Friendli Inference: How to Serve Large-scale Transformer Models