Faster serving of the 4-bit quantized Llama 2 70B model with fewer GPUs: Friendli Inference vs. vLLM