- April 29, 2024
- 2 min read
Meta Llama 3 now available on Friendli

At FriendliAI, we’re on a mission to democratize access to cutting-edge generative AI models. That’s why we’re thrilled to announce the integration of Meta’s latest Llama 3 large language models (LLMs) into our platform.
Llama 3 represents a breakthrough in open-source LLM performance. This next-generation model family demonstrates state-of-the-art capabilities across a wide range of benchmarks, showcasing improved reasoning, multi-tasking, and few-shot learning abilities.
In line with our commitment to openness, we’ve made the 8 billion and 70 billion parameter versions of Llama 3 available through Friendli. These models unlock new frontiers in language understanding, generation, analysis, and more.
Some key advantages of Llama 3 include:
- Cutting-edge performance rivaling models an order of magnitude larger
- Stronger logical reasoning and multi-step problem-solving skills
- Improved few-shot learning from limited examples
- Robust handling of long contexts and document understanding
Whether you’re a researcher, developer, or working on innovative AI applications, Llama 3 offers robust new foundations to build. On Friendli, you can quickly fine-tune the models, or leverage them for inference at scale. FriendliAI is trusted by major players in LLMs like Upstage, ScatterLab, TUNiB, and many more. If you want to be a part of it, sign up for our service → Friendli Dedicated Endpoints | Friendli Container.
- To use Llama 3 instantly sign up to access Friendli Serverless Endpoints: Sign up
- Go to Personal Settings > Tokens and create a personal access token by clicking ‘Create new token’.
- Save your created token value.
- Install
friendli-client
python package to use Python SDK to interact with the Serverless Endpoint for Llama. Runpip install friendli-client
- Now initialize the Python client instance as follows:
python
- You can create a response from Llama 3 as follows:
python
How Llama 3 performs on Friendli:
Here is a small snippet of Llama3 , quantized to FP8 by FriendliAI, which is now available to run on Friendli.
You can download FP8 Meta Llama 3 checkpoints at the Hugging Face Models hub also : https://huggingface.co/FriendliAI
Friendli Inference:
Our LLM inference serving engine is the fastest on the market. It is built for serving LLMs, more broadly generative AI models, with low latency and high throughput Friendli Inference provides high inference serving performance with its optimizations covering various use cases.
3 ways to use Llama 3 with Friendli Suite:
Friendli Suite offers three ways to leverage the power of Friendli Inference. Whether you want to run your LLMs on the cloud or on-premises, Friendli has got you covered.
- Friendli Dedicated Endpoints: Run your generative AI models on dedicated GPUs, conveniently on autopilot.
- Friendli Container: Deploy and serve your models in your GPU environment, whether in the cloud or on-premises, for complete control.
- Friendli Serverless Endpoints: Start instantly with open-source models through our user-friendly API, which has the lowest costs in the market.
We’re excited to put this exceptional AI technology into the hands of our community and can’t wait to see what you create. The future of open and capable generative AI is here - Start building today on Friendli!
Check our Youtube channel to see more such model performance with FriendliAI!
Written by
FriendliAI Tech & Research
Share
General FAQ
What is FriendliAI?
FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.
How does FriendliAI help my business?
Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing
Which models and modalities are supported?
Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models
Can I deploy models from Hugging Face directly?
Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership
Still have questions?
If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Contact Sales — our experts (not a bot) will reply within one business day.