- April 29, 2024
- 2 min read
Meta Llama 3 now available on Friendli
At FriendliAI, we’re on a mission to democratize access to cutting-edge generative AI models. That’s why we’re thrilled to announce the integration of Meta’s latest Llama 3 large language models (LLMs) into our platform.
Llama 3 represents a breakthrough in open-source LLM performance. This next-generation model family demonstrates state-of-the-art capabilities across a wide range of benchmarks, showcasing improved reasoning, multi-tasking, and few-shot learning abilities.
In line with our commitment to openness, we’ve made the 8 billion and 70 billion parameter versions of Llama 3 available through Friendli. These models unlock new frontiers in language understanding, generation, analysis, and more.
Some key advantages of Llama 3 include:
- Cutting-edge performance rivaling models an order of magnitude larger
- Stronger logical reasoning and multi-step problem-solving skills
- Improved few-shot learning from limited examples
- Robust handling of long contexts and document understanding
Whether you’re a researcher, developer, or working on innovative AI applications, Llama 3 offers robust new foundations to build. On Friendli, you can quickly fine-tune the models, or leverage them for inference at scale. FriendliAI is trusted by major players in LLMs like Upstage, ScatterLab, TUNiB, and many more. If you want to be a part of it, sign up for our service → Friendli Dedicated Endpoints | Friendli Container.
- To use Llama 3 instantly signup to access Friendli Serverless Endpoints: Sign up
- Go to Personal Settings > Tokens and create a personal access token by clicking ‘Create new token’.
- Save your created token value.
- Install
friendli-client
python package to use Python SDK to interact with the Serverless Endpoint for Llama. Runpip install friendli-client
- Now initialize the Python client instance as follows:
pythonfrom friendli import Friendli client = Friendli(token=“YOUR PERSONAL ACCESS TOKEN”)
- You can create a response from Llama 3 as follows:
pythonchat_completion = client.chat.completions.create( model=“meta-llama-3-70b-instruct”, messages=[ { “role”: “user”, “content”: “Tell me how to make a delicious pancake” } ], stream=False, ) print(chat_completion.choices[0].message.content)
How Llama 3 performs on Friendli:
Here is a small snippet of Llama3 , quantized to FP8 by FriendliAI, which is now available to run on Friendli.
You can download FP8 Meta Llama 3 checkpoints at the Hugging Face Models hub also : https://huggingface.co/FriendliAI
Friendli Engine:
Our LLM inference serving engine is the fastest on the market. It is built for serving LLMs, more broadly generative AI models, with low latency and high throughput Friendli Engine provides high inference serving performance with its optimizations covering various use cases.
3 ways to use Llama 3 with Friendli Suite:
Friendli Suite offers three ways to leverage the power of Friendli Engine. Whether you want to run your LLMs on the cloud or on-premises, Friendli has got you covered.
- Friendli Dedicated Endpoints: Run your generative AI models on dedicated GPUs, conveniently on autopilot.
- Friendli Container: Deploy and serve your models in your GPU environment, whether in the cloud or on-premises, for complete control.
- Friendli Serverless Endpoints: Start instantly with open-source models through our user-friendly API, which has the lowest costs in the market.
We’re excited to put this exceptional AI technology into the hands of our community and can’t wait to see what you create. The future of open and capable generative AI is here - Start building today on Friendli!
Check our Youtube channel to see more such model performance with FriendliAI!
Written by
FriendliAI Tech & Research
Share