April 29, 2024
2 min read

Meta Llama 3 now available on Friendli

At FriendliAI, we’re on a mission to democratize access to cutting-edge generative AI models. That’s why we’re thrilled to announce the integration of Meta’s latest Llama 3 large language models (LLMs) into our platform.

Llama 3 represents a breakthrough in open-source LLM performance. This next-generation model family demonstrates state-of-the-art capabilities across a wide range of benchmarks, showcasing improved reasoning, multi-tasking, and few-shot learning abilities.

In line with our commitment to openness, we’ve made the 8 billion and 70 billion parameter versions of Llama 3 available through Friendli. These models unlock new frontiers in language understanding, generation, analysis, and more.

Some key advantages of Llama 3 include:

Cutting-edge performance rivaling models an order of magnitude larger
Stronger logical reasoning and multi-step problem-solving skills
Improved few-shot learning from limited examples
Robust handling of long contexts and document understanding

Whether you’re a researcher, developer, or working on innovative AI applications, Llama 3 offers robust new foundations to build. On Friendli, you can quickly fine-tune the models, or leverage them for inference at scale. FriendliAI is trusted by major players in LLMs like Upstage, ScatterLab, TUNiB, and many more. If you want to be a part of it, sign up for our service → Friendli Dedicated Endpoints | Friendli Container.

To use Llama 3 instantly sign up to access Friendli Serverless Endpoints: Sign up
Go to Personal Settings > Tokens and create a personal access token by clicking ‘Create new token’.
Save your created token value.
Install friendli-client python package to use Python SDK to interact with the Serverless Endpoint for Llama. Run pip install friendli-client
Now initialize the Python client instance as follows:

You can create a response from Llama 3 as follows:

How Llama 3 performs on Friendli:

Here is a small snippet of Llama3 , quantized to FP8 by FriendliAI, which is now available to run on Friendli.

p90 latency graph of FP8 Llama3 70B (Friendli vs vLLM)

You can download FP8 Meta Llama 3 checkpoints at the Hugging Face Models hub also : https://huggingface.co/FriendliAI

Friendli Inference:

Our LLM inference serving engine is the fastest on the market. It is built for serving LLMs, more broadly generative AI models, with low latency and high throughput Friendli Inference provides high inference serving performance with its optimizations covering various use cases.

3 ways to use Llama 3 with Friendli Suite:

Friendli Suite offers three ways to leverage the power of Friendli Inference. Whether you want to run your LLMs on the cloud or on-premises, Friendli has got you covered.

Friendli Dedicated Endpoints: Run your generative AI models on dedicated GPUs, conveniently on autopilot.
Friendli Container: Deploy and serve your models in your GPU environment, whether in the cloud or on-premises, for complete control.
Friendli Serverless Endpoints: Start instantly with open-source models through our user-friendly API, which has the lowest costs in the market.

We’re excited to put this exceptional AI technology into the hands of our community and can’t wait to see what you create. The future of open and capable generative AI is here - Start building today on Friendli!

Check our Youtube channel to see more such model performance with FriendliAI!

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an expert — our experts (not a bot) will reply within one business day.

May 3, 2024
2 min read

Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain

Tutorial

RAG

LangChain

April 12, 2024
2 min read

Easily Migrating LLM Inference Serving from vLLM to Friendli Container

Some key advantages of Llama 3 include:

Cutting-edge performance rivaling models an order of magnitude larger

Stronger logical reasoning and multi-step problem-solving skills

Improved few-shot learning from limited examples

Robust handling of long contexts and document understanding

To use Llama 3 instantly sign up to access Friendli Serverless Endpoints: Sign up

Go to Personal Settings > Tokens and create a personal access token by clicking ‘Create new token’.

Save your created token value.

Install friendli-client python package to use Python SDK to interact with the Serverless Endpoint for Llama. Run pip install friendli-client

Now initialize the Python client instance as follows:

from friendli import Friendli

client = Friendli(token=“YOUR PERSONAL ACCESS TOKEN”)

You can create a response from Llama 3 as follows:

chat_completion = client.chat.completions.create(
    model=“meta-llama-3-70b-instruct”,
    messages=[
        {
            “role”: “user”,
            “content”: “Tell me how to make a delicious pancake”
        }
    ],
    stream=False,
)
print(chat_completion.choices[0].message.content)

3 ways to use Llama 3 with Friendli Suite:

Friendli Suite offers three ways to leverage the power of Friendli Inference. Whether you want to run your LLMs on the cloud or on-premises, Friendli has got you covered.

Friendli Dedicated Endpoints: Run your generative AI models on dedicated GPUs, conveniently on autopilot.

Friendli Container: Deploy and serve your models in your GPU environment, whether in the cloud or on-premises, for complete control.

Friendli Serverless Endpoints: Start instantly with open-source models through our user-friendly API, which has the lowest costs in the market.

Check our Youtube channel to see more such model performance with FriendliAI!

General FAQ

What is FriendliAI?

How does FriendliAI help my business?

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an expert — our experts (not a bot) will reply within one business day.

Meta Llama 3 now available on Friendli

Some key advantages of Llama 3 include:

How Llama 3 performs on Friendli:

Friendli Inference:

3 ways to use Llama 3 with Friendli Suite:

General FAQ

Related Posts

Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain

Easily Migrating LLM Inference Serving from vLLM to Friendli Container

Explore FriendliAI today

Meta Llama 3 now available on Friendli

Some key advantages of Llama 3 include:

How Llama 3 performs on Friendli:

Friendli Inference:

3 ways to use Llama 3 with Friendli Suite:

General FAQ

Related Posts

Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain

Easily Migrating LLM Inference Serving from vLLM to Friendli Container

Explore FriendliAI today