(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL – utm_medium}}", "utm_source={{URL – utm_source}}", "utm_campaign={{URL – utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());

Browse models
supported by Friendli Engine

Friendli Engine

Model library

OrganizationModel nameModel description
Mistral AIMistral AI
Mixtral-8x22B-v0.1
A 22B sparse Mixture-of-Experts (SMoE). Uses only 39B active parameters out of 141B.
Mistral AIMistral AI
Mixtral-8x22B-Instruct-v0.1
The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1.
Mistral AIMistral AI
Mixtral-8x7B-v0.1
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.
Mistral AIMistral AI
Mixtral-8x7B-Instruct-v0.1
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.
Mistral AIMistral AI
Mixtral-8x7B-v0.1-fp8
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.
Hugging Face H4Hugging Face H4
zephyr-orpo-141b-A35b-v0.1
Zephyr is a series of language models that are trained to act as helpful assistants, and is a fine-tuned version of Mixtral-8x22B-v0.1. A Mixture of Experts (MoE) model with 141B total parameters and 39B active parameters.
Mistral AIMistral AI
Mistral-7B-Instruct-v0.2
LoRAFP8
The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.
Mistral AIMistral AI
Mistral-7B-Instruct-v0.1
LoRA
The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.
Mistral AIMistral AI
Mistral-7B-v0.1
LoRA
A 7B transformer model, fast-deployed and easily customisable. Small, yet very powerful for a variety of use cases.
Cohere For AICohere For AI
c4ai-command-r-plus
C4AI Command R+ is an open weights research release of a 104B billion parameter model with highly advanced capabilities, this includes Retrieval Augmented Generation (RAG) and tool use to automate sophisticated tasks.
Cohere For AICohere For AI
c4ai-command-r-v01
C4AI Command-R is a research release of a 35 billion parameter highly performant generative model. Command-R is a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question answering.
GoogleGoogle
codegemma-7b-it
LoRA
CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models and are available as a 7 billion pretrained variant that specializes in code completion and code generation tasks, a 7 billion parameter instruction-tuned variant for code chat and instruction following and a 2 billion parameter pretrained variant for fast code completion.
GoogleGoogle
codegemma-7b
LoRA
CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models and are available as a 7 billion pretrained variant that specializes in code completion and code generation tasks, a 7 billion parameter instruction-tuned variant for code chat and instruction following and a 2 billion parameter pretrained variant for fast code completion.
GoogleGoogle
gemma-7b-it
LoRA
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
GoogleGoogle
gemma-7b
LoRA
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
GoogleGoogle
gemma-2-9b
LoRA
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
GoogleGoogle
gemma-2-27b
LoRA
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
GoogleGoogle
gemma-2-9b-it
LoRA
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
GoogleGoogle
gemma-2-27b-it
LoRA
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
DatabricksDatabricks
dbrx-instruct
DBRX Instruct is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. DBRX Instruct specializes in few-turn interactions.
DatabricksDatabricks
dbrx-base
DBRX Base is a mixture-of-experts (MoE) large language model trained from scratch by Databricks.
MetaMeta
Llama-2-7b-hf
LoRA
Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
MetaMeta
Llama-2-7b-chat-hf
LoRAFP8
Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
MetaMeta
Llama-2-13b-hf
LoRA
Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
MetaMeta
Llama-2-13b-chat-hf
LoRAFP8
Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
MetaMeta
Llama-2-70b-hf
LoRA
Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
MetaMeta
Llama-2-70b-chat-hf
LoRAFP8
Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
MetaMeta
CodeLlama-7b-hf
LoRA
Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This model is designed for general code synthesis and understanding.
MetaMeta
CodeLlama-13b-hf
LoRA
Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This model is designed for general code synthesis and understanding.
MetaMeta
CodeLlama-34b-hf
LoRA
Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This model is designed for general code synthesis and understanding.
MetaMeta
CodeLlama-70b-hf
LoRA
Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This model is designed for general code synthesis and understanding.
MetaMeta
Meta-Llama-3-8B
LoRAFP8
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.
MetaMeta
Meta-Llama-3-8B-Instruct
LoRAFP8
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.
MetaMeta
Meta-Llama-3-70B
LoRAFP8
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.
MetaMeta
Meta-Llama-3-70B-Instruct
LoRAFP8
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.
MetaMeta
Meta-Llama-3.1-8B
LoRA
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
MetaMeta
Meta-Llama-3.1-8B-Instruct
LoRA
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
MetaMeta
Meta-Llama-3.1-70B
LoRA
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
MetaMeta
Meta-Llama-3.1-70B-Instruct
LoRA
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
MetaMeta
Meta-Llama-3.1-405B
LoRA
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
MetaMeta
Meta-Llama-3.1-405B-Instruct
LoRA
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
OpenLM ResearchOpenLM Research
open_llama_13b
LoRA
OpenLLaMA is an open reproduction of LLaMA.
LMSYSLMSYS
vicuna-13b-v1.3
LoRA
Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
01.AI01.AI
Yi-6B
LoRA
The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI. Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more.
01.AI01.AI
Yi-34B
LoRA
The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI. Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more.
WizardLM TeamWizardLM Team
WizardLM-13B-V1.0
LoRA
WizardLM is a large language model developed by fine-tuning the LLaMA model using a novel method called Evol-Instruct. This method generates large amounts of complex instruction data automatically, reducing the need for manual data creation. Starting with an initial set of instructions, Evol-Instruct iteratively rewrites them to increase their complexity. The resulting instruction data is then used to fine-tune the LLaMA model, creating WizardLM.
WizardLM TeamWizardLM Team
WizardLM-13B-V1.2
LoRA
WizardLM is a large language model developed by fine-tuning the LLaMA model using a novel method called Evol-Instruct. This method generates large amounts of complex instruction data automatically, reducing the need for manual data creation. Starting with an initial set of instructions, Evol-Instruct iteratively rewrites them to increase their complexity. The resulting instruction data is then used to fine-tune the LLaMA model, creating WizardLM.
WizardLM TeamWizardLM Team
WizardLM-70B-V1.0
LoRA
WizardLM is a large language model developed by fine-tuning the LLaMA model using a novel method called Evol-Instruct. This method generates large amounts of complex instruction data automatically, reducing the need for manual data creation. Starting with an initial set of instructions, Evol-Instruct iteratively rewrites them to increase their complexity. The resulting instruction data is then used to fine-tune the LLaMA model, creating WizardLM.
WizardLM TeamWizardLM Team
WizardMath-7B-V1.0
LoRA
Present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying WizardLM Team's proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math.
WizardLM TeamWizardLM Team
WizardMath-70B-V1.0
LoRA
Present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying WizardLM Team's proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math.
WizardLM TeamWizardLM Team
WizardMath-7B-V1.1
LoRA
Present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying WizardLM Team's proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math.
WizardLM TeamWizardLM Team
WizardCoder-15B-V1.0
LoRA
Introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.
WizardLM TeamWizardLM Team
WizardCoder-33B-V1.1
LoRA
Introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.
WizardLM TeamWizardLM Team
WizardCoder-Python-13B-V1.0
LoRA
Introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.
WizardLM TeamWizardLM Team
WizardCoder-Python-34B-V1.0
LoRA
Introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.
BigCodeBigCode
starcoder2-15b
StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from The Stack v2, with opt-out requests excluded.
BigCodeBigCode
starcoder2-7b
StarCoder2-7B model is a 7B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded.
BigCodeBigCode
starcoder2-3b
StarCoder2-3B model is a 3B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded.
BigCodeBigCode
starcoder
The StarCoder models are 15.5B parameter models trained on 80+ programming languages from The Stack (v1.2), with opt-out requests excluded. The model uses Multi Query Attention, a context window of 8192 tokens, and was trained using the Fill-in-the-Middle objective on 1 trillion tokens.
BigCodeBigCode
gpt_bigcode-santacoder
This is the same model as SantaCoder but it can be loaded with transformers >=4.28.1 to use the GPTBigCode architecture.
MosaicMLMosaicML
mpt-7b
LoRA
MPT-7B-StoryWriter-65k+ is a model designed to read and write fictional stories with super long context lengths. It was built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the books3 dataset.
MosaicMLMosaicML
mpt-7b-storywriter
LoRA
MPT-7B-StoryWriter-65k+ is a model designed to read and write fictional stories with super long context lengths. It was built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the books3 dataset.
MosaicMLMosaicML
mpt-30b
LoRA
MPT-30B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. This model was trained by MosaicML.
Technology Innovation InstituteTechnology Innovation Institute
falcon-7b
Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license.
Technology Innovation InstituteTechnology Innovation Institute
falcon-40b
Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license.
BigScienceBigScience
bloom
BLOOM is an autoregressive Large Language Model (LLM), trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources. As such, it is able to output coherent text in 46 languages and 13 programming languages that is hardly distinguishable from text written by humans. BLOOM can also be instructed to perform text tasks it hasn't been explicitly trained for, by casting them as text generation tasks.
BigScienceBigScience
bloomz
BLOOMZ is a model capable of zero-shot following human instructions in dozens of languages. It is fine-tuned with BLOOM pretrained multilingual language models on a cross-lingual task mixture (xP3), thus enabling it to generalize cross-lingually to unseen tasks and languages.
FacebookFacebook
opt-66b
OPT was first introduced in Open Pre-trained Transformer Language Models and first released in metaseq's repository on May 3rd 2022 by Meta AI.
FacebookFacebook
opt-iml-max-30b
OPT-IML (OPT + Instruction Meta-Learning) is a set of instruction-tuned versions of OPT, on a collection of ~2000 NLP tasks gathered from 8 NLP benchmarks, called OPT-IML Bench.
EleutherAIEleutherAI
gpt-j-6b
LoRA
GPT-J 6B is a transformer model trained using Ben Wang's Mesh Transformer JAX. "GPT-J" refers to the class of model, while "6B" represents the number of trainable parameters.
EleutherAIEleutherAI
gpt-neox-20b
GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile using the GPT-NeoX library. Its architecture intentionally resembles that of GPT-3, and is almost identical to that of GPT-J- 6B. Its training dataset contains a multitude of English-language texts, reflecting the general-purpose nature of this model.
EleutherAIEleutherAI
pythia-12b
The Pythia model suite was deliberately designed to promote scientific research on large language models, especially interpretability research.
DatabricksDatabricks
dolly-v2-12b
Databricks' dolly-v2-12b, an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. Based on pythia-12b, Dolly is trained on ~15k instruction/response fine tuning records databricks-dolly-15k generated by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA and summarization. dolly-v2-12b is not a state-of-the-art model, but does exhibit surprisingly high quality instruction following behavior not characteristic of the foundation model on which it is based.
StabilityAIStabilityAI
stablelm-tuned-alpha-7b
StableLM-Tuned-Alpha is a suite of 3B and 7B parameter decoder-only language models built on top of the StableLM-Base-Alpha models and further fine-tuned on various chat and instruction-following datasets.
MicrosoftMicrosoft
phi-1_5
The language model Phi-1.5 is a Transformer with 1.3 billion parameters. It was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts.
MicrosoftMicrosoft
phi-2
Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value).
MicrosoftMicrosoft
Phi-3-mini-4k-instruct
The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties.
MicrosoftMicrosoft
Phi-3-mini-128k-instruct
The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
QwenQwen
Qwen1.5-7B
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data.
QwenQwen
Qwen1.5-7B-Chat
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data.
OpenAI communityOpenAI community
gpt2
GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was trained to guess the next word in sentences.
OpenAI communityOpenAI community
gpt2-xl
GPT-2 XL is the 1.5B parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.
GoogleGoogle
flan-t5-base
For the same number of parameters, FLAN-T5 has been fine-tuned on more than 1000 additional tasks covering also more languages.
GoogleGoogle
flan-t5-xl
For the same number of parameters, FLAN-T5 has been fine-tuned on more than 1000 additional tasks covering also more languages.
GoogleGoogle
flan-t5-xxl
For the same number of parameters, FLAN-T5 has been fine-tuned on more than 1000 additional tasks covering also more languages.
FacebookFacebook
blenderbot-3B
The BlenderBot series has made progress in combining conversational skills — like personality, empathy and knowledge — incorporating long-term memory, and searching the internet to carry out meaningful conversations.
xAIxAI
grok-1
Base model trained on a large amount of text data, not fine-tuned for any particular task. 314B parameter Mixture-of-Experts model with 25% of the weights active on a given token. Trained from scratch by xAI using a custom training stack on top of JAX and Rust in October 2023.
Stability AIStability AI
stable-diffusion-xl-base-1.0
SDXL is a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder.
RunwayRunway
stable-diffusion-v1-5
The Stable-Diffusion-v1-5 checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.

Friendli Engine supports FP8 in all models and AWQ (4-bit) in most models, except for models like GPT-J.

The list above is not exhaustive. The current version of Friendli Engine supports direct loading of safetensors checkpoints for the following Hugging Face transformers model architectures.
GPT2LMHeadModel
GPTJForCausalLM
MPTForCausalLM
OPTForCausalLM
BloomForCausalLM
GPTNeoXForCausalLM
LlamaForCausalLM
FalconForCausalLM
MistralForCausalLM
MixtralForCausalLM
Qwen2ForCausalLM
GemmaForCausalLM
Starcoder2ForCausalLM
CohereForCausalLM
DbrxForCausalLM
If your model does not belong to one of the above model architectures, please contact us for support.
HOW TO USE

Three ways to run generative AI models with Friendli Engine:

01

Friendli Container

Serve LLMs/LMMs inferences with Friendli Engine in your GPU environment

Learn more

02

Friendli Dedicated Endpoints

Build and run LLMs/LMMs on autopilot with Friendli Dedicated Endpoints

Learn more

03

Friendli Serverless Endpoints

Fast and affordable API for open-source generative AI models

Learn more