How to Fine-tune Vision Language Models (VLMs)
Fine-tune Vision Language Models (VLMs) on Friendli Dedicated Endpoints using datasets.
Introduction
Effortlessly fine-tune your Vision Language Model (VLM) with Friendli Dedicated Endpoints, which leverages the Parameter-Efficient Fine-Tuning (PEFT) method to reduce training costs while preserving model quality, similar to full-parameter fine-tuning. This can make your model become an expert on specific visual tasks and improve its ability to understand and describe images accurately.
In this tutorial, we will cover:
- How to upload your image-text dataset for VLM fine-tuning.
- How to fine-tuning state-of-the-art VLMs like Qwen2.5-VL-32B-Instruct and gemma-3-27b-it on your dataset.
- How to deploy your fine-tuned VLM model.
Table of Contents
- Prerequisites
- Step 1. Prepare Your Dataset
- Step 2. Upload Your Dataset
- Step 3. Fine-tune Your VLM
- Step 4. Monitor Training Progress
- Step 5. Deploy Your Fine-tuned Model
- Resources
Prerequisites
- Head to Friendli Suite and create an account.
- Issue a Friendli Token and store it safely.
Step 1. Prepare Your Dataset
Your dataset should be a conversational dataset in JSONL format, where each line represents a sequence of messages. Each message in the conversation should include a "role"
(e.g., system
, user
, or assistant
) and "content"
. For VLM fine-tuning, user content can contain both text and image data (Note that for image data, we support URL and Base64).
Here’s an example of what it should look like. Note that it’s one line but beautified for readability:
You can access our example dataset ‘FriendliAI/gsm8k’ (for Chat), ‘FriendliAI/sample-vision’ (for Chat with image) and explore some of our quantized generative AI models on our Hugging Face page.
Step 2. Upload Your Dataset
Once you have prepared your dataset, upload it to Friendli using the Python SDK:
To view and edit the datasets you’ve uploaded, visit Friendli Suite > Dataset.
Step 3. Fine-tune Your VLM
Go to Friendli Suite > Fine-tuning, and click the ‘New job’ button to create a new job.
In the job creation form, you’ll need to configure the following settings:
-
Job Name:
- Enter a name for your fine-tuning job.
- If not provided, a name will be automatically generated (e.g.,
accomplished-shark
).
-
Model:
- Choose your base model from one of these sources:
- Hugging Face: Select from models available on Hugging Face.
- Weights & Biases: Use a model from your W&B projects.
- Uploaded model: Use a model you’ve previously uploaded.
- Choose your base model from one of these sources:
-
Dataset:
- Select the dataset to use.
-
Weights & Biases Integration (Optional):
- Enable W&B tracking by providing your W&B project name.
- This will allow you to monitor training metrics in W&B.
-
Hyperparameters:
- Learning Rate (required): Initial learning rate for optimizer (e.g., 0.0001).
- Batch Size (required): Total batch size used for training (e.g., 16).
- Total Number of Training (required), either:
- Number of Training Epoch: Total number of training epochs to perform (e.g., 1)
- Training Steps: Total number of training steps to perform (e.g., 1000)
- Evaluation Steps (required): Number of steps between evaluation of the model using the validation set (e.g., 300).
- LoRA Rank (optional): Rank of the LoRA parameters (e.g., 16).
- LoRA Alpha (optional): Scaling factor that determines the influence of the low-rank matrices during fine-tuning (e.g., 32).
- LoRA Dropout (optional): Dropout rate applied during fine-tuning (e.g., 0.1).
After configuring these settings, click the ‘Create’ button at the bottom to start your fine-tuning job.
Step 4. Monitor Training Progress
You can now monitor your fine-tuning job progress and on Friendli Suite.
If you have integrated your Weights & Biases (W&B) account, you can also monitor the training status in your W&B project. Read our FAQ section on using W&B with dedicated fine-tuning to learn more about monitoring you fine-tuning jobs on their platform.
Step 5. Deploy Your Fine-tuned Model
Once the fine-tuning process is complete, you can immediately deploy the model by clicking the ‘Deploy’ button in the top right corner. The name of the fine-tuned LoRA adapter will be the same as your fine-tuning job name.
For more information about deploying a model, refer to Endpoints documentation.
Resources
Explore these additional resources to learn more about VLM fine-tuning and optimization:
- Browse all models supported by FriendliAI
- Example dataset
- FAQ on general requirements for a model
- FAQ on using a Hugging Face repository as a model
- FAQ on integrating a Hugging Face account
- FAQ on using a W&B artifact as a model
- FAQ on integrating a W&B account
- FAQ on using W&B with dedicated fine-tuning
- Endpoints documentation on model deployment