Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Model Details

  • Developed by: [Your Name/Organization]
  • Model type: Multimodal Large Language Model (Vision-Language)
  • Language(s): English
  • Finetuned from model: unsloth/Qwen2-VL-7B-Instruct
  • Finetuning approach: LoRA (Low-Rank Adaptation)

Training Details

Training Data

Fine-tuned on the images split of the Docmatix dataset, which focuses on document understanding and visual question answering.

Training Hyperparameters

  • Method: SFT (Supervised Fine-Tuning) with LoRA
  • LoRA Rank (r): 8
  • LoRA Alpha: 16
  • Optimizer: AdamW (8-bit)
  • Learning Rate: 1e-4
  • Batch Size: 1 (with Gradient Accumulation steps: 8)
  • Max Steps: 200
  • Precision: fp16/bf16 (depending on hardware compatibility)

How to Get Started with the Model

Loading the LoRA Adapter

python

from unsloth import FastVisionModel
import torch
model, tokenizer = FastVisionModel.from_pretrained(
"unsloth/Qwen2-VL-7B-Instruct",
load_in_4bit=True,
)
model = FastVisionModel.load_adapter(model, "path_to_your_lora_files")

Inference Example

python

from transformers import TextStreamer
FastVisionModel.for_inference(model)
# Standard Qwen2-VL inference code follows...

Framework versions

  • PEFT 0.19.1
  • Unsloth 2026.5.8
  • Transformers 5.0.0
  • PyTorch 2.11.0

Model provider

S-ABISHEAK

Model tree

Base

unsloth/Qwen2-VL-7B-Instruct

Adapter

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today