Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Container
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Model Details
- Developed by: [Your Name/Organization]
- Model type: Multimodal Large Language Model (Vision-Language)
- Language(s): English
- Finetuned from model: unsloth/Qwen2-VL-7B-Instruct
- Finetuning approach: LoRA (Low-Rank Adaptation)
Training Details
Training Data
Fine-tuned on the images split of the Docmatix dataset, which focuses on document understanding and visual question answering.
Training Hyperparameters
- Method: SFT (Supervised Fine-Tuning) with LoRA
- LoRA Rank (r): 8
- LoRA Alpha: 16
- Optimizer: AdamW (8-bit)
- Learning Rate: 1e-4
- Batch Size: 1 (with Gradient Accumulation steps: 8)
- Max Steps: 200
- Precision: fp16/bf16 (depending on hardware compatibility)
How to Get Started with the Model
Loading the LoRA Adapter
python
from unsloth import FastVisionModelimport torchmodel, tokenizer = FastVisionModel.from_pretrained("unsloth/Qwen2-VL-7B-Instruct",load_in_4bit=True,)model = FastVisionModel.load_adapter(model, "path_to_your_lora_files")
Inference Example
python
from transformers import TextStreamerFastVisionModel.for_inference(model)# Standard Qwen2-VL inference code follows...
Framework versions
- PEFT 0.19.1
- Unsloth 2026.5.8
- Transformers 5.0.0
- PyTorch 2.11.0
Model provider
S-ABISHEAK
Model tree
Base
unsloth/Qwen2-VL-7B-Instruct
Adapter
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information