ForeverBlue

Qwen3-VL-2B-GRACE-BF16

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Model Details

Base model: Qwen/Qwen3-VL-2B-Instruct
Training framework: GRACE
Precision: BF16 full precision
Training data: ShareGPT4V
Evaluation protocol: LLaVA-style multimodal evaluation
Repository: ForeverBlue/Qwen3-VL-2B-GRACE-BF16

📊 Results

Comparison on 7 VLM benchmarks. The 8B model is the distillation teacher (reference upper bound); all GRACE-Qwen3 variants are 2B students. Best result among the 2B Qwen3-VL models is in bold.

We release GRACE on Qwen3-VL here because it is the most current backbone and gives a fairer, up-to-date point of comparison, with the vanilla Qwen3-VL-2B-Instruct as the baseline. The paper itself reports GRACE on LLaVA-1.5 and Qwen2-VL; we additionally release the LLaVA-1.5 W4G128 INT4 checkpoint from the paper in the model zoo below.

Table with columns: Model, Params, Precision, HallB, MMBench, ScienceQA, AI2D, MMMU, SEED, MMStar, Avg
Model	Params	Precision	HallB	MMBench	ScienceQA	AI2D	MMMU	SEED	MMStar	Avg
Qwen3-VL-8B (teacher, ref.)	8B	BF16	61.1	84.5	85.0	85.7	69.6	77.5	70.9	76.3
Qwen3-VL-2B (baseline)	2B	BF16	51.4	78.4	81.4	76.9	53.4	71.2	58.3	67.3

GRACE lifts the Qwen3-VL-2B baseline by +9.4 avg and matches or slightly exceeds the 8B teacher on average (76.7 vs. 76.3) at roughly 1/4 the parameters. The W4G128 INT4 model retains 98% of the BF16 average.

🤗 Model Zoo

Table with columns: Model, Backbone, Bits, Group, Checkpoint description, HF Hub
Model	Backbone	Bits	Group	Checkpoint description	HF Hub
Qwen3-VL-2B-GRACE-BF16	Qwen3-VL-2B	bf16	—	Full-precision GRACE checkpoint; used as the student initialization for the W8/W4 Qwen3-VL runs.	FoeverBLUE/Qwen3-VL-2B-GRACE-BF16
Qwen3-VL-2B-GRACE-W8G128	Qwen3-VL-2B	int8	128	INT8 QAT checkpoint with group size 128; high-retention quantized Qwen3-VL student.

The BF16 Qwen3-VL checkpoint is the full-precision GRACE student used as the initial student weights for the W8 and W4 Qwen3-VL runs. The LLaVA-1.5 W4G128 checkpoint corresponds to the paper setting and includes GRACE-specific QAT quantized weights for reproducing the INT4 LLaVA experiments.

Intended Use

This model is intended for research on:

Efficient vision-language models
Knowledge distillation for VLMs
Multimodal alignment
Full-precision GRACE training
BF16 baseline / teacher-student comparison studies

Training Details

This checkpoint is a full-precision BF16 model trained under the GRACE framework.

Configuration:

Precision: BF16
Training method: GRACE
Backbone: Qwen3-VL-2B-Instruct
Dataset: ShareGPT4V
Evaluation: LLaVA-style multimodal benchmarks

Unlike the QAT releases, this model does not use weight quantization.

Files

model.safetensors / model-*.safetensors
config.json
generation_config.json
tokenizer files
processor files

Loading

python
from transformers import AutoProcessor
from transformers import AutoModelForImageTextToText
import torch

repo_id = "ForeverBlue/Qwen3-VL-2B-GRACE-BF16"

processor = AutoProcessor.from_pretrained(
    repo_id,
    trust_remote_code=True
)

model = AutoModelForImageTextToText.from_pretrained(
    repo_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

Important Notes

This is the full-precision BF16 GRACE checkpoint. It does not include INT8 or INT4 QAT weight compression. For quantized versions, please refer to the W8G128 and W4G128 checkpoints listed in the Model Zoo.

The standard from_pretrained call should load this BF16 checkpoint directly in a Qwen3-VL-compatible Transformers environment. For reproducing the GRACE training or distillation pipeline, please refer to the official code repository:

https://github.com/ForeverBlue816/GRACE

Limitations

This model is released for research purposes.
Performance may vary depending on the evaluation codebase, preprocessing, generation parameters, and multimodal benchmark implementation.
Users should follow the license and usage restrictions of the original Qwen3-VL-2B-Instruct base model.
This checkpoint is not optimized for low-bit inference; use the W8G128 or W4G128 release for quantized deployment studies.

Citation

If you use this model, please cite:

bibtex
@article{chen2026gated,
  title={Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs},
  author={Chen, Yanlong and Habibian, Amirhossein and Benini, Luca and Li, Yawei},
  journal={arXiv preprint arXiv:2601.22709},
  year={2026}
}

Model provider

ForeverBlue

Model tree

Base

Qwen/Qwen3-VL-2B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Model Details

Base model: Qwen/Qwen3-VL-2B-Instruct
Training framework: GRACE
Precision: BF16 full precision
Training data: ShareGPT4V
Evaluation protocol: LLaVA-style multimodal evaluation
Repository: ForeverBlue/Qwen3-VL-2B-GRACE-BF16

📊 Results

Table with columns: Model, Params, Precision, HallB, MMBench, ScienceQA, AI2D, MMMU, SEED, MMStar, Avg
Model	Params	Precision	HallB	MMBench	ScienceQA	AI2D	MMMU	SEED	MMStar	Avg
Qwen3-VL-8B (teacher, ref.)	8B	BF16	61.1	84.5	85.0	85.7	69.6	77.5	70.9	76.3
Qwen3-VL-2B (baseline)	2B	BF16	51.4	78.4	81.4	76.9	53.4	71.2	58.3	67.3

GRACE lifts the Qwen3-VL-2B baseline by +9.4 avg and matches or slightly exceeds the 8B teacher on average (76.7 vs. 76.3) at roughly 1/4 the parameters. The W4G128 INT4 model retains 98% of the BF16 average.

🤗 Model Zoo

Table with columns: Model, Backbone, Bits, Group, Checkpoint description, HF Hub
Model	Backbone	Bits	Group	Checkpoint description	HF Hub
Qwen3-VL-2B-GRACE-BF16	Qwen3-VL-2B	bf16	—	Full-precision GRACE checkpoint; used as the student initialization for the W8/W4 Qwen3-VL runs.	FoeverBLUE/Qwen3-VL-2B-GRACE-BF16
Qwen3-VL-2B-GRACE-W8G128	Qwen3-VL-2B	int8	128	INT8 QAT checkpoint with group size 128; high-retention quantized Qwen3-VL student.

Intended Use

This model is intended for research on:

Efficient vision-language models
Knowledge distillation for VLMs
Multimodal alignment
Full-precision GRACE training
BF16 baseline / teacher-student comparison studies

Training Details

This checkpoint is a full-precision BF16 model trained under the GRACE framework.

Configuration:

Precision: BF16
Training method: GRACE
Backbone: Qwen3-VL-2B-Instruct
Dataset: ShareGPT4V
Evaluation: LLaVA-style multimodal benchmarks

Unlike the QAT releases, this model does not use weight quantization.

Files

model.safetensors / model-*.safetensors
config.json
generation_config.json
tokenizer files
processor files

Loading

python
from transformers import AutoProcessor
from transformers import AutoModelForImageTextToText
import torch

repo_id = "ForeverBlue/Qwen3-VL-2B-GRACE-BF16"

processor = AutoProcessor.from_pretrained(
    repo_id,
    trust_remote_code=True
)

model = AutoModelForImageTextToText.from_pretrained(
    repo_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

Important Notes

https://github.com/ForeverBlue816/GRACE

Limitations

This model is released for research purposes.
Performance may vary depending on the evaluation codebase, preprocessing, generation parameters, and multimodal benchmark implementation.
Users should follow the license and usage restrictions of the original Qwen3-VL-2B-Instruct base model.
This checkpoint is not optimized for low-bit inference; use the W8G128 or W4G128 release for quantized deployment studies.

Citation

If you use this model, please cite:

bibtex
@article{chen2026gated,
  title={Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs},
  author={Chen, Yanlong and Habibian, Amirhossein and Benini, Luca and Li, Yawei},
  journal={arXiv preprint arXiv:2601.22709},
  year={2026}
}

Qwen3-VL-2B-GRACE-BF16

Get help setting up a custom Dedicated Endpoints.

README

Model Details

📊 Results

🤗 Model Zoo

Intended Use

Training Details

Files

Loading

Important Notes

Limitations

Citation

Explore FriendliAI today

README

Model Details

📊 Results

🤗 Model Zoo

Intended Use

Training Details

Files

Loading

Important Notes

Limitations

Citation