ohjoonhee/vlatents-qwen25vl7b-stage3-upstream-baseline-v1 API & Inference Endpoint

Recipe

Stage: 1 (NTP SFT; no alignment, no latent slots)
Base model: Qwen/Qwen2.5-VL-7B-Instruct
Init checkpoint: (none)
Dataset: ohjoonhee/visual-cot-50k-poc (Monet-SFT-125K Visual_CoT subset, eval-200 excluded)
Hardware: 4× H100 80GB, DeepSpeed ZeRO-2 + CPU optim offload, bf16
(no config available)

Notes

Pure NTP SFT — no Monet Stage 2 alignment loss, no latent-mode forward. The Monet special tokens (<observation>, <abs_vis_token>, etc.) ARE registered in the tokenizer and embedded so the model learns to produce them, but the architectural latent-slot mechanism is unused at this stage.

This revision (`step-1500`)

No training log row available.

Notes

Faithful upstream Monet Stage 3 reproduction (lambda_reg=0). Init: Monet-SFT-7B/stage1. Teacher: upstream-precomputed (124K latents). Trained ~1942 step target, walltime-cut at step ~1728 (epoch 1.77). Final: loss=0.19 alignment_loss=0.032 obs_acc=0.97 — collapse signature.

Other revisions: see the revisions dropdown on this page.

How to load

python
from transformers import AutoModelForVision2Seq, AutoProcessor
m = AutoModelForVision2Seq.from_pretrained(
    "ohjoonhee/vlatents-qwen25vl7b-stage3-upstream-baseline-v1", revision="step-1500", torch_dtype="bfloat16")
p = AutoProcessor.from_pretrained("ohjoonhee/vlatents-qwen25vl7b-stage3-upstream-baseline-v1", revision="step-1500")

Limitations

Research checkpoint, eval-only. Mid-training step (1500/?). Not for production.

Card generated 2026-06-01 from training_log.jsonl + the run's training config.

Recipe

Stage: 1 (NTP SFT; no alignment, no latent slots)
Base model: Qwen/Qwen2.5-VL-7B-Instruct
Init checkpoint: (none)
Dataset: ohjoonhee/visual-cot-50k-poc (Monet-SFT-125K Visual_CoT subset, eval-200 excluded)
Hardware: 4× H100 80GB, DeepSpeed ZeRO-2 + CPU optim offload, bf16
(no config available)

Notes

This revision (`step-1500`)

No training log row available.

Notes

Other revisions: see the revisions dropdown on this page.

How to load

python
from transformers import AutoModelForVision2Seq, AutoProcessor
m = AutoModelForVision2Seq.from_pretrained(
    "ohjoonhee/vlatents-qwen25vl7b-stage3-upstream-baseline-v1", revision="step-1500", torch_dtype="bfloat16")
p = AutoProcessor.from_pretrained("ohjoonhee/vlatents-qwen25vl7b-stage3-upstream-baseline-v1", revision="step-1500")

Limitations

Research checkpoint, eval-only. Mid-training step (1500/?). Not for production.

Card generated 2026-06-01 from training_log.jsonl + the run's training config.

vlatents-qwen25vl7b-stage3-upstream-baseline-v1

Get help setting up a custom Dedicated Endpoints.

README

Recipe

Notes

This revision (`step-1500`)

Notes

How to load

Limitations

Explore FriendliAI today

README

Recipe

Notes

This revision (`step-1500`)

Notes

How to load

Limitations

vlatents-qwen25vl7b-stage3-upstream-baseline-v1

Get help setting up a custom Dedicated Endpoints.

Recipe

Notes

This revision (step-1500)

Notes

How to load

Limitations

Explore FriendliAI today

Recipe

Notes

This revision (step-1500)

Notes

How to load

Limitations

This revision (`step-1500`)

This revision (`step-1500`)