Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Details

ItemValue
Base ModelQwen/Qwen3.5-4B
Fine-tune MethodLoRA (PEFT)
LoRA Rank32
LoRA Alpha64
LoRA Dropout0.05
Target Modulesall-linear (language_model)
Training Frameworkms-swift v4.3.0
Precisionbfloat16
Max Sequence Length10240

Quick Start

Installation

bash

pip install torch transformers peft accelerate pillow qwen_vl_utils
# Or use ms-swift (recommended):
pip install ms-swift[all]

Inference with ms-swift (Recommended)

python

from htmlgen_infer import HTMLGenModel
model = HTMLGenModel(
adapter_path="Yesianrohn", # or local path to this repo
base_model="Qwen/Qwen3.5-4B", # auto-detected from adapter_config.json
merge_lora=True,
)
# Single image inference
html_output = model.predict("path/to/document_image.png")
print(html_output)
# Batch inference
results = model.predict_batch(["img1.png", "img2.png"])

Inference with Transformers + PEFT (Manual)

python

import torch
from transformers import AutoProcessor, AutoModelForCausalLM
from peft import PeftModel
from PIL import Image
# Load base model
base_model_id = "Qwen/Qwen3.5-4B"
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "Yesianrohn")
model = model.merge_and_unload() # Optional: merge for faster inference
# Prepare input
image = Image.open("document.png").convert("RGB")
system_prompt = (
"You are an expert document parser. Given an image of a document page, "
"reconstruct its source as a single complete, self-contained HTML5 "
"document. Faithfully preserve the original layout, typography, tables, "
"formulas, and visual hierarchy using inline CSS where appropriate. "
"Output only the HTML source, with no explanations, no markdown fences, "
"and no extra prose."
)
user_prompt = (
"Convert this document page into a complete HTML document. "
"Preserve the layout, headings, tables, and formulas exactly as shown. "
"Return only the HTML source."
)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": user_prompt},
]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
output_ids = model.generate(**inputs, max_new_tokens=10240, temperature=0.0, do_sample=False)
output_text = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(output_text)

Inference with ms-swift CLI

bash

# Direct inference with swift
swift infer \
--model Qwen/Qwen3.5-4B \
--adapters Yesianrohn \
--merge_lora true \
--torch_dtype bfloat16 \
--stream false

System Prompt

The model was trained with the following system prompt:

You are an expert document parser. Given an image of a document page, reconstruct its source as a single complete, self-contained HTML5 document. Faithfully preserve the original layout, typography, tables, formulas, and visual hierarchy using inline CSS where appropriate. Output only the HTML source, with no explanations, no markdown fences, and no extra prose.

Limitations

  • The model works best on clean, well-scanned document pages.
  • Very complex multi-column layouts or low-resolution images may produce imperfect HTML.
  • The maximum output length is 10240 tokens; very long documents may be truncated.

License

This adapter is released under the Apache 2.0 license. The base model (Qwen3.5-4B) has its own license — please refer to Qwen/Qwen3.5-4B for details.

Model provider

Yesianrohn

Model tree

Base

Qwen/Qwen3.5-4B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today