Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Details
| Item | Value |
|---|---|
| Base Model | Qwen/Qwen3.5-4B |
| Fine-tune Method | LoRA (PEFT) |
| LoRA Rank | 32 |
| LoRA Alpha | 64 |
| LoRA Dropout | 0.05 |
| Target Modules | all-linear (language_model) |
| Training Framework | ms-swift v4.3.0 |
| Precision | bfloat16 |
| Max Sequence Length | 10240 |
Quick Start
Installation
bash
pip install torch transformers peft accelerate pillow qwen_vl_utils# Or use ms-swift (recommended):pip install ms-swift[all]
Inference with ms-swift (Recommended)
python
from htmlgen_infer import HTMLGenModelmodel = HTMLGenModel(adapter_path="Yesianrohn", # or local path to this repobase_model="Qwen/Qwen3.5-4B", # auto-detected from adapter_config.jsonmerge_lora=True,)# Single image inferencehtml_output = model.predict("path/to/document_image.png")print(html_output)# Batch inferenceresults = model.predict_batch(["img1.png", "img2.png"])
Inference with Transformers + PEFT (Manual)
python
import torchfrom transformers import AutoProcessor, AutoModelForCausalLMfrom peft import PeftModelfrom PIL import Image# Load base modelbase_model_id = "Qwen/Qwen3.5-4B"model = AutoModelForCausalLM.from_pretrained(base_model_id,torch_dtype=torch.bfloat16,device_map="auto",trust_remote_code=True,)processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)# Load LoRA adaptermodel = PeftModel.from_pretrained(model, "Yesianrohn")model = model.merge_and_unload() # Optional: merge for faster inference# Prepare inputimage = Image.open("document.png").convert("RGB")system_prompt = ("You are an expert document parser. Given an image of a document page, ""reconstruct its source as a single complete, self-contained HTML5 ""document. Faithfully preserve the original layout, typography, tables, ""formulas, and visual hierarchy using inline CSS where appropriate. ""Output only the HTML source, with no explanations, no markdown fences, ""and no extra prose.")user_prompt = ("Convert this document page into a complete HTML document. ""Preserve the layout, headings, tables, and formulas exactly as shown. ""Return only the HTML source.")messages = [{"role": "system", "content": system_prompt},{"role": "user", "content": [{"type": "image", "image": image},{"type": "text", "text": user_prompt},]},]text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)output_ids = model.generate(**inputs, max_new_tokens=10240, temperature=0.0, do_sample=False)output_text = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]print(output_text)
Inference with ms-swift CLI
bash
# Direct inference with swiftswift infer \--model Qwen/Qwen3.5-4B \--adapters Yesianrohn \--merge_lora true \--torch_dtype bfloat16 \--stream false
System Prompt
The model was trained with the following system prompt:
You are an expert document parser. Given an image of a document page, reconstruct its source as a single complete, self-contained HTML5 document. Faithfully preserve the original layout, typography, tables, formulas, and visual hierarchy using inline CSS where appropriate. Output only the HTML source, with no explanations, no markdown fences, and no extra prose.
Limitations
- The model works best on clean, well-scanned document pages.
- Very complex multi-column layouts or low-resolution images may produce imperfect HTML.
- The maximum output length is 10240 tokens; very long documents may be truncated.
License
This adapter is released under the Apache 2.0 license. The base model (Qwen3.5-4B) has its own license — please refer to Qwen/Qwen3.5-4B for details.
Model provider
Yesianrohn
Model tree
Base
Qwen/Qwen3.5-4B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information