Nalandadata

DrishtiTable-Qwen2.5-VL-7B

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Results

Table
ModelMethodTEDSS-TEDS
Qwen2.5-VL-7BZero-shot58.8%74.0%
o4-mini (OpenAI)Zero-shot61.4%70.0%
GPT-4.1 (OpenAI)Zero-shot68.0%80.8%
GPT-4o (OpenAI)Zero-shot71.1%84.3%
DrishtiTable-Qwen2.5-VL-7B (ours)SFT83.2%89.7%

Breakdown by Table Type

Table
Table TypeGPT-4oOursImprovement
Statistical77.7%82.8%+5.1
Financial60.3%82.0%+21.7
Lookup71.7%85.7%+14.0
Comparison72.4%95.9%+23.5

Usage

python

from unsloth import FastVisionModel
from qwen_vl_utils import process_vision_info
from PIL import Image
# Load model
model, tokenizer = FastVisionModel.from_pretrained(
"Nalandadata/DrishtiTable-Qwen2.5-VL-7B",
max_seq_length=4096,
load_in_4bit=True,
)
FastVisionModel.for_inference(model)
# Prepare input
image = Image.open("table.png").convert("RGB")
messages = [
{"role": "system", "content": "You are a table structure recognition expert. Given an image of a table, output the HTML representation of the table structure and content. Use <table>, <thead>, <tbody>, <tr>, <th>, <td> tags. Use colspan and rowspan attributes for merged cells. Output ONLY the HTML table, nothing else."},
{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": "Convert this table image to HTML. Output only the HTML table structure with cell content."},
]},
]
# Generate
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = tokenizer(text=[text], images=image_inputs, videos=video_inputs,
padding=True, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
generated = [o[len(i):] for i, o in zip(inputs.input_ids, output)]
html = tokenizer.batch_decode(generated, skip_special_tokens=True)[0].strip()
print(html)

With Transformers + PEFT

python

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "Nalandadata/DrishtiTable-Qwen2.5-VL-7B")
processor = AutoProcessor.from_pretrained("Nalandadata/DrishtiTable-Qwen2.5-VL-7B")

Training Details

Table
ParameterValue
Base modelQwen2.5-VL-7B-Instruct
MethodQLoRA (4-bit) via Unsloth
LoRA rank32
LoRA alpha32
Target modulesall-linear (incl. vision layers)
Training data1,141 table images from DrishtiTable
Epochs3
Learning rate2e-4 (cosine schedule)
Batch size1 (gradient accumulation 8)
Max sequence length4,096
OptimizerAdamW 8-bit
Hardware1x NVIDIA A100-80GB
Training time~35 minutes
Training cost~$5 (Modal cloud)

Dataset

Trained on DrishtiTable -- 1,421 table images from 9 Indian academic textbooks (S. Chand Publications) spanning Financial Accounting, Business Statistics, Quantitative Techniques, Operation Research, Ethics, and Engineering Steam Tables.

Evaluation

Evaluated using TEDS (Tree Edit Distance Similarity), the standard metric for table structure recognition. TEDS measures structural and content similarity between predicted and ground-truth HTML table trees on a 0-100% scale.

Table
ResourceLink
Live DemoDrishtiTable Space
Dataset (sample)Nalandadata/DrishtiTable
Base ModelQwen/Qwen2.5-VL-7B-Instruct

Limitations

  • Trained on tables from a single publisher (S. Chand Publications); performance on other publishers/styles is untested
  • Optimized for Indian academic textbook tables; may not generalize to web tables, handwritten tables, or camera-captured tables
  • HTML output may contain OCR errors in cell text content (S-TEDS 89.7% > TEDS 83.2%)

Citation

bibtex

@article{drishtitable2026,
title={Domain-Specific Fine-Tuning for Table Structure Recognition: A 7B Open Model Outperforms GPT-4o with 1,141 Training Samples},
author={Nalanda Data},
year={2026}
}

Commercial Use & Support

This model is released under Apache 2.0. The training data (DrishtiTable) is a public sample of a larger internal corpus of 1,421 expert-annotated tables from Indian academic textbooks.

Available on request:

  • Custom fine-tuned TSR models for your document layouts
  • Production deployment support (vLLM, quantization, serving)
  • Access to the full training corpus under custom commercial license terms
  • Partnerships for document-understanding evaluation and integration

Contact

For commercial licensing, full dataset access, custom data work, or partnerships:
📧 info@nalandadata.ai

For technical questions, integration help, or fine-tuning support:
📧 tech@nalandadata.ai

🌐 nalandadata.ai

Model provider

Nalandadata

Model tree

Base

Qwen/Qwen2.5-VL-7B-Instruct

Adapter

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today