Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Model Description

Babel is a Qwen2-VL LoRA adapter fine-tuned for multilingual OCR (Optical Character Recognition) and translation tasks. It can extract text from images across multiple languages and translate between them, making it ideal for document digitization, cross-language content processing, and international business automation.

Model Architecture

  • Base Model: Qwen/Qwen2-VL-7B-Instruct
  • Fine-tuning Method: LoRA (Low-Rank Adaptation) via PEFT
  • Checkpoint: Final checkpoint
  • Task: Multilingual OCR + Translation (Vision-Language)

Training Details

  • Framework: HuggingFace PEFT + Transformers
  • Dataset: Multilingual document images with text annotations and translations
  • Languages: Multiple languages supported including English, Hindi, and more
  • Approach: Vision-language fine-tuning with OCR and translation objectives

Files

FileDescription
adapter_model.safetensorsLoRA adapter weights
adapter_config.jsonPEFT adapter configuration
tokenizer.jsonTokenizer vocabulary
tokenizer_config.jsonTokenizer configuration

Usage

python

from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from peft import PeftModel
from PIL import Image
from huggingface_hub import snapshot_download
# Download adapter
adapter_dir = snapshot_download(repo_id='devanshty/Babel')
# Load base model
base_model = Qwen2VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2-VL-7B-Instruct",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained(adapter_dir)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_dir)
model.eval()
# OCR + Translate
image = Image.open("document.jpg")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Extract all text from this image and translate it to English."}
]
}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=1024)
print(processor.decode(output[0], skip_special_tokens=True))

Download & Use

python

from huggingface_hub import hf_hub_download
adapter = hf_hub_download(repo_id='devanshty/Babel', filename='adapter_model.safetensors')

Model provider

devanshty

Model tree

Base

Qwen/Qwen2-VL-7B-Instruct

Adapter

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today