devanshty

Babel

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Model Description

Babel is a Qwen2-VL LoRA adapter fine-tuned for multilingual OCR (Optical Character Recognition) and translation tasks. It can extract text from images across multiple languages and translate between them, making it ideal for document digitization, cross-language content processing, and international business automation.

Model Architecture

Base Model: Qwen/Qwen2-VL-7B-Instruct
Fine-tuning Method: LoRA (Low-Rank Adaptation) via PEFT
Checkpoint: Final checkpoint
Task: Multilingual OCR + Translation (Vision-Language)

Training Details

Framework: HuggingFace PEFT + Transformers
Dataset: Multilingual document images with text annotations and translations
Languages: Multiple languages supported including English, Hindi, and more
Approach: Vision-language fine-tuning with OCR and translation objectives

Files

Table with columns: File, Description
File	Description
`adapter_model.safetensors`	LoRA adapter weights
`adapter_config.json`	PEFT adapter configuration
`tokenizer.json`	Tokenizer vocabulary
`tokenizer_config.json`	Tokenizer configuration

Usage

python
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from peft import PeftModel
from PIL import Image
from huggingface_hub import snapshot_download

# Download adapter
adapter_dir = snapshot_download(repo_id='devanshty/Babel')

# Load base model
base_model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-7B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(adapter_dir)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_dir)
model.eval()

# OCR + Translate
image = Image.open("document.jpg")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Extract all text from this image and translate it to English."}
        ]
    }
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=1024)
print(processor.decode(output[0], skip_special_tokens=True))

Download & Use

python
from huggingface_hub import hf_hub_download
adapter = hf_hub_download(repo_id='devanshty/Babel', filename='adapter_model.safetensors')

Model provider

devanshty

Model tree

Base

Qwen/Qwen2-VL-7B-Instruct

Adapter

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Model Description

Model Architecture

Base Model: Qwen/Qwen2-VL-7B-Instruct
Fine-tuning Method: LoRA (Low-Rank Adaptation) via PEFT
Checkpoint: Final checkpoint
Task: Multilingual OCR + Translation (Vision-Language)

Training Details

Framework: HuggingFace PEFT + Transformers
Dataset: Multilingual document images with text annotations and translations
Languages: Multiple languages supported including English, Hindi, and more
Approach: Vision-language fine-tuning with OCR and translation objectives

Files

Table with columns: File, Description
File	Description
`adapter_model.safetensors`	LoRA adapter weights
`adapter_config.json`	PEFT adapter configuration
`tokenizer.json`	Tokenizer vocabulary
`tokenizer_config.json`	Tokenizer configuration

Usage

python
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from peft import PeftModel
from PIL import Image
from huggingface_hub import snapshot_download

# Download adapter
adapter_dir = snapshot_download(repo_id='devanshty/Babel')

# Load base model
base_model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-7B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(adapter_dir)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_dir)
model.eval()

# OCR + Translate
image = Image.open("document.jpg")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Extract all text from this image and translate it to English."}
        ]
    }
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=1024)
print(processor.decode(output[0], skip_special_tokens=True))

Download & Use

python
from huggingface_hub import hf_hub_download
adapter = hf_hub_download(repo_id='devanshty/Babel', filename='adapter_model.safetensors')

Babel

Get help setting up a custom Dedicated Endpoints.

README

Model Description

Model Architecture

Training Details

Files

Usage

Download & Use

Explore FriendliAI today

README

Model Description

Model Architecture

Training Details

Files

Usage

Download & Use