Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitModel Description
Babel is a Qwen2-VL LoRA adapter fine-tuned for multilingual OCR (Optical Character Recognition) and translation tasks. It can extract text from images across multiple languages and translate between them, making it ideal for document digitization, cross-language content processing, and international business automation.
Model Architecture
- Base Model:
Qwen/Qwen2-VL-7B-Instruct - Fine-tuning Method: LoRA (Low-Rank Adaptation) via PEFT
- Checkpoint: Final checkpoint
- Task: Multilingual OCR + Translation (Vision-Language)
Training Details
- Framework: HuggingFace PEFT + Transformers
- Dataset: Multilingual document images with text annotations and translations
- Languages: Multiple languages supported including English, Hindi, and more
- Approach: Vision-language fine-tuning with OCR and translation objectives
Files
| File | Description |
|---|---|
adapter_model.safetensors | LoRA adapter weights |
adapter_config.json | PEFT adapter configuration |
tokenizer.json | Tokenizer vocabulary |
tokenizer_config.json | Tokenizer configuration |
Usage
python
from transformers import AutoProcessor, Qwen2VLForConditionalGenerationfrom peft import PeftModelfrom PIL import Imagefrom huggingface_hub import snapshot_download# Download adapteradapter_dir = snapshot_download(repo_id='devanshty/Babel')# Load base modelbase_model = Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2-VL-7B-Instruct",torch_dtype="auto",device_map="auto")processor = AutoProcessor.from_pretrained(adapter_dir)# Load LoRA adaptermodel = PeftModel.from_pretrained(base_model, adapter_dir)model.eval()# OCR + Translateimage = Image.open("document.jpg")messages = [{"role": "user","content": [{"type": "image", "image": image},{"type": "text", "text": "Extract all text from this image and translate it to English."}]}]text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)output = model.generate(**inputs, max_new_tokens=1024)print(processor.decode(output[0], skip_special_tokens=True))
Download & Use
python
from huggingface_hub import hf_hub_downloadadapter = hf_hub_download(repo_id='devanshty/Babel', filename='adapter_model.safetensors')
Model provider
devanshty
Model tree
Base
Qwen/Qwen2-VL-7B-Instruct
Adapter
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information