Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Highlights
- 🗣️ Malayalam-first — tuned to respond in clear, idiomatic Malayalam.
- 🧩 Instruction-following — handles questions, explanations and short-form generation in a chat format.
- ⚙️ Drop-in — standard
transformerstext-generation model; use the chat template as usual.
Usage
This is a Gemma-4 model; load it with the multimodal class and use the chat template for text conversation.
python
from transformers import AutoModelForImageTextToText, AutoTokenizerimport torchmodel_id = "Navneeth017/Malayalam_gemma-4-E4B-it"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForImageTextToText.from_pretrained(model_id, dtype=torch.bfloat16, device_map="auto")messages = [{"role": "system", "content": "നിങ്ങൾ സഹായകരമായ ഒരു മലയാളം സഹായിയാണ്."},{"role": "user", "content": "കേരളത്തിലെ ഓണം ആഘോഷത്തെക്കുറിച്ച് വിശദീകരിക്കൂ."},]inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)out = model.generate(inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.9)print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
Training
Fine-tuned with QLoRA (4-bit NF4 base + LoRA adapters) and the adapters merged back into the base for a standalone model.
License
Apache 2.0, inherited from the base model google/gemma-4-E4B-it. See the
Gemma 4 license terms.
Model provider
Navneeth017
Model tree
Base
google/gemma-4-E4B-it
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information