Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Highlights

  • 🗣️ Malayalam-first — tuned to respond in clear, idiomatic Malayalam.
  • 🧩 Instruction-following — handles questions, explanations and short-form generation in a chat format.
  • ⚙️ Drop-in — standard transformers text-generation model; use the chat template as usual.

Usage

This is a Gemma-4 model; load it with the multimodal class and use the chat template for text conversation.

python

from transformers import AutoModelForImageTextToText, AutoTokenizer
import torch
model_id = "Navneeth017/Malayalam_gemma-4-E2B-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
model_id, dtype=torch.bfloat16, device_map="auto"
)
messages = [
{"role": "system", "content": "നിങ്ങൾ സഹായകരമായ ഒരു മലയാളം സഹായിയാണ്."},
{"role": "user", "content": "കേരളത്തിലെ ഓണം ആഘോഷത്തെക്കുറിച്ച് വിശദീകരിക്കൂ."},
]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Training

Fine-tuned with QLoRA (4-bit NF4 base + LoRA adapters) and the adapters merged back into the base for a standalone model.

License

Apache 2.0, inherited from the base model google/gemma-4-E2B-it. See the Gemma 4 license terms.

Model provider

Navneeth017

Model tree

Base

google/gemma-4-E2B-it

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today