Navneeth017/Malayalam_gemma-4-E4B-it API & Inference Endpoint

Highlights

🗣️ Malayalam-first — tuned to respond in clear, idiomatic Malayalam.
🧩 Instruction-following — handles questions, explanations and short-form generation in a chat format.
⚙️ Drop-in — standard transformers text-generation model; use the chat template as usual.

Usage

This is a Gemma-4 model; load it with the multimodal class and use the chat template for text conversation.

python
from transformers import AutoModelForImageTextToText, AutoTokenizer
import torch

model_id = "Navneeth017/Malayalam_gemma-4-E4B-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
    model_id, dtype=torch.bfloat16, device_map="auto"
)

messages = [
    {"role": "system", "content": "നിങ്ങൾ സഹായകരമായ ഒരു മലയാളം സഹായിയാണ്."},
    {"role": "user", "content": "കേരളത്തിലെ ഓണം ആഘോഷത്തെക്കുറിച്ച് വിശദീകരിക്കൂ."},
]
inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

out = model.generate(inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Training

Fine-tuned with QLoRA (4-bit NF4 base + LoRA adapters) and the adapters merged back into the base for a standalone model.

License

Apache 2.0, inherited from the base model google/gemma-4-E4B-it. See the Gemma 4 license terms.

Highlights

🗣️ Malayalam-first — tuned to respond in clear, idiomatic Malayalam.
🧩 Instruction-following — handles questions, explanations and short-form generation in a chat format.
⚙️ Drop-in — standard transformers text-generation model; use the chat template as usual.

Usage

This is a Gemma-4 model; load it with the multimodal class and use the chat template for text conversation.

python
from transformers import AutoModelForImageTextToText, AutoTokenizer
import torch

model_id = "Navneeth017/Malayalam_gemma-4-E4B-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
    model_id, dtype=torch.bfloat16, device_map="auto"
)

messages = [
    {"role": "system", "content": "നിങ്ങൾ സഹായകരമായ ഒരു മലയാളം സഹായിയാണ്."},
    {"role": "user", "content": "കേരളത്തിലെ ഓണം ആഘോഷത്തെക്കുറിച്ച് വിശദീകരിക്കൂ."},
]
inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

out = model.generate(inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Training

Fine-tuned with QLoRA (4-bit NF4 base + LoRA adapters) and the adapters merged back into the base for a standalone model.

License

Apache 2.0, inherited from the base model google/gemma-4-E4B-it. See the Gemma 4 license terms.

Malayalam_gemma-4-E4B-it

Get help setting up a custom Dedicated Endpoints.

README

Highlights

Usage

Training

License

Explore FriendliAI today

README

Highlights

Usage

Training

License