neuracoder/neura-fa-en-1.9b API & Inference Endpoint

✨ Key Features

Truly bilingual (Persian + English) – Understands and generates both languages fluently, with natural code‑switching (e.g., “بگو hello world به انگلیسی”).
Ultra‑lightweight – Only 1.9B parameters, ~1.6 GB (FP16) / ~0.9 GB (INT8). Runs on 4 GB RAM devices.
Offline & private – No internet connection or API key needed after download.
Fast inference – 40–60 tok/s on T4 GPU, 8–12 tok/s on Intel i7 CPU, 2–3 tok/s on Raspberry Pi 4.
Long context – 32,768 tokens (≈24,000 Persian words), enough for long conversations or short stories.
Iranian‑made, Apache 2.0 – Free for commercial and personal use, with full transparency.
Research‑friendly – Released as a research model to help the Persian AI community fine‑tune, quantise, or build upon it.

🎯 Suitable Use Cases

Daily chit‑chat – Casual conversation, small talk, jokes, and friendly assistant tasks.
Simple Q&A – Answering general knowledge questions (e.g., “پایتخت فرانسه کجاست?” / “What is the capital of France?”).
Informal translation – Translating short sentences or phrases between Persian and English (not professional/legal grade).
Light summarisation – Summarising a paragraph or a short article in Persian or English.
Brainstorming & writing help – Generating ideas, rewriting a sentence, fixing simple grammar.
Educational tool for language learning – Practicing Persian or English conversations (basic to intermediate level).
Offline assistant for edge devices – Embedded in chatbots, local web UIs, or Telegram bots (simple integration).

❌ Not suitable for:

Code generation, debugging, or programming assistance.
Complex mathematical reasoning or multi‑step logic.
Professional translation (e.g., legal, medical).
Long document processing (>32k tokens).
Any task requiring up‑to‑date information after mid‑2024.

📊 Evaluation & Performance Metrics

We evaluated Neura‑FA‑EN‑1.9B on standard Persian and English benchmarks for conversational models.

Dataset	Metric	Score	Note
ParsiMMLU (5‑shot)	Accuracy	48.7%	General knowledge in Persian
PersianQA	Exact Match	56.2%	Reading comprehension (questions in Persian)
MMLU (English, 5‑shot)	Accuracy	51.3%	General knowledge in English
XNLI (fa)	Accuracy	62.1%	Natural language inference (Persian)
XNLI (en)	Accuracy	68.5%	Natural language inference (English)
Perplexity (fa‑wikitext)	PPL	18.3	Fluency on Persian texts

Interpretation: The model performs on par with much larger multilingual models (e.g., XLM‑R 3B) on Persian tasks while being 40% smaller. For English, it stays competitive with dedicated 1.5B models.

📈 Comparison with Similar‑Sized Models

Model	Params	Persian MMLU	English MMLU	VRAM (FP16)	Speed (tok/s, T4)	License
Neura-FA-EN-1.9B	1.9B	48.7%	51.3%	~3.8 GB	48	Apache 2.0
Arian‑2B (Persian)	2.0B	44.2%	28.7%	~4.0 GB	45	Apache 2.0
Phi‑2 (2.7B, English‑only)	2.7B	N/A	57.8%	~5.4 GB	40	MIT
Gemma‑2B (English‑only)	2.0B	N/A	52.6%	~4.0 GB	52	Gemma

Key points: Neura‑FA‑EN is the only 1.9B model that provides strong performance on both Persian and English.

🧪 Technical Details & Training Process

Built on the Qwen2 architecture (only the architecture, not derived from any existing model) and trained from scratch by Neuracoder.

Architecture

Layers: 28 decoder‑only layers.
Attention: Grouped Query Attention (GQA) – 12 query heads, 2 key/value heads.
Activation: SwiGLU.
Context length: 32,768 tokens.
Embedding size: 2048.
Intermediate size: 5632.

Pre‑training

Data: 350 billion tokens – 60% Persian (web texts, books, news, forums), 35% English (common crawl, books, Wikipedia), 5% code (to preserve basic formatting).
Duration: 18 days on 8× NVIDIA A100 (80GB) using DeepSpeed ZeRO‑3.
Hyperparameters: AdamW (lr=3e-4), cosine decay, warmup 2000 steps, batch size 512, seq len 2048 (later extended to 8192 with RoPE scaling).

Supervised Fine‑Tuning (SFT)

Data: 150,000 conversation pairs in Persian and English:
- 80,000 from public Persian chat datasets (ParsiNLU, FaChat).
- 50,000 from translated and cleaned ShareGPT data.
- 20,000 hand‑written by Neuracoder team for natural code‑switching and cultural relevance.
Format: {"system": "You are a helpful assistant.", "user": "...", "assistant": "..."}
Hyperparameters: 3 epochs, lr=1e-5, batch size 128, LoRA (rank=32) then full fine‑tune last 6 layers.

Validation

Every 500 steps evaluated on held‑out Persian and English test sets.
Final checkpoint chosen by lowest perplexity on Persian validation and highest MMLU score.

⚡ Inference Speed & Hardware Requirements

Hardware	Weight format	Avg tokens/sec (gen 256 tokens)	Memory usage
NVIDIA A100 (40GB)	FP16	78 tok/s	4.1 GB
NVIDIA T4 (16GB)	FP16	48 tok/s	3.9 GB
NVIDIA T4 (16GB)	INT8	55 tok/s	2.3 GB
NVIDIA GTX 1060 (6GB)	FP16	28 tok/s	3.9 GB
CPU (Intel i7-12700K)	INT8	9 tok/s	2.1 GB
Raspberry Pi 4 (4GB)	INT8 (ONNX)	2–3 tok/s	1.6 GB

Recommendation: Use FP16 on any GPU with 6+ GB VRAM. For CPU or low‑memory devices, use INT8 quantised version (available separately).

🚀 Usage Guide

Installation

markdown
pip install transformers torch accelerate sentencepiece

Example 1: Basic Persian conversation

markdown
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "neuracoder/neura-fa-en-1.9b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "به نظرت بهترین راه برای یادگیری زبان انگلیسی چیه؟"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Example 2: Mixed Persian‑English query

markdown
prompt = "یه جمله انگلیسی بنویس که معنی 'خورشید می‌تابد' رو برسونه"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.6)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Example 3: Simple summarisation (English)

markdown
article = """The Persian cat is a long-haired breed characterized by its round face and short muzzle. 
It is one of the oldest cat breeds, originating from Persia (modern-day Iran)."""
prompt = f"Summarise the following text in one sentence:\n\n{article}"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=80, temperature=0.3)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Download the model directly

markdown
git lfs install
git clone https://huggingface.co/neuracoder/neura-fa-en-1.9b

Or via Python:

markdown
from huggingface_hub import snapshot_download
snapshot_download(repo_id="neuracoder/neura-fa-en-1.9b", local_dir="./neura-fa-en-1.9b")

⚠️ Limitations

Not a code model – Cannot write or debug programs reliably.
Not a mathematical engine – Struggles with multi‑step arithmetic or symbolic reasoning.
Knowledge cutoff – Mid‑2024. Unaware of very recent events or new APIs.
Persian dialect – Trained on standard Persian (Farsi); may not understand Dari or Tajik well.
Formal translation – Not suitable for legal, medical, or highly technical documents.
Hallucinations – Like all LLMs, may produce plausible but incorrect facts.
Context length – While 32k is generous, very long documents may degrade attention quality.

🗺️ Roadmap

Q1 2026: Release of quantised versions (INT4, INT8, GGUF) for even lighter deployment.
Q2 2026: Neura‑FA‑EN‑3B – 3.5B parameters, expanded Persian vocabulary, improved reasoning.
Q3 2026: Fine‑tuned variant for formal translation (Persian ↔ English).
Ongoing: Open‑source training datasets (Persian conversational data) and evaluation benchmarks.

🤝 Contribute

This model is free and open‑source. You can help by:

Reporting bugs or suggesting improvements in the Discussions tab.
Providing high‑quality Persian conversational data (anonymised) to improve future versions.
Building tools (Gradio UI, Ollama modelfile, Telegram bot) using this model.
Financial sponsorship – Contact the Neuracoder team.
Spreading the word – Every user helps the Persian AI community grow.

📜 License

Apache License 2.0 – You may freely use, modify, distribute, and even sell this model as part of your product, provided you include the original license and copyright notice. No other restrictions.

📞 Contact

Website: neuracoder.net (coming soon)
Email: info@neuracoder.net
Telegram: @Neuracoder
GitHub: github.com/neura_coder

ساخته شده با ❤️ در ایران – تیم neuracoder
دموکراتیزه کردن هوش مصنوعی مکالمه‌ای برای فارسی‌زبانان، سریع، محلی و رایگان برای همه.

neura-fa-en-1.9b

Get help setting up a custom Dedicated Endpoints.

README