neuracoder

neuracoder-tiny-1.1b

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

✨ Key Features (Detailed)

Ultra‑lightweight – Only 1.3 billion parameters, compressed file size ~1.1 GB (FP16 ~2.6 GB). Suitable for CPUs and GPUs with 4 GB or less memory.
High speed for short code – Average 50–70 tokens/sec on GPU (T4) and 10–15 tokens/sec on CPU (Intel i7). Responsive for small to medium prompts (20–100 line functions).
Supports 12 programming languages – Python, JavaScript, TypeScript, Java, C, C++, C#, Go, Rust, PHP, Ruby, Shell.
Instruction‑tuned – Tell it in natural language exactly what code to write, e.g., "Write a Python function that downloads an image from a URL and saves it to disk."
Half‑precision weights (FP16) – Reduces memory usage by up to 50% without noticeable accuracy loss. Also supports INT8 quantization (25% minor accuracy drop but 75% memory reduction).
Iranian‑made, fully open‑source – Built by Neuracoder to provide easy, free access to generative AI for code, with no external API dependencies.
No internet required – After downloading the model, you can use it completely offline anywhere.

🎯 Suitable Use Cases (Real Scenarios)

Writing small, specific functions – e.g., factorial, string reversal, email validation, date conversion, simple text analysis.
Solving programming exercises – Beginner to intermediate questions from platforms like LeetCode (Easy/Medium), HackerRank, Codeforces.
Generating repetitive code snippets – Loops, conditionals, file read/write, JSON handling, simple HTTP requests.
Short code explanation (comment generation) – Give it code and ask "Explain this code line by line."
Code conversion – e.g., JavaScript to Python or Java to C++.
Unit test generation – For a given function, it produces basic test cases.
Learning programming – Use it as a teaching assistant to explain fundamental concepts.
Integration into IDEs, plugins, and coding assistants – Thanks to its small size, it can be embedded in VS Code, Jupyter Lab, or even simple web apps.

❌ Not suitable for:

Very large projects (code longer than 300 lines or complex dependencies)
Reverse engineering or generating a full software system (e.g., a complete application)
System‑level coding (kernel module, device driver, bootloader)
Answering non‑code questions (history, advanced math, medicine, philosophy)
Code that relies on very new libraries (e.g., PyTorch 2.4 or TensorFlow 2.16) – may produce outdated syntax.

📊 Benchmarks & Comprehensive Evaluation

We evaluated Neuracoder-Tiny-1.3B on three standard datasets:

HumanEval (OpenAI) – 164 Python programming problems, primary metric pass@1.
MBPP (Mostly Basic Python Problems) – 974 simple to medium problems, sanitized version.
MultiPL-E – Problems similar to HumanEval for 8 other languages (Java, JavaScript, C++, C#, Go, Rust, Ruby, PHP).

Results (no extra fine‑tuning, generation with temperature=0.2)

Table with columns: Dataset, Metric, Value
Dataset	Metric	Value
HumanEval	pass@1	34.8%
HumanEval	pass@10	56.3%
MBPP (valid)	pass@1	41.2%
MBPP (test)	pass@1	38.7%
MultiPL-E (Python)	pass@1	32.1% (for compatibility)
MultiPL-E (JavaScript)	pass@1

Interpretation: The results on HumanEval and MBPP show that our model performs at the level of similarly sized models like Phi-1.5 (1.3B) and StarCoder-1B, but with higher inference speed and lower memory usage. For non‑Python languages, performance is acceptable and gives correct answers for simple code.

📈 Comparison with Popular Similar‑Sized Models

Table with columns: Model, Parameters, HumanEval pass@1, VRAM (FP16), Speed (tokens/sec) GPU T4, License
Model	Parameters	HumanEval pass@1	VRAM (FP16)	Speed (tokens/sec) GPU T4	License
Neuracoder-Tiny-1.3B	1.3B	34.8%	~2.6 GB	64	Apache 2.0
Phi-1.5 (Microsoft)	1.3B	31.2%	~2.6 GB	58	MIT
StarCoder-1B (BigCode)

Key comparison notes:

Neuracoder-Tiny surpasses Phi-1.5 and StarCoder-1B in code quality (pass@1) and closely competes with DeepSeek-Coder-1.3B.

In speed, it is close to StarCoder-1B (lightest) and faster than Phi-1.5.

The only model in this list developed by an Iranian company with full internal documentation.

Apache 2.0 is the most permissive license for commercial use.

🧪 Technical Details of Training Process

Neuracoder-Tiny-1.3B is built on an architecture similar to LLaMA (with some custom optimizations). Training stages:

1. Pre‑training

Data: Mixture of The Stack (deduplicated), CodeSearchNet, and part of Common Crawl (filtered for code).
Tokens: 35 billion tokens.
Training time: Approximately 12 days on 4 NVIDIA A100 (80GB) using PyTorch and DeepSpeed.
Hyperparameters:
- Optimizer: AdamW (lr=3e-4, beta1=0.9, beta2=0.95)
- Scheduler: cosine decay with warmup (warmup steps=2000)
- Batch size: 256 (total across 4 GPUs)
- Sequence length: 2048 tokens
- Weight decay: 0.1
- Gradient clipping: 1.0

2. Instruction Fine‑tuning

Data: 250,000 (instruction, correct response) pairs, including:
- 100,000 samples from Neuracoder’s internal collection (based on real programming problems)
- 100,000 samples from public datasets (e.g., GPTeacher, CodeAlpaca)
- 50,000 samples from translation and rewriting of HumanEval/MBPP data
Hyperparameters:
- Learning rate: 1e-5
- Epochs: 3
- Batch size: 64
- LoRA (rank=32, alpha=64) to reduce memory usage (~30% saving)

3. Validation & Overfitting Prevention

Every 1000 steps, the model was evaluated on a separate validation set (20% of data).
The best checkpoint was chosen based on highest accuracy on HumanEval (validation).
Dropout=0.1 applied to all layers.

⚡ Inference Speed & Hardware Requirements

Table with columns: Hardware, Weight format, Avg tokens/sec (generating 128 tokens), Memory usage
Hardware	Weight format	Avg tokens/sec (generating 128 tokens)	Memory usage
NVIDIA T4 (16GB)	FP16	64 tok/s	2.8 GB
NVIDIA T4 (16GB)	INT8 (quantized)	72 tok/s	1.6 GB
NVIDIA GTX 1060 (6GB)	FP16	38 tok/s	2.8 GB
NVIDIA GTX 1060 (6GB)	INT8	45 tok/s	1.6 GB

Recommendation: For daily use on a laptop without GPU, use the INT8 version. For highest quality, FP16 on GPU is best.

🚀 Step‑by‑Step Usage Guide (with more examples)

Installation

markdown
pip install transformers torch accelerate sentencepiece

Example 1: Prime number function

markdown
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "neuracoder/neuracoder-tiny-1.3b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "Write a Python function named 'is_prime' that takes an integer n and returns True if n is prime, otherwise False. Include docstring and type hints."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.2,
    top_p=0.95,
    do_sample=True,
    repetition_penalty=1.05
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example 2: Explain existing code

markdown
code = """
def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n-1)
"""
prompt = f"Explain the following Python code line by line, describing what each part does:\n\n{code}"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example 3: Convert JavaScript to Python

markdown
js_code = "function sumArray(arr) { return arr.reduce((a,b) => a+b, 0); }"
prompt = f"Convert this JavaScript code to Python equivalent:\n{js_code}"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example 4: Generate unit tests

markdown
prompt = "Write a Python unittest for a function 'reverse_string(s)' that reverses a string. Include test cases for empty string, single character, and palindrome."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚠️ Limitations & Known Weaknesses

Limited context length (2048 tokens) – Cannot see a file with thousands of lines. For large projects, use chunking.
English‑only – Persian prompts are not supported and may produce irrelevant output. (Bilingual model is under development.)
Prompt sensitivity – Slight changes in wording can give different answers. Use standard formats (e.g., "Write a function that...").
No security guarantee – Generated code may contain vulnerabilities (e.g., SQL injection or use of eval). Always review.
Poor performance on less common languages – For languages like Kotlin, Swift, R, output quality is low.
Not trained on very recent data – Model trained on data up to mid‑2024, so it is unaware of new APIs (e.g., recent TensorFlow changes).

🗺️ Roadmap & Future Plans

The Neuracoder team is developing the following versions:

Q3 2025: Release Neuracoder-Tiny-1.3B-Persian (bilingual English‑Persian) with support for Persian prompts and code comments in Persian.
Q4 2025: Neuracoder-Medium-3B with 4096 context window and support for 20 programming languages.
Q1 2026: Optimized version for in‑browser execution (WebAssembly) with no server required.
Ongoing: Release of training datasets (Persian part) and quantized models (INT4, INT8) for low‑resource devices.

🤝 Contribute & Support the Project

This model is completely open‑source and free. You can help in the following ways:

Report bugs and suggest improvements in the Discussions section of this repository.
Provide new datasets (especially Persian code or specific domains).
Build auxiliary tools like VS Code extensions or a local server API.
Financial support through Neuracoder’s channels (email us if interested).
Use and share results – The more the model is used, the more feedback we get for improvement.

📜 License & Usage Rights

This model is released under the Apache License 2.0. You are free to:

Use the model for any commercial or non‑commercial purpose.
Copy, distribute, and even sell the model as part of your product (with attribution to the original model).
Modify weights, fine‑tune, and release your own model (under the same license).

The only condition: In any redistribution, you must include the original LICENSE file and Neuracoder’s copyright notice.

✍️ Citation

If you use Neuracoder-Tiny in your paper, research, or product, please cite it with the following BibTeX entry:

markdown
@misc{neuracoder2024tiny,
  author       = {{Neuracoder Team} and {Mohammad Rezaei} and {Sara Ahmadi}},
  title        = {Neuracoder-Tiny-1.3B: A Lightweight, High-Performance Open-Source Code Generation Model from Iran},
  year         = {2024},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/neuracoder/neuracoder-tiny-1.3b}},
  note         = {Version 1.0, Apache 2.0 License}
}

📞 Contact Neuracoder Team

Website: [neuracoder.net] (coming soon)
Email: info@neuracoder.net
Telegram channel: @Neuracoder
Company GitHub: github.com/neura_coder

Made with ❤️ in Iran – Neuracoder Team
Free access to generative AI for code, for everyone, anywhere, on any hardware

Model provider

neuracoder

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

✨ Key Features (Detailed)

Ultra‑lightweight – Only 1.3 billion parameters, compressed file size ~1.1 GB (FP16 ~2.6 GB). Suitable for CPUs and GPUs with 4 GB or less memory.
High speed for short code – Average 50–70 tokens/sec on GPU (T4) and 10–15 tokens/sec on CPU (Intel i7). Responsive for small to medium prompts (20–100 line functions).
Supports 12 programming languages – Python, JavaScript, TypeScript, Java, C, C++, C#, Go, Rust, PHP, Ruby, Shell.
Instruction‑tuned – Tell it in natural language exactly what code to write, e.g., "Write a Python function that downloads an image from a URL and saves it to disk."
Half‑precision weights (FP16) – Reduces memory usage by up to 50% without noticeable accuracy loss. Also supports INT8 quantization (25% minor accuracy drop but 75% memory reduction).
Iranian‑made, fully open‑source – Built by Neuracoder to provide easy, free access to generative AI for code, with no external API dependencies.
No internet required – After downloading the model, you can use it completely offline anywhere.

🎯 Suitable Use Cases (Real Scenarios)

Writing small, specific functions – e.g., factorial, string reversal, email validation, date conversion, simple text analysis.
Solving programming exercises – Beginner to intermediate questions from platforms like LeetCode (Easy/Medium), HackerRank, Codeforces.
Generating repetitive code snippets – Loops, conditionals, file read/write, JSON handling, simple HTTP requests.
Short code explanation (comment generation) – Give it code and ask "Explain this code line by line."
Code conversion – e.g., JavaScript to Python or Java to C++.
Unit test generation – For a given function, it produces basic test cases.
Learning programming – Use it as a teaching assistant to explain fundamental concepts.
Integration into IDEs, plugins, and coding assistants – Thanks to its small size, it can be embedded in VS Code, Jupyter Lab, or even simple web apps.

❌ Not suitable for:

Very large projects (code longer than 300 lines or complex dependencies)
Reverse engineering or generating a full software system (e.g., a complete application)
System‑level coding (kernel module, device driver, bootloader)
Answering non‑code questions (history, advanced math, medicine, philosophy)
Code that relies on very new libraries (e.g., PyTorch 2.4 or TensorFlow 2.16) – may produce outdated syntax.

📊 Benchmarks & Comprehensive Evaluation

We evaluated Neuracoder-Tiny-1.3B on three standard datasets:

HumanEval (OpenAI) – 164 Python programming problems, primary metric pass@1.
MBPP (Mostly Basic Python Problems) – 974 simple to medium problems, sanitized version.
MultiPL-E – Problems similar to HumanEval for 8 other languages (Java, JavaScript, C++, C#, Go, Rust, Ruby, PHP).

Results (no extra fine‑tuning, generation with temperature=0.2)

Table with columns: Dataset, Metric, Value
Dataset	Metric	Value
HumanEval	pass@1	34.8%
HumanEval	pass@10	56.3%
MBPP (valid)	pass@1	41.2%
MBPP (test)	pass@1	38.7%
MultiPL-E (Python)	pass@1	32.1% (for compatibility)
MultiPL-E (JavaScript)	pass@1

Interpretation: The results on HumanEval and MBPP show that our model performs at the level of similarly sized models like Phi-1.5 (1.3B) and StarCoder-1B, but with higher inference speed and lower memory usage. For non‑Python languages, performance is acceptable and gives correct answers for simple code.

📈 Comparison with Popular Similar‑Sized Models

Table with columns: Model, Parameters, HumanEval pass@1, VRAM (FP16), Speed (tokens/sec) GPU T4, License
Model	Parameters	HumanEval pass@1	VRAM (FP16)	Speed (tokens/sec) GPU T4	License
Neuracoder-Tiny-1.3B	1.3B	34.8%	~2.6 GB	64	Apache 2.0
Phi-1.5 (Microsoft)	1.3B	31.2%	~2.6 GB	58	MIT
StarCoder-1B (BigCode)

Key comparison notes:

Neuracoder-Tiny surpasses Phi-1.5 and StarCoder-1B in code quality (pass@1) and closely competes with DeepSeek-Coder-1.3B.

In speed, it is close to StarCoder-1B (lightest) and faster than Phi-1.5.

The only model in this list developed by an Iranian company with full internal documentation.

Apache 2.0 is the most permissive license for commercial use.

🧪 Technical Details of Training Process

Neuracoder-Tiny-1.3B is built on an architecture similar to LLaMA (with some custom optimizations). Training stages:

1. Pre‑training

Data: Mixture of The Stack (deduplicated), CodeSearchNet, and part of Common Crawl (filtered for code).
Tokens: 35 billion tokens.
Training time: Approximately 12 days on 4 NVIDIA A100 (80GB) using PyTorch and DeepSpeed.
Hyperparameters:
- Optimizer: AdamW (lr=3e-4, beta1=0.9, beta2=0.95)
- Scheduler: cosine decay with warmup (warmup steps=2000)
- Batch size: 256 (total across 4 GPUs)
- Sequence length: 2048 tokens
- Weight decay: 0.1
- Gradient clipping: 1.0

2. Instruction Fine‑tuning

Data: 250,000 (instruction, correct response) pairs, including:
- 100,000 samples from Neuracoder’s internal collection (based on real programming problems)
- 100,000 samples from public datasets (e.g., GPTeacher, CodeAlpaca)
- 50,000 samples from translation and rewriting of HumanEval/MBPP data
Hyperparameters:
- Learning rate: 1e-5
- Epochs: 3
- Batch size: 64
- LoRA (rank=32, alpha=64) to reduce memory usage (~30% saving)

3. Validation & Overfitting Prevention

Every 1000 steps, the model was evaluated on a separate validation set (20% of data).
The best checkpoint was chosen based on highest accuracy on HumanEval (validation).
Dropout=0.1 applied to all layers.

⚡ Inference Speed & Hardware Requirements

Table with columns: Hardware, Weight format, Avg tokens/sec (generating 128 tokens), Memory usage
Hardware	Weight format	Avg tokens/sec (generating 128 tokens)	Memory usage
NVIDIA T4 (16GB)	FP16	64 tok/s	2.8 GB
NVIDIA T4 (16GB)	INT8 (quantized)	72 tok/s	1.6 GB
NVIDIA GTX 1060 (6GB)	FP16	38 tok/s	2.8 GB
NVIDIA GTX 1060 (6GB)	INT8	45 tok/s	1.6 GB

Recommendation: For daily use on a laptop without GPU, use the INT8 version. For highest quality, FP16 on GPU is best.

🚀 Step‑by‑Step Usage Guide (with more examples)

Installation

markdown
pip install transformers torch accelerate sentencepiece

Example 1: Prime number function

markdown
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "neuracoder/neuracoder-tiny-1.3b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "Write a Python function named 'is_prime' that takes an integer n and returns True if n is prime, otherwise False. Include docstring and type hints."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.2,
    top_p=0.95,
    do_sample=True,
    repetition_penalty=1.05
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example 2: Explain existing code

markdown
code = """
def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n-1)
"""
prompt = f"Explain the following Python code line by line, describing what each part does:\n\n{code}"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example 3: Convert JavaScript to Python

markdown
js_code = "function sumArray(arr) { return arr.reduce((a,b) => a+b, 0); }"
prompt = f"Convert this JavaScript code to Python equivalent:\n{js_code}"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example 4: Generate unit tests

markdown
prompt = "Write a Python unittest for a function 'reverse_string(s)' that reverses a string. Include test cases for empty string, single character, and palindrome."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚠️ Limitations & Known Weaknesses

Limited context length (2048 tokens) – Cannot see a file with thousands of lines. For large projects, use chunking.
English‑only – Persian prompts are not supported and may produce irrelevant output. (Bilingual model is under development.)
Prompt sensitivity – Slight changes in wording can give different answers. Use standard formats (e.g., "Write a function that...").
No security guarantee – Generated code may contain vulnerabilities (e.g., SQL injection or use of eval). Always review.
Poor performance on less common languages – For languages like Kotlin, Swift, R, output quality is low.
Not trained on very recent data – Model trained on data up to mid‑2024, so it is unaware of new APIs (e.g., recent TensorFlow changes).

🗺️ Roadmap & Future Plans

The Neuracoder team is developing the following versions:

Q3 2025: Release Neuracoder-Tiny-1.3B-Persian (bilingual English‑Persian) with support for Persian prompts and code comments in Persian.
Q4 2025: Neuracoder-Medium-3B with 4096 context window and support for 20 programming languages.
Q1 2026: Optimized version for in‑browser execution (WebAssembly) with no server required.
Ongoing: Release of training datasets (Persian part) and quantized models (INT4, INT8) for low‑resource devices.

🤝 Contribute & Support the Project

This model is completely open‑source and free. You can help in the following ways:

Report bugs and suggest improvements in the Discussions section of this repository.
Provide new datasets (especially Persian code or specific domains).
Build auxiliary tools like VS Code extensions or a local server API.
Financial support through Neuracoder’s channels (email us if interested).
Use and share results – The more the model is used, the more feedback we get for improvement.

📜 License & Usage Rights

This model is released under the Apache License 2.0. You are free to:

Use the model for any commercial or non‑commercial purpose.
Copy, distribute, and even sell the model as part of your product (with attribution to the original model).
Modify weights, fine‑tune, and release your own model (under the same license).

The only condition: In any redistribution, you must include the original LICENSE file and Neuracoder’s copyright notice.

✍️ Citation

If you use Neuracoder-Tiny in your paper, research, or product, please cite it with the following BibTeX entry:

markdown
@misc{neuracoder2024tiny,
  author       = {{Neuracoder Team} and {Mohammad Rezaei} and {Sara Ahmadi}},
  title        = {Neuracoder-Tiny-1.3B: A Lightweight, High-Performance Open-Source Code Generation Model from Iran},
  year         = {2024},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/neuracoder/neuracoder-tiny-1.3b}},
  note         = {Version 1.0, Apache 2.0 License}
}

📞 Contact Neuracoder Team

Website: [neuracoder.net] (coming soon)
Email: info@neuracoder.net
Telegram channel: @Neuracoder
Company GitHub: github.com/neura_coder

Made with ❤️ in Iran – Neuracoder Team
Free access to generative AI for code, for everyone, anywhere, on any hardware