ertghiu256

gemma-4-e2b-gemini-opus-reasoning-distill-lora

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

🌟 Overview

The gemma-4-e2b-gemini-opus-reasoning-distill model is a specialized variant of the Gemma 4 architecture. It has been fine-tuned specifically to enhance the logical structure and rigidity of its reasoning capabilities, particularly in technical domains like mathematics and coding.

This training process focused on refining how the model approaches problem-solving, aiming to instill a systematic, traceable approach to generating solutions. The primary goal is not to change the core conversational style of Gemma 4, but rather to make its internal thought processes more organized and deterministic.

🧠 Training Methodology

This model was trained using a focused distillation process on high-quality reasoning examples extracted from various large language models (LLMs). This approach aimed to transfer structured thinking patterns into the Gemma 4 architecture.

Core Objectives:

Structural Rigidity: To encourage the model to follow systematic, step-by-step procedures when tackling problems.
Traceability: To enable the generation of explicit thought processes (using tags like <\|think\|>) that clearly map out the logical progression from problem statement to final solution.
Domain Focus: To improve performance in mathematical problem-solving and code logic by exposing the model to high-quality reasoning patterns in these specific fields.

Training Datasets:

Table with columns: Dataset, Purpose, Size/Focus
Dataset	Purpose	Size/Focus
`angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k`	High-level logical deduction examples.	8.7k examples
`Jackrong/GLM-5.1-Reasoning-1M-Cleaned`	Large-scale reasoning patterns and structured output generation.	1 Million examples
`Roman1111111/gemini-3.1-pro-hard-high-reasoning`	Specialized, challenging reasoning scenarios in technical domains.	High-quality specialized dataset
`ertghiu256/safety-training-distilled-50-examples`	Additional safety fine-tuning to retain security protocols during the distillation process.

✨ Capabilities

Improved Logical Problem Solving: The model is capable of handling multi-step problems in mathematics and code logic, relying on structured deduction rather than purely creative generation.
Structured Reasoning Output: Excels at generating solutions that are clearly organized, featuring explicit thought steps (e.g., using the <\|think\|> tag) before presenting the final answer.
Technical Proficiency: Provides functional code snippets and detailed explanations for algorithmic choices, leveraging the patterns learned from technical reasoning datasets.

⚠️ Limitations and Risks

Reasoning Depth: While improved in structure, the model's depth of understanding may not match that of massive, general-purpose models on extremely niche or highly abstract conceptual tasks.
Hallucination Risk: This model retains the inherent risk of hallucination. It may generate false facts, incorrect mathematical steps, or biased code suggestions.
Data Scale Note: The training utilized a targeted distillation approach with curated datasets. While effective for structural refinement, the dataset size is focused and not designed to achieve broad, state-of-the-art general reasoning mastery.

⚙️ Usage Guidelines & Recommended Parameters

To maximize the model's rigid and structured reasoning capabilities, use the following settings:

Table with columns: Parameter, Value, Description
Parameter	Value	Description
Temperature (`temp`)	`0.5`	Low temperature promotes deterministic, logical, and less creative output, favoring accuracy over novelty.
Top-K (`top_k`)	`64`	Limits the sampling space to the 40 most likely tokens, ensuring focused and relevant reasoning paths.
Top-P (`top_p`)	`0.9`	Allows for sufficient diversity in vocabulary while maintaining a high degree of coherence and relevance.

Prompting Strategy

For optimal performance, structure your prompts to encourage the model to utilize its structured reasoning features:

Explicit Task Definition: Clearly define the domain (Math, Code, Logic).
Demand Structure: Ask the model to use a structured thought process (e.g., "First, think step-by-step using the <|think|> tag, then provide the final answer.").
Constraint Setting: Specify the required output format (e.g., "Provide only the Python code and the explanation," or "Show all intermediate mathematical steps.").

💻 Technical Deployment

This model is compatible with standard Hugging Face transformers library implementations and can be deployed using various inference engines:

Python Loading Example (Hugging Face Transformers)

python
from transformers import AutoProcessor, AutoModelForCausalLM

MODEL_ID = "ertghiu256/gemma-4-e2b-gemini-opus-reasoning-distill"

# Load model and processor
processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    dtype="auto",
    device_map="auto"  # Automatically maps layers to available devices (GPU/CPU)
)

# Example inference setup (simplified)
prompt = "Solve the following quadratic equation: x^2 - 5x + 6 = 0. Use the <|think|> tag for your reasoning."

inputs = processor(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, temperature=0.5, top_k=40, top_p=0.95)

print(processor.decode(outputs[0], skip_special_tokens=True))

Recommended Inference Engines

vLLM: For high-throughput serving and low latency on GPU clusters.
llama.cpp: For efficient CPU/edge deployment and local running.
LM Studio / Ollama: For easy, user-friendly local experimentation and setup.

Model provider

ertghiu256

Model tree

Base

google/gemma-4-E2B-it

Adapter

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

🌟 Overview

🧠 Training Methodology

Core Objectives:

Structural Rigidity: To encourage the model to follow systematic, step-by-step procedures when tackling problems.
Traceability: To enable the generation of explicit thought processes (using tags like <\|think\|>) that clearly map out the logical progression from problem statement to final solution.
Domain Focus: To improve performance in mathematical problem-solving and code logic by exposing the model to high-quality reasoning patterns in these specific fields.

Training Datasets:

Table with columns: Dataset, Purpose, Size/Focus
Dataset	Purpose	Size/Focus
`angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k`	High-level logical deduction examples.	8.7k examples
`Jackrong/GLM-5.1-Reasoning-1M-Cleaned`	Large-scale reasoning patterns and structured output generation.	1 Million examples
`Roman1111111/gemini-3.1-pro-hard-high-reasoning`	Specialized, challenging reasoning scenarios in technical domains.	High-quality specialized dataset
`ertghiu256/safety-training-distilled-50-examples`	Additional safety fine-tuning to retain security protocols during the distillation process.

✨ Capabilities

Improved Logical Problem Solving: The model is capable of handling multi-step problems in mathematics and code logic, relying on structured deduction rather than purely creative generation.
Structured Reasoning Output: Excels at generating solutions that are clearly organized, featuring explicit thought steps (e.g., using the <\|think\|> tag) before presenting the final answer.
Technical Proficiency: Provides functional code snippets and detailed explanations for algorithmic choices, leveraging the patterns learned from technical reasoning datasets.

⚠️ Limitations and Risks

Reasoning Depth: While improved in structure, the model's depth of understanding may not match that of massive, general-purpose models on extremely niche or highly abstract conceptual tasks.
Hallucination Risk: This model retains the inherent risk of hallucination. It may generate false facts, incorrect mathematical steps, or biased code suggestions.
Data Scale Note: The training utilized a targeted distillation approach with curated datasets. While effective for structural refinement, the dataset size is focused and not designed to achieve broad, state-of-the-art general reasoning mastery.

⚙️ Usage Guidelines & Recommended Parameters

To maximize the model's rigid and structured reasoning capabilities, use the following settings:

Table with columns: Parameter, Value, Description
Parameter	Value	Description
Temperature (`temp`)	`0.5`	Low temperature promotes deterministic, logical, and less creative output, favoring accuracy over novelty.
Top-K (`top_k`)	`64`	Limits the sampling space to the 40 most likely tokens, ensuring focused and relevant reasoning paths.
Top-P (`top_p`)	`0.9`	Allows for sufficient diversity in vocabulary while maintaining a high degree of coherence and relevance.

Prompting Strategy

For optimal performance, structure your prompts to encourage the model to utilize its structured reasoning features:

Explicit Task Definition: Clearly define the domain (Math, Code, Logic).
Demand Structure: Ask the model to use a structured thought process (e.g., "First, think step-by-step using the <|think|> tag, then provide the final answer.").
Constraint Setting: Specify the required output format (e.g., "Provide only the Python code and the explanation," or "Show all intermediate mathematical steps.").

💻 Technical Deployment

This model is compatible with standard Hugging Face transformers library implementations and can be deployed using various inference engines:

Python Loading Example (Hugging Face Transformers)

python
from transformers import AutoProcessor, AutoModelForCausalLM

MODEL_ID = "ertghiu256/gemma-4-e2b-gemini-opus-reasoning-distill"

# Load model and processor
processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    dtype="auto",
    device_map="auto"  # Automatically maps layers to available devices (GPU/CPU)
)

# Example inference setup (simplified)
prompt = "Solve the following quadratic equation: x^2 - 5x + 6 = 0. Use the <|think|> tag for your reasoning."

inputs = processor(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, temperature=0.5, top_k=40, top_p=0.95)

print(processor.decode(outputs[0], skip_special_tokens=True))

Recommended Inference Engines

vLLM: For high-throughput serving and low latency on GPU clusters.
llama.cpp: For efficient CPU/edge deployment and local running.
LM Studio / Ollama: For easy, user-friendly local experimentation and setup.

gemma-4-e2b-gemini-opus-reasoning-distill-lora

Get help setting up a custom Dedicated Endpoints.

README

🌟 Overview

🧠 Training Methodology

✨ Capabilities

⚠️ Limitations and Risks

⚙️ Usage Guidelines & Recommended Parameters

Prompting Strategy

💻 Technical Deployment

Python Loading Example (Hugging Face Transformers)

Recommended Inference Engines

Explore FriendliAI today

README

🌟 Overview

🧠 Training Methodology

✨ Capabilities

⚠️ Limitations and Risks

⚙️ Usage Guidelines & Recommended Parameters

Prompting Strategy

💻 Technical Deployment

Python Loading Example (Hugging Face Transformers)

Recommended Inference Engines