Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

🌟 Overview

The gemma-4-e2b-gemini-opus-reasoning-distill model is a specialized variant of the Gemma 4 architecture. It has been fine-tuned specifically to enhance the logical structure and rigidity of its reasoning capabilities, particularly in technical domains like mathematics and coding.

This training process focused on refining how the model approaches problem-solving, aiming to instill a systematic, traceable approach to generating solutions. The primary goal is not to change the core conversational style of Gemma 4, but rather to make its internal thought processes more organized and deterministic.

🧠 Training Methodology

This model was trained using a focused distillation process on high-quality reasoning examples extracted from various large language models (LLMs). This approach aimed to transfer structured thinking patterns into the Gemma 4 architecture.

Core Objectives:

  1. Structural Rigidity: To encourage the model to follow systematic, step-by-step procedures when tackling problems.
  2. Traceability: To enable the generation of explicit thought processes (using tags like <\|think\|>) that clearly map out the logical progression from problem statement to final solution.
  3. Domain Focus: To improve performance in mathematical problem-solving and code logic by exposing the model to high-quality reasoning patterns in these specific fields.

Training Datasets:

DatasetPurposeSize/Focus
angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7kHigh-level logical deduction examples.8.7k examples
Jackrong/GLM-5.1-Reasoning-1M-CleanedLarge-scale reasoning patterns and structured output generation.1 Million examples
Roman1111111/gemini-3.1-pro-hard-high-reasoningSpecialized, challenging reasoning scenarios in technical domains.High-quality specialized dataset
ertghiu256/safety-training-distilled-50-examplesAdditional safety fine-tuning to retain security protocols during the distillation process.50 examples

✨ Capabilities

  • Improved Logical Problem Solving: The model is capable of handling multi-step problems in mathematics and code logic, relying on structured deduction rather than purely creative generation.
  • Structured Reasoning Output: Excels at generating solutions that are clearly organized, featuring explicit thought steps (e.g., using the <\|think\|> tag) before presenting the final answer.
  • Technical Proficiency: Provides functional code snippets and detailed explanations for algorithmic choices, leveraging the patterns learned from technical reasoning datasets.

⚠️ Limitations and Risks

  • Reasoning Depth: While improved in structure, the model's depth of understanding may not match that of massive, general-purpose models on extremely niche or highly abstract conceptual tasks.
  • Hallucination Risk: This model retains the inherent risk of hallucination. It may generate false facts, incorrect mathematical steps, or biased code suggestions.
  • Data Scale Note: The training utilized a targeted distillation approach with curated datasets. While effective for structural refinement, the dataset size is focused and not designed to achieve broad, state-of-the-art general reasoning mastery.

⚙️ Usage Guidelines & Recommended Parameters

To maximize the model's rigid and structured reasoning capabilities, use the following settings:

ParameterValueDescription
Temperature (temp)0.5Low temperature promotes deterministic, logical, and less creative output, favoring accuracy over novelty.
Top-K (top_k)64Limits the sampling space to the 40 most likely tokens, ensuring focused and relevant reasoning paths.
Top-P (top_p)0.9Allows for sufficient diversity in vocabulary while maintaining a high degree of coherence and relevance.

Prompting Strategy

For optimal performance, structure your prompts to encourage the model to utilize its structured reasoning features:

  1. Explicit Task Definition: Clearly define the domain (Math, Code, Logic).
  2. Demand Structure: Ask the model to use a structured thought process (e.g., "First, think step-by-step using the <|think|> tag, then provide the final answer.").
  3. Constraint Setting: Specify the required output format (e.g., "Provide only the Python code and the explanation," or "Show all intermediate mathematical steps.").

💻 Technical Deployment

This model is compatible with standard Hugging Face transformers library implementations and can be deployed using various inference engines:

Python Loading Example (Hugging Face Transformers)

python

from transformers import AutoProcessor, AutoModelForCausalLM
MODEL_ID = "ertghiu256/gemma-4-e2b-gemini-opus-reasoning-distill"
# Load model and processor
processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
dtype="auto",
device_map="auto" # Automatically maps layers to available devices (GPU/CPU)
)
# Example inference setup (simplified)
prompt = "Solve the following quadratic equation: x^2 - 5x + 6 = 0. Use the <|think|> tag for your reasoning."
inputs = processor(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, temperature=0.5, top_k=40, top_p=0.95)
print(processor.decode(outputs[0], skip_special_tokens=True))

Recommended Inference Engines

  • vLLM: For high-throughput serving and low latency on GPU clusters.
  • llama.cpp: For efficient CPU/edge deployment and local running.
  • LM Studio / Ollama: For easy, user-friendly local experimentation and setup.

Model provider

ertghiu256

ertghiu256

Model tree

Base

google/gemma-4-E2B-it

Adapter

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today