erenyeager-1

Huihui4-8B-A4B

README

License: apache-2.0

📌 Overview

Huihui4-8B-A4B is a lightweight MoE (Mixture of Experts) conversational model optimized from Google's gemma-4-26B-A4B-it architecture. Through expert pruning and supervised fine-tuning on high-quality dialogue data, this model significantly reduces computational overhead while preserving core reasoning and interaction capabilities. It is specifically designed for deployment on consumer-grade hardware and code-related conversational tasks.

This model is not an ablation variant.

ollama

Please use the latest version of ollama

You can use huihui_ai/huihui-4:8b directly,

markdown
ollama run huihui_ai/huihui-4:8b

🧱 Architecture & Configuration

Table with columns: Parameter, Description
Parameter	Description
Base Model	`google/gemma-4-26B-A4B-it`
Total MoE Experts	32 (pruned from the original 128)
Active Experts per Token	8 (maintaining the A4B activation scale)
Model Positioning	Lightweight MoE conversational base / Consumer-hardware friendly

📊 Training Data & Methodology

Data Source: 500+ high-quality dialogue samples carefully extracted from code preference data.
Training Method: Supervised Fine-Tuning (SFT).
Optimization Goal: Maintain semantic coherence, instruction-following capability, and code context understanding post-pruning.

📈 Evaluation & Performance

Evaluation Tool: Quantitative perplexity assessment using the calculate_perplexity script.
Test Results: Preliminary dialogue tests indicate smooth interactions and stable logic. The model performs reliably in daily conversations and code-assistance tasks, with no significant performance degradation observed after pruning.

💻 Inference & Deployment Recommendations

Recommended Frameworks: vLLM / llama.cpp / HuggingFace Transformers
VRAM Requirements:
- FP16: < 18GB
- INT4/INT8 Quantized: < 6~9GB (compatible with mainstream single consumer GPUs)
Use Cases: Code conversation assistants, lightweight task planning, local deployment prototyping, and baseline validation for MoE pruning/merging techniques.

🗺️ Roadmap

Multi-Domain Fine-Tuning: Further SFT on four distinct datasets to enhance the generalization capabilities of this 32-expert model.
Expert Merging Validation: Experiment with merging the four independently fine-tuned models back into a 128-expert architecture, validating the feasibility of a "prune → fine-tune → merge" pipeline.
Core Objective: Ultimately verify the engineering viability of training and iterating on large-scale MoE models using only consumer-grade hardware.
If you're interested, feel free to fine-tune this model on your own datasets. We plan to merge all resulting models into a unified version at the end.

📝 Notes

This model represents the initial pruned and fine-tuned iteration of the Huihui series. Future updates will involve multi-dataset integration and expert merging.
calculate_perplexity evaluation script).
Evaluation results

markdown
python evaluate_perplexity_final.py --model_path ./google/gemma-4-26B-A4B-it

Model Path     : ./google/gemma-4-26B-A4B-it
Eval Samples   : 100
Max Length     : 8192

Table with columns: model, Fine-tuning steps, num_experts, Perplexity, Average Loss
model	Fine-tuning steps	num_experts	Perplexity	Average Loss
gemma-4-26B-A4B-it	0	128	1.5964 (+ 0 )	0.4678 (+ 0 )
gemma-4-26B-A4B-it-Pruned-32	0	32	2.4826 (+ 0.8862)	0.9093 (+ 0.4415)
gemma-4-26B-A4B-it-Pruned-32-sft-750	750	32	1.3827 (- 0.2137)	0.3240 (- 0.1438)

Citation

markdown
@misc{huihui4-8b-a4b,
      title  = {{Huihui4-8B-A4B}: A lightweight MoE (Mixture of Experts) conversational model},
      author = {Huihui-ai},
      year   = {2026},
      url    = {https://hf.co/huihui-ai/Huihui4-8B-A4B}
}

Contact

If you have any questions, please raise an issue or contact us at support@huihui.ai.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

erenyeager-1

Model Tree

Base

google/gemma-4-26B-A4B-it

Fine-tuned

this model

Input Modalities

TextImage

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer