Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Container
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Details
| Property | Value |
|---|---|
| Author | Momin Aldahdouh |
| Base model | Qwen/Qwen3-0.6B (596M params) |
| Adapter size | ~39 MB (LoRA rank=16) |
| GGUF (Q4_K_M) | 379 MB — use with Ollama |
| License | Apache 2.0 |
Ollama (Recommended)
bash
ollama run hf.co/Momin-Aldahdouh/MominoMoE-v3:Q4_K_M
For clean output with thinking suppressed, use a Modelfile:
markdown
FROM hf.co/Momin-Aldahdouh/MominoMoE-v3:Q4_K_MPARAMETER temperature 0.1PARAMETER num_predict 512SYSTEM You are MominOS, a kernel fault diagnostician and OS assistant. Be concise and direct.TEMPLATE """{{- if .System }}<|im_start|>system{{ .System }}<|im_end|>{{ end }}{{- range .Messages }}<|im_start|>{{ .Role }}{{ .Content }}<|im_end|>{{ end }}<|im_start|>assistant<think></think>"""
Python Usage
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModeltokenizer = AutoTokenizer.from_pretrained("Momin-Aldahdouh/MominoMoE-v3")model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B", torch_dtype=torch.float16, device_map="auto")model = PeftModel.from_pretrained(model, "Momin-Aldahdouh/MominoMoE-v3")model.eval()
What it can do
| Task | Example |
|---|---|
| Kernel fault diagnosis | Page fault at cr2=0x8 → null deref, write+non-present, fix pointer |
| Tool calls | {"tool": "kill_process", "args": {"pid": 1847, "signal": 9}} |
| Shell commands | find /var/log -type f -size +100M -delete |
| Log analysis | SSH brute force, filesystem corruption, SYN flood |
| Sysadmin Q&A | OOM killer, context switches, cgroups, eBPF |
| Process debugging | D-state, zombie, memory leaks, page fault rate |
| Security events | /etc/shadow access, suspicious ports, AppArmor denials |
| Network diagnosis | TCP states, packet capture, routing, NAT |
| Systemd | Unit files, journalctl, service dependencies |
| Scripting | Bash/Python automation scripts |
| Docker | Container debugging, resource limits, networking |
Training
| Metric | Value |
|---|---|
| Training samples | 50,000 |
| Validation samples | 5,000 |
| Steps | 8,000 |
| Effective batch | 16 (4 × grad_accum 4) |
| Learning rate | 1e-4 (cosine) |
| Hardware | NVIDIA L4 (23GB VRAM), GCP g2-standard-8 |
| Duration | ~6h 37m |
| Final train loss | 0.1906 |
| Final eval loss | 0.1602 |
| Token accuracy | 94.7% |
Dataset categories
| Category | % |
|---|---|
| Tool calls (single-step) | 22% |
| Tool calls (multi-step) | 13% |
| Shell command generation | 13% |
| Kernel fault diagnosis | 12% |
| Sysadmin Q&A | 10% |
| Process/memory debugging | 8% |
| Log analysis | 7% |
| Security events | 5% |
| Network diagnostics | 4% |
| Systemd management | 3% |
| Scripting (bash/python) | 2% |
| Docker/containers | 1% |
Lineage
| Version | Params | Training | Eval loss | Notes |
|---|---|---|---|---|
| MominoMoE_1.2B | 1.2B | From scratch | — | Wrong dataset (web APIs) |
| MominoMoE-v2 | 0.6B LoRA | 10k kernel samples | 0.2896 | Kernel faults only |
| MominoMoE-v3 | 0.6B LoRA | 50k broad OS samples | 0.1602 | Full OS assistant |
Framework versions
- PEFT 0.19.1
Model provider
Momin-Aldahdouh
Model tree
Base
Qwen/Qwen3-0.6B
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information