Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Details

PropertyValue
AuthorMomin Aldahdouh
Base modelQwen/Qwen3-0.6B (596M params)
Adapter size~39 MB (LoRA rank=16)
GGUF (Q4_K_M)379 MB — use with Ollama
LicenseApache 2.0

Ollama (Recommended)

bash

ollama run hf.co/Momin-Aldahdouh/MominoMoE-v3:Q4_K_M

For clean output with thinking suppressed, use a Modelfile:

markdown

FROM hf.co/Momin-Aldahdouh/MominoMoE-v3:Q4_K_M
PARAMETER temperature 0.1
PARAMETER num_predict 512
SYSTEM You are MominOS, a kernel fault diagnostician and OS assistant. Be concise and direct.
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}
{{- range .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
<think>
</think>
"""

Python Usage

python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
tokenizer = AutoTokenizer.from_pretrained("Momin-Aldahdouh/MominoMoE-v3")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B", torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, "Momin-Aldahdouh/MominoMoE-v3")
model.eval()

What it can do

TaskExample
Kernel fault diagnosisPage fault at cr2=0x8 → null deref, write+non-present, fix pointer
Tool calls{"tool": "kill_process", "args": {"pid": 1847, "signal": 9}}
Shell commandsfind /var/log -type f -size +100M -delete
Log analysisSSH brute force, filesystem corruption, SYN flood
Sysadmin Q&AOOM killer, context switches, cgroups, eBPF
Process debuggingD-state, zombie, memory leaks, page fault rate
Security events/etc/shadow access, suspicious ports, AppArmor denials
Network diagnosisTCP states, packet capture, routing, NAT
SystemdUnit files, journalctl, service dependencies
ScriptingBash/Python automation scripts
DockerContainer debugging, resource limits, networking

Training

MetricValue
Training samples50,000
Validation samples5,000
Steps8,000
Effective batch16 (4 × grad_accum 4)
Learning rate1e-4 (cosine)
HardwareNVIDIA L4 (23GB VRAM), GCP g2-standard-8
Duration~6h 37m
Final train loss0.1906
Final eval loss0.1602
Token accuracy94.7%

Dataset categories

Category%
Tool calls (single-step)22%
Tool calls (multi-step)13%
Shell command generation13%
Kernel fault diagnosis12%
Sysadmin Q&A10%
Process/memory debugging8%
Log analysis7%
Security events5%
Network diagnostics4%
Systemd management3%
Scripting (bash/python)2%
Docker/containers1%

Lineage

VersionParamsTrainingEval lossNotes
MominoMoE_1.2B1.2BFrom scratchWrong dataset (web APIs)
MominoMoE-v20.6B LoRA10k kernel samples0.2896Kernel faults only
MominoMoE-v30.6B LoRA50k broad OS samples0.1602Full OS assistant

Framework versions

  • PEFT 0.19.1

Model provider

Momin-Aldahdouh

Model tree

Base

Qwen/Qwen3-0.6B

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today