clglavan

magos-k8s-0.6b

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Scope and design

The model targets a narrow task: mapping a Kubernetes symptom (a failed or Warning condition, a kubectl describe/events excerpt, a misconfigured manifest) to the responsible spec field and the corrective action. The reasoning trace is intentionally short and templated (implicated condition → spec field → verdict → fix / next command) rather than open-ended chain-of-thought — that is the form a 0.6B model reproduces reliably without drifting into invented detail.

Because every response terminates in a concrete next action, the model fits as the inner-loop reasoner of a planner→executor devops agent. It is full-weight fine-tuned (no LoRA/adapters), ships as bf16 safetensors plus GGUF quantizations, and runs locally at ~640 MB (Q8). Knowledge is frozen at the training-snapshot; treat it as a reasoning component, not a source of truth, and verify field/flag specifics against current docs or live kubectl explain.

What's new in v16 (current stable)

v16 is the largest and broadest corpus yet — ~108k <think> reasoning examples, all derived from the official Kubernetes sources and built so the model only ever phrases scenarios around verified facts (every YAML field is checked against the v1.34 OpenAPI schema; every flag against the kubectl reference). It combines two tracks:

Event-grounded diagnostic matched pairs (the v15 design): a BROKEN case (failed/Warning events ↔ the exact offending YAML field) and a HEALTHY case (clean events ↔ the same field set correctly), across ~80 failure subcategories (scheduling, image, crashloop, probes, volumes, networking, RBAC/PodSecurity, controllers, quota/limits, …).
Command-reference: correct kubectl invocations across ~45 subcommands and their flags.

Every answer is a short, structured <think> chain (events → correlate to field → verdict → fix, or goal → command) followed by a concise YAML patch or command — the form a 0.6B model reproduces reliably without drifting into invented detail.

Table with columns: v15, v16
	v15	v16
Corpus	~16.6k diagnostic	~108k (diagnostic + command-reference)
Coverage	~80 diagnostic subcategories	+ ~45 kubectl subcommands/flags
Recipe	4 epochs · LR 2e-5 · batch 32	4 epochs · LR 2e-5 · batch 32

Strengths: diagnosing from pasted events/describe output, YAML generation/review, and structured next-step reasoning. It is full-weight fine-tuned (no LoRA), schema- grounded, and low-hallucination by construction.

To pin a specific version when loading:

python
AutoModelForCausalLM.from_pretrained("clglavan/magos-k8s-0.6b", revision="v16")
# or revision="v15" / "v8" / "v7" / "v6" / "v5" / "v3" / "v2" for previous versions

What it's good at

Diagnosing from events — paste kubectl get events / kubectl describe output and it correlates the failure to the responsible YAML field + fix.
YAML manifest generation and review — a top strength; correct apiVersion/field names across Pod, Deployment, Service, NetworkPolicy, PVC, HPA, Ingress, RBAC and many other Kinds (schema-validated training set).
kubectl command construction — broad subcommand/flag coverage from the reference (the v16 command-reference track).
Prometheus alert handling — meaning + diagnostic steps for the prometheus-operator runbook set.
Structured next-step reasoning — short <think> that ends in a concrete command or fix, suitable as an agent's inner-loop reasoner.

What it's not good at

Multi-step planning or complex tool chains — it's a 0.6B model.
Subtle/rare flags and multi-flag combinations — verify with kubectl --help.
General (non-Kubernetes) reasoning — this corpus is K8s-focused.
Knowledge of features released after the source docs were captured (mid-2026).

How to use

Important — sampling: v16 is a reasoning model. Run it greedy with repetition_penalty = 1.0. A repetition penalty > 1.0 penalizes the prompt words the <think> block needs to reference and collapses it to an empty <think></think>. (This differs from the terse v8, which used temp 0.05 / rep 1.15.)

llama.cpp / Ollama / LM Studio

Table with columns: File, Size, Quality
File	Size	Quality
`magos-k8s-0.6b-f16.gguf`	~1.2 GB	reference (full precision)
`magos-k8s-0.6b-q8_0.gguf`	~640 MB	effectively identical to f16 — recommended
`magos-k8s-0.6b-q4_k_m.gguf`	~400 MB	smallest; more field/flag mistakes — fine for casual use

python
from llama_cpp import Llama

llm = Llama(model_path="magos-k8s-0.6b-q8_0.gguf", n_ctx=4096, chat_format="chatml")
resp = llm.create_chat_completion(
    messages=[{"role": "user", "content":
        "kubectl describe pod shows: Warning FailedScheduling 0/3 nodes are available: 3 Insufficient memory. Why?"}],
    temperature=0.0,
    repeat_penalty=1.0,
    max_tokens=512,
)
print(resp["choices"][0]["message"]["content"])

Hugging Face transformers

python
from transformers import AutoModelForCausalLM, AutoTokenizer

tok   = AutoTokenizer.from_pretrained("clglavan/magos-k8s-0.6b")
model = AutoModelForCausalLM.from_pretrained("clglavan/magos-k8s-0.6b",
                                             dtype="bfloat16",
                                             device_map="auto")

messages = [{"role": "user", "content":
    "My pod is CrashLoopBackOff right after deploy. What's the likely cause and fix?"}]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512,
                     do_sample=False, repetition_penalty=1.0)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training

Table

Base model	Qwen/Qwen3-0.6B
Method	Two stage: continued pre-training (CPT) → supervised fine-tuning (SFT). Both full-weight (no LoRA).
Stage 1 corpus	~8.5k document chunks: kubernetes.io docs + blog (~6.5k), Kubernetes API reference v1.34 (~1.9k), Prometheus alert runbooks (~106). Unchanged since v5.
Stage 1	LR 5e-6, cosine, 1 epoch (~6.5M tokens)
Stage 2 corpus (v16)	~108k synthetic Q&A pairs derived from the official documentation, all with a structured `<think>` reasoning block: event→YAML diagnostic matched BROKEN/HEALTHY pairs across ~80 K8s failure subcategories plus a `kubectl` command-reference track (~45 subcommands + flags). Every YAML field is validated against the v1.34 OpenAPI schema and every flag against the kubectl reference, so the teacher only phrases scenarios around verified facts.
Stage 2

Files

model.safetensors — fine-tuned weights, HF format (bf16)
magos-k8s-0.6b-f16.gguf / -q8_0.gguf / -q4_k_m.gguf — GGUF quantizations
tokenizer.json, tokenizer_config.json, chat_template.jinja — Qwen3 tokenizer + ChatML template
config.json, generation_config.json — standard HF configs

Limitations and intended use

This is a small experimental model. Always verify any command, YAML, or behavioral claim against current Kubernetes documentation before running in production. Intended for learning, prototyping, and as a component in local devops agents — not as an authoritative source.

License

Apache 2.0. Inherits from the Qwen3-0.6B base model license. The training data is derived from the official Kubernetes documentation (CC-BY 4.0) and the prometheus-operator Prometheus runbooks (Apache 2.0).

Model provider

clglavan

Model tree

Base

Qwen/Qwen3-0.6B

Quantized

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Scope and design

What's new in v16 (current stable)

Event-grounded diagnostic matched pairs (the v15 design): a BROKEN case (failed/Warning events ↔ the exact offending YAML field) and a HEALTHY case (clean events ↔ the same field set correctly), across ~80 failure subcategories (scheduling, image, crashloop, probes, volumes, networking, RBAC/PodSecurity, controllers, quota/limits, …).
Command-reference: correct kubectl invocations across ~45 subcommands and their flags.

Table with columns: v15, v16
	v15	v16
Corpus	~16.6k diagnostic	~108k (diagnostic + command-reference)
Coverage	~80 diagnostic subcategories	+ ~45 kubectl subcommands/flags
Recipe	4 epochs · LR 2e-5 · batch 32	4 epochs · LR 2e-5 · batch 32

To pin a specific version when loading:

python
AutoModelForCausalLM.from_pretrained("clglavan/magos-k8s-0.6b", revision="v16")
# or revision="v15" / "v8" / "v7" / "v6" / "v5" / "v3" / "v2" for previous versions

What it's good at

Diagnosing from events — paste kubectl get events / kubectl describe output and it correlates the failure to the responsible YAML field + fix.
YAML manifest generation and review — a top strength; correct apiVersion/field names across Pod, Deployment, Service, NetworkPolicy, PVC, HPA, Ingress, RBAC and many other Kinds (schema-validated training set).
kubectl command construction — broad subcommand/flag coverage from the reference (the v16 command-reference track).
Prometheus alert handling — meaning + diagnostic steps for the prometheus-operator runbook set.
Structured next-step reasoning — short <think> that ends in a concrete command or fix, suitable as an agent's inner-loop reasoner.

What it's not good at

Multi-step planning or complex tool chains — it's a 0.6B model.
Subtle/rare flags and multi-flag combinations — verify with kubectl --help.
General (non-Kubernetes) reasoning — this corpus is K8s-focused.
Knowledge of features released after the source docs were captured (mid-2026).

How to use

Important — sampling: v16 is a reasoning model. Run it greedy with repetition_penalty = 1.0. A repetition penalty > 1.0 penalizes the prompt words the <think> block needs to reference and collapses it to an empty <think></think>. (This differs from the terse v8, which used temp 0.05 / rep 1.15.)

llama.cpp / Ollama / LM Studio

Table with columns: File, Size, Quality
File	Size	Quality
`magos-k8s-0.6b-f16.gguf`	~1.2 GB	reference (full precision)
`magos-k8s-0.6b-q8_0.gguf`	~640 MB	effectively identical to f16 — recommended
`magos-k8s-0.6b-q4_k_m.gguf`	~400 MB	smallest; more field/flag mistakes — fine for casual use

python
from llama_cpp import Llama

llm = Llama(model_path="magos-k8s-0.6b-q8_0.gguf", n_ctx=4096, chat_format="chatml")
resp = llm.create_chat_completion(
    messages=[{"role": "user", "content":
        "kubectl describe pod shows: Warning FailedScheduling 0/3 nodes are available: 3 Insufficient memory. Why?"}],
    temperature=0.0,
    repeat_penalty=1.0,
    max_tokens=512,
)
print(resp["choices"][0]["message"]["content"])

Hugging Face transformers

python
from transformers import AutoModelForCausalLM, AutoTokenizer

tok   = AutoTokenizer.from_pretrained("clglavan/magos-k8s-0.6b")
model = AutoModelForCausalLM.from_pretrained("clglavan/magos-k8s-0.6b",
                                             dtype="bfloat16",
                                             device_map="auto")

messages = [{"role": "user", "content":
    "My pod is CrashLoopBackOff right after deploy. What's the likely cause and fix?"}]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512,
                     do_sample=False, repetition_penalty=1.0)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training

Table

Base model	Qwen/Qwen3-0.6B
Method	Two stage: continued pre-training (CPT) → supervised fine-tuning (SFT). Both full-weight (no LoRA).
Stage 1 corpus	~8.5k document chunks: kubernetes.io docs + blog (~6.5k), Kubernetes API reference v1.34 (~1.9k), Prometheus alert runbooks (~106). Unchanged since v5.
Stage 1	LR 5e-6, cosine, 1 epoch (~6.5M tokens)
Stage 2 corpus (v16)	~108k synthetic Q&A pairs derived from the official documentation, all with a structured `<think>` reasoning block: event→YAML diagnostic matched BROKEN/HEALTHY pairs across ~80 K8s failure subcategories plus a `kubectl` command-reference track (~45 subcommands + flags). Every YAML field is validated against the v1.34 OpenAPI schema and every flag against the kubectl reference, so the teacher only phrases scenarios around verified facts.
Stage 2

Files

model.safetensors — fine-tuned weights, HF format (bf16)
magos-k8s-0.6b-f16.gguf / -q8_0.gguf / -q4_k_m.gguf — GGUF quantizations
tokenizer.json, tokenizer_config.json, chat_template.jinja — Qwen3 tokenizer + ChatML template
config.json, generation_config.json — standard HF configs

magos-k8s-0.6b

Get help setting up a custom Dedicated Endpoints.

README

Scope and design

What's new in v16 (current stable)

What it's good at

What it's not good at

How to use

llama.cpp / Ollama / LM Studio

Hugging Face transformers

Training

Files

Limitations and intended use

License

Explore FriendliAI today

README

Scope and design

What's new in v16 (current stable)

What it's good at

What it's not good at

How to use

llama.cpp / Ollama / LM Studio

Hugging Face transformers

Training

Files

Limitations and intended use

License