Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Support This Work
I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.
Training Approach

799 curated rows. That's it. A small, precisely curated dataset instead of tens of thousands of unfiltered examples. The base model already has the knowledge from pretraining - the fine-tune teaches it a reasoning behavior pattern.
Every training row contains explicit self-correction ("wait, that's not right"), verification ("let me check by plugging back in"), and multi-path exploration ("alternatively, I could try..."). The data was generated from multiple frontier models and filtered through a custom structural quality pipeline that enforces reasoning depth, coherence, and flow patterns. 100% of rows pass all quality gates simultaneously.
Training Data Quality

The reasoning data was curated using a custom structural process supervision pipeline. Key metrics:
| Metric | Value |
|---|---|
| Signal quality score | 78.7 mean (61.5 min, 90.0 max) |
| Thinking trace depth | 1,667 words average |
| Self-correction | 100% of rows (17.2 per row avg) |
| Verification | 100% of rows (10.3 per row avg) |
| Exploration | 100% of rows (6.3 per row avg) |
| Quality gate pass rate | 100% |
Every row was scored across multiple structural dimensions and only rows passing all thresholds simultaneously were included. No rows were manually curated - the pipeline is fully automated and reproducible.
How It Compares

We ran our structural quality analysis against every major public reasoning dataset used for Opus/Qwen distillation. The results:
| Dataset | Rows | Think Words | Self-Correction | Verification | Exploration | Signal Score | Gate Pass |
|---|---|---|---|---|---|---|---|
| Harmonic (ours) | 799 | 1,667 | 100% | 100% | 100% | 78.7 | 100% |
| Crownelius/Opus-3300x | 2,160 | 188 | 5.9% | 22.6% | 5.2% | 28.0 | 0.1% |
| nohurry/Opus-Filtered | 2,326 | 191 | 6.7% | 24.1% | 5.3% | 28.5 | 0.1% |
| TeichAI/Opus-250x | 250 | 323 | 17.2% | 26.8% | 6.8% | 24.6 | 0.4% |
| Jackrong/Qwen-700x | 633 | 6,653 | 97.5% | 97.6% | 69.8% | 75.6 | 22.7% |
| Bespoke-Stratos-17k | 16,710 | 1,322 | 88.2% | 72.7% | 59.7% | 71.7 | 49.0% |
| glaiveai/reasoning-20m | 22M+ | 799 | 64.1% | 41.4% | 37.3% | 46.2 | 12.8% |
| KingNish/reasoning-20k | 19,944 | 132 | 0.7% | 4.2% | 4.3% | 27.4 | 0.0% |
The popular Opus distillation datasets (Crownelius, nohurry, TeichAI) have less than 1% quality gate pass rate. Their thinking traces average under 200 words with near-zero self-correction. Models trained on this data learn to produce short, shallow chain-of-thought that looks like reasoning but lacks the structural behaviors that make reasoning reliable.
Jackrong and Stratos are closer competitors but still fall short on consistency. Jackrong has massive traces (6,653 words avg) but only 22.7% pass the quality gate - the thinking is verbose but wanders. Stratos has decent markers but 49% of rows still fail, meaning half the gradient updates during training push the model toward shallow patterns.
Harmonic's data is smaller by design. Every row passes. Every gradient update reinforces genuine reasoning behavior.
Reasoning Flow

Marker density measured across 20 equal segments of each thinking trace. The characteristic curve shows reasoning intensity building through the middle of the trace and peaking in the later segments as the model enters verification and self-correction before committing to an answer.
Training Configuration
markdown
base_model: Qwen/Qwen3.5-9Bdataset: 799 curated reasoning rowsepochs: 1learning_rate: 1e-4lr_scheduler: cosinewarmup_ratio: 0.1max_seq_length: 8192lora_rank: 32lora_alpha: 32dropout: 0.05micro_batch_size: 1gradient_accumulation_steps: 4weight_decay: 0.01
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizermodel = AutoModelForCausalLM.from_pretrained("DJLougen/Harmonic-9B")tokenizer = AutoTokenizer.from_pretrained("DJLougen/Harmonic-9B")
Reasoning format
The model uses <think> blocks for reasoning:
markdown
<think>The user is asking about X. Let me consider two approaches...Approach 1: ...Approach 2: ...I'll go with Approach 1 because...Wait, I need to be careful here - this assumes Y, which may not hold.Let me verify by checking a special case...Yes, that confirms the result.</think>[Final answer here]
Intended Use
- Reasoning tasks requiring genuine multi-step thinking
- Mathematical problem-solving with self-correction
- Code analysis and generation with structured verification
- General conversation (conversational ability preserved through training design)
- Base model for Stage 2 agentic fine-tuning
Limitations
- 9B parameter model - not suitable for tasks requiring extensive world knowledge
- Reasoning traces can be verbose for simple questions
- Not optimized for tool calling - see Harmonic-Hermes-9B (coming soon) for agentic use
- Benchmark evaluation is ongoing
Architecture
- Base: Qwen 3.5 9B (9.65B parameters)
- Training: LoRA fine-tuning, merged into base weights
- Precision: BF16
- Context: 8192 tokens
License
Apache 2.0 - same as the base model. All training data is from Apache 2.0 or MIT licensed sources. Fully commercial use permitted.
Links
- GGUF quantizations: DJLougen/Harmonic-9B-GGUF
- Agentic variant: Harmonic-Hermes-9B (coming soon)
- Filtered agent dataset: DJLougen/hermes-agent-traces-filtered
Model provider
slevinw
Model tree
Base
Qwen/Qwen3.5-9B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information