Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Intended use & limitations
Retrieval-augmented Q&A for mining / quarry / blasting professionals. Not a standalone knowledge source — use with the retriever (intfloat/e5-base-v2 + a Qdrant collection). For SDS / safety content, always verify against the cited source PDF; the model is trained to give page numbers for exactly this reason. Domain-specific to Dyno Nobel AU products.
Training
- Base: Qwen/Qwen3-4B
- Method: QLoRA (4-bit nf4), r=16, α=32, dropout=0.05, targets all attn + MLP projections
- Data: 602 synthetic grounded examples — 584
[N]-cited answers (SDS sections, tech specs, guides, case studies) + 18 refusal / safe-decline examples — generated by a teacher model over retrieved context - Schedule: 3 epochs, lr 1e-4 cosine, full-sequence SFT
- Result: final
train_loss1.48 (2.54 → 1.12), token accuracy 57% → 76%
Files
*.safetensors— merged fp16 weights (load withtransformers)dyno-blast-4b-q8_0.gguf— q8_0 GGUF for llama.cpp / Ollama
Prompt format
Grounded system prompt (answer only from numbered SOURCEs, cite [N], refuse if
absent) + numbered SOURCE [N] blocks from the retriever, then the question. The
exact system prompt and chunk schema are in the companion dataset repo.
Model provider
kcherry497
Model tree
Base
Qwen/Qwen3-4B
Quantized
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information