Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Intended use & limitations

Retrieval-augmented Q&A for mining / quarry / blasting professionals. Not a standalone knowledge source — use with the retriever (intfloat/e5-base-v2 + a Qdrant collection). For SDS / safety content, always verify against the cited source PDF; the model is trained to give page numbers for exactly this reason. Domain-specific to Dyno Nobel AU products.

Training

  • Base: Qwen/Qwen3-4B
  • Method: QLoRA (4-bit nf4), r=16, α=32, dropout=0.05, targets all attn + MLP projections
  • Data: 602 synthetic grounded examples — 584 [N]-cited answers (SDS sections, tech specs, guides, case studies) + 18 refusal / safe-decline examples — generated by a teacher model over retrieved context
  • Schedule: 3 epochs, lr 1e-4 cosine, full-sequence SFT
  • Result: final train_loss 1.48 (2.54 → 1.12), token accuracy 57% → 76%

Files

  • *.safetensors — merged fp16 weights (load with transformers)
  • dyno-blast-4b-q8_0.gguf — q8_0 GGUF for llama.cpp / Ollama

Prompt format

Grounded system prompt (answer only from numbered SOURCEs, cite [N], refuse if absent) + numbered SOURCE [N] blocks from the retriever, then the question. The exact system prompt and chunk schema are in the companion dataset repo.

Model provider

kcherry497

Model tree

Base

Qwen/Qwen3-4B

Quantized

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today