Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

🔑 Key Highlights

  • Base Model: Qwen/Qwen3-32B (33B parameters)
  • Method: LoRA (Low-Rank Adaptation) — r=64, alpha=128, all-linear targets
  • Training Data: 2000 multi-turn telecom conversations across 7 domains
  • Hardware: AMD Instinct MI300X (192GB HBM3) on ROCm 6.2
  • Training Time: ~3 hours
  • Trainable Parameters: 536M (1.6% of total)

📡 Domains Covered

DomainDescription
5G RANgNB configuration, beamforming, MIMO, cell planning
5G CoreAMF/SMF/UPF operations, network slicing, NRF management
TransportMPLS, segment routing, fronthaul/backhaul optimization
SecurityIPsec, SUPI/SUCI encryption, network access control
AutomationAnsible/Terraform for network, closed-loop operations
VoLTE/IMSSIP call flows, QoS, VoNR migration
Cloud NativeCNF deployment, Kubernetes for telco, service mesh

🚀 Quick Start

Loading with PEFT

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-32B",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, "shaunak1234/qwen3-32b-telecom-expert")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B", trust_remote_code=True)
# Example prompt
messages = [
{"role": "system", "content": "You are a senior 5G RAN engineer with expertise in network optimization."},
{"role": "user", "content": "Our gNB is showing high RACH failure rate in a dense urban cell. What's your troubleshooting approach?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Using with vLLM (Merged)

python

# First merge the adapter for faster inference
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-32B", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base, "shaunak1234/qwen3-32b-telecom-expert")
merged = model.merge_and_unload()
merged.save_pretrained("qwen3-32b-telecom-merged")
# Then serve with vLLM
# vllm serve qwen3-32b-telecom-merged --dtype bfloat16

📊 Training Details

Dataset

  • Source: shaunak1234/telecom-agentic-dataset
  • Size: 2000 multi-turn conversations (12.8 MB)
  • Generation: Synthesized using Qwen3-32B via vLLM with domain-specific prompts
  • Format: ChatML (system/user/assistant turns)
  • Complexity: Mix of troubleshooting, configuration, architecture, and operational scenarios

Hyperparameters

ParameterValue
LoRA rank (r)64
LoRA alpha128
LoRA dropout0.05
Target modulesall-linear
Batch size2
Gradient accumulation16
Effective batch size32
Learning rate2e-4
LR schedulerCosine
Warmup ratio0.05
Weight decay0.01
Max sequence length2048
Epochs3
Total steps186
Precisionbfloat16
Gradient checkpointingYes (non-reentrant)

Training Infrastructure

ComponentDetails
GPUAMD Instinct MI300X (192GB HBM3)
PlatformAMD DevCloud
SoftwarePyTorch 2.5.1 + ROCm 6.2
FrameworkTransformers 4.52.4, PEFT 0.19.1
Training speed~58 seconds/step
Total training time~3 hours
Cost~6(at2/hr)

Training Metrics

  • Initial loss: 2.34
  • Trainable parameters: 536,870,912 (1.6% of 33.3B total)
  • Gradient flow: Verified on 896 LoRA parameter tensors

⚠️ Limitations

  • Fine-tuned on synthetic data generated by the base model — may reflect base model biases
  • Focused on telecom domain; general capabilities may be slightly reduced
  • Not trained for real-time network operations or safety-critical decisions
  • English only

📄 License

Apache 2.0 (following Qwen3-32B base model license)

🙏 Acknowledgments

  • Qwen Team for the excellent Qwen3-32B base model
  • AMD for MI300X GPU access via DevCloud
  • Hugging Face for PEFT, Transformers, and model hosting

Model provider

shaunak1234

Model tree

Base

Qwen/Qwen3-32B

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today