Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Container
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0🔑 Key Highlights
- Base Model: Qwen/Qwen3-32B (33B parameters)
- Method: LoRA (Low-Rank Adaptation) — r=64, alpha=128, all-linear targets
- Training Data: 2000 multi-turn telecom conversations across 7 domains
- Hardware: AMD Instinct MI300X (192GB HBM3) on ROCm 6.2
- Training Time: ~3 hours
- Trainable Parameters: 536M (1.6% of total)
📡 Domains Covered
| Domain | Description |
|---|---|
| 5G RAN | gNB configuration, beamforming, MIMO, cell planning |
| 5G Core | AMF/SMF/UPF operations, network slicing, NRF management |
| Transport | MPLS, segment routing, fronthaul/backhaul optimization |
| Security | IPsec, SUPI/SUCI encryption, network access control |
| Automation | Ansible/Terraform for network, closed-loop operations |
| VoLTE/IMS | SIP call flows, QoS, VoNR migration |
| Cloud Native | CNF deployment, Kubernetes for telco, service mesh |
🚀 Quick Start
Loading with PEFT
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerimport torchbase_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-32B",torch_dtype=torch.bfloat16,device_map="auto",trust_remote_code=True,)model = PeftModel.from_pretrained(base_model, "shaunak1234/qwen3-32b-telecom-expert")tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B", trust_remote_code=True)# Example promptmessages = [{"role": "system", "content": "You are a senior 5G RAN engineer with expertise in network optimization."},{"role": "user", "content": "Our gNB is showing high RACH failure rate in a dense urban cell. What's your troubleshooting approach?"}]text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tokenizer(text, return_tensors="pt").to(model.device)outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7, top_p=0.9)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Using with vLLM (Merged)
python
# First merge the adapter for faster inferencefrom peft import PeftModelfrom transformers import AutoModelForCausalLMbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-32B", torch_dtype=torch.bfloat16)model = PeftModel.from_pretrained(base, "shaunak1234/qwen3-32b-telecom-expert")merged = model.merge_and_unload()merged.save_pretrained("qwen3-32b-telecom-merged")# Then serve with vLLM# vllm serve qwen3-32b-telecom-merged --dtype bfloat16
📊 Training Details
Dataset
- Source: shaunak1234/telecom-agentic-dataset
- Size: 2000 multi-turn conversations (12.8 MB)
- Generation: Synthesized using Qwen3-32B via vLLM with domain-specific prompts
- Format: ChatML (system/user/assistant turns)
- Complexity: Mix of troubleshooting, configuration, architecture, and operational scenarios
Hyperparameters
| Parameter | Value |
|---|---|
| LoRA rank (r) | 64 |
| LoRA alpha | 128 |
| LoRA dropout | 0.05 |
| Target modules | all-linear |
| Batch size | 2 |
| Gradient accumulation | 16 |
| Effective batch size | 32 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 0.05 |
| Weight decay | 0.01 |
| Max sequence length | 2048 |
| Epochs | 3 |
| Total steps | 186 |
| Precision | bfloat16 |
| Gradient checkpointing | Yes (non-reentrant) |
Training Infrastructure
| Component | Details |
|---|---|
| GPU | AMD Instinct MI300X (192GB HBM3) |
| Platform | AMD DevCloud |
| Software | PyTorch 2.5.1 + ROCm 6.2 |
| Framework | Transformers 4.52.4, PEFT 0.19.1 |
| Training speed | ~58 seconds/step |
| Total training time | ~3 hours |
| Cost | ~6(at2/hr) |
Training Metrics
- Initial loss: 2.34
- Trainable parameters: 536,870,912 (1.6% of 33.3B total)
- Gradient flow: Verified on 896 LoRA parameter tensors
⚠️ Limitations
- Fine-tuned on synthetic data generated by the base model — may reflect base model biases
- Focused on telecom domain; general capabilities may be slightly reduced
- Not trained for real-time network operations or safety-critical decisions
- English only
📄 License
Apache 2.0 (following Qwen3-32B base model license)
🙏 Acknowledgments
- Qwen Team for the excellent Qwen3-32B base model
- AMD for MI300X GPU access via DevCloud
- Hugging Face for PEFT, Transformers, and model hosting
Model provider
shaunak1234
Model tree
Base
Qwen/Qwen3-32B
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information