Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Architecture
| Parameter | Value |
|---|---|
| Parameters | 1,217,608 |
| Model type | qwen3 |
| Hidden size | 8 |
| Layers | 2 |
| Intermediate size | 32 |
| Attention heads | 1 |
| KV heads | 1 |
| Head dimension | 8 |
| Max position embeddings | 4,096 |
| Vocab size | 151,936 |
| Tensor dtype | bfloat16 |
| Tokenizer source | Qwen/Qwen3-0.6B local mirror |
How this model was created
scripts/tools/create_mock_qwen3.py in the Relax ROCm Megatron workspace:
- Loads the tokenizer and config metadata from a local Qwen3-0.6B checkpoint.
- Shrinks the Qwen3 config dimensions to the table above.
- Initializes
Qwen3ForCausalLMwith random weights. - Ties word embeddings and saves the model as safetensors.
- Writes
mock_qwen3_info.jsonwith the exact generation metadata.
The model weights are random. Only tokenizer/chat-template metadata is copied from Qwen3-0.6B.
Reproduction
From the Relax ROCm Megatron repository:
bash
source /vast/users/qirong.ho/miniforge3/etc/profile.d/conda.shconda activate relaxrl_rocmpython scripts/tools/create_mock_qwen3.py \--tokenizer-source /vast/users/qirong.ho/erland/Python_project/relax_e2e_assets/Qwen3-0.6B \--output-dir /vast/users/qirong.ho/erland/Python_project/relax_e2e_assets/Qwen3-Mock-1M
Relax e2e validation
This checkpoint was validated with the Relax AMD ROCm e2e launcher:
bash
NUM_ROLLOUT=2 SAVE_INTERVAL=1 CKPT_FORMAT=torch_dist NO_SAVE_OPTIM=0 \WANDB_GROUP="qwen3-mock-1m-tmux-20260531_095214" \./amd_qwen3_mock_2gpu_e2e.sh
Validation evidence:
- Ray job:
raysubmit_sGx5uTXcKu41nHzL - W&B run:
me4ticfh - completed
Actor training completed step 0/2 - completed
Actor training completed step 1/2 - saved
torch_distcheckpoints at iterations 0 and 1 - checkpoint metadata contains optimizer state keys, including
optimizer.state.exp_avgandoptimizer.state.exp_avg_sq
The e2e validation exercised:
- Hugging Face model load
- SGLang transformers rollout
- Megatron Qwen3Bridge import
- distributed weight update
- optimizer step
- W&B application metrics
- optimizer-inclusive
torch_distcheckpoint save
Intended use
- Fast Relax/Megatron/SGLang startup and integration tests
- ROCm smoke tests where Qwen3 code paths matter more than model quality
- Checkpointing and resume infrastructure checks
- Debugging model-provider, tokenizer, rollout, and weight-sync wiring
Not intended for
- Inference quality evaluation
- Benchmarking Qwen3 capability
- Any downstream task
- Reward/loss quality analysis
Because the model is random and extremely small, generated text is expected to be nonsense. During the validation run, rewards were invalid/negative and advantages collapsed to zero; this is expected for this smoke checkpoint.
Model provider
Erland
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information