Architecture
Table with columns: Parameter, Value| Parameter | Value |
|---|
| Parameters | 1,217,608 |
| Model type | qwen3 |
| Hidden size | 8 |
| Layers | 2 |
| Intermediate size | 32 |
| Attention heads | 1 |
| KV heads | 1 |
| Head dimension | 8 |
| Max position embeddings | 4,096 |
| Vocab size | 151,936 |
| Tensor dtype | bfloat16 |
| Tokenizer source | Qwen/Qwen3-0.6B local mirror |
How this model was created
scripts/tools/create_mock_qwen3.py in the Relax ROCm Megatron workspace:
- Loads the tokenizer and config metadata from a local Qwen3-0.6B checkpoint.
- Shrinks the Qwen3 config dimensions to the table above.
- Initializes
Qwen3ForCausalLM with random weights.
- Ties word embeddings and saves the model as safetensors.
- Writes
mock_qwen3_info.json with the exact generation metadata.
The model weights are random. Only tokenizer/chat-template metadata is copied from Qwen3-0.6B.
Reproduction
From the Relax ROCm Megatron repository:
source /vast/users/qirong.ho/miniforge3/etc/profile.d/conda.sh
conda activate relaxrl_rocm
python scripts/tools/create_mock_qwen3.py \
--tokenizer-source /vast/users/qirong.ho/erland/Python_project/relax_e2e_assets/Qwen3-0.6B \
--output-dir /vast/users/qirong.ho/erland/Python_project/relax_e2e_assets/Qwen3-Mock-1M
Relax e2e validation
This checkpoint was validated with the Relax AMD ROCm e2e launcher:
NUM_ROLLOUT=2 SAVE_INTERVAL=1 CKPT_FORMAT=torch_dist NO_SAVE_OPTIM=0 \
WANDB_GROUP="qwen3-mock-1m-tmux-20260531_095214" \
./amd_qwen3_mock_2gpu_e2e.sh
Validation evidence:
- Ray job:
raysubmit_sGx5uTXcKu41nHzL
- W&B run:
me4ticfh
- completed
Actor training completed step 0/2
- completed
Actor training completed step 1/2
- saved
torch_dist checkpoints at iterations 0 and 1
- checkpoint metadata contains optimizer state keys, including
optimizer.state.exp_avg and optimizer.state.exp_avg_sq
The e2e validation exercised:
- Hugging Face model load
- SGLang transformers rollout
- Megatron Qwen3Bridge import
- distributed weight update
- optimizer step
- W&B application metrics
- optimizer-inclusive
torch_dist checkpoint save
Intended use
- Fast Relax/Megatron/SGLang startup and integration tests
- ROCm smoke tests where Qwen3 code paths matter more than model quality
- Checkpointing and resume infrastructure checks
- Debugging model-provider, tokenizer, rollout, and weight-sync wiring
Not intended for
- Inference quality evaluation
- Benchmarking Qwen3 capability
- Any downstream task
- Reward/loss quality analysis
Because the model is random and extremely small, generated text is expected to be nonsense. During the validation run, rewards were invalid/negative and advantages collapsed to zero; this is expected for this smoke checkpoint.