Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Architecture

ParameterValue
Parameters1,217,608
Model typeqwen3
Hidden size8
Layers2
Intermediate size32
Attention heads1
KV heads1
Head dimension8
Max position embeddings4,096
Vocab size151,936
Tensor dtypebfloat16
Tokenizer sourceQwen/Qwen3-0.6B local mirror

How this model was created

scripts/tools/create_mock_qwen3.py in the Relax ROCm Megatron workspace:

  1. Loads the tokenizer and config metadata from a local Qwen3-0.6B checkpoint.
  2. Shrinks the Qwen3 config dimensions to the table above.
  3. Initializes Qwen3ForCausalLM with random weights.
  4. Ties word embeddings and saves the model as safetensors.
  5. Writes mock_qwen3_info.json with the exact generation metadata.

The model weights are random. Only tokenizer/chat-template metadata is copied from Qwen3-0.6B.

Reproduction

From the Relax ROCm Megatron repository:

bash

source /vast/users/qirong.ho/miniforge3/etc/profile.d/conda.sh
conda activate relaxrl_rocm
python scripts/tools/create_mock_qwen3.py \
--tokenizer-source /vast/users/qirong.ho/erland/Python_project/relax_e2e_assets/Qwen3-0.6B \
--output-dir /vast/users/qirong.ho/erland/Python_project/relax_e2e_assets/Qwen3-Mock-1M

Relax e2e validation

This checkpoint was validated with the Relax AMD ROCm e2e launcher:

bash

NUM_ROLLOUT=2 SAVE_INTERVAL=1 CKPT_FORMAT=torch_dist NO_SAVE_OPTIM=0 \
WANDB_GROUP="qwen3-mock-1m-tmux-20260531_095214" \
./amd_qwen3_mock_2gpu_e2e.sh

Validation evidence:

  • Ray job: raysubmit_sGx5uTXcKu41nHzL
  • W&B run: me4ticfh
  • completed Actor training completed step 0/2
  • completed Actor training completed step 1/2
  • saved torch_dist checkpoints at iterations 0 and 1
  • checkpoint metadata contains optimizer state keys, including optimizer.state.exp_avg and optimizer.state.exp_avg_sq

The e2e validation exercised:

  • Hugging Face model load
  • SGLang transformers rollout
  • Megatron Qwen3Bridge import
  • distributed weight update
  • optimizer step
  • W&B application metrics
  • optimizer-inclusive torch_dist checkpoint save

Intended use

  • Fast Relax/Megatron/SGLang startup and integration tests
  • ROCm smoke tests where Qwen3 code paths matter more than model quality
  • Checkpointing and resume infrastructure checks
  • Debugging model-provider, tokenizer, rollout, and weight-sync wiring

Not intended for

  • Inference quality evaluation
  • Benchmarking Qwen3 capability
  • Any downstream task
  • Reward/loss quality analysis

Because the model is random and extremely small, generated text is expected to be nonsense. During the validation run, rewards were invalid/negative and advantages collapsed to zero; this is expected for this smoke checkpoint.

Model provider

Erland

Erland

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today