Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Details
- Base model:
Qwen/Qwen3-30B-A3B(Mixture-of-Experts) - Architecture:
Qwen3MoeForCausalLM - Total parameters: 30B (3B active per token, 8 experts routed per token)
- Hidden layers: 48
- Quantization: W7A7 (7-bit weights, 7-bit activations) via Pearl quantization
- Model type: Causal LLM (instruction-tuned, MoE)
- Primary runtime target: Pearl vLLM plugin (
miner/vllm-miner) - Intended use: Text generation with Pearl mining integration
Intended Use
This model is intended to be served through the Pearl miner stack, where vLLM inference is integrated with Pearl mining workflows.
Typical flow:
- Run
pearldwith RPC enabled. - Start the Pearl miner/vLLM stack.
- Serve this model through vLLM while Pearl gateway/miner components handle mining-side integration.
How To Use (Pearl vLLM Plugin)
Follow the miner setup from the Pearl repo:
High-level prerequisites:
- Python 3.12
uv- CUDA + NVIDIA GPU (sm90 class, e.g. H100/H200, per project docs)
- Rust toolchain
- Running
pearldnode with RPC credentials
Docker Example
From the Pearl repository root:
bash
docker buildx build -t vllm_miner . -f miner/vllm-miner/Dockerfile
bash
docker run --rm -it --gpus all \-p 8000:8000 -p 8337:8337 -p 8339:8339 \-e PEARLD_RPC_URL=<PEARLD_URL> \-e PEARLD_RPC_USER=<RPC_USER> \-e PEARLD_RPC_PASSWORD=<RPC_PASSWORD> \-e VLLM_USE_DEEP_GEMM=0 \-v ~/.cache/huggingface:/root/.cache/huggingface \--shm-size 8g \vllm_miner:latest \pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl \--host 0.0.0.0 --port 8000 \--max-model-len 8192 \--gpu-memory-utilization 0.9 \--enforce-eager
vLLM Standalone
bash
pip install vllmvllm serve "pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl"
bash
curl -X POST "http://localhost:8000/v1/chat/completions" \-H "Content-Type: application/json" \--data '{"model": "pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl","messages": [{"role": "user","content": "What is the capital of France?"}]}'
SGLang
bash
pip install sglangpython3 -m sglang.launch_server \--model-path "pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl" \--host 0.0.0.0 \--port 30000
bash
curl -X POST "http://localhost:30000/v1/chat/completions" \-H "Content-Type: application/json" \--data '{"model": "pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl","messages": [{"role": "user","content": "What is the capital of France?"}]}'
Transformers
python
from transformers import pipelinepipe = pipeline("text-generation", model="pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl")messages = [{"role": "user", "content": "Who are you?"},]pipe(messages)
python
from transformers import AutoTokenizer, AutoModelForCausalLMtokenizer = AutoTokenizer.from_pretrained("pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl")model = AutoModelForCausalLM.from_pretrained("pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl")messages = [{"role": "user", "content": "Who are you?"},]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,tokenize=True,return_dict=True,return_tensors="pt",).to(model.device)outputs = model.generate(**inputs, max_new_tokens=40)print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
License
This model is based on Qwen3 and is distributed under the Apache 2.0 license.
Limitations
- Can generate incorrect, unsafe, or biased outputs.
- Requires careful deployment controls and output validation.
- Hardware/software compatibility depends on the Pearl miner stack and supported GPU architectures.
Model provider
pearl-ai
Model tree
Base
Qwen/Qwen3-30B-A3B
Quantized
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information