Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Details

  • Base model: Qwen/Qwen3-30B-A3B (Mixture-of-Experts)
  • Architecture: Qwen3MoeForCausalLM
  • Total parameters: 30B (3B active per token, 8 experts routed per token)
  • Hidden layers: 48
  • Quantization: W7A7 (7-bit weights, 7-bit activations) via Pearl quantization
  • Model type: Causal LLM (instruction-tuned, MoE)
  • Primary runtime target: Pearl vLLM plugin (miner/vllm-miner)
  • Intended use: Text generation with Pearl mining integration

Intended Use

This model is intended to be served through the Pearl miner stack, where vLLM inference is integrated with Pearl mining workflows.

Typical flow:

  1. Run pearld with RPC enabled.
  2. Start the Pearl miner/vLLM stack.
  3. Serve this model through vLLM while Pearl gateway/miner components handle mining-side integration.

How To Use (Pearl vLLM Plugin)

Follow the miner setup from the Pearl repo:

High-level prerequisites:

  • Python 3.12
  • uv
  • CUDA + NVIDIA GPU (sm90 class, e.g. H100/H200, per project docs)
  • Rust toolchain
  • Running pearld node with RPC credentials

Docker Example

From the Pearl repository root:

bash

docker buildx build -t vllm_miner . -f miner/vllm-miner/Dockerfile

bash

docker run --rm -it --gpus all \
-p 8000:8000 -p 8337:8337 -p 8339:8339 \
-e PEARLD_RPC_URL=<PEARLD_URL> \
-e PEARLD_RPC_USER=<RPC_USER> \
-e PEARLD_RPC_PASSWORD=<RPC_PASSWORD> \
-e VLLM_USE_DEEP_GEMM=0 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--shm-size 8g \
vllm_miner:latest \
pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl \
--host 0.0.0.0 --port 8000 \
--max-model-len 8192 \
--gpu-memory-utilization 0.9 \
--enforce-eager

vLLM Standalone

bash

pip install vllm
vllm serve "pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl"

bash

curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'

SGLang

bash

pip install sglang
python3 -m sglang.launch_server \
--model-path "pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl" \
--host 0.0.0.0 \
--port 30000

bash

curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'

Transformers

python

from transformers import pipeline
pipe = pipeline("text-generation", model="pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages)

python

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl")
model = AutoModelForCausalLM.from_pretrained("pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

License

This model is based on Qwen3 and is distributed under the Apache 2.0 license.

Limitations

  • Can generate incorrect, unsafe, or biased outputs.
  • Requires careful deployment controls and output validation.
  • Hardware/software compatibility depends on the Pearl miner stack and supported GPU architectures.

Model provider

pearl-ai

Model tree

Base

Qwen/Qwen3-30B-A3B

Quantized

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today