pearl-ai

Qwen3-30B-A3B-Instruct-2507-pearl

README

License: apache-2.0

Model Details

Base model: Qwen/Qwen3-30B-A3B-Instruct-2507 (Mixture-of-Experts)
Architecture: Qwen3MoeForCausalLM
Total parameters: 30B (3B active per token, 8 experts routed per token)
Hidden layers: 48
Quantization: Mixed precision (W7A7 + FP8) via Pearl quantization.
Model type: Causal LLM (instruction-tuned, MoE)
Primary runtime target: Pearl vLLM plugin (miner/vllm-miner)
Intended use: Text generation with Pearl mining integration

Evaluation

MMLU-Pro (full test split, 12,032 questions), generative 5-shot CoT, run with the Qwen3-30B-A3B-Instruct-2507 sampling recipe (

markdown

temperature=0.7, top_p=0.8, top_k=20, min_p=0

) via lm-eval-harness on vLLM.

Table with columns: Model, MMLU-Pro
Model	MMLU-Pro
Original	77.88%
Pearl	77.33%

Intended Use

This model is intended to be served through the Pearl miner stack, where vLLM inference is integrated with Pearl mining workflows.

Typical flow:

Run pearld with RPC enabled.
Start the Pearl miner/vLLM stack.
Serve this model through vLLM while Pearl gateway/miner components handle mining-side integration.

How To Use (Pearl vLLM Plugin)

Follow the miner setup from the Pearl repo:

pearl/miner README

High-level prerequisites:

Python 3.12
uv
CUDA + NVIDIA GPU (sm90 class, e.g. H100/H200, per project docs)
Rust toolchain
Running pearld node with RPC credentials

Docker Example

From the Pearl repository root:

bash
docker buildx build -t vllm_miner . -f miner/vllm-miner/Dockerfile

bash
docker run --rm -it --gpus all \
  -p 8000:8000 -p 8337:8337 -p 8339:8339 \
  -e PEARLD_RPC_URL=<PEARLD_URL> \
  -e PEARLD_RPC_USER=<RPC_USER> \
  -e PEARLD_RPC_PASSWORD=<RPC_PASSWORD> \
  -e VLLM_USE_DEEP_GEMM=0 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --shm-size 8g \
  vllm_miner:latest \
  pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl \
  --host 0.0.0.0 --port 8000 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.9 \
  --enforce-eager

vLLM Standalone

bash
pip install vllm

vllm serve "pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl"

bash
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl",
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ]
  }'

Transformers

python
from transformers import pipeline

pipe = pipeline("text-generation", model="pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl")
model = AutoModelForCausalLM.from_pretrained("pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl")

messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

License

This model is based on Qwen3 and is distributed under the Apache 2.0 license.

Limitations

Can generate incorrect, unsafe, or biased outputs.
Requires careful deployment controls and output validation.
Hardware/software compatibility depends on the Pearl miner stack and supported GPU architectures.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

pearl-ai

Model Tree

Base

Qwen/Qwen3-30B-A3B-Instruct-2507

Quantized

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Container

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

Model Details

Base model: Qwen/Qwen3-30B-A3B-Instruct-2507 (Mixture-of-Experts)
Architecture: Qwen3MoeForCausalLM
Total parameters: 30B (3B active per token, 8 experts routed per token)
Hidden layers: 48
Quantization: Mixed precision (W7A7 + FP8) via Pearl quantization.
Model type: Causal LLM (instruction-tuned, MoE)
Primary runtime target: Pearl vLLM plugin (miner/vllm-miner)
Intended use: Text generation with Pearl mining integration

Evaluation

MMLU-Pro (full test split, 12,032 questions), generative 5-shot CoT, run with the Qwen3-30B-A3B-Instruct-2507 sampling recipe (

markdown

temperature=0.7, top_p=0.8, top_k=20, min_p=0

) via lm-eval-harness on vLLM.

Table with columns: Model, MMLU-Pro
Model	MMLU-Pro
Original	77.88%
Pearl	77.33%

Intended Use

This model is intended to be served through the Pearl miner stack, where vLLM inference is integrated with Pearl mining workflows.

Typical flow:

Run pearld with RPC enabled.
Start the Pearl miner/vLLM stack.
Serve this model through vLLM while Pearl gateway/miner components handle mining-side integration.

How To Use (Pearl vLLM Plugin)

Follow the miner setup from the Pearl repo:

pearl/miner README

High-level prerequisites:

Python 3.12
uv
CUDA + NVIDIA GPU (sm90 class, e.g. H100/H200, per project docs)
Rust toolchain
Running pearld node with RPC credentials

Docker Example

From the Pearl repository root:

bash
docker buildx build -t vllm_miner . -f miner/vllm-miner/Dockerfile

bash
docker run --rm -it --gpus all \
  -p 8000:8000 -p 8337:8337 -p 8339:8339 \
  -e PEARLD_RPC_URL=<PEARLD_URL> \
  -e PEARLD_RPC_USER=<RPC_USER> \
  -e PEARLD_RPC_PASSWORD=<RPC_PASSWORD> \
  -e VLLM_USE_DEEP_GEMM=0 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --shm-size 8g \
  vllm_miner:latest \
  pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl \
  --host 0.0.0.0 --port 8000 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.9 \
  --enforce-eager

vLLM Standalone

bash
pip install vllm

vllm serve "pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl"

bash
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl",
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ]
  }'

Transformers

python
from transformers import pipeline

pipe = pipeline("text-generation", model="pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl")
model = AutoModelForCausalLM.from_pretrained("pearl-ai/Qwen3-30B-A3B-Instruct-2507-pearl")

messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

License

This model is based on Qwen3 and is distributed under the Apache 2.0 license.

Limitations

Can generate incorrect, unsafe, or biased outputs.
Requires careful deployment controls and output validation.
Hardware/software compatibility depends on the Pearl miner stack and supported GPU architectures.