Jackrong

Qwopus3.6-27B-Coder-FP8

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Quantization

  • Source model: Jackrong/Qwopus3.6-27B-Coder
  • Output format: Hugging Face safetensors
  • Quantization: FP8 E4M3, dynamic activations, 128x128 weight blocks
  • Runtime target: vLLM FP8 loading path
  • MTP tensors: preserved; 7 MTP projection weights quantized to FP8 and indexed in mtp.safetensors

Validation

Validated locally on GB10 with vLLM before upload.

  • vLLM smoke test: passed, normal Python recursive factorial output, garbled=false, has_answer=true
  • Structural check: all 64 gate_proj.weight_scale_inv tensors present
  • Dtype check: language-model and MTP gate_proj.weight tensors are torch.float8_e4m3fn
  • 30-question no-thinking batch validation: 30/30 completed, empty outputs 0, dangerous repetition flags 0, control-character flags 0
  • Validation artifacts: test_data/vllm_fp8_30q_no_think_reviewed_report.md and JSON results

Note: some validation prompts reached the 768-token cap because the answers were long; reviewed outputs did not show乱码, empty responses, or mechanical looping.

Loading Example

python

from vllm import LLM, SamplingParams
llm = LLM(
model="Jackrong/Qwopus3.6-27B-Coder-FP8",
trust_remote_code=True,
max_model_len=8192,
)
outputs = llm.generate(["Write a Python factorial function."], SamplingParams(max_tokens=256))
print(outputs[0].outputs[0].text)

Model provider

Jackrong

Model tree

Base

Jackrong/Qwopus3.6-27B-Coder

Quantized

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today