Jackrong
Qwopus3.6-27B-Coder-FP8
Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Quantization
- Source model:
Jackrong/Qwopus3.6-27B-Coder - Output format: Hugging Face safetensors
- Quantization: FP8 E4M3, dynamic activations, 128x128 weight blocks
- Runtime target: vLLM FP8 loading path
- MTP tensors: preserved; 7 MTP projection weights quantized to FP8 and indexed in
mtp.safetensors
Validation
Validated locally on GB10 with vLLM before upload.
- vLLM smoke test: passed, normal Python recursive factorial output,
garbled=false,has_answer=true - Structural check: all 64
gate_proj.weight_scale_invtensors present - Dtype check: language-model and MTP
gate_proj.weighttensors aretorch.float8_e4m3fn - 30-question no-thinking batch validation: 30/30 completed, empty outputs 0, dangerous repetition flags 0, control-character flags 0
- Validation artifacts:
test_data/vllm_fp8_30q_no_think_reviewed_report.mdand JSON results
Note: some validation prompts reached the 768-token cap because the answers were long; reviewed outputs did not show乱码, empty responses, or mechanical looping.
Loading Example
python
from vllm import LLM, SamplingParamsllm = LLM(model="Jackrong/Qwopus3.6-27B-Coder-FP8",trust_remote_code=True,max_model_len=8192,)outputs = llm.generate(["Write a Python factorial function."], SamplingParams(max_tokens=256))print(outputs[0].outputs[0].text)
Model provider
Jackrong
Model tree
Base
Jackrong/Qwopus3.6-27B-Coder
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information