Holo 3.1 35B A3B Mixed NVFP4 BF16-Head Overlay

This repository contains only the unique patched overlay files for the loadable mixed NVFP4 runtime variant. It does not reupload the full base NVFP4 checkpoint.

Base checkpoint:

text
Hcompany/Holo-3.1-35B-A3B-NVFP4

Patch purpose:

keep the base mixed ModelOpt NVFP4/FP8 model body
replace packed lm_head.weight with a dequantized BF16 full-width head
filter stale lm_head.* tensors out of shard 3 so vLLM does not load the packed head
preserve the patched config.json and model.safetensors.index.json

Reconstruct on a RunPod volume:

bash
BASE=/workspace/holo3/models/Holo-3.1-35B-A3B-NVFP4
PATCH=/workspace/holo3/models/Holo-3.1-35B-A3B-NVFP4-bf16-head
mkdir -p "$PATCH"
for f in "$BASE"/*; do ln -s "$f" "$PATCH/$(basename "$f")" 2>/dev/null || true; done
hf download akzaidan/holo31-mixed-nvfp4-bf16-head-overlay --local-dir /tmp/holo31-overlay
cp -f /tmp/holo31-overlay/config.json "$PATCH/config.json"
cp -f /tmp/holo31-overlay/model.safetensors.index.json "$PATCH/model.safetensors.index.json"
cp -f /tmp/holo31-overlay/model-00003-of-00003.safetensors "$PATCH/model-00003-of-00003.safetensors"
cp -f /tmp/holo31-overlay/model-lm-head-bf16.safetensors "$PATCH/model-lm-head-bf16.safetensors"
cp -f /tmp/holo31-overlay/start_vllm_nvfp4_bf16_head.sh /workspace/holo3/scripts/start_vllm_nvfp4_bf16_head.sh
chmod +x /workspace/holo3/scripts/start_vllm_nvfp4_bf16_head.sh

Launch:

bash
/workspace/holo3/scripts/start_vllm_nvfp4_bf16_head.sh

Served model name:

text
holo3-1-35b-a3b-mixed-nvfp4

For normal non-thinking OpenAI-compatible chat responses, send:

json
{"chat_template_kwargs":{"enable_thinking":false}}

With default thinking enabled and --reasoning-parser qwen3, vLLM routes open <think> text into the reasoning field, so simple prompts may return content: null until the model emits answer text after </think>.

holo31-mixed-nvfp4-bf16-head-overlay

Get help setting up a custom Dedicated Endpoints.

README

Holo 3.1 35B A3B Mixed NVFP4 BF16-Head Overlay

Explore FriendliAI today

holo31-mixed-nvfp4-bf16-head-overlay