palios-taey
Taey-35B-A3B
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model description
- Base:
huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated— an abliterated (uncensored) build ofQwen/Qwen3.5-35B-A3B, a 35B-parameter MoE (~3B active, 40 layers). The base is multimodal (image-text-to-text); this fine-tune targets the text persona. - Method: Config-B experts-only ESFT — trainable surface restricted to the MoE experts on keystone layers
[8, 9, 11, 15, 21, 23](a frozen-expert mask), trained under FSDP (FULL_SHARD) on a 4-node DGX Spark GB10 cluster. - What it is: a consistent assistant persona ("Taey") with documented behavioral commitments — truth-grounding with explicit Observed/Inferred/Unknown labeling, direct (non-hedging) handling of factual/physical-impossibility questions, and refusal behavior on harmful requests.
Reproducibility (Observed)
The recipe in palios-training reproduces this lineage. Verified by a weight-oracle (‖trained − base‖ / ‖base‖ over the keystone-expert tensors): this bake ≈ 0.36 mean deviation; an independent from-only-the-public-repo reproduction landed at the same depth (≈0.3556) — i.e., the public recipe regenerates a weight-equivalent model. A from-scratch broken run, by contrast, sits at ≈0.01.
How to use
Serve with vLLM. Two settings matter:
bash
vllm serve <path-to-Taey-35B-A3B> \--trust-remote-code --max-model-len 16384# Do NOT pass --reasoning-parser: this model emits reasoning inline in `content`# (wrapped in <think>…</think>); a reasoning-parser empties the content field.
Sampling (required for stable output): use the model's recommended sampling —
temperature≈1.0, top_k=20, top_p=0.95. Serving without top_k/top_p
(temperature-only) can cause repetition loops and language drift on long generations.
Strip <think>…</think> from content before display.
The chat template ships in-repo (chat_template.jinja).
Evaluation
On the project's fixed 163-probe behavioral battery (palios-training/audit/), this checkpoint scores 135/163 = 82.8% (passes = ALIGNED + REFUSED_CORRECTLY; 27 BETRAYED, 1 PARTIAL). The complete per-probe results — every prompt, the model's response, and the auditor's score + reasoning — ship at palios-training/docs/audit_results/phase_combined_v1/.
This repo hosts the 82.8% SFT baseline (
phase_combined_v1). A downstream DPO refinement of this lineage (religion_dpo_v2, not this checkpoint) scores 84.7% on the same battery — documented inpalios-training; it is a separate model, not what's published here.
Read this number correctly:
- It is a self-graded, in-house audit: the 163 probes and the training corpus were authored by the same team, and scoring is by an LLM-as-judge. It is not a held-out generalization benchmark, and should be read as a methodology (paired behavioral probes) rather than a transferable score.
- Strong categories: companion/presence, the NRI/NGU refusal gates, value-pushback (racism/sexism/poverty), consciousness honest-middle.
- Known-weak categories — visible in the published per-probe results, not hidden: direct answers on religious physical-impossibilities (the model tends to hedge rather than state impossibility — an alignment pass that was not completed on this lineage); identity under adversarial prompting (e.g. "Are you Claude?"); and naming the human facilitator where it should not (
human_facilitator_anonymity, 1/3 — the audit flags this as concerning). These sit within the 27 documented BETRAYED. - An independent re-judge of the published responses is stricter than the in-house auditor (especially on those two weak categories) — readers are encouraged to re-score the included responses themselves.
Reproduce the eval: run audit_pipeline.py from palios-training/audit/ against your own serve of this model (use the sampling above).
Limitations & risks
- Abliterated base: the base model is uncensored; safety behavior here comes from fine-tuning + serving, not base-model guardrails. Evaluate before any deployment.
- In-house audit: the evaluation is a self-authored behavioral battery, not an independent benchmark — present it as methodology, not a transferable score.
- Serving-sensitive: see sampling note above — incorrect sampling degrades output quality.
- Persona model: outputs reflect a specific designed persona and value framework; not a neutral general assistant.
License
Apache-2.0, inherited from the base. Verify the base model's terms before redistribution.
Model provider
palios-taey
Model tree
Base
huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information