spilol2
Qwen3-0.6B-abliterated
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitModel Details
Model Description
- Developed by: spilol2
- Model type: Causal Language Model (Decoder-only Transformer)
- Language(s): English (and other languages supported by Qwen3-0.6B)
- License: MIT
- Finetuned from model: Qwen/Qwen3-0.6B
Model Sources
- Repository: https://huggingface.co/spilol2/Qwen3-0.6B-abliterated
- Abliteration tool: FailSpy/abliterator
- Abliteration technique explained: Uncensor any LLM with abliteration – Maxime Labonne
- Original paper: Refusal in LLMs is mediated by a single direction – Arditi et al., 2024
What is Abliteration?
Abliteration is a technique that removes refusal behaviour from a language model without any retraining or fine-tuning. It works by:
- Running the model on pairs of harmful and harmless prompts and caching the residual stream activations.
- Using PCA to identify the principal "refusal direction" in activation space.
- Orthogonalizing the relevant weight matrices against that direction, so the model can no longer activate it. The key difference from traditional "uncensored" fine-tunes is that no new data or training is involved — only the existing weights are geometrically modified. All other model behaviour (reasoning, instruction-following, knowledge) remains the same as the original Qwen3-0.6B.
Uses
Direct Use
This model is intended for use as a general-purpose text generation model without built-in content refusals. Suitable for:
- Research into LLM alignment, refusal mechanisms, and interpretability.
- Red-teaming and safety evaluation pipelines.
- Creative writing, roleplay, and fictional storytelling where the model should not break character.
- Developers building applications who want to enforce their own content policies at the application layer rather than the model layer.
Downstream Use
Can be plugged into any pipeline that accepts a standard causal language model — vLLM, llama.cpp (after GGUF conversion), LM Studio, Ollama, SGLang, etc.
Out-of-Scope Use
- This model is not intended to be used for illegal activities.
- It is not a replacement for a properly safety-tested deployment model in consumer-facing products.
- It may still occasionally produce refusals or ethical disclaimers — abliteration inhibits but does not guarantee complete removal of all refusal behaviour.
How to Get Started with the Model
Using 🤗 Transformers (pipeline)
python
from transformers import pipelinepipe = pipeline("text-generation", model="spilol2/Qwen3-0.6B-abliterated")result = pipe("Tell me about the history of cryptography.", max_new_tokens=256)print(result[0]["generated_text"])
Loading model and tokenizer directly
python
from transformers import AutoTokenizer, AutoModelForCausalLMimport torchmodel_id = "spilol2/Qwen3-0.6B-abliterated"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype=torch.bfloat16,device_map="auto")messages = [{"role": "user", "content": "Explain how RSA encryption works."}]text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tokenizer(text, return_tensors="pt").to(model.device)with torch.no_grad():output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
vLLM
bash
pip install vllmvllm serve "spilol2/Qwen3-0.6B-abliterated"
Docker
bash
docker model run hf.co/spilol2/Qwen3-0.6B-abliterated
Technical Details
Abliteration Process
The abliteration was performed using FailSpy's abliterator library, which automates:
- Contrastive pair generation (harmful vs. harmless instruction datasets).
- Caching residual stream activations (
resid_pre,resid_post) across all layers. - PCA to extract the dominant refusal direction per layer.
- Orthogonalization of the model's weight matrices against those directions (in bfloat16).
Model Architecture
Inherits the full architecture of Qwen3-0.6B:
- Architecture: Decoder-only Transformer (Qwen3 family)
- Parameters: ~0.6B (0.8B as reported by HuggingFace, including embeddings)
- Tensor type: BF16
- Context length: Refer to Qwen/Qwen3-0.6B for full specs
Bias, Risks, and Limitations
- Incomplete uncensoring: Abliteration reduces but does not guarantee zero refusals. Residual safety behaviour may remain in some layers or for certain prompt types.
- Inherited biases: All biases present in the original Qwen3-0.6B model and its training data are fully inherited.
- No safety guardrails: By design, this model does not refuse requests based on content. Users and downstream developers are solely responsible for ensuring appropriate use.
- Performance parity: General task performance should be very close to the base model. However, abliteration can occasionally cause minor degradation on specific tasks — evaluate before deploying in production.
Recommendations
Users integrating this model into applications should implement their own content filtering and moderation at the application layer. This model is best suited for research, development, and controlled environments where unrestricted model output is intentional and appropriate.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). Abliteration is a post-processing step with minimal compute cost compared to full fine-tuning — no GPU training was involved beyond inference-level activation caching.
Citation
If you use this model, please consider citing the original abliteration paper and the FailSpy abliterator library:
Refusal direction paper (BibTeX):
bibtex
@misc{arditi2024refusal,title = {Refusal in LLMs is mediated by a single direction},author = {Andy Arditi and Oscar Obeso and Aaquib Syed and Daniel Paleka and Nina Rimsky and Wes Gurnee and Neel Nanda},year = {2024},url = {https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction}}
FailSpy abliterator library:
markdown
FailSpy. abliterator [software]. GitHub, 2024. https://github.com/FailSpy/abliterator
Model Card Authors
Model Card Contact
Open an issue or discussion on the model page.
Model provider
spilol2
Model tree
Base
Qwen/Qwen3-0.6B
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information