axonlabsai/axon-oss API & Inference Endpoint

Model Details

Base model: Qwen/Qwen3-1.7B
Parameters: ~1.7B (base) + LoRA adapter (r=16, alpha=32)
Architecture: Qwen3 (transformer decoder) with LoRA adapter targeting all linear projections (q, k, v, o, gate, up, down)
LoRA rank: 16
LoRA alpha: 32
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Tokenizer: Qwen3 tokenizer with ChatML-style formatting (<|im_start|> / <|im_end|>)
Context length: Up to 32K tokens (base model capability)
License: MIT

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(base, "axonlabsai/axon-oss")
tokenizer = AutoTokenizer.from_pretrained("axonlabsai/axon-oss")

messages = [{"role": "user", "content": "Hello! What can you do?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
output = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Limitations

Not fine-tuned on domain-specific data — general purpose only
Small model size means limited reasoning depth compared to larger models
May hallucinate or produce incorrect information
Not suitable for production deployments without further fine-tuning

About Axon Labs

Axon Labs builds AI models and tools. This is our open-source contribution — a small, lightweight model for experimentation and chat.

axon-oss

Get help setting up a custom Dedicated Endpoints.

README

Model Details

Usage

Limitations

About Axon Labs

Explore FriendliAI today

axon-oss