Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What this repo is

  • A lightweight Hugging Face model repo with real weights and tokenizer files
  • Intended to be easier to deploy on consumer hardware than much larger LLMs
  • Branded and published under the Fox 1.6 name

What this repo is not

  • It is not trained from scratch here
  • It is not a claim that the model beats every large frontier model
  • It is a fork of the upstream Qwen 3 1.7B checkpoint

Intended use

  • On-device or low-footprint inference
  • Assistant-style chat and completion
  • Rapid experimentation with a compact base model

Notes

If you want Fox 1.6 to become a genuinely new model, the next step is to fine-tune this fork on a curated instruction dataset and evaluate it against your target benchmarks.

🤖 Run with Ollama

bash

ollama run hf.co/teolm30/fox-1.6

Model provider

teolm30

Model tree

Base

Qwen/Qwen3-1.7B

Quantized

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today