flowty1

qwen-asr-0.6b-he

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Highlights

  • 🎯 Hebrew-first — fine-tuned on a purpose-built, carefully curated Hebrew speech corpus collected, cleaned, and aligned specifically for this task.
  • Real-time, low-latency streaming — transcribes as you speak.
  • 💻 Runs anywhere — CPU, NVIDIA GPU (vLLM), Apple Silicon (MLX), on-device.
  • 🌐 Hebrew + English (plus auto-detect).

Usage

The easiest path is the companion repository, which ships a one-line file transcriber and an OpenAI-compatible server (batch + realtime streaming) for both backends:

👉 GitHub: https://github.com/flowtyone/QwenASR-he

bash

# transcribe a file (auto-selects CUDA/vLLM or Apple-Silicon/mlx-audio)
uv run python examples/simple.py recording.m4a --language he

Apple Silicon (mlx-audio)

python

from mlx_audio.stt import load
model = load("flowty1/qwen-asr-0.6b-he")
out = model.generate(audio_16k_mono_float32, language="Hebrew")
print(out.text)

NVIDIA GPU (vLLM) / transformers

This is a Qwen3ASRForConditionalGeneration model and uses the Qwen3-ASR runtime. See the companion repo above or the upstream Qwen3-ASR project for the vLLM/transformers inference toolkit.

Output stability & tuning

The model is accurate on real-world Hebrew, but — like most compact ASR models — it can occasionally repeat or hallucinate on noisy or long audio. Decoding defaults to greedy (temperature = 0), which is the most reliable baseline. If you see looping, the most effective levers are a repetition penalty (~1.11.3) and capping max new tokens. The companion repo exposes these as environment variables and applies an additional deterministic repetition-cleanup pass.

Languages

Validated for Hebrew and English (plus auto-detect). Other languages from the base model are not guaranteed on this checkpoint.

License & attribution

Fine-tune of Qwen/Qwen3-ASR-0.6B; released under Apache-2.0, following the base model's terms. Please also refer to the base model card for its conditions.

Model provider

flowty1

Model tree

Base

Qwen/Qwen3-ASR-0.6B

Fine-tuned

this model

Modalities

Input

Audio

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today