flowty1
qwen-asr-0.6b-he
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Highlights
- 🎯 Hebrew-first — fine-tuned on a purpose-built, carefully curated Hebrew speech corpus collected, cleaned, and aligned specifically for this task.
- ⚡ Real-time, low-latency streaming — transcribes as you speak.
- 💻 Runs anywhere — CPU, NVIDIA GPU (vLLM), Apple Silicon (MLX), on-device.
- 🌐 Hebrew + English (plus auto-detect).
Usage
The easiest path is the companion repository, which ships a one-line file transcriber and an OpenAI-compatible server (batch + realtime streaming) for both backends:
👉 GitHub: https://github.com/flowtyone/QwenASR-he
bash
# transcribe a file (auto-selects CUDA/vLLM or Apple-Silicon/mlx-audio)uv run python examples/simple.py recording.m4a --language he
Apple Silicon (mlx-audio)
python
from mlx_audio.stt import loadmodel = load("flowty1/qwen-asr-0.6b-he")out = model.generate(audio_16k_mono_float32, language="Hebrew")print(out.text)
NVIDIA GPU (vLLM) / transformers
This is a Qwen3ASRForConditionalGeneration model and uses the Qwen3-ASR runtime.
See the companion repo above or the upstream
Qwen3-ASR project for the vLLM/transformers
inference toolkit.
Output stability & tuning
The model is accurate on real-world Hebrew, but — like most compact ASR models — it
can occasionally repeat or hallucinate on noisy or long audio. Decoding defaults to
greedy (temperature = 0), which is the most reliable baseline. If you see looping,
the most effective levers are a repetition penalty (~1.1–1.3) and capping
max new tokens. The companion repo exposes these as environment variables and
applies an additional deterministic repetition-cleanup pass.
Languages
Validated for Hebrew and English (plus auto-detect). Other languages from the base model are not guaranteed on this checkpoint.
License & attribution
Fine-tune of Qwen/Qwen3-ASR-0.6B; released under Apache-2.0, following the base model's terms. Please also refer to the base model card for its conditions.
Model provider
flowty1
Model tree
Base
Qwen/Qwen3-ASR-0.6B
Fine-tuned
this model
Modalities
Input
Audio
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information