rushilsaraf

qwen3-actionable-v2-adapter

Use case

Reflections streams audio + video from MentraOS smart glasses, transcribes with Soniox, attributes speakers with an active-speaker-detection model, then asks this classifier whether the latest finalized sentence is actionable. When P(actionable) >= GLASSES_GATE_THRESHOLD (default 0.25), the pipeline escalates to Claude Haiku with tools (web search, Google Maps, Google Calendar). Otherwise the turn is dropped silently.

The classifier is not a general chat model. It is trained to output a single label (0 or 1) given five structured context inputs:

Transcript — recent speaker-attributed turns, with the target sentence marked.
Memory — short summary of prior sessions (read from memory.md).
Available tools — names of tools the agent could call this turn (e.g. send_message, create_calendar_event).
Location — a coarse description + lat/lon (used by maps-style tools).
Entity list — known people in the wearer's life, with facts (allergies, preferences, etc.).

Prompts are rendered into Qwen's ChatML format and the score is softmax(logits)[1] over the two-token vocabulary {0, 1} at the <label> position.

How to use

The adapter is intended to be loaded onto the Qwen3-1.7B base model with PEFT:

python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

BASE = "Qwen/Qwen3-1.7B"
ADAPTER = "rushilsaraf/qwen3-actionable-v2-adapter"

tokenizer = AutoTokenizer.from_pretrained(BASE)
base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.float16)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()

# In the live Reflections pipeline, prompts are rendered by
# packages/proactivity/render.py and scored as:
#   logits = model(input_ids).logits[0, -1]
#   p_actionable = torch.softmax(logits[[tok0, tok1]], dim=0)[1].item()

For end-to-end use, install Reflections and run python -m apps.viewer — the LoRA loads automatically from this Hub repo (override with REFLECTIONS_LORA_MODEL_ID).

Training

Table

Training base	`unsloth/qwen3-1.7b-unsloth-bnb-4bit` (Unsloth 4-bit)
Inference base	`Qwen/Qwen3-1.7B` (float16)
Framework	Unsloth + TRL SFT
Hardware	Single T4 (free Colab tier)
Wall-clock	~50 minutes
Examples	~400 (synthetic, labeled)

Adapter config

Table with columns: Parameter, Value
Parameter	Value
PEFT type	LoRA
Rank (`r`)	8
LoRA alpha	16
LoRA dropout	0.05
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`,

Gate thresholds (Reflections-side)

Two distinct knobs, do not conflate:

GLASSES_GATE_THRESHOLD (default 0.25) — the live gate used by the agent worker. Sentences scoring below this are silently dropped.
REASONING_TRIGGER (0.45) — only used by the offline smoke-test path (scripts/smoke_full_transcript.py, scripts/smoke_server.py) to decide whether to also generate a reasoning trace.

The live path never reads REASONING_TRIGGER.

Performance (held-out synthetic test set)

Table with columns: Metric, Value
Metric	Value
Test accuracy	88.6%
Train accuracy	94.3%
Train–test gap	+5.7%
Raw Qwen3-1.7B (no LoRA)	51.1%
Lift from LoRA	+37.5 points
Mean inference latency (Apple Silicon MPS)	~196 ms
p95 latency	~257 ms
Throughput	~5 classifications / sec

The benchmark is a synthetic dataset matched to the training distribution. Real-world ASR transcripts are not yet part of the evaluation set — see Limitations below.

Limitations

English only.
Synthetic training data ceiling. The 400-example training set was generated to cover entity / memory / tool / location signals. Real-world ASR disfluencies are not represented.
Weak categories. Per-category breakdowns show tool_dependent at ~40% and location_dependent at ~60% accuracy. Adding ~25 paired-negative examples per category should fix the imbalance in the next training cycle.
Not a general classifier. The model expects the exact 5-input prompt structure produced by packages/proactivity/render.py in Reflections. Out-of-distribution prompts will produce unreliable scores.
Not safety-critical. Do not use for medical, legal, or moderation decisions. This is a latency-saving gate in front of a stronger downstream LLM, not a standalone judgment.

License

Base model (Qwen/Qwen3-1.7B): Apache 2.0 — see the Qwen3 license.
This LoRA adapter: MIT, matching the Reflections repository.

Combined use of base + adapter remains subject to the Apache 2.0 license of the Qwen3 weights.

Citation

If you use this adapter, please reference the Reflections project and the Qwen3 base model:

bibtex
@misc{qwen3-2025,
  title  = {Qwen3 Technical Report},
  author = {Qwen Team},
  year   = {2025},
  url    = {https://huggingface.co/Qwen/Qwen3-1.7B}
}

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

rushilsaraf

Model Tree

Base

Qwen/Qwen3-1.7B

Adapter

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Container

Explore FriendliAI today

Get started Talk to an engineer

Use case

The classifier is not a general chat model. It is trained to output a single label (0 or 1) given five structured context inputs:

Transcript — recent speaker-attributed turns, with the target sentence marked.
Memory — short summary of prior sessions (read from memory.md).
Available tools — names of tools the agent could call this turn (e.g. send_message, create_calendar_event).
Location — a coarse description + lat/lon (used by maps-style tools).
Entity list — known people in the wearer's life, with facts (allergies, preferences, etc.).

Prompts are rendered into Qwen's ChatML format and the score is softmax(logits)[1] over the two-token vocabulary {0, 1} at the <label> position.

How to use

The adapter is intended to be loaded onto the Qwen3-1.7B base model with PEFT:

python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

BASE = "Qwen/Qwen3-1.7B"
ADAPTER = "rushilsaraf/qwen3-actionable-v2-adapter"

tokenizer = AutoTokenizer.from_pretrained(BASE)
base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.float16)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()

# In the live Reflections pipeline, prompts are rendered by
# packages/proactivity/render.py and scored as:
#   logits = model(input_ids).logits[0, -1]
#   p_actionable = torch.softmax(logits[[tok0, tok1]], dim=0)[1].item()

For end-to-end use, install Reflections and run python -m apps.viewer — the LoRA loads automatically from this Hub repo (override with REFLECTIONS_LORA_MODEL_ID).

Training

Table

Training base	`unsloth/qwen3-1.7b-unsloth-bnb-4bit` (Unsloth 4-bit)
Inference base	`Qwen/Qwen3-1.7B` (float16)
Framework	Unsloth + TRL SFT
Hardware	Single T4 (free Colab tier)
Wall-clock	~50 minutes
Examples	~400 (synthetic, labeled)

Adapter config

Table with columns: Parameter, Value
Parameter	Value
PEFT type	LoRA
Rank (`r`)	8
LoRA alpha	16
LoRA dropout	0.05
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`,

Gate thresholds (Reflections-side)

Two distinct knobs, do not conflate:

GLASSES_GATE_THRESHOLD (default 0.25) — the live gate used by the agent worker. Sentences scoring below this are silently dropped.
REASONING_TRIGGER (0.45) — only used by the offline smoke-test path (scripts/smoke_full_transcript.py, scripts/smoke_server.py) to decide whether to also generate a reasoning trace.

The live path never reads REASONING_TRIGGER.

Performance (held-out synthetic test set)

Table with columns: Metric, Value
Metric	Value
Test accuracy	88.6%
Train accuracy	94.3%
Train–test gap	+5.7%
Raw Qwen3-1.7B (no LoRA)	51.1%
Lift from LoRA	+37.5 points
Mean inference latency (Apple Silicon MPS)	~196 ms
p95 latency	~257 ms
Throughput	~5 classifications / sec

The benchmark is a synthetic dataset matched to the training distribution. Real-world ASR transcripts are not yet part of the evaluation set — see Limitations below.

Limitations

English only.
Synthetic training data ceiling. The 400-example training set was generated to cover entity / memory / tool / location signals. Real-world ASR disfluencies are not represented.
Weak categories. Per-category breakdowns show tool_dependent at ~40% and location_dependent at ~60% accuracy. Adding ~25 paired-negative examples per category should fix the imbalance in the next training cycle.
Not a general classifier. The model expects the exact 5-input prompt structure produced by packages/proactivity/render.py in Reflections. Out-of-distribution prompts will produce unreliable scores.
Not safety-critical. Do not use for medical, legal, or moderation decisions. This is a latency-saving gate in front of a stronger downstream LLM, not a standalone judgment.

License

Base model (Qwen/Qwen3-1.7B): Apache 2.0 — see the Qwen3 license.
This LoRA adapter: MIT, matching the Reflections repository.

Combined use of base + adapter remains subject to the Apache 2.0 license of the Qwen3 weights.

Citation

If you use this adapter, please reference the Reflections project and the Qwen3 base model:

bibtex
@misc{qwen3-2025,
  title  = {Qwen3 Technical Report},
  author = {Qwen Team},
  year   = {2025},
  url    = {https://huggingface.co/Qwen/Qwen3-1.7B}
}

qwen3-actionable-v2-adapter

README

Use case

How to use

Training

Adapter config

Gate thresholds (Reflections-side)

Performance (held-out synthetic test set)

Limitations

License

Citation

Explore FriendliAI today

README

Use case

How to use

Training

Adapter config

Gate thresholds (Reflections-side)

Performance (held-out synthetic test set)

Limitations

License

Citation