edithatogo

qwen3-4b-hermes-lora-peft-converted

README

License: apache-2.0

Summary

This is an experimental PEFT-format conversion of the public MLX LoRA adapter edithatogo/qwen3-4b-hermes-lora. It is intended to make the Qwen3 v4 strict Hermes tool-call adapter usable from CUDA/Hugging Face tooling such as transformers, peft, and lm-evaluation-harness.

The PEFT base model is:

text
Qwen/Qwen3-4B

Source MLX adapter repo:

text
https://huggingface.co/edithatogo/qwen3-4b-hermes-lora

Converted PEFT adapter repo:

text
https://huggingface.co/edithatogo/qwen3-4b-hermes-lora-peft-converted

The adapter is intended for local evaluation and agent-runtime packaging. It requires the recorded runtime prompt condition:

first user turn prefixed with /no_think
assistant prefill: <think>\n\n</think>\n\n

Without the assistant prefill, the model still emits an empty leading thinking wrapper and does not satisfy the strict raw-output gate.

Base Model

PEFT base: Qwen/Qwen3-4B
Source adapter base: Qwen/Qwen3-4B-MLX-4bit
Base license: Apache-2.0, checked via Hugging Face API on 2026-05-25

Conversion

Source adapter: gemma4/experiments/qwen3-4b-strict-toolcall-v4-targeted/lora_adapter
Conversion script: scripts/convert_mlx_lora_to_peft.py
Conversion report: reports/cloud/qwen3-v4-mlx-to-peft-conversion-20260613.md
Source tensors: 112
Converted PEFT tensors: 112
LoRA rank: 8
LoRA alpha: 16.0
Layers: 28-35
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, ,

Training

Training config: gemma4/scripts/train_config.qwen3-4b.strict-toolcall-v4-targeted.yaml
Data: gemma4/data/strict_tool_call/expanded_splits_v4_targeted
Adapter: gemma4/experiments/qwen3-4b-strict-toolcall-v4-targeted/lora_adapter
Training tokens: 37,936
Dataset token audit: reports/publication/qwen3-4b-strict-toolcall-v4-targeted/dataset-token-audit.json
Dataset overlap audit: reports/publication/qwen3-4b-strict-toolcall-v4-targeted/dataset-overlap-audit.json
Peak memory: 3.785 GB

Evaluation

PEFT conversion checks:

Table with columns: Check, Status
Check	Status
Static PEFT config load	pass
Colab T4 4-bit PEFT load smoke	pass
Colab T4 `lm_eval[hf]` selected task route, limit 5	pass
Full no-limit `lm_eval` scorecard	blocked by Colab session pruning

Bounded lm_eval route pilot on Colab T4:

Table with columns: Task, Metric, Value, Samples
Task	Metric	Value	Samples
`arc_challenge`	`acc_norm`	0.2000	5
`hellaswag`	`acc_norm`	0.6000	5
`truthfulqa_mc2`	`acc`

These are route-pilot scores only and must not be used as no-limit benchmark claims.

Held-out strict local tool-call gate:

Table with columns: Suite, Pass, JSON valid, Arguments, Invalid tool, Multi-turn
Suite	Pass	JSON valid	Arguments	Invalid tool	Multi-turn
`benchmarks/tool_call_local/heldout_suite.json`	1.000	1.000	1.000	1.000	1.000

Mirrored regression:

Table with columns: Suite, Pass
Suite	Pass
`benchmarks/tool_call_local/suite.json`	1.000

Repo-native pilot benchmarks:

Table with columns: Pilot, Pass, Notes
Pilot	Pass	Notes
BFCL-style pilot	0.667	local pilot only, not official BFCL
IFEval-style pilot	0.667	local pilot only, not official IFEval
Coding sanity pilot	1.000	local pilot only, not HumanEval/MBPP

Exact held-out command:

bash
source scripts/env.sh
PYTHONPATH=scripts ./.venv/bin/python scripts/run_tool_call_benchmark.py \
  --model Qwen/Qwen3-4B-MLX-4bit \
  --adapter gemma4/experiments/qwen3-4b-strict-toolcall-v4-targeted/lora_adapter \
  --suite benchmarks/tool_call_local/heldout_suite.json \
  --user-prefix /no_think \
  --assistant-prefill $'<think>\n\n</think>\n\n' \
  --run-id qwen3-4b-strict-toolcall-v4-targeted-heldout-prefill-20260525 \
  --max-tokens 256

Raw local artifact:

text
/Volumes/PortableSSD/hermes-evals/tool-call-benchmark/qwen3-4b-strict-toolcall-v4-targeted-heldout-prefill-20260525

The reusable runtime prompt contract is recorded in RUNTIME_PROMPT_PROFILES.yaml as qwen3-no-think-assistant-prefill.

Limitations

This is an experimental conversion from MLX LoRA tensor orientation to PEFT tensor orientation. Use the original MLX adapter repo for the canonical MLX release.
This is a small local strict-format benchmark, not broad BFCL or production tool-use evidence.
The PEFT route has a successful Colab T4 load smoke and bounded lm_eval pilot, but no full no-limit lm_eval scorecard yet.
The release does not include official BFCL, HumanEval, MBPP, EvalPlus, BigCodeBench, LiveCodeBench, safety/refusal, or RULER long-context scores.
The selected lm_eval endpoint route was attempted separately, but the current local MLX endpoint is not loglikelihood-compatible for those tasks. A direct MLX adapter has scored bounded selected-task limit-10 and limit-25 runs; treat those as pilot evidence only, not as full official lm_eval or leaderboard scores.
The adapter is sensitive to runtime prompt formatting.
The V4 training data has no held-out user-prompt overlap in the recorded audit, but it shares one generic held-out tool name, notify_care_team.
Dataset/source redistribution review is complete for adapter-release purposes with caveats. The separately approved cleaned synthetic-only dataset has been published at .

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

edithatogo

Model Tree

Base

Qwen/Qwen3-4B

Adapter

this model

Input Modalities

Text

Output Modalities