public-knowledge-project

agentic-jats-annotation-qwen3.5-9b-lora-v8-rl-step70

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Quickstart

python

from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoTokenizer
base = AutoModelForImageTextToText.from_pretrained(
"Qwen/Qwen3.5-9B",
torch_dtype="bfloat16", trust_remote_code=True, device_map="auto",
)
model = PeftModel.from_pretrained(
base,
"public-knowledge-project/agentic-jats-annotation-qwen3.5-9b-lora-v8-rl-step70",
)
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-9B")

See the inference repo for the full multi-turn loop, prompt format, and generation settings (stop on </tool_call>, enable_thinking=False, max_new_tokens 160-256, temperature 0.7).

Generation settings

  • Stop sequence: </tool_call> — one tool call per turn.
  • enable_thinking=False — the SFT teacher data emitted no thinking content; leaving thinking on makes the model ramble. Always pass enable_thinking=False to apply_chat_template.
  • max_new_tokens 160-256 per turn; temperature 0.7 sampling / 0.0 greedy.

System prompt (verbatim)

text

You annotate documents with JATS XML by emitting one tool call per turn.
The environment owns the XML tree; you only emit JSON describing the next
edit. Use <think>...</think> for brief reasoning (<=200 tokens) and then
emit exactly ONE <tool_call>{...}</tool_call> block. Generation stops at
</tool_call>; the env resumes after parsing.
# Tool-call JSON shape
Each call is a SINGLE FLAT JSON object whose discriminator field is "name"
and whose other fields are the call's arguments at the SAME nesting level
(NOT nested under an "arguments" or "params" key). Line numbers are
INTEGERS (e.g. 5), not strings (e.g. NOT "L0005").
Three correct examples:
<tool_call>{"name": "set_article_title", "line": 1, "title": "German-Austrian Consensus on Charcot Neuroarthropathy"}</tool_call>
<tool_call>{"name": "mark_section_start", "line": 5, "depth": 1, "title": "Introduction"}</tool_call>
<tool_call>{"name": "mark_xref", "line": 7, "target": "1", "ref_type": "bibr", "rid": "cit0001", "head": "guidelines (", "tail": ")."}</tool_call>
Common WRONG shapes the parser rejects:
{"command": "...", ...} -- key must be "name"
{"name": "...", "arguments": {...}} -- args are FLAT, not nested
{"name": "...", "line": "L0001"} -- "line" is int, not "L..."
# Tools (signatures: required fields then [optional])
Front-matter:
set_article_title line:int, title:str
add_contrib surname:str, given_names:str, [contrib_type, initials, email, aff_rids:list[str]]
add_affiliation aff_id:str, text:str
start_abstract / end_abstract (no args)
add_keyword text:str
Body:
mark_section_start line:int, depth:int(1..4), title:str, [sec_type]
mark_section_end depth:int
mark_paragraph start_line:int, end_line:int
mark_xref line:int, target:str, ref_type:"bibr"|"table"|"fig"|"sec"|"aff"|"fn", rid:str, [head, tail]
mark_inline line:int, target:str, tag:"italic"|"bold"|"sup"|"sub"|"underline", [head, tail]
mark_list_start list_type:"bullet"|"order"|"simple"|"alpha-lower"|"alpha-upper"
mark_list_item start_line:int, end_line:int
mark_list_end (no args)
mark_table start_line:int, end_line:int, [label, caption]
mark_figure line:int, [label, caption, graphic_href]
Back / ref-list:
add_ref rid:str, label:str, publication_type:"journal"|"book"|"chapter"|"conf-proc"|"thesis"|"webpage"|"other", fields:list[{field:str, value:str}]
Preferred: emits a whole <ref> in one turn instead of start_ref/ref_field*N/end_ref.
`field` values: surname|given-names|year|article-title|source|volume|issue|fpage|lpage|pub-id-doi|pub-id-pmid|ext-link-uri|...
start_ref rid:str, label:str, [publication_type] # alternative to add_ref, used with ref_field/end_ref
ref_field field:str, value:str # field: surname|given-names|year|article-title|source|volume|issue|fpage|lpage|pub-id-doi|...
end_ref (no args)
Meta:
skip_lines start_line:int, end_line:int, [reason]
unassign_lines start_line:int, end_line:int
finish (no args)
# Semantic rules
- depth must equal (parent_depth + 1), or 1 at top level.
- mark_xref / mark_inline require the line to already be inside a <p>.
- mark_xref's `rid` must reference an entry in the rid_table.
- finish() requires zero open sections and zero unassigned non-empty lines.

Lineage

  1. Base: Qwen/Qwen3.5-9B (9B, hybrid Gated DeltaNet + Sparse MoE, 262k ctx)
  2. SFT-LoRA (rank 32, alpha 64, all-linear): 3 epochs over ~540 gold trajectories from replaying gold JATS XML → tool-call sequences.
  3. RL-LoRA (rank 32, alpha 64): v8 run, multi-turn DAPO — true DAPO loss (no advantage std-normalization, zero-variance group filtering, eps_clip 0.2/0.32, no KL), plus an entity-uniqueness anti-repetition guard and a no-op penalty. Checkpoint taken at training step 70.
  4. This adapter: SFT + RL LoRAs composed into one rank-64 LoRA — applying it on raw Qwen3.5-9B is mathematically equivalent to applying SFT then RL.

Training data

  • PKP (Public Knowledge Project) journal articles paired with JATS XML
  • ~810 paired DOCX → markdown → JATS triples; 80/10/10 split
  • Markdown via Docling; reward = F_β=0.5 over (parent_tag, child_tag, depth, text_hash) tuples vs. gold JATS, with a recall floor and anti-hack guards

Offline eval (20 validation rollouts, MAX_TURNS=90, temp=0.7)

Table
Modelmean rewardmax rewardpositive>+1.0
3-epoch SFT (no RL)-1.65+0.703/200
v4 RL step-25-1.59+0.914/200
v8 RL step-40-1.48+0.524/200
This (v8 RL step-70)-1.26+1.293/201/20

Limitations

  • Recall is low on long documents — rollouts truncate before covering long docs; best on documents under ~60 markdown lines.
  • finish() rarely successful — most rollouts end via turn cap or consecutive-error limit; serialize partial state via src/serialize.py:calls_to_xml.
  • Distribution — trained on PKP journal articles; other document types likely degrade.

License

Apache-2.0, matching Qwen3.5-9B.

Model provider

public-knowledge-project

Model tree

Base

Qwen/Qwen3.5-9B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today