public-knowledge-project
agentic-jats-annotation-qwen3.5-9b-lora-v8-rl-step70
Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Quickstart
python
from peft import PeftModelfrom transformers import AutoModelForImageTextToText, AutoTokenizerbase = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3.5-9B",torch_dtype="bfloat16", trust_remote_code=True, device_map="auto",)model = PeftModel.from_pretrained(base,"public-knowledge-project/agentic-jats-annotation-qwen3.5-9b-lora-v8-rl-step70",)tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-9B")
See the inference repo for the full multi-turn loop, prompt format, and
generation settings (stop on </tool_call>, enable_thinking=False,
max_new_tokens 160-256, temperature 0.7).
Generation settings
- Stop sequence:
</tool_call>— one tool call per turn. enable_thinking=False— the SFT teacher data emitted no thinking content; leaving thinking on makes the model ramble. Always passenable_thinking=Falsetoapply_chat_template.- max_new_tokens 160-256 per turn; temperature 0.7 sampling / 0.0 greedy.
System prompt (verbatim)
text
You annotate documents with JATS XML by emitting one tool call per turn.The environment owns the XML tree; you only emit JSON describing the nextedit. Use <think>...</think> for brief reasoning (<=200 tokens) and thenemit exactly ONE <tool_call>{...}</tool_call> block. Generation stops at</tool_call>; the env resumes after parsing.# Tool-call JSON shapeEach call is a SINGLE FLAT JSON object whose discriminator field is "name"and whose other fields are the call's arguments at the SAME nesting level(NOT nested under an "arguments" or "params" key). Line numbers areINTEGERS (e.g. 5), not strings (e.g. NOT "L0005").Three correct examples:<tool_call>{"name": "set_article_title", "line": 1, "title": "German-Austrian Consensus on Charcot Neuroarthropathy"}</tool_call><tool_call>{"name": "mark_section_start", "line": 5, "depth": 1, "title": "Introduction"}</tool_call><tool_call>{"name": "mark_xref", "line": 7, "target": "1", "ref_type": "bibr", "rid": "cit0001", "head": "guidelines (", "tail": ")."}</tool_call>Common WRONG shapes the parser rejects:{"command": "...", ...} -- key must be "name"{"name": "...", "arguments": {...}} -- args are FLAT, not nested{"name": "...", "line": "L0001"} -- "line" is int, not "L..."# Tools (signatures: required fields then [optional])Front-matter:set_article_title line:int, title:stradd_contrib surname:str, given_names:str, [contrib_type, initials, email, aff_rids:list[str]]add_affiliation aff_id:str, text:strstart_abstract / end_abstract (no args)add_keyword text:strBody:mark_section_start line:int, depth:int(1..4), title:str, [sec_type]mark_section_end depth:intmark_paragraph start_line:int, end_line:intmark_xref line:int, target:str, ref_type:"bibr"|"table"|"fig"|"sec"|"aff"|"fn", rid:str, [head, tail]mark_inline line:int, target:str, tag:"italic"|"bold"|"sup"|"sub"|"underline", [head, tail]mark_list_start list_type:"bullet"|"order"|"simple"|"alpha-lower"|"alpha-upper"mark_list_item start_line:int, end_line:intmark_list_end (no args)mark_table start_line:int, end_line:int, [label, caption]mark_figure line:int, [label, caption, graphic_href]Back / ref-list:add_ref rid:str, label:str, publication_type:"journal"|"book"|"chapter"|"conf-proc"|"thesis"|"webpage"|"other", fields:list[{field:str, value:str}]Preferred: emits a whole <ref> in one turn instead of start_ref/ref_field*N/end_ref.`field` values: surname|given-names|year|article-title|source|volume|issue|fpage|lpage|pub-id-doi|pub-id-pmid|ext-link-uri|...start_ref rid:str, label:str, [publication_type] # alternative to add_ref, used with ref_field/end_refref_field field:str, value:str # field: surname|given-names|year|article-title|source|volume|issue|fpage|lpage|pub-id-doi|...end_ref (no args)Meta:skip_lines start_line:int, end_line:int, [reason]unassign_lines start_line:int, end_line:intfinish (no args)# Semantic rules- depth must equal (parent_depth + 1), or 1 at top level.- mark_xref / mark_inline require the line to already be inside a <p>.- mark_xref's `rid` must reference an entry in the rid_table.- finish() requires zero open sections and zero unassigned non-empty lines.
Lineage
- Base:
Qwen/Qwen3.5-9B(9B, hybrid Gated DeltaNet + Sparse MoE, 262k ctx) - SFT-LoRA (rank 32, alpha 64, all-linear): 3 epochs over ~540 gold trajectories from replaying gold JATS XML → tool-call sequences.
- RL-LoRA (rank 32, alpha 64): v8 run, multi-turn DAPO — true DAPO loss (no advantage std-normalization, zero-variance group filtering, eps_clip 0.2/0.32, no KL), plus an entity-uniqueness anti-repetition guard and a no-op penalty. Checkpoint taken at training step 70.
- This adapter: SFT + RL LoRAs composed into one rank-64 LoRA — applying it on raw Qwen3.5-9B is mathematically equivalent to applying SFT then RL.
Training data
- PKP (Public Knowledge Project) journal articles paired with JATS XML
- ~810 paired DOCX → markdown → JATS triples; 80/10/10 split
- Markdown via Docling; reward = F_β=0.5 over
(parent_tag, child_tag, depth, text_hash)tuples vs. gold JATS, with a recall floor and anti-hack guards
Offline eval (20 validation rollouts, MAX_TURNS=90, temp=0.7)
| Model | mean reward | max reward | positive | >+1.0 |
|---|---|---|---|---|
| 3-epoch SFT (no RL) | -1.65 | +0.70 | 3/20 | 0 |
| v4 RL step-25 | -1.59 | +0.91 | 4/20 | 0 |
| v8 RL step-40 | -1.48 | +0.52 | 4/20 | 0 |
| This (v8 RL step-70) | -1.26 | +1.29 | 3/20 | 1/20 |
Limitations
- Recall is low on long documents — rollouts truncate before covering long docs; best on documents under ~60 markdown lines.
finish()rarely successful — most rollouts end via turn cap or consecutive-error limit; serialize partial state viasrc/serialize.py:calls_to_xml.- Distribution — trained on PKP journal articles; other document types likely degrade.
License
Apache-2.0, matching Qwen3.5-9B.
Model provider
public-knowledge-project
Model tree
Base
Qwen/Qwen3.5-9B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information