cometadata

affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air-dapo-latex

README

License: apache-2.0

Eval

987-prompt LaTeX-extracted test split. Matching uses case-insensitive name matching (fuzz.ratio, threshold 85) and token_sort_ratio + domain normalization (expand abbreviations, drop postal codes) for affiliations (threshold 85, audited at precision 1.0 on a 64-pair labeled set so the metric does not credit genuinely-different institutions).

Reward (normalized `(format + author_IoU + affiliation_IoU) / 3`)

Table with columns: Stage, Test reward
Stage	Test reward
Stage 1 only (distil)	0.918
Stage 1 + Stage 2 (this adapter)	0.921

Per-category precision / recall / F0.5 / F1

Pooled (micro) TP/FP/FN across all 987 prompts. Parse rate (schema-valid JSON emitted): 0.992 for both stages.

Authors — fuzzy name match across each prompt's gold vs. predicted author list.

Table with columns: Stage, TP, FP, FN, P, R, F0.5, F1
Stage	TP	FP	FN	P	R	F0.5	F1
Stage 1 only	3095	147	680	0.955	0.820	0.924	0.882
Stage 1 + 2	3008	90	767

Affiliations — affiliation matching within matched-author pairs; gold affiliations of unmatched authors count as FN, predicted of unmatched as FP.

Table with columns: Stage, TP, FP, FN, P, R, F0.5, F1
Stage	TP	FP	FN	P	R	F0.5	F1
Stage 1 only	2821	596	1348	0.826	0.677	0.791	0.744
Stage 1 + 2	2793	361	1376

Macro (mean of per-prompt P/R/F) for Stage 1 + 2 is higher because large multi-author papers drag the micro denominators down: authors P 0.954 / R 0.954 / F0.5 0.953 / F1 0.953; affiliations P 0.840 / R 0.822 / F0.5 0.833 / F1 0.826.

DAPO moves precision, not recall — authors FP halves (147 → 90), affiliations FP drops 39% (596 → 361). The model learned to stop emitting hallucinated / wrong items. Recall is flat — the unreachable items are ~4% truly bad data (the author block was lost during source extraction for those papers) plus very large author lists where some authors are consistently skipped.

A discovery during training was that the naive case-sensitive fuzz.ratio metric was scoring ~22% of correctly extracted papers as 0 — gold labels are often ALL-CAPS (LUKASZ PAWELEC) while a correct extraction from the paper text is mixed-case (Łukasz Pawelec); the corrected metric reveals the model was always ~0.91, not the 0.81 that the buggy metric showed.

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="bfloat16")
tok  = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(
    base,
    "cometadata/affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air-dapo-latex",
)

SYSTEM = (
    "You are an expert at reading academic articles and parsing information "
    "about their affiliations. The user will show you an academic article and "
    "your job is to extract the authors and their affiliations in a structured "
    "format (a JSON array of {name, affiliations}). Respond after </think>."
)

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user",   "content": "<the paper text>"},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
                                 return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=2048, temperature=0.7, do_sample=True)
print(tok.decode(out[0, inputs.shape[1]:], skip_special_tokens=True))

The model emits <think>…</think> reasoning followed by a JSON array [{"name": ..., "affiliations": [...]}, ...].

Training & evaluation code

github.com/cometadata/affiliation-parsing-cl-latex (or the project directory /scratch/m000152-pm05/affiliation-parsing-cl-latex/).

License

Apache-2.0 (matches the Qwen3-8B base).

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

cometadata

Model Tree

Base

Qwen/Qwen3-8B

Adapter

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

Eval

Reward (normalized `(format + author_IoU + affiliation_IoU) / 3`)

Table with columns: Stage, Test reward
Stage	Test reward
Stage 1 only (distil)	0.918
Stage 1 + Stage 2 (this adapter)	0.921

Per-category precision / recall / F0.5 / F1

Pooled (micro) TP/FP/FN across all 987 prompts. Parse rate (schema-valid JSON emitted): 0.992 for both stages.

Authors — fuzzy name match across each prompt's gold vs. predicted author list.

Table with columns: Stage, TP, FP, FN, P, R, F0.5, F1
Stage	TP	FP	FN	P	R	F0.5	F1
Stage 1 only	3095	147	680	0.955	0.820	0.924	0.882
Stage 1 + 2	3008	90	767

Affiliations — affiliation matching within matched-author pairs; gold affiliations of unmatched authors count as FN, predicted of unmatched as FP.

Table with columns: Stage, TP, FP, FN, P, R, F0.5, F1
Stage	TP	FP	FN	P	R	F0.5	F1
Stage 1 only	2821	596	1348	0.826	0.677	0.791	0.744
Stage 1 + 2	2793	361	1376

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="bfloat16")
tok  = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(
    base,
    "cometadata/affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air-dapo-latex",
)

SYSTEM = (
    "You are an expert at reading academic articles and parsing information "
    "about their affiliations. The user will show you an academic article and "
    "your job is to extract the authors and their affiliations in a structured "
    "format (a JSON array of {name, affiliations}). Respond after </think>."
)

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user",   "content": "<the paper text>"},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
                                 return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=2048, temperature=0.7, do_sample=True)
print(tok.decode(out[0, inputs.shape[1]:], skip_special_tokens=True))

The model emits <think>…</think> reasoning followed by a JSON array [{"name": ..., "affiliations": [...]}, ...].

Training & evaluation code

github.com/cometadata/affiliation-parsing-cl-latex (or the project directory /scratch/m000152-pm05/affiliation-parsing-cl-latex/).

License

Apache-2.0 (matches the Qwen3-8B base).

affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air-dapo-latex

README

Eval

Reward (normalized (format + author_IoU + affiliation_IoU) / 3)

Per-category precision / recall / F0.5 / F1

Usage

Training & evaluation code

License

Explore FriendliAI today

README

Eval

Reward (normalized (format + author_IoU + affiliation_IoU) / 3)

Per-category precision / recall / F0.5 / F1

Usage

Training & evaluation code

License

Reward (normalized `(format + author_IoU + affiliation_IoU) / 3`)

Reward (normalized `(format + author_IoU + affiliation_IoU) / 3`)