Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Links
- Training environment (Prime Intellect, public): smolclaims/abercrombie
- Base model: Qwen/Qwen3.5-4B
- Benchmark: LegalBench Abercrombie
Results
On the 95-row LegalBench Abercrombie held-out test set (non-thinking inference, greedy decoding):
| Category | Base Qwen3.5-4B | + Abercrombie-GRPO LoRA | Delta |
|---|---|---|---|
| Generic | 89% | 100% | +11 |
| Descriptive | 100% | 74% | -26 |
| Suggestive | 5% | 26% | +21 |
| Arbitrary | 0% | 47% | +47 |
| Fanciful | 5% | 95% | +90 |
| Overall | 40.0% | 68.4% | +28.4 |
Mean ordinal distance: 1.09 -> 0.53 (halved).
Output format
The model is trained to emit exactly six lines and nothing else:
markdown
Q1: [Yes/No]Q2: [Yes/No]Q3: [Yes/No]Q4: [Yes/No]Q5: [Yes/No]FINAL_CLASSIFICATION: [Generic/Descriptive/Suggestive/Arbitrary/Fanciful]
Each Qn is a doctrinal sub-question. Q1 = coined term test, Q2 = semantic relationship, Q3 = imagination test, Q4 = immediate conveyance, Q5 = genus test. The routing rule (Q1=Yes -> Fanciful, else Q2=No -> Arbitrary, else Q5=Yes -> Generic, else Q4=Yes -> Descriptive, else Q3=Yes -> Suggestive) is baked into the system prompt.
Usage
1. Install
bash
pip install transformers accelerate peft torch
2. System prompt (required - do not modify)
python
SYSTEM_PROMPT = """You are a trademark distinctiveness classifier. Given a mark and the goods or services it identifies, classify the mark on the Abercrombie spectrum: Generic, Descriptive, Suggestive, Arbitrary, or Fanciful.Answer five questions about the mark, then provide a final classification. Evaluate each question in relation to the specific goods or services and the relevant purchasing public. Treat the mark as a whole; do not decompose compound marks into separate components.Q1 - Coined Term Test. Is the mark an invented term created solely for trademark use, with no prior independent meaning?Q2 - Semantic Relationship Test. Does the mark's ordinary dictionary meaning have any plausible semantic relationship to the goods or services?Q3 - Imagination Test. Must the consumer use imagination, thought, or a multi-step mental process to connect the mark to the nature of the goods or services?Q4 - Immediate Conveyance Test. Does the mark immediately convey an idea of a feature, quality, function, ingredient, or characteristic of the goods or services to the relevant purchasing public?Q5 - Genus Test. Does the relevant purchasing public understand the mark primarily as the name of the general category of goods or services, rather than as an indicator of source?When Q2=Yes and Q5=No, exactly one of Q3 or Q4 must be Yes: a semantically-related, non-generic mark is either descriptively immediate or suggestively imaginative, never neither.Apply this routing rule to determine the final classification:- If Q1 = Yes, classify as Fanciful- Else if Q2 = No, classify as Arbitrary- Else if Q5 = Yes, classify as Generic- Else if Q4 = Yes, classify as Descriptive- Else if Q3 = Yes, classify as SuggestiveRespond in exactly this format with no other text:Q1: [Yes/No]Q2: [Yes/No]Q3: [Yes/No]Q4: [Yes/No]Q5: [Yes/No]FINAL_CLASSIFICATION: [Generic/Descriptive/Suggestive/Arbitrary/Fanciful]"""
3. Load and run
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModelBASE = "Qwen/Qwen3.5-4B"LORA = "DoodDood/abercrombie-grpo"dtype = torch.bfloat16tok = AutoTokenizer.from_pretrained(BASE)model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=dtype, device_map="auto")model = PeftModel.from_pretrained(model, LORA)model.eval()def classify(mark_and_goods: str) -> str:msgs = [{"role": "system", "content": SYSTEM_PROMPT},{"role": "user", "content": mark_and_goods},]prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True,enable_thinking=False,)inputs = tok(prompt, return_tensors="pt").to(model.device)with torch.no_grad():out = model.generate(**inputs, max_new_tokens=128, do_sample=False,pad_token_id=tok.eos_token_id,)return tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)# Input format: `The mark "X" for Y.` (matches LegalBench phrasing)print(classify('The mark "Kodak" for cameras.'))# Expected: Q1: Yes, Q2-Q5: No, FINAL_CLASSIFICATION: Fancifulprint(classify('The mark "Apple" for personal computers.'))# Expected: Q1: No, Q2: No, ..., FINAL_CLASSIFICATION: Arbitraryprint(classify('The mark "Salt" for packages of sodium chloride.'))# Expected: Q1-Q4: No, Q5: Yes, FINAL_CLASSIFICATION: Generic
Important caveats
- Don't modify the system prompt. The model was trained against this exact prompt, including the Q-numbering and routing rule. Changes will degrade output.
- Always use
enable_thinking=False. The adapter was shaped on non-thinking forward passes; thinking-mode inference produces unreliable outputs. - Greedy decoding only. Sampling adds noise to a strict-format task. Use
do_sample=False. - Phrase the input as
The mark "X" for Y.This matches the LegalBench surface form the model was trained on. Other phrasings may work but are not guaranteed.
Method
Trained on Prime Intellect's hosted RL with the Verifiers framework on a custom synthetic dataset (2,100 marks, balanced across 5 classes, with a generator blacklist that excludes every LegalBench test mark - no train/test contamination).
Reward stack (5 functions, weights 1.0 / 0.3 / 0.2 / 0.15 / 0.3):
- Ordinal accuracy on the final label - distance-based, dominant signal.
- Decisive Q - the dispositive sub-element for the true label only.
- Consistency bonus - gated on correct answer AND matching decisive Q.
- Routing consistency - stated FINAL matches own self-routing.
- Routed truth - own Q-chain decomposition lands on the true label.
300 steps, batch 128, 16 rollouts/example, LoRA r=16. Total compute: ~$12.
The full environment, reward functions, and synthetic training data are public at the Prime Intellect env page.
Model provider
DoodDood
Model tree
Base
Qwen/Qwen3.5-4B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information