Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Links

Results

On the 95-row LegalBench Abercrombie held-out test set (non-thinking inference, greedy decoding):

CategoryBase Qwen3.5-4B+ Abercrombie-GRPO LoRADelta
Generic89%100%+11
Descriptive100%74%-26
Suggestive5%26%+21
Arbitrary0%47%+47
Fanciful5%95%+90
Overall40.0%68.4%+28.4

Mean ordinal distance: 1.09 -> 0.53 (halved).

Output format

The model is trained to emit exactly six lines and nothing else:

markdown

Q1: [Yes/No]
Q2: [Yes/No]
Q3: [Yes/No]
Q4: [Yes/No]
Q5: [Yes/No]
FINAL_CLASSIFICATION: [Generic/Descriptive/Suggestive/Arbitrary/Fanciful]

Each Qn is a doctrinal sub-question. Q1 = coined term test, Q2 = semantic relationship, Q3 = imagination test, Q4 = immediate conveyance, Q5 = genus test. The routing rule (Q1=Yes -> Fanciful, else Q2=No -> Arbitrary, else Q5=Yes -> Generic, else Q4=Yes -> Descriptive, else Q3=Yes -> Suggestive) is baked into the system prompt.

Usage

1. Install

bash

pip install transformers accelerate peft torch

2. System prompt (required - do not modify)

python

SYSTEM_PROMPT = """You are a trademark distinctiveness classifier. Given a mark and the goods or services it identifies, classify the mark on the Abercrombie spectrum: Generic, Descriptive, Suggestive, Arbitrary, or Fanciful.
Answer five questions about the mark, then provide a final classification. Evaluate each question in relation to the specific goods or services and the relevant purchasing public. Treat the mark as a whole; do not decompose compound marks into separate components.
Q1 - Coined Term Test. Is the mark an invented term created solely for trademark use, with no prior independent meaning?
Q2 - Semantic Relationship Test. Does the mark's ordinary dictionary meaning have any plausible semantic relationship to the goods or services?
Q3 - Imagination Test. Must the consumer use imagination, thought, or a multi-step mental process to connect the mark to the nature of the goods or services?
Q4 - Immediate Conveyance Test. Does the mark immediately convey an idea of a feature, quality, function, ingredient, or characteristic of the goods or services to the relevant purchasing public?
Q5 - Genus Test. Does the relevant purchasing public understand the mark primarily as the name of the general category of goods or services, rather than as an indicator of source?
When Q2=Yes and Q5=No, exactly one of Q3 or Q4 must be Yes: a semantically-related, non-generic mark is either descriptively immediate or suggestively imaginative, never neither.
Apply this routing rule to determine the final classification:
- If Q1 = Yes, classify as Fanciful
- Else if Q2 = No, classify as Arbitrary
- Else if Q5 = Yes, classify as Generic
- Else if Q4 = Yes, classify as Descriptive
- Else if Q3 = Yes, classify as Suggestive
Respond in exactly this format with no other text:
Q1: [Yes/No]
Q2: [Yes/No]
Q3: [Yes/No]
Q4: [Yes/No]
Q5: [Yes/No]
FINAL_CLASSIFICATION: [Generic/Descriptive/Suggestive/Arbitrary/Fanciful]"""

3. Load and run

python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "Qwen/Qwen3.5-4B"
LORA = "DoodDood/abercrombie-grpo"
dtype = torch.bfloat16
tok = AutoTokenizer.from_pretrained(BASE)
model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=dtype, device_map="auto")
model = PeftModel.from_pretrained(model, LORA)
model.eval()
def classify(mark_and_goods: str) -> str:
msgs = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": mark_and_goods},
]
prompt = tok.apply_chat_template(
msgs, tokenize=False, add_generation_prompt=True,
enable_thinking=False,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs, max_new_tokens=128, do_sample=False,
pad_token_id=tok.eos_token_id,
)
return tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
# Input format: `The mark "X" for Y.` (matches LegalBench phrasing)
print(classify('The mark "Kodak" for cameras.'))
# Expected: Q1: Yes, Q2-Q5: No, FINAL_CLASSIFICATION: Fanciful
print(classify('The mark "Apple" for personal computers.'))
# Expected: Q1: No, Q2: No, ..., FINAL_CLASSIFICATION: Arbitrary
print(classify('The mark "Salt" for packages of sodium chloride.'))
# Expected: Q1-Q4: No, Q5: Yes, FINAL_CLASSIFICATION: Generic

Important caveats

  • Don't modify the system prompt. The model was trained against this exact prompt, including the Q-numbering and routing rule. Changes will degrade output.
  • Always use enable_thinking=False. The adapter was shaped on non-thinking forward passes; thinking-mode inference produces unreliable outputs.
  • Greedy decoding only. Sampling adds noise to a strict-format task. Use do_sample=False.
  • Phrase the input as The mark "X" for Y. This matches the LegalBench surface form the model was trained on. Other phrasings may work but are not guaranteed.

Method

Trained on Prime Intellect's hosted RL with the Verifiers framework on a custom synthetic dataset (2,100 marks, balanced across 5 classes, with a generator blacklist that excludes every LegalBench test mark - no train/test contamination).

Reward stack (5 functions, weights 1.0 / 0.3 / 0.2 / 0.15 / 0.3):

  1. Ordinal accuracy on the final label - distance-based, dominant signal.
  2. Decisive Q - the dispositive sub-element for the true label only.
  3. Consistency bonus - gated on correct answer AND matching decisive Q.
  4. Routing consistency - stated FINAL matches own self-routing.
  5. Routed truth - own Q-chain decomposition lands on the true label.

300 steps, batch 128, 16 rollouts/example, LoRA r=16. Total compute: ~$12.

The full environment, reward functions, and synthetic training data are public at the Prime Intellect env page.

Model provider

DoodDood

DoodDood

Model tree

Base

Qwen/Qwen3.5-4B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today