Orionfold

patent-strategist-v3-unsloth

README

License: apache-2.0

What this model does

Offline patent-prosecution reasoning on Spark-class hardware

Patent prosecution work — claim construction, MPEP-grounded office-action responses, Markush analysis, doctrine-of-equivalents reasoning — happens inside firms that can't ship privileged client text to a hosted frontier API. This release distills DeepSeek-R1's chain-of-thought reasoning onto a 5,000-row synthetic patent-reasoning corpus so a single Spark-class box can run the workflow offline, with full IRAC-shaped reasoning chains.

Use cases:

Claim construction (Markush groups, doctrine of equivalents)
MPEP-grounded office-action argument drafting
Prior-art relevance + non-obviousness reasoning chains
Patent-licensing scenario analysis (most-favored-licensee, FTO)

Who this is for: Patent attorneys, prosecution-team engineers, and IP-strategy teams running privileged workflows offline on Spark-class hardware (GB10, 128 GB unified memory) or comparable edge devices.

Notebooks

Two runnable notebooks ship with this model — open either on a free cloud GPU:

Table with columns: Notebook, What it does, Open
Notebook	What it does	Open
Builder	Reproduce this model's build and DGX Spark benchmarks end-to-end with `fieldkit`.
User	Load the published model and call it from your own app in a few lines.

Choosing this lane

Unsloth-trained BF16 merged weights. Pick this lane if you want to continue training in Unsloth's 4-bit QLoRA workflow or if you need transformers-format weights for inference paths outside llama.cpp. The bakeoff measured 7h 34m training wall on this lane (vs 5h 38m on NeMo Framework) at probe think rate 0.80 / mean chain 916 tokens. For pure inference on Spark-class hardware, the GGUF sibling is faster; for the bakeoff-winning checkpoint, see the NeMo lane.

Spark measurements (BF16 merged):

Table with columns: Variant, Size, Train wall, Probe think rate, Mean chain
Variant	Size	Train wall	Probe think rate	Mean chain
BF16	15.26 GB	7h 34m	0.80	916 tok

How to run

HuggingFace Transformers:

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Orionfold/patent-strategist-v3-unsloth"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

prompt = (
    "<｜User｜>A patent claim recites \"a fastener selected from the group consisting "
    "of bolts, screws, and rivets.\" Walk through the Markush-group construction "
    "and explain how doctrine of equivalents applies to a magnetic snap.<｜Assistant｜>"
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=1024, temperature=0.6, top_p=0.95)
print(tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Methods

Full methodology and Spark-side measurement protocol: Two paths to the same chain — Unsloth vs NeMo Framework on Spark.

Known drift

Bounded limitations observed during Spark-side measurement. Each item below names the artifact and the scope of the drift; the balance of the bench measures clean — see Methods for the full breakdown.

"metes-and-times" terminology — Two known terminology drifts inherited from the v3 synthetic corpus; balance of probe answers (~99%) cite real MPEP sections. Correct legal term in claim construction is metes and bounds.
Fabricated MPEP §2163.05(s) citation — Same scope — corpus-generator artifact, not a model-wide hallucination pattern. Real §2163.05 has subsections (a)–(f) on written-description support; subsection (s) does not exist.

Other Orionfold variants

Sibling repos from the same release:

Table with columns: Variant, Lane, Format
Variant	Lane	Format
`Orionfold/patent-strategist-v3-unsloth`	Unsloth	BF16 (transformers)
`Orionfold/patent-strategist-v3-unsloth-GGUF`	Unsloth	GGUF (llama.cpp)
`Orionfold/patent-strategist-v3-nemo`	NeMo Framework

Published by Orionfold LLC · orionfold.com · Methods documented at ainative.business/field-notes.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

Orionfold

Model Tree

Base

deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

Fine-tuned

this model

Input Modalities

Text

Output Modalities