Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Current Status

Training completed on May 27, 2026. This card now includes the audited final run findings and W&B-style result charts.

  • HF repo: HarleyCooper/Qwen3.6-35B-A3B-Dakota1890-GRPO
  • Base model: Qwen/Qwen3.6-35B-A3B
  • Training platform: Thinking Machines Tinker
  • Method: GRPO-style RL with a custom Dakota grammar verifier
  • Adapter type: LoRA, rank 32
  • W&B project: christian-cooper-us/dakota-rl-grammar
  • Completed full run: owf98569
  • Reward-channel pilot: d44bra91
  • Thinking Machines cost: $68.75
  • Tokens processed: 82.05 million
  • Final Tinker sampler path: tinker://1f23df9c-5d88-59d9-a7e8-dd4e169ea7d0:train:0/sampler_weights/final
  • Final Tinker state path: tinker://1f23df9c-5d88-59d9-a7e8-dd4e169ea7d0:train:0/weights/final
  • Inference adapter weights: adapter_model.safetensors
  • Adapter config: adapter_config.json

The reward-channel pilot completed before the full run and cost about $0.26. It verified that the repaired environment can emit nonzero pattern_raw and exact_match_raw channels locally and in W&B before scaling up.

Final Run Findings

Dakota1890 full run dashboard

The full run completed 199 metric rows, ending at training step 198. It cost $68.75 in Thinking Machines credits and processed 82.05 million tokens. The final audit found:

  • composite reward improved from 0.1664 to 0.2297;
  • character-overlap reward improved from 0.1424 to 0.4027;
  • affix reward stayed high and ended at 1.0000;
  • all-task pattern_raw was nonzero in 186 of 199 logged training rows;
  • identify_pattern pattern reward reached 0.90625 and was nonzero in 179 of 199 rows;
  • eval pattern_raw remained nonzero, ending at 0.0586;
  • exact-match reward stayed at 0.0 throughout the mixed-task run;
  • composite_diff stayed exactly 0.0, confirming that the emitted ledger reconstructs the scalar reward.

The key result is that the repaired pattern channel is live in a full paid Tinker run. Exact match remains a task-design and prompting problem for short answer-only completions, not a reward-plumbing failure.

The machine-readable summary and markdown findings are included in analysis/final_run_summary.json and analysis/FINAL_RUN_FINDINGS.md.

Composite reward progression

Pattern reward channel

Reward components

Source Lineage

Dakota1890 is built around historical Dakota language source material rather than generic web text. The local project contains the primary Riggs scan, 440 scanned page images, public visual artifacts used for documentation, extracted grammar rules, extracted dictionary vocabulary, generated RL tasks, and W&B/Tinker run logs.

Source and extraction inventory:

  • primary source PDF: grammardictionar00riggrich.pdf
  • JP2 source scans: 440 local page images under Dictionary/grammardictionar00riggrich_jp2
  • processed page images: 440 JPG conversions under data/processed_images
  • page-layout manifest: 440 rows, including 345 two-column pages and 84 single-column pages
  • grammar extraction: pages 1-92, with no missing pages in that intended range
  • dictionary extraction: pages 95-430, with no missing pages in that intended range
  • verified dictionary entries: 24,224
  • median dictionary entries per extracted page: 63
  • average extraction confidence: about 0.9245
  • first and last verified headwords: a through zig'zag

The current RL grammar dataset contains:

  • 10,576 total packaged RL tasks
  • 1,497 extracted grammar-rule records
  • 684 grammar rules in the grammar extraction pass
  • 350 interlinear texts in the grammar extraction pass
  • 396 linguistic terms in the grammar extraction pass
  • 33 special-character forms observed in the grammar extraction pass
  • 1,497 rows with pattern-bearing verification metadata
  • 514 rows with affix metadata
  • median reference answer length of about 4 words

Task family counts in the packaged dataset:

Task familyCount
word_translation2,879
reverse_translation2,137
morphology1,934
identify_pattern1,497
positive_negative_evidence584
exception_trigger584
syntax420
sentence_translation316
affix_insertion115
multi_step_morphology110

The dictionary extraction and synthetic Q&A generation work are adjacent dataset-building tracks. This Tinker RL run does not require OpenAI SFT data or a synthetic dictionary Q&A dump to start; it trains against the existing grammar-task environment and its reward function. The dictionary artifacts remain important for future evaluation, documentation, and broader Dakota lexical coverage.

Visual Project Artifacts

These local project images are included to preserve the visual record of the build alongside the model card:

ArtifactPreview
Grammar source pageGrammar source page
Dictionary source pageDictionary source page
Training screenshotTraining screenshot

What Changed For This Run

Earlier public runs showed metrics/pattern_reward = 0.0 and metrics/exact_match_reward = 0.0 throughout the public W&B surface. The pattern channel was a real plumbing bug: the packaged dataset stores metadata under entry["info"], while the environment had been reading top-level fields such as entry["verification_pattern"], entry["task_type"], and entry["difficulty"].

That has been fixed. The environment now preserves:

  • task type
  • difficulty
  • rule id
  • hints
  • verification pattern, including info.pattern
  • special-character and affix metadata

The full run now logs the composite reward ledger under namespaces such as env/all/ledger/*, env/task/identify_pattern/ledger/*, and difficulty-specific ledgers. Early full-run metrics already showed nonzero pattern reward on identify_pattern tasks with composite_diff = 0.0, meaning the logged scalar reward matches the reconstructed component ledger.

Reward Function

The Dakota grammar verifier uses a composite reward:

ComponentWeightPurpose
exact match40%Rewards short answers that exactly match rigid references
character overlap20%Rewards lexical and orthographic overlap with the reference
pattern match15%Rewards structural matches for rule-bearing tasks
affix accuracy10%Rewards required Dakota affix behavior
length control15%Penalizes verbose completions when the gold answer is short

Difficulty multipliers are applied after the component sum. The run logs raw component values, normalized values, weights, weighted contributions, reconstructed composites, final reward scalar, and composite_diff for auditability.

Exact match is intentionally strict. In the older public 0.6B run, completions were often much longer than the reference answers, while the packaged dataset has a median reference answer length of about four words. That makes exact match a behavioral and prompting problem, not evidence that the exact-match function is dead. The new run keeps max generation short and logs enough ledger detail to diagnose whether exact-match-sensitive task families are improving.

Training Configuration

The active full run was launched with:

SettingValue
modelQwen/Qwen3.6-35B-A3B
batches planned199
batch size48
group size16
max tokens128
temperature0.5
learning rate4e-5
LoRA rank32
eval interval20 batches
save interval20 batches
W&B syncenabled

The local runtime gate passed before launch with current Tinker, Tinker cookbook, W&B, Gemini, tokenizer, and reward-channel smoke checks.

Intended Use

This adapter is intended for research and tool-building around historical Dakota grammar tasks:

  • grammar-rule drills
  • short translation and reverse-translation tasks
  • morphology and affix experiments
  • verifier-driven RL experiments for low-resource language work
  • reproducible study of reward components for historical grammar sources

It is not intended as a standalone Dakota language authority, a substitute for community language expertise, or a production translation system.

Limitations And Ethical Notes

The source material is a historical grammar and dictionary published in 1890. It reflects the terminology, analysis, orthography, and colonial-era framing of its time. Outputs from this model can inherit mistakes, omissions, and outdated descriptions from the source extraction process and from the base model.

Dakota language work should be reviewed with appropriate community and linguistic expertise. This repository should be treated as an experimental technical artifact: useful for transparent research, not authoritative for teaching, cultural interpretation, or official translation.

Usage

The Tinker final sampler checkpoint is available now for direct Tinker sampling:

text

tinker://1f23df9c-5d88-59d9-a7e8-dd4e169ea7d0:train:0/sampler_weights/final

The Hugging Face PEFT adapter for this run lives in this repository:

Use it against the base model Qwen/Qwen3.6-35B-A3B with a standard PEFT loading path:

python

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_name = "Qwen/Qwen3.6-35B-A3B"
adapter_name = "HarleyCooper/Qwen3.6-35B-A3B-Dakota1890-GRPO"
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
model = PeftModel.from_pretrained(model, adapter_name)
messages = [
{"role": "system", "content": "Answer Dakota grammar tasks concisely. Return only the answer."},
{"role": "user", "content": "Translate 'my elder brother' to Dakota. Return only the answer."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=64, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

Primary source:

Riggs, Stephen Return. 1890. A Dakota-English Dictionary. Contributions to North American Ethnology, Volume VII. Washington: Government Printing Office.

Training and experiment tracking:

  • Thinking Machines Tinker for the RL training run
  • W&B for experiment tracking and reward-ledger audit trails
  • Dakota1890 repository artifacts for extraction, task generation, and verifier code

Model provider

HarleyCooper

HarleyCooper

Model tree

Base

Qwen/Qwen3.6-35B-A3B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today