cds-jb

qwen3-8b-gist-instruction-compression

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Qwen3-8B gist-token instruction compression (model organism)

Reproduction of "Learning to Compress Prompts with Gist Tokens" (Mu, Li & Goodman 2023, arXiv:2304.08467) on Qwen/Qwen3-8B, packaged as a model organism for activation-verbalizer (activation-oracle) evals: an entire instruction is compressed into the activations of ONE learned <GIST> token (id 151669).

A 4D attention mask enforces the bottleneck during training AND inference: tokens after the gist cannot attend the instruction, so the instruction reaches the completion only through the gist token's hidden states. Executable proof: corrupting all instruction K/V after prefill leaves greedy generations bit-identical.

  • LoRA r=64 alpha=128 on all linear modules + trainable <GIST> embedding row (PEFT trainable_token_indices; the row's trained VALUES are stored in the adapter, so loading does not depend on base-row init).
  • Data: Alpaca+ (Self-Instruct + Alpaca, 128k), 3 epochs, eff. batch 128, lr 1e-4 cosine. Prompt format: Instruction: {instruction}\n<GIST>\n[Input: {input}\n]Output:
  • Held-out ROUGE-L (gist mask vs full-attention positive control vs no-instruction floor): seen 0.559 / 0.577 / 0.212, unseen 0.548 / 0.557 / 0.252, human 0.300 / 0.309 / 0.149.
  • IMPORTANT for inference: the gist behavior assumes the gist mask (post-gist tokens must not attend the instruction). Under plain causal attention the model can still read the instruction directly. Load the adapter UNMERGED (merge_and_unload adds bf16 rounding noise). Mask utilities + eval code: gist_tokens/ in the activation_oracles_dev repo.

Used by the gist_tokens/gist_instruction task in cds-jb/AVBench: the AO reads the single gist position (token-exact rows) and must recover the compressed held-out instruction.

Model provider

cds-jb

Model tree

Base

Qwen/Qwen3-8B

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today