Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Repository role

code-tape publishes the same model family in three forms:

RepositoryPurpose
ceilf6/code-tape-subtitle-postprocessor-loraLoRA adapter for reproducibility and continued fine-tuning.
ceilf6/code-tape-subtitle-postprocessor-mergedThis full merged model, useful for Python/Transformers inspection or re-export.
ceilf6/code-tape-subtitle-postprocessor-onnxTransformers.js-compatible ONNX export used by the browser app.

For browser-local inference in code-tape, use the ONNX repository. Use this repository when you need a standard Transformers checkpoint.

Intended contract

Input is a chat message containing JSON:

json

{
"context": {
"fileName": "ReplayControls.tsx",
"code": "const canSeek = durationMs > 0;",
"runtimeOutput": "",
"glossary": ["ReplayControls", "canSeek", "durationMs"]
},
"segments": [
{ "id": "subtitle-1", "startMs": 0, "endMs": 1400, "text": "这里先判断 can seek 是否可用" }
]
}

Expected output shape:

json

{
"segments": [
{ "id": "subtitle-1", "text": "这里先判断 canSeek 是否可用" }
],
"chapters": [
{ "title": "判断回放是否可 seek", "startMs": 0, "endMs": 1400 }
]
}

Rules expected by the code-tape application:

  • output JSON only, with no Markdown or explanation;
  • segments contains only changed segments and may be empty;
  • every returned segment id must exist in the input and must not be duplicated;
  • chapter times must be monotonic, non-overlapping, and inside the subtitle timeline;
  • invalid output is discarded by the application.

Usage with Transformers

python

from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "ceilf6/code-tape-subtitle-postprocessor-merged"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
messages = [
{
"role": "system",
"content": (
"You are the code-tape subtitle post-processing model.\n"
"Only output one JSON object.\n"
"Goal: correct ASR subtitle text for frontend/code terms and create playback chapter jump points."
),
},
{"role": "user", "content": "{\"context\":{},\"segments\":[]}"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=384, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training and conversion

The model was created from the code-tape subtitle post-processing LoRA workflow:

  1. prepare seed records with ASR-like subtitles, code context, runtime output, and glossary terms;
  2. distill strict JSON correction/chapter examples;
  3. fine-tune a LoRA adapter on HuggingFaceTB/SmolLM2-135M-Instruct;
  4. merge the adapter into a full model;
  5. export the merged model to ONNX for browser use.

The merged checkpoint is mainly an intermediate artifact for reproducibility and export.

Evaluation

code-tape evaluates this model family with project-specific checks instead of broad language-model benchmarks:

  • valid JSON object output;
  • valid sparse segment references;
  • glossary preservation after sparse corrections are applied back to the source subtitles;
  • non-empty, ordered, non-overlapping chapter supervision for training/evaluation records;
  • chapter bounds inside the subtitle timeline.

The model output must always be validated by the caller.

Limitations

  • Narrowly trained for code-tape subtitle correction and chapter generation.
  • Not suitable as a general chat assistant or general summarizer.
  • Not an ASR model and cannot process audio directly.
  • Small local models may produce malformed JSON; callers must keep a fallback path.

Privacy and security

The intended production path is the ONNX export running in the browser with @huggingface/transformers. Public browser loading does not require a Hugging Face token.

Do not put secrets, credentials, private code, or access tokens in prompts unless your inference environment is trusted.

License

Apache-2.0, following the base model license.

Model provider

ceilf6

Model tree

Base

HuggingFaceTB/SmolLM2-135M-Instruct

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today