Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Repository role
code-tape publishes the same model family in three forms:
| Repository | Purpose |
|---|---|
ceilf6/code-tape-subtitle-postprocessor-lora | LoRA adapter for reproducibility and continued fine-tuning. |
ceilf6/code-tape-subtitle-postprocessor-merged | This full merged model, useful for Python/Transformers inspection or re-export. |
ceilf6/code-tape-subtitle-postprocessor-onnx | Transformers.js-compatible ONNX export used by the browser app. |
For browser-local inference in code-tape, use the ONNX repository. Use this repository when you need a standard Transformers checkpoint.
Intended contract
Input is a chat message containing JSON:
json
{"context": {"fileName": "ReplayControls.tsx","code": "const canSeek = durationMs > 0;","runtimeOutput": "","glossary": ["ReplayControls", "canSeek", "durationMs"]},"segments": [{ "id": "subtitle-1", "startMs": 0, "endMs": 1400, "text": "这里先判断 can seek 是否可用" }]}
Expected output shape:
json
{"segments": [{ "id": "subtitle-1", "text": "这里先判断 canSeek 是否可用" }],"chapters": [{ "title": "判断回放是否可 seek", "startMs": 0, "endMs": 1400 }]}
Rules expected by the code-tape application:
- output JSON only, with no Markdown or explanation;
segmentscontains only changed segments and may be empty;- every returned segment id must exist in the input and must not be duplicated;
- chapter times must be monotonic, non-overlapping, and inside the subtitle timeline;
- invalid output is discarded by the application.
Usage with Transformers
python
from transformers import AutoModelForCausalLM, AutoTokenizermodel_id = "ceilf6/code-tape-subtitle-postprocessor-merged"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id)messages = [{"role": "system","content": ("You are the code-tape subtitle post-processing model.\n""Only output one JSON object.\n""Goal: correct ASR subtitle text for frontend/code terms and create playback chapter jump points."),},{"role": "user", "content": "{\"context\":{},\"segments\":[]}"},]prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tokenizer(prompt, return_tensors="pt")outputs = model.generate(**inputs, max_new_tokens=384, do_sample=False)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training and conversion
The model was created from the code-tape subtitle post-processing LoRA workflow:
- prepare seed records with ASR-like subtitles, code context, runtime output, and glossary terms;
- distill strict JSON correction/chapter examples;
- fine-tune a LoRA adapter on
HuggingFaceTB/SmolLM2-135M-Instruct; - merge the adapter into a full model;
- export the merged model to ONNX for browser use.
The merged checkpoint is mainly an intermediate artifact for reproducibility and export.
Evaluation
code-tape evaluates this model family with project-specific checks instead of broad language-model benchmarks:
- valid JSON object output;
- valid sparse segment references;
- glossary preservation after sparse corrections are applied back to the source subtitles;
- non-empty, ordered, non-overlapping chapter supervision for training/evaluation records;
- chapter bounds inside the subtitle timeline.
The model output must always be validated by the caller.
Limitations
- Narrowly trained for code-tape subtitle correction and chapter generation.
- Not suitable as a general chat assistant or general summarizer.
- Not an ASR model and cannot process audio directly.
- Small local models may produce malformed JSON; callers must keep a fallback path.
Privacy and security
The intended production path is the ONNX export running in the browser with @huggingface/transformers. Public browser loading does not require a Hugging Face token.
Do not put secrets, credentials, private code, or access tokens in prompts unless your inference environment is trusted.
License
Apache-2.0, following the base model license.
Model provider
ceilf6
Model tree
Base
HuggingFaceTB/SmolLM2-135M-Instruct
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information