TLLMC

g-1.1.0-mxfp4-fixed-2512

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Training

Datasets

Table
Dataset	Samples
qa-dataset-raft	73232
multi_dataset	35690

Hyper Parameters

Table
Parameter	Value
epochs	5
learning rate	5e-6

Inference

使用 Transformers pipeline 進行單輪生成。

python
from transformers import pipeline

model_id = "./g-1.1.0-mxfp4-fixed-2512"  # 或本機目錄路徑

pipe = pipeline(
    "text-generation",
    model=model_id,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {
        "role": "user",
        "content": "USER PROMPT HERE",
    },
]

prompt = pipe.tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

outputs = pipe(
    prompt,
    max_new_tokens=2048,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    return_full_text=False,
)

print(outputs[0]["generated_text"])