Marcoson320/codeparrot-gpt2-mi50 API & Inference Endpoint

訓練配置

項目	值
模型架構	GPT-2 (n_layer=12, n_head=12, n_embd=768)
參數量	124,242,432
Tokenizer	huggingface-course/code-search-net-tokenizer (BPE, vocab=50,000)
Context length	128 tokens
訓練集	huggingface-course/codeparrot-ds-train (16,702,061 length-128 chunks)
驗證集	huggingface-course/codeparrot-ds-valid
Optimizer	AdamW (β₁=0.9, β₂=0.999, weight_decay=0.1)
Learning rate	5×10⁻⁴, cosine schedule, warmup 1,000 steps
Effective batch size	256 (per_device_bs=64 × grad_accum=2 × world_size=2)
Precision	fp16
平行化	DistributedDataParallel (DDP), NCCL/RCCL backend
總步數	65,243 (1 epoch)
Eval / save 間隔	每 5,000 steps

硬體環境

GPU：2 × AMD Radeon Instinct MI50 (32 GB HBM2 each, gfx906)
平台：PyTorch + ROCm，容器化部署
訓練時間：約 19 小時
平均 throughput：159.8 samples/sec, ~1.41 sec/step

Loss 與訓練動力學

每 5,000 steps 取一個訓練 metrics 與 eval metrics 紀錄。

step	epoch	learning_rate	train_loss	grad_norm	eval_loss
5,000	0.077	4.952×10⁻⁴	2.677	0.180	1.752
10,000	0.153	4.762×10⁻⁴	1.685	0.152	1.520
15,000	0.230	4.437×10⁻⁴	1.529	0.153	1.415
20,000	0.307	3.996×10⁻⁴	1.447	0.145	1.347
25,000	0.383	3.467×10⁻⁴	1.386	0.154	1.295
30,000	0.460	2.880×10⁻⁴	1.334	0.160	1.247
35,000	0.537	2.271×10⁻⁴	1.288	0.160	1.204
40,000	0.613	1.675×10⁻⁴	1.241	0.170	1.160
45,000	0.690	1.128×10⁻⁴	1.200	0.175	1.123
50,000	0.766	6.631×10⁻⁵	1.162	0.174	1.090
55,000	0.843	3.072×10⁻⁵	1.135	0.180	1.066
60,000	0.920	8.175×10⁻⁶	1.113	0.191	1.054
65,000	0.996	1.78×10⁻⁸	1.106	0.180	1.051

訓練未進行更多 epoch 或超參數搜尋。後半段 cosine 衰減使 lr 趨近於零，gradient norm 維持在 0.15-0.19 區間，未出現發散或不穩定徵兆。

使用

python
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="Marcoson320/codeparrot-gpt2-mi50",
    device=0,
)

print(pipe("# scatter plot of x, y\n", max_new_tokens=64)[0]["generated_text"])

限制

Context 上限 128 tokens，無法處理較長之程式碼段落。
訓練資料偏重 pandas / sklearn / matplotlib / seaborn 之 GitHub Python，其他領域之程式碼覆蓋有限。
模型容量小，續寫易出現 repetition；推論時可設 repetition_penalty>1.0 或 no_repeat_ngram_size 緩解。

codeparrot-gpt2-mi50

Get help setting up a custom Dedicated Endpoints.

README

訓練配置

硬體環境

Loss 與訓練動力學

使用

限制

Explore FriendliAI today

codeparrot-gpt2-mi50