Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Container
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0訓練配置
| 項目 | 值 |
|---|---|
| 模型架構 | GPT-2 (n_layer=12, n_head=12, n_embd=768) |
| 參數量 | 124,242,432 |
| Tokenizer | huggingface-course/code-search-net-tokenizer (BPE, vocab=50,000) |
| Context length | 128 tokens |
| 訓練集 | huggingface-course/codeparrot-ds-train (16,702,061 length-128 chunks) |
| 驗證集 | huggingface-course/codeparrot-ds-valid |
| Optimizer | AdamW (β₁=0.9, β₂=0.999, weight_decay=0.1) |
| Learning rate | 5×10⁻⁴, cosine schedule, warmup 1,000 steps |
| Effective batch size | 256 (per_device_bs=64 × grad_accum=2 × world_size=2) |
| Precision | fp16 |
| 平行化 | DistributedDataParallel (DDP), NCCL/RCCL backend |
| 總步數 | 65,243 (1 epoch) |
| Eval / save 間隔 | 每 5,000 steps |
硬體環境
- GPU:2 × AMD Radeon Instinct MI50 (32 GB HBM2 each, gfx906)
- 平台:PyTorch + ROCm,容器化部署
- 訓練時間:約 19 小時
- 平均 throughput:159.8 samples/sec, ~1.41 sec/step
Loss 與訓練動力學
每 5,000 steps 取一個訓練 metrics 與 eval metrics 紀錄。
| step | epoch | learning_rate | train_loss | grad_norm | eval_loss |
|---|---|---|---|---|---|
| 5,000 | 0.077 | 4.952×10⁻⁴ | 2.677 | 0.180 | 1.752 |
| 10,000 | 0.153 | 4.762×10⁻⁴ | 1.685 | 0.152 | 1.520 |
| 15,000 | 0.230 | 4.437×10⁻⁴ | 1.529 | 0.153 | 1.415 |
| 20,000 | 0.307 | 3.996×10⁻⁴ | 1.447 | 0.145 | 1.347 |
| 25,000 | 0.383 | 3.467×10⁻⁴ | 1.386 | 0.154 | 1.295 |
| 30,000 | 0.460 | 2.880×10⁻⁴ | 1.334 | 0.160 | 1.247 |
| 35,000 | 0.537 | 2.271×10⁻⁴ | 1.288 | 0.160 | 1.204 |
| 40,000 | 0.613 | 1.675×10⁻⁴ | 1.241 | 0.170 | 1.160 |
| 45,000 | 0.690 | 1.128×10⁻⁴ | 1.200 | 0.175 | 1.123 |
| 50,000 | 0.766 | 6.631×10⁻⁵ | 1.162 | 0.174 | 1.090 |
| 55,000 | 0.843 | 3.072×10⁻⁵ | 1.135 | 0.180 | 1.066 |
| 60,000 | 0.920 | 8.175×10⁻⁶ | 1.113 | 0.191 | 1.054 |
| 65,000 | 0.996 | 1.78×10⁻⁸ | 1.106 | 0.180 | 1.051 |
訓練未進行更多 epoch 或超參數搜尋。後半段 cosine 衰減使 lr 趨近於零,gradient norm 維持在 0.15-0.19 區間,未出現發散或不穩定徵兆。
使用
python
from transformers import pipelinepipe = pipeline("text-generation",model="Marcoson320/codeparrot-gpt2-mi50",device=0,)print(pipe("# scatter plot of x, y\n", max_new_tokens=64)[0]["generated_text"])
限制
- Context 上限 128 tokens,無法處理較長之程式碼段落。
- 訓練資料偏重 pandas / sklearn / matplotlib / seaborn 之 GitHub Python,其他領域之程式碼覆蓋有限。
- 模型容量小,續寫易出現 repetition;推論時可設
repetition_penalty>1.0或no_repeat_ngram_size緩解。
Model provider
Marcoson320
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information