Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Training Data

  • Corpus: Unified State Register of Court Decisions of Ukraine (EDRSR)
  • Documents: 33.9M court decisions (after dedup + quality filtering from 38.5M)
  • Tokens: 161.4B tokens (Qwen2 BPE tokenizer, fertility = 0.515 for Ukrainian legal text)
  • Sequence length: 8,192 tokens
  • Shards: 1,233 pre-packaged numpy shards

Training Details

  • Hardware: 8x NVIDIA H100 SXM 80GB (NVIDIA Innovation Lab via Brev)
  • Framework: HuggingFace Trainer + DeepSpeed ZeRO-3
  • Precision: bfloat16
  • Global batch size: 128 sequences (1.05M tokens/step)
  • Total steps: 9,536 (10B tokens processed)
  • Learning rate: 1e-4, cosine schedule, 300-step linear warmup
  • Training time: 31 hours
  • Throughput: 91K tokens/sec, 11.5 sec/step

Results

MetricValue
Initial loss (step 10)1.08
Final loss (step 9,536)0.231
Loss reduction-79%
Base perplexity3.83
CPT perplexity1.30
Perplexity reduction-66.1%

Scaling Law

All four models in the series converge to similar perplexity after CPT:

ModelBase PPLCPT PPLReduction
0.5B6.831.35-80%
1.5B4.611.31-72%
3B3.831.30-66%
14B2.841.28-55%

Intended Use

This is a base model (not instruction-tuned). It is intended for:

  • Research on domain adaptation of LLMs for low-resource legal languages
  • Downstream fine-tuning for Ukrainian legal NLP tasks
  • Scaling law analysis of continued pretraining
  • Perplexity evaluation on Ukrainian legal text

Limitations

  • Not instruction-tuned; will not follow instructions or chat
  • Trained on Ukrainian court decisions only; may not generalize to other legal systems

Related Resources

Model provider

overthelex

Model tree

Base

Qwen/Qwen2.5-3B

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today