overthelex/qwen2.5-3b-edrsr-legal-uk API & Inference Endpoint

Training Data

Corpus: Unified State Register of Court Decisions of Ukraine (EDRSR)
Documents: 33.9M court decisions (after dedup + quality filtering from 38.5M)
Tokens: 161.4B tokens (Qwen2 BPE tokenizer, fertility = 0.515 for Ukrainian legal text)
Sequence length: 8,192 tokens
Shards: 1,233 pre-packaged numpy shards

Training Details

Hardware: 8x NVIDIA H100 SXM 80GB (NVIDIA Innovation Lab via Brev)
Framework: HuggingFace Trainer + DeepSpeed ZeRO-3
Precision: bfloat16
Global batch size: 128 sequences (1.05M tokens/step)
Total steps: 9,536 (10B tokens processed)
Learning rate: 1e-4, cosine schedule, 300-step linear warmup
Training time: 31 hours
Throughput: 91K tokens/sec, 11.5 sec/step

Results

Table with columns: Metric, Value
Metric	Value
Initial loss (step 10)	1.08
Final loss (step 9,536)	0.231
Loss reduction	-79%
Base perplexity	3.83
CPT perplexity	1.30
Perplexity reduction	-66.1%

Scaling Law

All four models in the series converge to similar perplexity after CPT:

Table with columns: Model, Base PPL, CPT PPL, Reduction
Model	Base PPL	CPT PPL	Reduction
0.5B	6.83	1.35	-80%
1.5B	4.61	1.31	-72%
3B	3.83	1.30

Intended Use

This is a base model (not instruction-tuned). It is intended for:

Research on domain adaptation of LLMs for low-resource legal languages
Downstream fine-tuning for Ukrainian legal NLP tasks
Scaling law analysis of continued pretraining
Perplexity evaluation on Ukrainian legal text

Limitations

Not instruction-tuned; will not follow instructions or chat
Trained on Ukrainian court decisions only; may not generalize to other legal systems

Training Data

Corpus: Unified State Register of Court Decisions of Ukraine (EDRSR)
Documents: 33.9M court decisions (after dedup + quality filtering from 38.5M)
Tokens: 161.4B tokens (Qwen2 BPE tokenizer, fertility = 0.515 for Ukrainian legal text)
Sequence length: 8,192 tokens
Shards: 1,233 pre-packaged numpy shards

Training Details

Hardware: 8x NVIDIA H100 SXM 80GB (NVIDIA Innovation Lab via Brev)
Framework: HuggingFace Trainer + DeepSpeed ZeRO-3
Precision: bfloat16
Global batch size: 128 sequences (1.05M tokens/step)
Total steps: 9,536 (10B tokens processed)
Learning rate: 1e-4, cosine schedule, 300-step linear warmup
Training time: 31 hours
Throughput: 91K tokens/sec, 11.5 sec/step

Results

Table with columns: Metric, Value
Metric	Value
Initial loss (step 10)	1.08
Final loss (step 9,536)	0.231
Loss reduction	-79%
Base perplexity	3.83
CPT perplexity	1.30
Perplexity reduction	-66.1%

Scaling Law

All four models in the series converge to similar perplexity after CPT:

Table with columns: Model, Base PPL, CPT PPL, Reduction
Model	Base PPL	CPT PPL	Reduction
0.5B	6.83	1.35	-80%
1.5B	4.61	1.31	-72%
3B	3.83	1.30

Intended Use

This is a base model (not instruction-tuned). It is intended for:

Research on domain adaptation of LLMs for low-resource legal languages
Downstream fine-tuning for Ukrainian legal NLP tasks
Scaling law analysis of continued pretraining
Perplexity evaluation on Ukrainian legal text

Limitations

Not instruction-tuned; will not follow instructions or chat
Trained on Ukrainian court decisions only; may not generalize to other legal systems

qwen2.5-3b-edrsr-legal-uk

Get help setting up a custom Dedicated Endpoints.

README

Training Data

Training Details

Results

Scaling Law

Intended Use

Limitations

Explore FriendliAI today

README

Training Data

Training Details

Results

Scaling Law

Intended Use

Limitations

qwen2.5-3b-edrsr-legal-uk

Get help setting up a custom Dedicated Endpoints.

Training Data

Training Details

Results

Scaling Law

Intended Use

Limitations

Related Resources

Explore FriendliAI today

Training Data

Training Details

Results

Scaling Law

Intended Use

Limitations

Related Resources