anasxs

icd10-llama3-8b-qlora-adapter

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Model Summary

Table with columns: Field, Value
Field	Value
Base model	`meta-llama/Meta-Llama-3-8B`
Adapter type	PEFT LoRA
Fine-tuning method	QLoRA
Task	ICD-10-CM code generation
Dataset	`generative-technologies/synth-ehr-icd10-llama3-format`
Training hardware	Kaggle 2× NVIDIA T4
Language	English

This repository contains only the LoRA adapter weights. You must have access to the gated meta-llama/Meta-Llama-3-8B base model on Hugging Face to use it.

Problem

Medical coding converts clinical documentation into standardized ICD-10-CM diagnosis codes. This project explores whether a compact LoRA adapter can teach an open-weight LLM to map synthetic EHR-style notes to ICD-10-CM codes.

Example task: Input: Patient is a 58-year-old male with type 2 diabetes mellitus without complications and essential hypertension. Output: E11.9, I10

Training Setup

Table with columns: Setting, Value
Setting	Value
Train samples	50,000
Validation samples	2,000
Epochs	1
Max sequence length	768
Quantization	4-bit NF4
Double quantization	Enabled
Compute dtype	float16
LoRA rank	16
LoRA alpha	32

Training used completion-only loss: prompt tokens were masked, and loss was computed only on the ICD-10-CM answer tokens.

Dataset Processing

Before training, examples were cleaned and normalized:

Removed empty clinical notes and empty targets
Normalized ICD-10-CM code formatting
Filtered examples with ICD code leakage in the prompt
Removed duplicate examples
Converted each example into Llama 3 chat format

Prompt format: <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an expert medical coder. Read the clinical note and return only the correct ICD-10-CM code or codes. <|eot_id|><|start_header_id|>user<|end_header_id|> {clinical_note} <|eot_id|><|start_header_id|>assistant<|end_header_id|> {icd_codes}<|eot_id|>

Results

Table with columns: Split, Examples, Precision, Recall, F1, Exact Match, Micro F1
Split	Examples	Precision	Recall	F1	Exact Match	Micro F1
Validation	2,000	0.8705	0.8705	0.8705	0.8705	0.8714
Test	500	0.8720	0.8720	0.8720	0.8720	0.8755

Test evaluation was run on 500 examples due to Kaggle runtime constraints. Validation evaluation was run on 2,000 examples.

Usage

Load the model

python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

base_model = "meta-llama/Meta-Llama-3-8B"
adapter_id = "anasxs/icd10-llama3-8b-qlora-adapter"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(base_model)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
)

model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

Generate ICD-10-CM codes

python
clinical_note = "Patient is a 58-year-old male with type 2 diabetes mellitus without complications and essential hypertension."

prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are an expert medical coder. Read the clinical note and return only the correct ICD-10-CM code or codes.
<|eot_id|><|start_header_id|>user<|end_header_id|>

{clinical_note}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=32,
        do_sample=False,
        num_beams=1,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )

prediction = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1]:],
    skip_special_tokens=True,
).strip()

print(prediction)
# Expected: E11.9, I10

Intended Use

This adapter is intended for:

ML engineering portfolio demonstration
QLoRA and PEFT experimentation
Educational examples of fine-tuning open-weight LLMs
Synthetic EHR medical coding research practice

Limitations

Trained on synthetic EHR data, not real patient records — results do not prove clinical usefulness
May generate malformed text around ICD-10-CM codes
A production system would require stronger output parsing, clinical validation, privacy review, monitoring, and expert human oversight

Out-of-Scope Use

Do not use this model for:

Real patient care
Billing or insurance claims
Diagnosis or treatment decisions
Clinical coding automation without expert review
Any regulated medical workflow

Reproducibility

Trained with:

PyTorch · Hugging Face Transformers · PEFT · TRL · bitsandbytes · datasets
Weights & Biases (experiment tracking)
Kaggle T4 ×2
Hugging Face Hub checkpointing (to survive Kaggle session resets)

License

This adapter depends on meta-llama/Meta-Llama-3-8B. Users must comply with the Llama 3 Community License and Hugging Face access requirements.

Model provider

anasxs

Model tree

Base

meta-llama/Meta-Llama-3-8B

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Model Summary

Table with columns: Field, Value
Field	Value
Base model	`meta-llama/Meta-Llama-3-8B`
Adapter type	PEFT LoRA
Fine-tuning method	QLoRA
Task	ICD-10-CM code generation
Dataset	`generative-technologies/synth-ehr-icd10-llama3-format`
Training hardware	Kaggle 2× NVIDIA T4
Language	English

This repository contains only the LoRA adapter weights. You must have access to the gated meta-llama/Meta-Llama-3-8B base model on Hugging Face to use it.

Problem

Example task: Input: Patient is a 58-year-old male with type 2 diabetes mellitus without complications and essential hypertension. Output: E11.9, I10

Training Setup

Table with columns: Setting, Value
Setting	Value
Train samples	50,000
Validation samples	2,000
Epochs	1
Max sequence length	768
Quantization	4-bit NF4
Double quantization	Enabled
Compute dtype	float16
LoRA rank	16
LoRA alpha	32

Training used completion-only loss: prompt tokens were masked, and loss was computed only on the ICD-10-CM answer tokens.

Dataset Processing

Before training, examples were cleaned and normalized:

Removed empty clinical notes and empty targets
Normalized ICD-10-CM code formatting
Filtered examples with ICD code leakage in the prompt
Removed duplicate examples
Converted each example into Llama 3 chat format

Results

Table with columns: Split, Examples, Precision, Recall, F1, Exact Match, Micro F1
Split	Examples	Precision	Recall	F1	Exact Match	Micro F1
Validation	2,000	0.8705	0.8705	0.8705	0.8705	0.8714
Test	500	0.8720	0.8720	0.8720	0.8720	0.8755

Test evaluation was run on 500 examples due to Kaggle runtime constraints. Validation evaluation was run on 2,000 examples.

Usage

Load the model

python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

base_model = "meta-llama/Meta-Llama-3-8B"
adapter_id = "anasxs/icd10-llama3-8b-qlora-adapter"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(base_model)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
)

model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

Generate ICD-10-CM codes

python
clinical_note = "Patient is a 58-year-old male with type 2 diabetes mellitus without complications and essential hypertension."

prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are an expert medical coder. Read the clinical note and return only the correct ICD-10-CM code or codes.
<|eot_id|><|start_header_id|>user<|end_header_id|>

{clinical_note}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=32,
        do_sample=False,
        num_beams=1,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )

prediction = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1]:],
    skip_special_tokens=True,
).strip()

print(prediction)
# Expected: E11.9, I10

Intended Use

This adapter is intended for:

ML engineering portfolio demonstration
QLoRA and PEFT experimentation
Educational examples of fine-tuning open-weight LLMs
Synthetic EHR medical coding research practice

Limitations

Trained on synthetic EHR data, not real patient records — results do not prove clinical usefulness
May generate malformed text around ICD-10-CM codes
A production system would require stronger output parsing, clinical validation, privacy review, monitoring, and expert human oversight

Out-of-Scope Use

Do not use this model for:

Real patient care
Billing or insurance claims
Diagnosis or treatment decisions
Clinical coding automation without expert review
Any regulated medical workflow

Reproducibility

Trained with:

PyTorch · Hugging Face Transformers · PEFT · TRL · bitsandbytes · datasets
Weights & Biases (experiment tracking)
Kaggle T4 ×2
Hugging Face Hub checkpointing (to survive Kaggle session resets)

License

This adapter depends on meta-llama/Meta-Llama-3-8B. Users must comply with the Llama 3 Community License and Hugging Face access requirements.

icd10-llama3-8b-qlora-adapter

Get help setting up a custom Dedicated Endpoints.

README

Model Summary

Problem

Training Setup

Dataset Processing

Results

Usage

Load the model

Generate ICD-10-CM codes

Intended Use

Limitations

Out-of-Scope Use

Reproducibility

License

Explore FriendliAI today

README

Model Summary

Problem

Training Setup

Dataset Processing

Results

Usage

Load the model

Generate ICD-10-CM codes

Intended Use

Limitations

Out-of-Scope Use

Reproducibility

License