Model Details
- Developed by: k111191114
- Base model: google/gemma-3-4b-it
- Model type: Text generation language model
- Languages: Vietnamese, English
- License: Apache 2.0
- Library: Transformers
- Training framework: Unsloth + Hugging Face TRL
The finetuned model achieved a score of 74.16 on the test dataset, compared to 42.62 for the original base model.
Table with columns: Model, Score| Model | Score |
|---|
| Original model | 42.62 |
| Finetuned model | 74.16 |
Training Data
The model was finetuned using Vietnamese and English medical question-answering and healthcare-related datasets.
Datasets used include:
-
PB3002/ViMedical_Disease
A Vietnamese dataset of over 12,000 questions about common disease symptoms.
Used for Vietnamese healthcare chatbot and disease/symptom prediction tasks.
-
hungnm/vietnamese-medical-qa
A Vietnamese medical question-answering dataset with approximately 9.3k samples.
-
urnus11/Vietnamese-Healthcare
A Vietnamese healthcare dataset with approximately 173k samples.
-
NIDDK Diabetes Overview
Medical information about diabetes from the National Institute of Diabetes and Digestive and Kidney Diseases:
https://www.niddk.nih.gov/health-information/diabetes/overview
-
PubMedQA
A biomedical question-answering dataset containing:
- Around 1,000 expert-labeled questions
- Around 61,200 unlabeled questions
- Around 211,300 artificially generated questions
-
Training
This model was trained with Unsloth and Hugging Face's TRL library.
Unsloth was used to make finetuning faster and more memory-efficient.
Intended Use
This model is intended for research and educational use in Vietnamese and English medical text-generation tasks, such as:
- Vietnamese medical question answering
- Healthcare chatbot research
- Medical text vẻification
- Medical assistant prototyping
Out-of-Scope Use
This model should not be used as a replacement for professional medical advice, diagnosis, or treatment.
Do not use this model as the sole basis for:
- Medical diagnosis
- Emergency medical decisions
- Prescription or dosage recommendations
- Treatment planning
- Clinical decision-making without human medical supervision
Limitations
This model may produce inaccurate, incomplete, outdated, biased, or hallucinated medical information.
Medical information generated by the model should always be verified by qualified healthcare professionals and trusted medical sources.
The model may also have limitations in:
- Rare diseases
- Complex clinical cases
- Emergency symptoms
- Drug interactions
- Patient-specific recommendations
- Non-Vietnamese or non-English medical contexts
How to Use
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "k111191114/gemma-3-finetune-medical-vie"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "Thường xuyên bị nhiễm trùng là triệu chứng của bệnh tiểu đường."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
)
Acknowledgements
This model was trained with Unsloth and Hugging Face's TRL library.