Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Model Details
Model Description
This is a fine-tuned version of unsloth/Llama-3.1-8B using QLoRA (Quantized Low-Rank Adaptation) via Unsloth. The model is trained on Ichsan2895/alpaca-gpt4-indonesian dataset containing 49,969 Indonesian instruction-response pairs.
The model uses Llama 3.1 chat template with the system prompt: "Kamu adalah asisten AI yang membantu menjawab pertanyaan pengguna berdasarkan konteks yang diberikan."
- Developed by: Threedotz
- Model type: Language Model (Causal LM)
- Language(s) (NLP): Indonesian (id)
- License: llama3.1 license
- Finetuned from model: unsloth/Llama-3.1-8B
Model Sources
- Repository: https://huggingface.co/threedotz/llama3.1-8b-qlora-alpaca-indonesian
- Base Model: https://huggingface.co/unsloth/Llama-3.1-8B
- Training Dataset: https://huggingface.co/datasets/Ichsan2895/alpaca-gpt4-indonesian
Uses
Direct Use
Indonesian language tasks:
- Question answering in Bahasa Indonesia
- Instruction following
- Text generation in Indonesian
Out-of-Scope Use
- Non-Indonesian language tasks
- Medical, legal, or financial advice without human oversight
How to Get Started with the Model
python
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamerimport torch# Load modelmodel_name = "threedotz/llama3.1-8b-qlora-alpaca-indonesian"tokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")# Promptprompt = "Apa itu machine learning?"# Tokenize & generateinputs = tokenizer(prompt, return_tensors="pt").to("cuda")text_streamer = TextStreamer(tokenizer, skip_prompt=True)_ = model.generate(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"],max_new_tokens=256, streamer=text_streamer)
Training Details
Training Data
- Dataset: Ichsan2895/alpaca-gpt4-indonesian
- Size: 49,969 instruction-response pairs
- Language: Indonesian (Bahasa Indonesia)
- Format: Alpaca instruction format (input → output)
- License: CC-BY-SA-4.0
Training Procedure
Preprocessing
Dataset formatted using Llama 3.1 chat template with system prompt.
Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Training regime | 4-bit QLoRA (bf16/fp16) |
| Max steps | 800 |
| Per device batch size | 1 |
| Gradient accumulation steps | 4 |
| Total batch size | 4 |
| Learning rate | 2e-4 |
| LR scheduler | linear |
| Warmup steps | 5 |
| Max sequence length | 512 |
| Optimizer | paged_adamw_8bit |
| Gradient checkpointing | unsloth |
QLoRA Configuration
| Parameter | Value |
|---|---|
| LoRA rank (r) | 16 |
| LoRA alpha | 16 |
| LoRA dropout | 0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Load in 4-bit | True |
Model Statistics
| Metric | Value |
|---|---|
| Total parameters | 8,072,204,288 |
| Trainable parameters | 41,943,040 |
| Trainable % | 0.52% |
Technical Specifications
Model Architecture
- Architecture: Llama 3.1 (Decoder-only Transformer)
- Parameters: 8 billion
- Quantization: 4-bit (QLoRA)
- Max sequence length: 2048
Compute Infrastructure
- GPU: Tesla T4 (Kaggle)
- Training time: ~1h 17min 11s
- Framework: Unsloth + TRL + Transformers
Citation
BibTeX:
bibtex
@misc{threedotz2024llama31indonesian,author = {Threedotz},title = {Llama 3.1 8B QLoRA Fine-tuned on Alpaca Indonesian},year = {2024},publisher = {HuggingFace},url = {https://huggingface.co/threedotz/llama3.1-8b-qlora-alpaca-indonesian}}
Model Card Contact
For questions, please contact Threedotz on HuggingFace.
Model provider
Threedotz
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information