azherali
Aqal-1.0-8B-Instruct
Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Container
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Quick start
python
from unsloth import FastLanguageModelimport torchmax_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!dtype = (None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+)load_in_4bit = False # Use 4bit quantization to reduce memory usage. Can be False.load_in_8bit = False # Use 8bit quantization to reduce memory usage. Can be False.model, tokenizer = FastLanguageModel.from_pretrained(model_name="azherali/Aqal-1.0-8B-Instruct", # Choose ANYmax_seq_length=max_seq_length,dtype=dtype,load_in_4bit=load_in_4bit,load_in_8bit=load_in_8bit,# token = "YOUR_HF_TOKEN", # HF Token for gated models)FastLanguageModel.for_inference(model) # Enable native 2x faster inferencemessages = [{"role": "user","content": "پانچ بچوں نے 20 چاکلیٹس برابر بانٹیں۔ ہر بچے کو کتنی چاکلیٹس ملیں گی؟",}]text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True, # Must add for generation)from transformers import TextStreamer_ = model.generate(**tokenizer(text, return_tensors="pt").to("cuda"),temperature=0.6,top_p=0.95,top_k=20, # For non thinkingstreamer=TextStreamer(tokenizer, skip_prompt=True),)
Training procedure
This model was trained with SFT.
Framework versions
- TRL: 0.22.2
- Transformers: 4.56.2
- Pytorch: 2.12.0+rocm7.2
- Datasets: 4.3.0
- Tokenizers: 0.22.2
Model provider
azherali
Model tree
Base
azherali/Aqal-1.0-8B
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information