Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Description
This model is a fine-tuned version of Qwen/Qwen1.5-1.8B-Chat adapted for Text-to-SQL generation using the Spider dataset.
Fine-tuning was done using QLoRA (Quantized Low-Rank Adaptation) — a parameter-efficient method that trains only a small set of adapter weights instead of the full model.
Intended Use
Convert natural language questions into SQL queries.
Example:
- Input: "How many singers do we have?"
- Output:
SELECT count(*) FROM singer
Training Data
- Dataset: Spider
- Samples used: 500 training samples (subset)
- Format: Qwen chat instruction format
Training Procedure
Hardware
- GPU: NVIDIA GeForce RTX 5060 Laptop GPU (8GB VRAM)
- Training time: ~40 minutes
Hyperparameters
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen1.5-1.8B-Chat |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, v_proj, k_proj, o_proj |
| Learning rate | 2e-4 |
| Batch size | 8 |
| Gradient accumulation | 2 (effective batch: 16) |
| Epochs | 2 |
| LR scheduler | cosine |
| Quantization | 4-bit NF4 (QLoRA) |
| Max sequence length | 512 |
Evaluation Results
SQL Generation (Spider validation set, 200 samples)
| Metric | Baseline | Fine-Tuned | Improvement |
|---|---|---|---|
| Exact Match Accuracy | 0.0% | 6.0% | +6.0% |
| Avg Token Match | 34.39% | 54.75% | +20.36% |
Catastrophic Forgetting Check (MMLU)
| Metric | Score |
|---|---|
| MMLU Accuracy (50 samples) | 16.0% |
| Random baseline | 25.0% |
The model retains general knowledge after SQL fine-tuning.
Limitations
- Trained on a subset (500 samples) of Spider — full dataset has 7,000+ samples
- May struggle with complex multi-table JOIN queries
- Best performance on simple SELECT, COUNT, GROUP BY queries
- Fine-tuned for ~15 minutes — longer training would improve results
How to Load and Use
python
import torchfrom transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfigfrom peft import PeftModelbase_model_name = "Qwen/Qwen1.5-1.8B-Chat"adapter_name = "faltooz123/qwen1.5-sql-qlora-spider"tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)bnb_config = BitsAndBytesConfig(load_in_4bit = True,bnb_4bit_quant_type = "nf4",bnb_4bit_compute_dtype = torch.float16,)base_model = AutoModelForCausalLM.from_pretrained(base_model_name,quantization_config = bnb_config,device_map = {"": 0},trust_remote_code = True,)model = PeftModel.from_pretrained(base_model, adapter_name)model.eval()def generate_sql(question, db_id):prompt = ("<|im_start|>system\n""You are an expert SQL assistant.<|im_end|>\n""<|im_start|>user\n"f"Database: {db_id}\n"f"Question: {question}\n""Write only the SQL query.<|im_end|>\n""<|im_start|>assistant\n")inputs = tokenizer(prompt, return_tensors="pt").to("cuda")with torch.no_grad():outputs = model.generate(**inputs, max_new_tokens=128, do_sample=False)generated = outputs[0][inputs["input_ids"].shape[1]:]return tokenizer.decode(generated, skip_special_tokens=True).strip()sql = generate_sql("How many singers do we have?", "concert_singer")print(sql)
Citation
bibtex
@misc{qwen1.5-sql-qlora,title = {Qwen1.5-1.8B SQL Fine-Tuned with QLoRA on Spider},year = {2025},}
Model provider
faltooz123
Model tree
Base
Qwen/Qwen1.5-1.8B-Chat
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information