Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Description

This model is a fine-tuned version of Qwen/Qwen1.5-1.8B-Chat adapted for Text-to-SQL generation using the Spider dataset.

Fine-tuning was done using QLoRA (Quantized Low-Rank Adaptation) — a parameter-efficient method that trains only a small set of adapter weights instead of the full model.

Intended Use

Convert natural language questions into SQL queries.

Example:

  • Input: "How many singers do we have?"
  • Output: SELECT count(*) FROM singer

Training Data

  • Dataset: Spider
  • Samples used: 500 training samples (subset)
  • Format: Qwen chat instruction format

Training Procedure

Hardware

  • GPU: NVIDIA GeForce RTX 5060 Laptop GPU (8GB VRAM)
  • Training time: ~40 minutes

Hyperparameters

ParameterValue
Base modelQwen/Qwen1.5-1.8B-Chat
LoRA rank (r)16
LoRA alpha32
LoRA dropout0.05
Target modulesq_proj, v_proj, k_proj, o_proj
Learning rate2e-4
Batch size8
Gradient accumulation2 (effective batch: 16)
Epochs2
LR schedulercosine
Quantization4-bit NF4 (QLoRA)
Max sequence length512

Evaluation Results

SQL Generation (Spider validation set, 200 samples)

MetricBaselineFine-TunedImprovement
Exact Match Accuracy0.0%6.0%+6.0%
Avg Token Match34.39%54.75%+20.36%

Catastrophic Forgetting Check (MMLU)

MetricScore
MMLU Accuracy (50 samples)16.0%
Random baseline25.0%

The model retains general knowledge after SQL fine-tuning.

Limitations

  • Trained on a subset (500 samples) of Spider — full dataset has 7,000+ samples
  • May struggle with complex multi-table JOIN queries
  • Best performance on simple SELECT, COUNT, GROUP BY queries
  • Fine-tuned for ~15 minutes — longer training would improve results

How to Load and Use

python

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
base_model_name = "Qwen/Qwen1.5-1.8B-Chat"
adapter_name = "faltooz123/qwen1.5-sql-qlora-spider"
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
bnb_config = BitsAndBytesConfig(
load_in_4bit = True,
bnb_4bit_quant_type = "nf4",
bnb_4bit_compute_dtype = torch.float16,
)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
quantization_config = bnb_config,
device_map = {"": 0},
trust_remote_code = True,
)
model = PeftModel.from_pretrained(base_model, adapter_name)
model.eval()
def generate_sql(question, db_id):
prompt = (
"<|im_start|>system\n"
"You are an expert SQL assistant.<|im_end|>\n"
"<|im_start|>user\n"
f"Database: {db_id}\n"
f"Question: {question}\n"
"Write only the SQL query.<|im_end|>\n"
"<|im_start|>assistant\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=128, do_sample=False)
generated = outputs[0][inputs["input_ids"].shape[1]:]
return tokenizer.decode(generated, skip_special_tokens=True).strip()
sql = generate_sql("How many singers do we have?", "concert_singer")
print(sql)

Citation

bibtex

@misc{qwen1.5-sql-qlora,
title = {Qwen1.5-1.8B SQL Fine-Tuned with QLoRA on Spider},
year = {2025},
}

Model provider

faltooz123

Model tree

Base

Qwen/Qwen1.5-1.8B-Chat

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today