Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Results
Evaluated on a held-out internal text-to-SQL test set (500 examples, schema-aware prompt):
| Model | Exact Set Match | Valid SQL |
|---|---|---|
mistralai/Mistral-7B-v0.3 (base) | 9.2% | 86.2% |
| SQLForge-Mistral-7B-QLoRA | 87.0% | 97.4% |
A 9.5× improvement in exact-match accuracy and a +11.2 pp lift in syntactic validity, delivered by a 336 MB LoRA adapter.
Usage
python
import torchfrom transformers import AutoTokenizer, AutoModelForCausalLMfrom peft import PeftModelBASE = "mistralai/Mistral-7B-v0.3"ADAPT = "shreyash-pandey-katni/SQLForge-Mistral-7B-QLoRA"tok = AutoTokenizer.from_pretrained(BASE)model = AutoModelForCausalLM.from_pretrained(BASE,torch_dtype=torch.bfloat16,device_map="auto",)model = PeftModel.from_pretrained(model, ADAPT)model.eval()schema = "CREATE TABLE employees (id INT, name TEXT, dept TEXT, salary REAL);"question = "List the names of employees in the Engineering department earning over 100000."prompt = f"""[INST] You are an expert SQL assistant. Given the schema and question, write a single SQL query.Schema:{schema}Question: {question} [/INST]"""inputs = tok(prompt, return_tensors="pt").to(model.device)out = model.generate(**inputs, max_new_tokens=128, do_sample=False)print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
4-bit inference (single 12 GB GPU)
python
from transformers import BitsAndBytesConfigbnb = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_quant_type="nf4",bnb_4bit_compute_dtype=torch.bfloat16,)model = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto")model = PeftModel.from_pretrained(model, ADAPT)
Training Details
- Base model:
mistralai/Mistral-7B-v0.3 - Method: QLoRA (4-bit NF4 base, BF16 LoRA adapters) via
trl.SFTTrainer - Datasets:
spider— cross-domain text-to-SQLb-mc2/sql-create-context— schema-grounded SQL- Combined: 90,719 train / 3,929 val rows
- Prompt format:
[INST] … [/INST](Mistral instruction template). Loss is masked to apply only on tokens after[/INST](the SQL response). - Max sequence length: 512 tokens
LoRA configuration
| Setting | Value |
|---|---|
Rank r | 64 |
| Alpha | 16 (scale = α/r = 0.25, conservative — preserves reasoning) |
| Dropout | 0.05 |
| RSLoRA | enabled (1/√r scaling for r=64 stability) |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Bias | none |
Training hyperparameters
| Setting | Value |
|---|---|
| Epochs | 3 |
| Effective batch size | 8 (1 × 8 grad-accum) |
| Learning rate | 2e-4 |
| LR scheduler | cosine |
| Warmup ratio | 0.03 |
| Precision | BF16 |
| Gradient checkpointing | enabled |
| Optimizer | AdamW (transformers default) |
| Checkpoint | checkpoint-34020 (final) |
Hardware
- GPU: 1× NVIDIA RTX 3080 Ti (12 GB VRAM)
- Peak VRAM: ~8.25 GB (4-bit NF4 base + BF16 adapters + grad-ckpt)
- Training time: ~4–8 hours
Limitations
- Schema sensitivity: trained with explicit
CREATE TABLEschemas in the prompt; performance degrades without schema context. - Single-query bias: optimized for single SELECT statements; multi-statement scripts, DDL, and procedural SQL are out of scope.
- Dialect: training data is primarily SQLite/ANSI; vendor-specific syntax (T-SQL, PL/pgSQL, BigQuery functions) may not generalize.
- English-only: prompts and questions are English; non-English NL queries untested.
- No execution safety: generated SQL should be sandboxed/parameterized before running against production databases.
Evaluation
The metrics above use exact_set_match (gold and predicted normalized SQL match exactly
as token sets) and valid_sql_pct (predicted SQL parses with sqlparse/sqlite3).
See evaluate_sql.py in the training repository
for the full eval harness.
Files
| File | Purpose |
|---|---|
adapter_model.safetensors | LoRA weights (~336 MB) |
adapter_config.json | PEFT/LoRA configuration |
tokenizer.json / tokenizer_config.json | Mistral tokenizer |
optimizer.pt, scheduler.pt, rng_state.pth, training_args.bin, trainer_state.json | Training-state snapshot (resume-capable) |
License
Released under Apache 2.0, the same license as the base model
mistralai/Mistral-7B-v0.3.
Citation
bibtex
@misc{sqlforge_mistral7b_qlora_2026,title = {SQLForge-Mistral-7B-QLoRA: A QLoRA Text-to-SQL Adapter for Mistral-7B-v0.3},author = {Shreyash Pandey Katni},year = {2026},url = {https://huggingface.co/shreyash-pandey-katni/SQLForge-Mistral-7B-QLoRA}}
Model provider
shreyash-pandey-katni
Model tree
Base
mistralai/Mistral-7B-v0.3
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information