Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Results

Evaluated on a held-out internal text-to-SQL test set (500 examples, schema-aware prompt):

ModelExact Set MatchValid SQL
mistralai/Mistral-7B-v0.3 (base)9.2%86.2%
SQLForge-Mistral-7B-QLoRA87.0%97.4%

A 9.5× improvement in exact-match accuracy and a +11.2 pp lift in syntactic validity, delivered by a 336 MB LoRA adapter.

Usage

python

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
BASE = "mistralai/Mistral-7B-v0.3"
ADAPT = "shreyash-pandey-katni/SQLForge-Mistral-7B-QLoRA"
tok = AutoTokenizer.from_pretrained(BASE)
model = AutoModelForCausalLM.from_pretrained(
BASE,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, ADAPT)
model.eval()
schema = "CREATE TABLE employees (id INT, name TEXT, dept TEXT, salary REAL);"
question = "List the names of employees in the Engineering department earning over 100000."
prompt = f"""[INST] You are an expert SQL assistant. Given the schema and question, write a single SQL query.
Schema:
{schema}
Question: {question} [/INST]"""
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

4-bit inference (single 12 GB GPU)

python

from transformers import BitsAndBytesConfig
bnb = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(model, ADAPT)

Training Details

  • Base model: mistralai/Mistral-7B-v0.3
  • Method: QLoRA (4-bit NF4 base, BF16 LoRA adapters) via trl.SFTTrainer
  • Datasets:
  • Prompt format: [INST] … [/INST] (Mistral instruction template). Loss is masked to apply only on tokens after [/INST] (the SQL response).
  • Max sequence length: 512 tokens

LoRA configuration

SettingValue
Rank r64
Alpha16 (scale = α/r = 0.25, conservative — preserves reasoning)
Dropout0.05
RSLoRAenabled (1/√r scaling for r=64 stability)
Target modulesq_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Biasnone

Training hyperparameters

SettingValue
Epochs3
Effective batch size8 (1 × 8 grad-accum)
Learning rate2e-4
LR schedulercosine
Warmup ratio0.03
PrecisionBF16
Gradient checkpointingenabled
OptimizerAdamW (transformers default)
Checkpointcheckpoint-34020 (final)

Hardware

  • GPU: 1× NVIDIA RTX 3080 Ti (12 GB VRAM)
  • Peak VRAM: ~8.25 GB (4-bit NF4 base + BF16 adapters + grad-ckpt)
  • Training time: ~4–8 hours

Limitations

  • Schema sensitivity: trained with explicit CREATE TABLE schemas in the prompt; performance degrades without schema context.
  • Single-query bias: optimized for single SELECT statements; multi-statement scripts, DDL, and procedural SQL are out of scope.
  • Dialect: training data is primarily SQLite/ANSI; vendor-specific syntax (T-SQL, PL/pgSQL, BigQuery functions) may not generalize.
  • English-only: prompts and questions are English; non-English NL queries untested.
  • No execution safety: generated SQL should be sandboxed/parameterized before running against production databases.

Evaluation

The metrics above use exact_set_match (gold and predicted normalized SQL match exactly as token sets) and valid_sql_pct (predicted SQL parses with sqlparse/sqlite3). See evaluate_sql.py in the training repository for the full eval harness.

Files

FilePurpose
adapter_model.safetensorsLoRA weights (~336 MB)
adapter_config.jsonPEFT/LoRA configuration
tokenizer.json / tokenizer_config.jsonMistral tokenizer
optimizer.pt, scheduler.pt, rng_state.pth, training_args.bin, trainer_state.jsonTraining-state snapshot (resume-capable)

License

Released under Apache 2.0, the same license as the base model mistralai/Mistral-7B-v0.3.

Citation

bibtex

@misc{sqlforge_mistral7b_qlora_2026,
title = {SQLForge-Mistral-7B-QLoRA: A QLoRA Text-to-SQL Adapter for Mistral-7B-v0.3},
author = {Shreyash Pandey Katni},
year = {2026},
url = {https://huggingface.co/shreyash-pandey-katni/SQLForge-Mistral-7B-QLoRA}
}

Model provider

shreyash-pandey-katni

Model tree

Base

mistralai/Mistral-7B-v0.3

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today