qwen3.5-2b-sql-lora API & Inference Endpoint

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-2B", dtype="auto", device_map="auto")
tok  = AutoTokenizer.from_pretrained("Vicen-te/qwen3.5-2b-sql-lora")
model = PeftModel.from_pretrained(base, "Vicen-te/qwen3.5-2b-sql-lora")

messages = [
    {"role": "system", "content": "You are a precise Text-to-SQL assistant. Output only the SQL query."},
    {"role": "user", "content": "### Schema\nCREATE TABLE employees (id INT, name TEXT, salary REAL)\n\n### Question\nWhat is the average salary?\n\n### SQL"},
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tok([text], return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
print(tok.decode(out[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True))

Training

Base model: Qwen/Qwen3.5-2B (instruction-tuned, thinking mode disabled for SQL)
Method: LoRA (rank=16, α=32, dropout=0.05) on all linear layers
Dataset: Vicen-te/sql-create-context-mini — 300 train / 200 eval examples
Hardware: single GPU, bf16, 3 epochs, effective batch 16, cosine LR 2e-4
Trainer: TRL SFTTrainer

Evaluation

See the project repo for the full evaluation report (executable accuracy, exact match, BLEU) against the same base model on a held-out 200-example split.

Limitations

SQLite-flavoured SQL only; other dialects untested.
The training set is intentionally small (300 rows); this is a small-scale fine-tune, not a production-grade Text-to-SQL system.

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-2B", dtype="auto", device_map="auto")
tok  = AutoTokenizer.from_pretrained("Vicen-te/qwen3.5-2b-sql-lora")
model = PeftModel.from_pretrained(base, "Vicen-te/qwen3.5-2b-sql-lora")

messages = [
    {"role": "system", "content": "You are a precise Text-to-SQL assistant. Output only the SQL query."},
    {"role": "user", "content": "### Schema\nCREATE TABLE employees (id INT, name TEXT, salary REAL)\n\n### Question\nWhat is the average salary?\n\n### SQL"},
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tok([text], return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
print(tok.decode(out[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True))

Training

Base model: Qwen/Qwen3.5-2B (instruction-tuned, thinking mode disabled for SQL)
Method: LoRA (rank=16, α=32, dropout=0.05) on all linear layers
Dataset: Vicen-te/sql-create-context-mini — 300 train / 200 eval examples
Hardware: single GPU, bf16, 3 epochs, effective batch 16, cosine LR 2e-4
Trainer: TRL SFTTrainer

Evaluation

See the project repo for the full evaluation report (executable accuracy, exact match, BLEU) against the same base model on a held-out 200-example split.

Limitations

SQLite-flavoured SQL only; other dialects untested.
The training set is intentionally small (300 rows); this is a small-scale fine-tune, not a production-grade Text-to-SQL system.

qwen3.5-2b-sql-lora

Get help setting up a custom Dedicated Endpoints.

README

Usage

Training

Evaluation

Limitations

Explore FriendliAI today

README

Usage

Training

Evaluation

Limitations