EphAsad

Mnemosyne-3B

Deploy Dedicated

README

License: apache-2.0

Intended Use

Mnemosyne-3B is suited for:

Generating SQL queries from natural language questions against a provided database schema
Applications in laboratory information management systems (LIMS), food and water testing, and scientific data management
General-purpose text-to-SQL use cases where low-latency local inference is required
Developer tooling, data analyst assistants, and schema-aware chatbots

Mnemosyne-3B is not suited for:

Tasks requiring external knowledge beyond the provided schema
Applications without a schema context (schema must be provided at inference time)
Safety-critical automated execution without a human review step

Model Details

Table with columns: Property, Value
Property	Value
Base model	Qwen/Qwen2.5-Coder-3B-Instruct
Parameters	3B
Fine-tuning method	QLoRA
LoRA rank / alpha	r=64, alpha=128
LoRA target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training hardware	NVIDIA A100 40GB
Training framework	Unsloth + TRL SFTTrainer
Precision (training)	bf16 with 4-bit quantised base (QLoRA)
Precision (release)	bf16 (merged), Q4_K_M GGUF, Q8_0 GGUF
License	Apache 2.0
Author	Zain Asad

Training

Hyperparameters

Table with columns: Setting, Value
Setting	Value
Epochs	2 (best checkpoint at step 1000 of 1224)
Per-device batch size	32
Gradient accumulation steps	2
Effective batch size	64
Learning rate	2e-4
LR scheduler	Cosine
Warmup ratio	0.05
Weight decay	0.01
Optimiser	AdamW 8-bit

The model converged at step 1000 — eval loss plateaued beyond this point, and checkpoint-1000 was selected as the final model.

Prompt Format

Mnemosyne-3B uses the Qwen2.5 ChatML format with a task-specific system prompt:

markdown
<|im_start|>system
You are Mnemosyne, an expert SQL assistant specialising in laboratory,
scientific, food safety, water quality, and general-purpose database queries.
Given a database schema and a natural language question, generate a correct,
well-formatted SQL query. Return only the SQL with no explanation.<|im_end|>
<|im_start|>user
### Schema:
{DDL}

### Question:
{natural_language_question}<|im_end|>
<|im_start|>assistant
{sql_query}<|im_end|>

Training Data

Mnemosyne-3B was trained on a combination of three datasets, capped at 20,000 examples per source and shuffled before training:

Table with columns: Dataset, Examples used, Role
Dataset	Examples used	Role
b-mc2/sql-create-context	20,000	General SQL foundation — single and multi-table queries
gretelai/synthetic_text_to_sql	20,000	Complex SQL complexity — CTEs, window functions, subqueries
Mnemosyne Lab Dataset (custom)	579	Laboratory / LIMS domain specialisation

Combined training set: ~40,579 examples (after quality filtering).

Mnemosyne Lab Dataset

The lab domain dataset was purpose-built for this fine-tune. It covers an 8-table LIMS schema (clients, samples, analysts, methods, determinands, results, worksheets, worksheet_samples) spanning food safety, drinking water, surface water, and environmental microbiology testing.

All examples are entirely synthetic and fictional. Company names, client names, staff names, and sample identifiers are invented. Analyte names and method references (ISO 9308-1, ISO 11290-1, ISO 6579-1, EC 2073/2005, EU DWD 2020/2184, etc.) reflect public international standards and are not proprietary. The dataset contains no real personal data, no real employer information, and no confidential laboratory records.

The dataset covers three complexity tiers:

Table with columns: Tier, Examples, Coverage
Tier	Examples	Coverage
Simple	217	Single-table SELECT, basic WHERE/COUNT/LIMIT, NULL checks, date filters
Moderate	215	Multi-table JOINs, GROUP BY + HAVING, CASE WHEN, NOT EXISTS, turnaround calculations
Complex	147	CTEs, LAG/LEAD, RANK/DENSE_RANK/NTILE, rolling averages, correlated subqueries, UNION, year-on-year pivots

Evaluation

All evaluations use execution accuracy (EX) — result set comparison against a live SQLite database — rather than exact match. This is the gold standard metric for text-to-SQL because syntactically different queries can return identical results. Exact match (EM) and valid SQL rate (VLD) are reported as supplementary metrics.

Results

Table with columns: Benchmark, n, Metric, Base (Qwen2.5-Coder-3B-Instruct), Mnemosyne-3B, Delta
Benchmark	n	Metric	Base (Qwen2.5-Coder-3B-Instruct)	Mnemosyne-3B	Delta
Spider (train split)	500	EX	65.4%	57.6%	-7.8%
Spider (train split)	500	EM	17.8%	15.2%	-2.6%
Spider (train split)	500

Interpretation

Mnemosyne-3B demonstrates the expected trade-off of targeted domain fine-tuning: a modest regression on general cross-domain SQL (Spider, -7.8% EX) in exchange for a large gain on laboratory domain SQL (+48% EX overall). The base model scores near-zero (2.6% EX) on complex LIMS queries; Mnemosyne-3B reaches 43.6% — a 17× improvement.

The high exact match on the lab suite (EM=89% vs EX=78%) reflects a known limitation of the evaluation setup: gold SQL was authored in PostgreSQL syntax, and some PostgreSQL-specific functions (DATE_TRUNC, INTERVAL, EXTRACT) are not natively supported by SQLite. Queries where Mnemosyne generates an exact match with the gold SQL may fail execution against SQLite even though the SQL is correct. Real-world execution accuracy on a PostgreSQL deployment would be higher than reported.

Limitations

General SQL regression: Mnemosyne-3B trades approximately 8% Spider EX for domain specialisation. For purely general-purpose SQL use cases, the base Qwen2.5-Coder-3B-Instruct may perform better.
Schema required at inference time: The model has no implicit knowledge of any specific database. A DDL schema must be provided in every prompt.
Schema length: Very long schemas (many tables, many columns) may be truncated at the 2048-token context limit. Prioritise relevant tables where possible.
Complex SQL ceiling: At 3B parameters, performance on multi-CTE, deeply nested, or multi-schema queries is limited. Consider larger models for enterprise-grade analytical SQL.
Dialect sensitivity: The model was primarily trained on ANSI/PostgreSQL-style SQL. Highly dialect-specific syntax (T-SQL, PL/pgSQL procedural blocks) is not a primary use case.
No execution or error correction: The model generates SQL in a single forward pass. It does not self-correct on execution errors. Downstream agents should implement error-feedback loops if needed.

How to Use

Transformers (Python)

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "EphAsad/Mnemosyne-3B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

SYSTEM_PROMPT = (
    "You are Mnemosyne, an expert SQL assistant specialising in laboratory, "
    "scientific, food safety, water quality, and general-purpose database queries. "
    "Given a database schema and a natural language question, generate a correct, "
    "well-formatted SQL query. Return only the SQL with no explanation."
)

def generate_sql(schema: str, question: str) -> str:
    prompt = (
        f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
        f"<|im_start|>user\n"
        f"### Schema:\n{schema.strip()}\n\n"
        f"### Question:\n{question.strip()}<|im_end|>\n"
        f"<|im_start|>assistant\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.1,
        do_sample=True,
        eos_token_id=tokenizer.convert_tokens_to_ids("<|im_end|>"),
    )
    decoded = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True,
    )
    return decoded.split("<|im_end|>")[0].strip()

# General SQL example
schema = """
CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    price FLOAT,
    category VARCHAR(50),
    stock INT
);
"""
question = "Show the top 5 most expensive products in the Electronics category that are still in stock."
print(generate_sql(schema, question))
# SELECT name, price FROM products
# WHERE category = 'Electronics' AND stock > 0
# ORDER BY price DESC LIMIT 5;

# Laboratory domain example
lab_schema = """
CREATE TABLE results (
    result_id SERIAL PRIMARY KEY,
    sample_id VARCHAR(20),
    determinand_id INT,
    numeric_value FLOAT,
    pass_fail CHAR(1),
    test_date DATE
);
CREATE TABLE determinands (
    determinand_id SERIAL PRIMARY KEY,
    determinand_name VARCHAR(100),
    unit VARCHAR(20)
);
CREATE TABLE samples (
    sample_id VARCHAR(20) PRIMARY KEY,
    matrix VARCHAR(50),
    collection_date DATE,
    client_id INT
);
CREATE TABLE clients (
    client_id SERIAL PRIMARY KEY,
    client_name VARCHAR(100)
);
"""
lab_question = "Show all failed E. coli results from drinking water samples in the last 30 days, ordered by numeric value descending."
print(generate_sql(lab_schema, lab_question))

Unsloth (fast inference)

python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="EphAsad/Mnemosyne-3B",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

GGUF — Ollama

bash
ollama run EphAsad/Mnemosyne-3B-Q4_K_M

GGUF — llama.cpp

bash
./llama-cli \
  -m Mnemosyne-3B-Q4_K_M.gguf \
  --temp 0.1 \
  -n 256 \
  -p "<|im_start|>system\nYou are Mnemosyne...

Available Files

Table with columns: File, Format, Size (approx), Use case
File	Format	Size (approx)	Use case
`model.safetensors` (sharded)	bf16	~6.2 GB	Full precision inference, further fine-tuning
`Mnemosyne-3B-Q4_K_M.gguf`	GGUF 4-bit	~2.0 GB	llama.cpp, LM Studio, Ollama — recommended for most users
`Mnemosyne-3B-Q8_0.gguf`	GGUF 8-bit	~3.3 GB	llama.cpp, LM Studio, Ollama — higher quality

Ethical Considerations

Training data privacy: The custom lab domain dataset used in training contains no real personal data, no real employer information, and no confidential laboratory records. All company names, client names, staff names, and identifiers are entirely fictional. Analyte names and method references reflect public international standards (ISO, EN, EPA, EC regulations).

Intended for decision support, not autonomous operation: SQL generated by this model should be reviewed before execution in production systems. The model may produce syntactically valid but semantically incorrect queries, particularly on complex schemas it has not been trained on. Human review is strongly recommended in regulated environments.

Potential for misuse: As with all SQL generation models, outputs should not be executed with elevated database privileges without appropriate access controls. The model has no awareness of data sensitivity or access permissions.

Citation

If you use Mnemosyne-3B in your work, please cite:

bibtex
@misc{asad2025mnemosyne3b,
  author       = {Zain Asad},
  title        = {Mnemosyne-3B: A Domain-Specialised Text-to-SQL Model for Laboratory and Scientific Databases},
  year         = {2025},
  publisher    = {Hugging Face},
  url          = {https://huggingface.co/EphAsad/Mnemosyne-3B}
}

Support

If you find this model useful for your research or projects, you can support further development of my datasets and models here:
☕ ko-fi.com/ephraim123

Acknowledgements

Qwen team at Alibaba Cloud for the Qwen2.5-Coder base model
Unsloth for the efficient QLoRA training framework
b-mc2 and Gretel AI for the open SQL training datasets

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

EphAsad

Model Tree

Base

Qwen/Qwen2.5-Coder-3B-Instruct

Quantized

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

Intended Use

Mnemosyne-3B is suited for:

Generating SQL queries from natural language questions against a provided database schema
Applications in laboratory information management systems (LIMS), food and water testing, and scientific data management
General-purpose text-to-SQL use cases where low-latency local inference is required
Developer tooling, data analyst assistants, and schema-aware chatbots

Mnemosyne-3B is not suited for:

Tasks requiring external knowledge beyond the provided schema
Applications without a schema context (schema must be provided at inference time)
Safety-critical automated execution without a human review step

Model Details

Table with columns: Property, Value
Property	Value
Base model	Qwen/Qwen2.5-Coder-3B-Instruct
Parameters	3B
Fine-tuning method	QLoRA
LoRA rank / alpha	r=64, alpha=128
LoRA target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training hardware	NVIDIA A100 40GB
Training framework	Unsloth + TRL SFTTrainer
Precision (training)	bf16 with 4-bit quantised base (QLoRA)
Precision (release)	bf16 (merged), Q4_K_M GGUF, Q8_0 GGUF
License	Apache 2.0
Author	Zain Asad

Training

Hyperparameters

Table with columns: Setting, Value
Setting	Value
Epochs	2 (best checkpoint at step 1000 of 1224)
Per-device batch size	32
Gradient accumulation steps	2
Effective batch size	64
Learning rate	2e-4
LR scheduler	Cosine
Warmup ratio	0.05
Weight decay	0.01
Optimiser	AdamW 8-bit

The model converged at step 1000 — eval loss plateaued beyond this point, and checkpoint-1000 was selected as the final model.

Prompt Format

Mnemosyne-3B uses the Qwen2.5 ChatML format with a task-specific system prompt:

markdown
<|im_start|>system
You are Mnemosyne, an expert SQL assistant specialising in laboratory,
scientific, food safety, water quality, and general-purpose database queries.
Given a database schema and a natural language question, generate a correct,
well-formatted SQL query. Return only the SQL with no explanation.<|im_end|>
<|im_start|>user
### Schema:
{DDL}

### Question:
{natural_language_question}<|im_end|>
<|im_start|>assistant
{sql_query}<|im_end|>

Training Data

Mnemosyne-3B was trained on a combination of three datasets, capped at 20,000 examples per source and shuffled before training:

Table with columns: Dataset, Examples used, Role
Dataset	Examples used	Role
b-mc2/sql-create-context	20,000	General SQL foundation — single and multi-table queries
gretelai/synthetic_text_to_sql	20,000	Complex SQL complexity — CTEs, window functions, subqueries
Mnemosyne Lab Dataset (custom)	579	Laboratory / LIMS domain specialisation

Combined training set: ~40,579 examples (after quality filtering).

Mnemosyne Lab Dataset

The dataset covers three complexity tiers:

Table with columns: Tier, Examples, Coverage
Tier	Examples	Coverage
Simple	217	Single-table SELECT, basic WHERE/COUNT/LIMIT, NULL checks, date filters
Moderate	215	Multi-table JOINs, GROUP BY + HAVING, CASE WHEN, NOT EXISTS, turnaround calculations
Complex	147	CTEs, LAG/LEAD, RANK/DENSE_RANK/NTILE, rolling averages, correlated subqueries, UNION, year-on-year pivots

Evaluation

Results

Table with columns: Benchmark, n, Metric, Base (Qwen2.5-Coder-3B-Instruct), Mnemosyne-3B, Delta
Benchmark	n	Metric	Base (Qwen2.5-Coder-3B-Instruct)	Mnemosyne-3B	Delta
Spider (train split)	500	EX	65.4%	57.6%	-7.8%
Spider (train split)	500	EM	17.8%	15.2%	-2.6%
Spider (train split)	500

Interpretation

Limitations

General SQL regression: Mnemosyne-3B trades approximately 8% Spider EX for domain specialisation. For purely general-purpose SQL use cases, the base Qwen2.5-Coder-3B-Instruct may perform better.
Schema required at inference time: The model has no implicit knowledge of any specific database. A DDL schema must be provided in every prompt.
Schema length: Very long schemas (many tables, many columns) may be truncated at the 2048-token context limit. Prioritise relevant tables where possible.
Complex SQL ceiling: At 3B parameters, performance on multi-CTE, deeply nested, or multi-schema queries is limited. Consider larger models for enterprise-grade analytical SQL.
Dialect sensitivity: The model was primarily trained on ANSI/PostgreSQL-style SQL. Highly dialect-specific syntax (T-SQL, PL/pgSQL procedural blocks) is not a primary use case.
No execution or error correction: The model generates SQL in a single forward pass. It does not self-correct on execution errors. Downstream agents should implement error-feedback loops if needed.

How to Use

Transformers (Python)

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "EphAsad/Mnemosyne-3B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

SYSTEM_PROMPT = (
    "You are Mnemosyne, an expert SQL assistant specialising in laboratory, "
    "scientific, food safety, water quality, and general-purpose database queries. "
    "Given a database schema and a natural language question, generate a correct, "
    "well-formatted SQL query. Return only the SQL with no explanation."
)

def generate_sql(schema: str, question: str) -> str:
    prompt = (
        f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
        f"<|im_start|>user\n"
        f"### Schema:\n{schema.strip()}\n\n"
        f"### Question:\n{question.strip()}<|im_end|>\n"
        f"<|im_start|>assistant\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.1,
        do_sample=True,
        eos_token_id=tokenizer.convert_tokens_to_ids("<|im_end|>"),
    )
    decoded = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True,
    )
    return decoded.split("<|im_end|>")[0].strip()

# General SQL example
schema = """
CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    price FLOAT,
    category VARCHAR(50),
    stock INT
);
"""
question = "Show the top 5 most expensive products in the Electronics category that are still in stock."
print(generate_sql(schema, question))
# SELECT name, price FROM products
# WHERE category = 'Electronics' AND stock > 0
# ORDER BY price DESC LIMIT 5;

# Laboratory domain example
lab_schema = """
CREATE TABLE results (
    result_id SERIAL PRIMARY KEY,
    sample_id VARCHAR(20),
    determinand_id INT,
    numeric_value FLOAT,
    pass_fail CHAR(1),
    test_date DATE
);
CREATE TABLE determinands (
    determinand_id SERIAL PRIMARY KEY,
    determinand_name VARCHAR(100),
    unit VARCHAR(20)
);
CREATE TABLE samples (
    sample_id VARCHAR(20) PRIMARY KEY,
    matrix VARCHAR(50),
    collection_date DATE,
    client_id INT
);
CREATE TABLE clients (
    client_id SERIAL PRIMARY KEY,
    client_name VARCHAR(100)
);
"""
lab_question = "Show all failed E. coli results from drinking water samples in the last 30 days, ordered by numeric value descending."
print(generate_sql(lab_schema, lab_question))

Unsloth (fast inference)

python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="EphAsad/Mnemosyne-3B",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

GGUF — Ollama

bash
ollama run EphAsad/Mnemosyne-3B-Q4_K_M

GGUF — llama.cpp

bash
./llama-cli \
  -m Mnemosyne-3B-Q4_K_M.gguf \
  --temp 0.1 \
  -n 256 \
  -p "<|im_start|>system\nYou are Mnemosyne...

Available Files

Table with columns: File, Format, Size (approx), Use case
File	Format	Size (approx)	Use case
`model.safetensors` (sharded)	bf16	~6.2 GB	Full precision inference, further fine-tuning
`Mnemosyne-3B-Q4_K_M.gguf`	GGUF 4-bit	~2.0 GB	llama.cpp, LM Studio, Ollama — recommended for most users
`Mnemosyne-3B-Q8_0.gguf`	GGUF 8-bit	~3.3 GB	llama.cpp, LM Studio, Ollama — higher quality

Ethical Considerations

Citation

If you use Mnemosyne-3B in your work, please cite:

bibtex
@misc{asad2025mnemosyne3b,
  author       = {Zain Asad},
  title        = {Mnemosyne-3B: A Domain-Specialised Text-to-SQL Model for Laboratory and Scientific Databases},
  year         = {2025},
  publisher    = {Hugging Face},
  url          = {https://huggingface.co/EphAsad/Mnemosyne-3B}
}

Support

If you find this model useful for your research or projects, you can support further development of my datasets and models here:
☕ ko-fi.com/ephraim123

Acknowledgements

Qwen team at Alibaba Cloud for the Qwen2.5-Coder base model
Unsloth for the efficient QLoRA training framework
b-mc2 and Gretel AI for the open SQL training datasets

Mnemosyne-3B

README

Intended Use

Model Details

Training

Hyperparameters

Prompt Format

Training Data

Mnemosyne Lab Dataset

Evaluation

Results

Interpretation

Limitations

How to Use

Transformers (Python)

Unsloth (fast inference)

GGUF — Ollama

GGUF — llama.cpp

Available Files

Ethical Considerations

Citation

Support

If you find this model useful for your research or projects, you can support further development of my datasets and models here: ☕ ko-fi.com/ephraim123

Acknowledgements

Explore FriendliAI today

README

Intended Use

Model Details

Training

Hyperparameters

Prompt Format

Training Data

Mnemosyne Lab Dataset

Evaluation

Results

Interpretation

Limitations

How to Use

Transformers (Python)

Unsloth (fast inference)

GGUF — Ollama

GGUF — llama.cpp

Available Files

Ethical Considerations

Citation

Support

If you find this model useful for your research or projects, you can support further development of my datasets and models here: ☕ ko-fi.com/ephraim123

Acknowledgements

If you find this model useful for your research or projects, you can support further development of my datasets and models here:
☕ ko-fi.com/ephraim123

If you find this model useful for your research or projects, you can support further development of my datasets and models here:
☕ ko-fi.com/ephraim123