Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Installation
bash
pip install torch transformers bitsandbytes accelerate
Inference Example
python
import torchfrom transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfigfrom peft import PeftModelMODEL_NAME = "ahammad115566/smeft-qwen-7b"RESPONSE_PREFIX = "\n### Response:\n"bnb_config = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_quant_type="nf4",bnb_4bit_compute_dtype=torch.bfloat16,bnb_4bit_use_double_quant=True,)model = AutoModelForCausalLM.from_pretrained(MODEL_NAME,quantization_config=bnb_config,device_map={"": 0},trust_remote_code=True,)model.eval()tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)if tokenizer.pad_token is None:tokenizer.pad_token = tokenizer.eos_token# ================= Inference ==============def build_prompt(instruction: str, context: str = "") -> str:parts = [ f"\n### Instruction:\n{instruction}"]parts.append(RESPONSE_PREFIX)return "\n".join(parts)@torch.inference_mode()def ask(instruction: str, context: str = "") -> str:inputs = tokenizer(build_prompt(instruction, context),return_tensors="pt",add_special_tokens=True,).to("cuda:0")output_ids = model.generate(**inputs,max_new_tokens=512,do_sample=False,repetition_penalty=1.1,eos_token_id=tokenizer.eos_token_id,pad_token_id=tokenizer.pad_token_id,)new_tokens = output_ids[0][inputs["input_ids"].shape[-1]:]return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()"""
Training Details
| Parameter | Value |
|---|---|
| Base Model | Qwen2.5-7B-Instruct |
| Fine-tuning Method | LoRA (merged) |
| Inference Quantization | 4-bit NF4 (bitsandbytes) |
| Domain | Standard Model Effective Field Theory |
| Training Corpus | Curated SMEFT and HEP preprints |
| Task Format | Instruction-following scientific QA |
Limitations
This model may occasionally:
- Hallucinate operator identities
- Domain-locked by design. The model is not suitable for general-purpose tasks.
- 1,700 training examples. Coverage of the SMEFT operator space may be uneven; rare operators or non-Warsaw bases may be answered less reliably.
- Omitting the system prompt will cause the model to behave like the base Qwen2.5-7B-Instruct.
Outputs should be independently verified against the primary literature.
Authors
Ahmed Hammad
High-Energy Physics Researcher
The High Energy Accelerator Research Organization (KEK)
Veronica Sanz
Professor of Theoretical Physics
University of Valencia
Citation
A technical paper describing the dataset construction and fine-tuning procedure is forthcoming.
Please cite the model as:
bibtex
@misc{hammad2026smeftqwen,author = {Ahmed Hammad and Veronica Sanz},title = {SMEFT-Qwen-7B: A Domain-Adapted Language Model for Standard Model Effective Field Theory},year = {2026},howpublished = {\url{https://huggingface.co/ahammad115566/smeft-qwen-7b}}}
Model provider
ahammad115566
Model tree
Base
Qwen/Qwen2.5-7B-Instruct
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information