Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitUsage
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")model = PeftModel.from_pretrained(base_model, "JasonShiii/step-llm-llama3b-no_rag")tokenizer = AutoTokenizer.from_pretrained("JasonShiii/step-llm-llama3b-no_rag")
Or use the inference script from the GitHub repo:
bash
python generate_step.py \--ckpt_path JasonShiii/step-llm-llama3b-no_rag \--caption "A cylindrical bolt with a hexagonal head"
Note: this adapter was trained with the no-RAG prompt template, so do not pass --use_rag when using it. For RAG inference, use JasonShiii/step-llm-llama3b instead.
Training Details
| Parameter | Value |
|---|---|
| Base model | Llama-3.2-3B-Instruct |
| LoRA rank (r) | 16 |
| lora_alpha | 16 |
| Learning rate | 5e-5 |
| Batch size | 2 (x4 grad accum = effective 8) |
| max_seq_length | 16384 |
| Training data | ~20k STEP files, 0-500 entities |
| Training steps | 6300 |
| Prompt template | no-RAG (caption -> output, no retrieved example) |
Citation
bibtex
@article{shi2026step,title={STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models},author={Shi, Xiangyu and Ding, Junyang and Zhao, Xu and Zhan, Sinong and Mohapatra, Payaland Quispe, Daniel and Welbeck, Kojo and Cao, Jian and Chen, Wei and Guo, Ping and others},journal={arXiv preprint arXiv:2601.12641},year={2026}}
Model provider
JasonShiii
Model tree
Base
meta-llama/Llama-3.2-3B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information