VetalValera/acestep-5Hz-lm-4B API & Inference Endpoint

Model Details

🚀 ACE-Step v1.5 is a highly efficient open-source music foundation model designed to bring commercial-grade music generation to consumer hardware.

Key Features

💰 Commercial-Ready: Unlike many models trained on ambiguous datasets, ACE-Step v1.5 is designed for creators. You can strictly use the generated music for commercial purposes.
📚 Safe & Robust Training Data: The model is trained on a massive, legally compliant dataset consisting of:
- Licensed Data: Professionally licensed music tracks.
- Royalty-Free / No-Copyright Data: A vast collection of public domain and royalty-free music.
- Synthetic Data: High-quality audio generated via advanced MIDI-to-Audio conversion.
⚡ Extreme Speed: Generates a full song in under 2 seconds on an A100 and under 10 seconds on an RTX 3090.
🖥️ Consumer Hardware Friendly: Runs locally with less than 4GB of VRAM.

Technical Capabilities

🌉 At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). ⚡ Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model's internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. 🎚️

🔮 Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages. This paves the way for powerful tools that seamlessly integrate into the creative workflows of music artists, producers, and content creators. 🎸

Developed by: [ACE-STEP]
Model type: [Text2Music]
Language(s): [50+ languages]
License: [MIT]

Evaluation

🏗️ Architecture

🦁 Model Zoo

DiT Models

DiT Model	Pre-Training	SFT	RL	CFG	Step	Refer audio	Text2Music	Cover	Repaint	Extract	Lego	Complete	Quality	Diversity	Fine-Tunability	Hugging Face
`acestep-v15-base`	✅	❌	❌	✅	50	✅	✅	✅	✅	✅	✅	✅	Medium	High	Easy	Link
`acestep-v15-sft`	✅	✅	❌	✅	50	✅	✅	✅	✅	❌	❌	❌	High	Medium	Easy	Link
`acestep-v15-turbo`	✅	✅	❌	❌	8	✅	✅	✅	✅	❌	❌	❌	Very High	Medium	Medium	Link
`acestep-v15-turbo-rl`	✅	✅	✅	❌	8	✅	✅	✅	✅	❌	❌	❌	Very High	Medium	Medium	To be released

LM Models

LM Model	Pretrain from	Pre-Training	SFT	RL	CoT metas	Query rewrite	Audio Understanding	Composition Capability	Copy Melody	Hugging Face
`acestep-5Hz-lm-0.6B`	Qwen3-0.6B	✅	✅	✅	✅	✅	Medium	Medium	Weak	✅
`acestep-5Hz-lm-1.7B`	Qwen3-1.7B	✅	✅	✅	✅	✅	Medium	Medium	Medium	✅
`acestep-5Hz-lm-4B`	Qwen3-4B	✅	✅	✅	✅	✅	Strong	Strong	Strong	✅

🙏 Acknowledgements

This project is co-led by ACE Studio and StepFun.

📖 Citation

If you find this project useful for your research, please consider citing:

BibTeX
@misc{gong2026acestep,
	title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
	author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, 
	howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
	year={2026},
	note={GitHub repository}
}

acestep-5Hz-lm-4B

Get help setting up a custom Dedicated Endpoints.

README