zuiho-kai
jianghan-runtime-v5-lora
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Usage
bash
git clone https://github.com/zuiho-kai/jianghan-roleplay-data-pipeline.gitcd jianghan-roleplay-data-pipelinepython scripts/jianghan_runtime_chat.py \--model Qwen/Qwen3.5-4B \--adapter zuiho-kai/jianghan-runtime-v5-lora \--profile role \--stage 第三阶段 \--prompt 猫灯们把简单账目做成了艺术展,你要处理这件事。 \--hide-stage-prefix \--retry 2
Current promoted stack:
text
Qwen/Qwen3.5-4B + v5 checkpoint-150 + runtime_rag_v1 + phase-hidden + audit/retry
Quick online deploy
For model + runtime RAG HTTP serving, use the GitHub quick deploy guide:
https://github.com/zuiho-kai/jianghan-roleplay-data-pipeline/blob/main/docs/QUICK_DEPLOY.md
The service exposes GET /health, POST /rag, and POST /chat.
Runtime RAG files
The adapter does not contain the RAG index. Runtime RAG code and the minimal deployable index live in the GitHub repo:
- GitHub runtime repo: https://github.com/zuiho-kai/jianghan-roleplay-data-pipeline
scripts/jianghan_runtime_rag.pydata/world/worldbook/worldbook_knowledge_index_v1.jsonldata/runtime/jianghan_phase_context_v1.jsonldata/runtime/jianghan_stage3_runtime_policy_v1.md
RAG-only smoke test:
bash
python scripts/jianghan_runtime_rag.py \--prompt "猫灯和奥维利亚是什么关系?" \--stage 第三阶段 \--profile fact
Model provider
zuiho-kai
Model tree
Base
Qwen/Qwen3.5-4B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information