zuiho-kai

jianghan-runtime-v5-lora

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Usage

bash

git clone https://github.com/zuiho-kai/jianghan-roleplay-data-pipeline.git
cd jianghan-roleplay-data-pipeline
python scripts/jianghan_runtime_chat.py \
--model Qwen/Qwen3.5-4B \
--adapter zuiho-kai/jianghan-runtime-v5-lora \
--profile role \
--stage 第三阶段 \
--prompt 猫灯们把简单账目做成了艺术展,你要处理这件事。 \
--hide-stage-prefix \
--retry 2

Current promoted stack:

text

Qwen/Qwen3.5-4B + v5 checkpoint-150 + runtime_rag_v1 + phase-hidden + audit/retry

Quick online deploy

For model + runtime RAG HTTP serving, use the GitHub quick deploy guide:

https://github.com/zuiho-kai/jianghan-roleplay-data-pipeline/blob/main/docs/QUICK_DEPLOY.md

The service exposes GET /health, POST /rag, and POST /chat.

Runtime RAG files

The adapter does not contain the RAG index. Runtime RAG code and the minimal deployable index live in the GitHub repo:

RAG-only smoke test:

bash

python scripts/jianghan_runtime_rag.py \
--prompt "猫灯和奥维利亚是什么关系?" \
--stage 第三阶段 \
--profile fact

Model provider

zuiho-kai

Model tree

Base

Qwen/Qwen3.5-4B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today