Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherRuntime
Recommended runtime repository:
text
https://github.com/MapleRhythm/ASA-ArknightStoryAgent
Download into the release tree:
bash
huggingface-cli download MapleRhythm/asa-arknightstoryagent-4b-lora \--local-dir model/lora/asa-arknightstoryagent-4b-lora
Run with the GPU release config:
bash
bash scripts/run_gpu_reranker_qwen35_4b.sh --answer-only "岁兽是什么?"
Current runtime defaults:
- Base model path:
model/qwen3.5-4b - LoRA path:
model/lora/asa-arknightstoryagent-4b-lora - Context size:
10000 - Max generation tokens:
1536 answer_grounding_mode:quoteconclusion_prompt_mode:minimal- Web context: disabled by default
The adapter was trained from an internal merged Qwen3.5-4B checkpoint. The release runtime is the supported way to load it; direct AutoPeftModel usage may require adapting local base-model paths.
Training
Method: LoRA preference tuning with LLaMA-Factory / PEFT.
Key hyperparameters:
- LoRA rank:
8 - LoRA alpha:
16 - LoRA dropout:
0.05 - Learning rate:
8e-7 - Epochs:
2 - Scheduler: cosine
- Effective batch size:
4
Training objective focused on ASA runtime failures: grounded action JSON stability, avoiding over-abstain when evidence is sufficient, and improving answer behavior under the current RAG chain.
Evaluation Snapshot
Training-time eval metrics:
- eval loss:
0.5542 - rewards/chosen:
0.0427 - rewards/rejected:
0.0250 - rewards/margins:
0.0177 - KL:
195.9924
Pipeline evaluation before the runtime truncation-recovery patch:
eval50: 50 questions, 3 JSON errors, 13 abstain-like answers, 34 direct answers.hard10: 10 questions, 1 JSON error, 3 abstain-like answers, 6 direct answers.
Runtime truncation-recovery regression after the patch:
- 4 truncation-prone questions, 0 JSON errors.
The full eval50 + hard10 suite should be rerun after each runtime or model change before treating this as a production-quality release.
Limitations
- The model should not be used without retrieval evidence. It can hallucinate or over-infer from weak evidence.
- Low-confidence retrieval cases, especially
fusion_score=0or missing dense/sparse/MiniRAG/evidence-chain scores, should be handled conservatively by the application. - Subjective questions such as "what bad things did X do" need framing by viewpoint; the model may otherwise mix factual actions with moral judgments.
- This adapter does not include the base model, game text, retrieval indexes, or reranker weights.
License And Data Notes
This adapter is released under other because downstream use depends on the base model license and on the rights around the source story corpus used to build the retrieval system. The adapter repository does not include raw game story text or prebuilt story indexes.
Model provider
MapleRhythm
Model tree
Base
Qwen/Qwen3.5-4B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information