MapleRhythm/asa-arknightstoryagent-4b-lora API & Inference Endpoint

Runtime

Recommended runtime repository:

text
https://github.com/MapleRhythm/ASA-ArknightStoryAgent

Download into the release tree:

bash
huggingface-cli download MapleRhythm/asa-arknightstoryagent-4b-lora \
  --local-dir model/lora/asa-arknightstoryagent-4b-lora

Run with the GPU release config:

bash
bash scripts/run_gpu_reranker_qwen35_4b.sh --answer-only "岁兽是什么？"

Current runtime defaults:

Base model path: model/qwen3.5-4b
LoRA path: model/lora/asa-arknightstoryagent-4b-lora
Context size: 10000
Max generation tokens: 1536
answer_grounding_mode: quote
conclusion_prompt_mode: minimal
Web context: disabled by default

The adapter was trained from an internal merged Qwen3.5-4B checkpoint. The release runtime is the supported way to load it; direct AutoPeftModel usage may require adapting local base-model paths.

Training

Method: LoRA preference tuning with LLaMA-Factory / PEFT.

Key hyperparameters:

LoRA rank: 8
LoRA alpha: 16
LoRA dropout: 0.05
Learning rate: 8e-7
Epochs: 2
Scheduler: cosine
Effective batch size: 4

Training objective focused on ASA runtime failures: grounded action JSON stability, avoiding over-abstain when evidence is sufficient, and improving answer behavior under the current RAG chain.

Evaluation Snapshot

Training-time eval metrics:

eval loss: 0.5542
rewards/chosen: 0.0427
rewards/rejected: 0.0250
rewards/margins: 0.0177
KL: 195.9924

Pipeline evaluation before the runtime truncation-recovery patch:

eval50: 50 questions, 3 JSON errors, 13 abstain-like answers, 34 direct answers.
hard10: 10 questions, 1 JSON error, 3 abstain-like answers, 6 direct answers.

Runtime truncation-recovery regression after the patch:

4 truncation-prone questions, 0 JSON errors.

The full eval50 + hard10 suite should be rerun after each runtime or model change before treating this as a production-quality release.

Limitations

The model should not be used without retrieval evidence. It can hallucinate or over-infer from weak evidence.
Low-confidence retrieval cases, especially fusion_score=0 or missing dense/sparse/MiniRAG/evidence-chain scores, should be handled conservatively by the application.
Subjective questions such as "what bad things did X do" need framing by viewpoint; the model may otherwise mix factual actions with moral judgments.
This adapter does not include the base model, game text, retrieval indexes, or reranker weights.

License And Data Notes

This adapter is released under other because downstream use depends on the base model license and on the rights around the source story corpus used to build the retrieval system. The adapter repository does not include raw game story text or prebuilt story indexes.

Runtime

Recommended runtime repository:

text
https://github.com/MapleRhythm/ASA-ArknightStoryAgent

Download into the release tree:

bash
huggingface-cli download MapleRhythm/asa-arknightstoryagent-4b-lora \
  --local-dir model/lora/asa-arknightstoryagent-4b-lora

Run with the GPU release config:

bash
bash scripts/run_gpu_reranker_qwen35_4b.sh --answer-only "岁兽是什么？"

Current runtime defaults:

Base model path: model/qwen3.5-4b
LoRA path: model/lora/asa-arknightstoryagent-4b-lora
Context size: 10000
Max generation tokens: 1536
answer_grounding_mode: quote
conclusion_prompt_mode: minimal
Web context: disabled by default

The adapter was trained from an internal merged Qwen3.5-4B checkpoint. The release runtime is the supported way to load it; direct AutoPeftModel usage may require adapting local base-model paths.

Training

Method: LoRA preference tuning with LLaMA-Factory / PEFT.

Key hyperparameters:

LoRA rank: 8
LoRA alpha: 16
LoRA dropout: 0.05
Learning rate: 8e-7
Epochs: 2
Scheduler: cosine
Effective batch size: 4

Training objective focused on ASA runtime failures: grounded action JSON stability, avoiding over-abstain when evidence is sufficient, and improving answer behavior under the current RAG chain.

Evaluation Snapshot

Training-time eval metrics:

eval loss: 0.5542
rewards/chosen: 0.0427
rewards/rejected: 0.0250
rewards/margins: 0.0177
KL: 195.9924

Pipeline evaluation before the runtime truncation-recovery patch:

eval50: 50 questions, 3 JSON errors, 13 abstain-like answers, 34 direct answers.
hard10: 10 questions, 1 JSON error, 3 abstain-like answers, 6 direct answers.

Runtime truncation-recovery regression after the patch:

4 truncation-prone questions, 0 JSON errors.

The full eval50 + hard10 suite should be rerun after each runtime or model change before treating this as a production-quality release.

Limitations

The model should not be used without retrieval evidence. It can hallucinate or over-infer from weak evidence.
Low-confidence retrieval cases, especially fusion_score=0 or missing dense/sparse/MiniRAG/evidence-chain scores, should be handled conservatively by the application.
Subjective questions such as "what bad things did X do" need framing by viewpoint; the model may otherwise mix factual actions with moral judgments.
This adapter does not include the base model, game text, retrieval indexes, or reranker weights.

asa-arknightstoryagent-4b-lora

Get help setting up a custom Dedicated Endpoints.

README

Runtime

Training

Evaluation Snapshot

Limitations

License And Data Notes

Explore FriendliAI today

README

Runtime

Training

Evaluation Snapshot

Limitations

License And Data Notes