gyung

gyung

qwen35-9b-ko-legal-rag-transition-lora-20260620-v3

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: other

Scope

  • Artifact type: PEFT LoRA adapter
  • This is not a standalone legal-advice model.
  • Use it to retrieve, rank, or prepare evidence for a separate answer model.

Basic Use

Install:

bash

pip install -U "huggingface_hub[cli]" peft transformers

Download:

bash

hf download gyung/qwen35-9b-ko-legal-rag-transition-lora-20260620-v3 --local-dir qwen35-9b-ko-legal-rag-transition-lora-20260620-v3

For LoRA adapters, load with the matching base model:

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_id = "Qwen/Qwen3.5-9B"
adapter_id = "gyung/qwen35-9b-ko-legal-rag-transition-lora-20260620-v3"
tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_id,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_id)

For vLLM LoRA serving:

bash

python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3.5-9B \
--enable-lora \
--lora-modules qwen35-9b-ko-legal-rag-transition-lora-20260620-v3=./qwen35-9b-ko-legal-rag-transition-lora-20260620-v3 \
--served-model-name qwen35-9b-ko-legal-rag-transition-lora-20260620-v3 \
--max-model-len 12288
  • Main code: gyung/ko-law-retriever
  • Large data/eval artifacts: gyung/ko-law-retriever-artifacts-20260622

Model provider

gyung

gyung

Model tree

Base

Qwen/Qwen3.5-9B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today