gyung
qwen35-9b-harness-local-search-lora-20260619-v1
Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherScope
- Artifact type: PEFT LoRA adapter
- This is not a standalone legal-advice model.
- Use it to retrieve, rank, or prepare evidence for a separate answer model.
Basic Use
Install:
bash
pip install -U "huggingface_hub[cli]" peft transformers
Download:
bash
hf download gyung/qwen35-9b-harness-local-search-lora-20260619-v1 --local-dir qwen35-9b-harness-local-search-lora-20260619-v1
For LoRA adapters, load with the matching base model:
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase_id = "Qwen/Qwen3.5-9B"adapter_id = "gyung/qwen35-9b-harness-local-search-lora-20260619-v1"tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(base_id,device_map="auto",torch_dtype="auto",trust_remote_code=True,)model = PeftModel.from_pretrained(model, adapter_id)
For vLLM LoRA serving:
bash
python -m vllm.entrypoints.openai.api_server \--model Qwen/Qwen3.5-9B \--enable-lora \--lora-modules qwen35-9b-harness-local-search-lora-20260619-v1=./qwen35-9b-harness-local-search-lora-20260619-v1 \--served-model-name qwen35-9b-harness-local-search-lora-20260619-v1 \--max-model-len 12288
Related Repositories
- Main code:
gyung/ko-law-retriever - Large data/eval artifacts:
gyung/ko-law-retriever-artifacts-20260622
Model provider
gyung
Model tree
Base
Qwen/Qwen3.5-9B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information