achuthc1298
qwen_llm_scs
Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Model details
- Architecture:
Qwen3_5ForConditionalGeneration(model_type: qwen3_5) - Base model:
Qwen/Qwen3.6-27B(full VLM) - Adaptation: LoRA
r=16,alpha=32, dropout0.05, continued pre-training - LoRA targets:
q_proj, k_proj, v_proj, o_proj, out_proj, gate_proj, up_proj, down_proj(language layers only — vision tower not touched) - Precision: BF16 (base FP8 dequantized at load time, then LoRA merged)
- Size: ~51 GB, 12 safetensors shards
- Domain: spinal cord stimulation clinical and engineering literature
Usage
python
import torchfrom transformers import AutoModelForImageTextToText, AutoProcessorrepo = "achuthc1298/qwen_llm_scs"processor = AutoProcessor.from_pretrained(repo, trust_remote_code=True)model = AutoModelForImageTextToText.from_pretrained(repo,dtype=torch.bfloat16,device_map="auto",trust_remote_code=True,attn_implementation="sdpa",)model.eval()# Text-onlymessages = [{"role": "user", "content": [{"type": "text", "text": "Summarize the principle of high-frequency SCS."}]}]inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device)out = model.generate(**inputs, max_new_tokens=400, do_sample=False)print(processor.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))# Vision (figure from a paper)from PIL import Imageimg = Image.open("figure.png").convert("RGB")messages = [{"role": "user", "content": [{"type": "image", "image": img},{"type": "text", "text": "Describe this figure."},]}]inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device)out = model.generate(**inputs, max_new_tokens=400, do_sample=False)print(processor.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Hardware
Tested on 2× RTX A6000 (48 GB each) with device_map="auto" and per-GPU memory limits of 44 GiB. Total VRAM at inference ≈ 57 GB in BF16.
Notes
- The vision tower (
model.visual.*) is identical to the base model — only the language layers received SCS-domain LoRA updates. - Loading uses the native
qwen3_5integration in moderntransformers; no custom remote code is bundled. - The chat template is the standard Qwen3-VL template.
License
Inherits the Qwen license of the base model.
Model provider
achuthc1298
Model tree
Base
Qwen/Qwen3.6-27B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information