lhordking

Shadow-coder-v2-LoRA

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Details

Table
Property	Value
Base Model	Qwen2.5-Coder-3B-Instruct
Method	LoRA (r=16, alpha=32)
GPU	AMD Radeon RX 9060 XT (ROCm)
Framework	Unsloth + HuggingFace TRL
Epochs	3

Training Data

Custom fullstack dataset (750 examples)
CodeAlpaca-20k (5,000 examples)
Magicoder-OSS-Instruct-75K (3,000 examples)
Total: ~8,750 examples

Languages & Frameworks

Python, JavaScript, TypeScript, Rust, Go, SQL, PHP, FastAPI, React, Vue, NestJS, Express, PostgreSQL, Docker

Usage

python
from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = "lhordking/Shadow-coder-v2-LoRA",
    max_seq_length = 2048,
    dtype          = torch.bfloat16,
    load_in_4bit   = False,
)
FastLanguageModel.for_inference(model)

prompt = (
    "### Instruction:\n"
    "Build a FastAPI endpoint for user authentication with JWT\n\n"
    "### Context:\n"
    "Use PostgreSQL and return access + refresh tokens\n\n"
    "### Response:\n"
)

inputs  = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs,
    max_new_tokens     = 512,
    temperature        = 0.7,
    repetition_penalty = 1.3,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))