ericmao

linkd-dsl-qwen3-4b-lora

README

License: apache-2.0

Usage

Serve with vLLM (OpenAI-compatible):

bash
vllm serve Qwen/Qwen3-4B-Instruct-2507 \
  --enable-lora --lora-modules linkd-dsl=ericmao/linkd-dsl-qwen3-4b-lora \
  --max-model-len 2048 \
  --speculative-config '{"method":"ngram","num_speculative_tokens":8,"prompt_lookup_max":4,"prompt_lookup_min":2}'

Then call it with the exact production prompt (see the linkd-search repo, slm/common.py:SYSTEM_PROMPT), model="linkd-dsl", temperature=0. The response is a raw JSON Mongo filter; run it as collection.find(filter).limit(20).

A merged full-weights variant (no LoRA runtime needed) is published at ericmao/linkd-dsl-qwen3-4b.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

ericmao

Model Tree