Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherIntended Runtime Profile
json
{"adapter_scale": 1.15,"temperature": 0.72,"top_p": 0.86,"top_k": 60,"repetition_penalty": 1.08,"no_repeat_ngram_size": 4,"max_new_tokens": 260,"min_new_tokens": 0,"max_context_tokens": 3072,"primer": "hard","user_wrapper": "board-hard","assistant_prefix": ""}
The behavior depends on the runtime wrapper in chat_lora.py. If you load only the adapter in a generic chat UI, it may become softer or more assistant-like.
Local Python Usage
powershell
cd F:\mistral-board-trainingpowershell -ExecutionPolicy Bypass -File .\scripts\start_chat_lora_hard.ps1
Minimal PEFT Example
python
import torchfrom peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigbase = "mistralai/Mistral-7B-Instruct-v0.3"adapter = "YOUR_USERNAME/mistral-board-hard-character-lora"quant = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_quant_type="nf4",bnb_4bit_use_double_quant=True,bnb_4bit_compute_dtype=torch.float16,)tokenizer = AutoTokenizer.from_pretrained(base)model = AutoModelForCausalLM.from_pretrained(base,quantization_config=quant,torch_dtype=torch.float16,device_map="auto",)model = PeftModel.from_pretrained(model, adapter)model.eval()
Prompting Note
For best behavior, prompt it like a thread post rather than like a polite assistant. The local script wraps user input with a hard character instruction and uses the saved sampling settings.
Jan / LM Studio
Jan and LM Studio usually work best with GGUF. See JAN_LMSTUDIO_GGUF_GUIDE_RU.md and export_hard_character_gguf.ps1.
Files
adapter_model.safetensors- LoRA adapter weightsadapter_config.json- PEFT configHARD_CHARACTER_SETTINGS.json- saved runtime settingschat_lora.py- local chat runner with hard wrapper/primerREADME_HARD_CHARACTER_RU.md- Russian local usage notesJAN_LMSTUDIO_GGUF_GUIDE_RU.md- Russian Jan/LM Studio guide
Model provider
XLEB985
Model tree
Base
mistralai/Mistral-7B-Instruct-v0.3
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information