Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Use with mlx
bash
pip install mlx-lm
python
from mlx_lm import load, generatemodel, tokenizer = load("FlatFootInternational/Darwin-9B-NEG-mlx-fp16")prompt="hello"if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:messages = [{"role": "user", "content": prompt}]prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)response = generate(model, tokenizer, prompt=prompt, verbose=True)
Model provider
FlatFootInternational
Model tree
Base
FINAL-Bench/Darwin-9B-NEG
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information