Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: other

Benchmark table (group comparison)

Modelavg_bitsgroupMMMUMMBenchScienceQAAvgΔMMMUΔMMBenchΔScienceQAΔAvg
FP16 (baseline)16.0llama13b_aggressive35.3363.7871.2456.780.00.00.00.0
INT4 (bnb NF4)4.0llama13b_aggressive35.5661.7671.1556.160.23-2.02-0.09-0.62
RL-MPQ Aggressive3.75llama13b_aggressive34.5663.071.156.22-0.77-0.78-0.14-0.56

Artifacts in this repo

  • artifacts/figures/ — plots for the model group
  • artifacts/benchmark_table.csv — FP16 / INT4 / RL-MPQ accuracies
  • eval_results.json — structured eval metadata (if present)

Model provider

AvoCahDoe

Model tree

Base

llava-hf/llava-1.5-13b-hf

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today