Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherBenchmark table (group comparison)
| Model | avg_bits | group | MMMU | MMBench | ScienceQA | Avg | ΔMMMU | ΔMMBench | ΔScienceQA | ΔAvg |
|---|---|---|---|---|---|---|---|---|---|---|
| FP16 (baseline) | 16.0 | llama13b_aggressive | 35.33 | 63.78 | 71.24 | 56.78 | 0.0 | 0.0 | 0.0 | 0.0 |
| INT4 (bnb NF4) | 4.0 | llama13b_aggressive | 35.56 | 61.76 | 71.15 | 56.16 | 0.23 | -2.02 | -0.09 | -0.62 |
| RL-MPQ Aggressive | 3.75 | llama13b_aggressive | 34.56 | 63.0 | 71.1 | 56.22 | -0.77 | -0.78 | -0.14 | -0.56 |
Artifacts in this repo
artifacts/figures/— plots for the model groupartifacts/benchmark_table.csv— FP16 / INT4 / RL-MPQ accuracieseval_results.json— structured eval metadata (if present)
Model provider
AvoCahDoe
Model tree
Base
llava-hf/llava-1.5-13b-hf
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information