AvoCahDoe

llava-1.5-13b-rlmpq-balanced

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: other

Model	avg_bits	group	MMMU	MMBench	ScienceQA	Avg	ΔMMMU	ΔMMBench	ΔScienceQA	ΔAvg
FP16 (baseline)	16.0	llama13b_aggressive	35.33	63.78	71.24	56.78	0.0	0.0	0.0	0.0
INT4 (bnb NF4)	4.0	llama13b_aggressive	35.56	61.76	71.15	56.16	0.23	-2.02	-0.09	-0.62
RL-MPQ Aggressive	3.75	llama13b_aggressive	34.56	63.0	71.1	56.22	-0.77	-0.78	-0.14	-0.56