cpral

nex-n2-pro-mix-2

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

~3.7BPW custom optimized EXL3 quant of Nex-N2-Pro 397B.

markdown
-- A perplexity:  3.27230039
 -- B perplexity:  3.29155778
 -- A label in top-K:
      K = 1: 0.7132
      K = 2: 0.8111
      K = 3: 0.8533
      K = 4: 0.8766
      K = 5: 0.8925
 -- B label in top-K:
      K = 1: 0.7115
      K = 2: 0.8104
      K = 3: 0.8528
      K = 4: 0.8760
      K = 5: 0.8918
 -- Top-K agreement, A vs B:
      K = 1: 0.9567
      K = 2: 0.8250
      K = 3: 0.6535
      K = 4: 0.4868
      K = 5: 0.3470
 -- KL divergence (A, B):  0.02210134
 -- KL divergence (B, A):  0.02157226