Histogram | Metric Name | Description |
---|---|---|
Friendli TCache hit ratio (0β€valueβ€1) | friendli\_tcache\_hit\_ratio\_bucket | Bucketized number of histogram samples for TCache hit ratio, with le label |
friendli\_tcache\_hit\_ratio\_count | Total number of histogram samples for TCache hit ratio | |
friendli\_tcache\_hit\_ratio\_sum | Sum of histogram sample values for TCache hit ratio | |
The length of input tokens (Experimental metric) | friendli\_input\_lengths\_bucket | Bucketized number of histogram samples for length of input tokens, with le label |
friendli\_input\_lengths\_count | Total number of histogram samples for length of input tokens | |
friendli\_input\_lengths\_sum | Sum of histogram sample values for length of input tokens | |
The length of output tokens (Experimental metric) | friendli\_output\_lengths\_bucket | Bucketized number of histogram samples for length of output tokens, with le label |
friendli\_output\_lengths\_count | Total number of histogram samples for length of output tokens | |
friendli\_output\_lengths\_sum | Sum of histogram sample values for length of output tokens |
Quantiles | Metric Name | Description |
---|---|---|
Request completion latency (in nanoseconds) | friendli\_requests\_latencies | Percentile value for request completion latency (quantile label is either 0.5 , 0.9 , or 0.99 ) |
friendli\_requests\_latencies\_count | Total number of samples for request completion latency | |
friendli\_requests\_latencies\_sum | Sum of sample values for request completion latency | |
Time to first token (TTFT) (in nanoseconds) | friendli\_requests\_ttft | Percentile value for time to first token (TTFT) (quantile label is either 0.5 , 0.9 , or 0.99 ) |
friendli\_requests\_ttft\_count | Total number of samples for time to first token (TTFT) | |
friendli\_requests\_ttft\_sum | Sum of sample values for time to first token (TTFT) | |
Request queueing delay (in nanoseconds) | friendli\_requests\_queueing\_delays | Percentile value for queueing delay (quantile label is either 0.5 , 0.9 , or 0.99 ) |
friendli\_requests\_queueing\_delays\_count | Total number of samples for queueing delay | |
friendli\_requests\_queueing\_delays\_sum | Sum of sample values for queueing delay |
{caption}
}{caption}
}{caption}
}{caption}
}{caption}
}{caption}
}{caption}
}{caption}
}{caption}
}{caption}
}{caption}
}{caption}
}{caption}
}{caption}
}{caption}
}