| Histogram | Metric Name | Description |
|---|---|---|
| Friendli TCache hit ratio (0≤value≤1) | friendli\_tcache\_hit\_ratio\_bucket | Bucketized number of histogram samples for TCache hit ratio, with le label |
| friendli\_tcache\_hit\_ratio\_count | Total number of histogram samples for TCache hit ratio | |
| friendli\_tcache\_hit\_ratio\_sum | Sum of histogram sample values for TCache hit ratio | |
| The length of input tokens (Experimental metric) | friendli\_input\_lengths\_bucket | Bucketized number of histogram samples for length of input tokens, with le label |
| friendli\_input\_lengths\_count | Total number of histogram samples for length of input tokens | |
| friendli\_input\_lengths\_sum | Sum of histogram sample values for length of input tokens | |
| The length of output tokens (Experimental metric) | friendli\_output\_lengths\_bucket | Bucketized number of histogram samples for length of output tokens, with le label |
| friendli\_output\_lengths\_count | Total number of histogram samples for length of output tokens | |
| friendli\_output\_lengths\_sum | Sum of histogram sample values for length of output tokens |
| Quantiles | Metric Name | Description |
|---|---|---|
| Request completion latency (in nanoseconds) | friendli\_requests\_latencies | Percentile value for request completion latency (quantile label is either 0.5, 0.9, or 0.99) |
| friendli\_requests\_latencies\_count | Total number of samples for request completion latency | |
| friendli\_requests\_latencies\_sum | Sum of sample values for request completion latency | |
| Time to first token (TTFT) (in nanoseconds) | friendli\_requests\_ttft | Percentile value for time to first token (TTFT) (quantile label is either 0.5, 0.9, or 0.99) |
| friendli\_requests\_ttft\_count | Total number of samples for time to first token (TTFT) | |
| friendli\_requests\_ttft\_sum | Sum of sample values for time to first token (TTFT) | |
| Request queueing delay (in nanoseconds) | friendli\_requests\_queueing\_delays | Percentile value for queueing delay (quantile label is either 0.5, 0.9, or 0.99) |
| friendli\_requests\_queueing\_delays\_count | Total number of samples for queueing delay | |
| friendli\_requests\_queueing\_delays\_sum | Sum of sample values for queueing delay |