Observability is an integral part of DevOps. To support this, Friendli Container exports internal metrics in a Prometheus text format. By default, metrics are served atDocumentation Index
Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
http://localhost:8281/metrics. You can configure the port number using the command line option --metrics-port.
Supported metrics
Counters
Counters are cumulative metrics whose values monotonically increase. They are often used in combination with Prometheus function rate() for calculating the throughput.| Metric Name | Description |
|---|---|
| friendli_requests_total | Cumulative number of requests received |
| friendli_responses_total | Cumulative number of responses sent |
| friendli_items_total | Cumulative number of items requested |
| friendli_failure_by_cancel | Cumulative number of failed requests due to cancellation |
| friendli_failure_by_timeout | Cumulative number of failed requests due to timeout |
| friendli_failure_by_nan_error | Cumulative number of failed requests due to NaN error |
| friendli_failure_by_reject | Cumulative number of failed requests due to rejection |
One inference request may generate multiple results with the
n field in the request body.
Upon receiving such request, friendli_requests_total is increased by 1 and friendli_items_total is increased by n.Gauges
Gauges are numerical values that can go up and down to represent the current value.| Metric Name | Description |
|---|---|
| friendli_current_requests | Current number of requests in the engine (either assigned or waiting) |
| friendli_current_items | Current number of items in the engine (either assigned or waiting) |
| friendli_current_assigned_items | Current number of items actively processed by the engine |
| friendli_current_waiting_items | Current number of items waiting in the internal queue |
Histograms
Histograms are used to track the distribution of variables over time.| Histogram | Metric Name | Description |
|---|---|---|
| Friendli TCache hit ratio (0≤value≤1) | friendli_tcache_hit_ratio_bucket | Bucketized number of histogram samples for TCache hit ratio, with le label |
| friendli_tcache_hit_ratio_count | Total number of histogram samples for TCache hit ratio | |
| friendli_tcache_hit_ratio_sum | Sum of histogram sample values for TCache hit ratio | |
| The length of input tokens (Experimental metric) | friendli_input_lengths_bucket | Bucketized number of histogram samples for length of input tokens, with le label |
| friendli_input_lengths_count | Total number of histogram samples for length of input tokens | |
| friendli_input_lengths_sum | Sum of histogram sample values for length of input tokens | |
| The length of output tokens (Experimental metric) | friendli_output_lengths_bucket | Bucketized number of histogram samples for length of output tokens, with le label |
| friendli_output_lengths_count | Total number of histogram samples for length of output tokens | |
| friendli_output_lengths_sum | Sum of histogram sample values for length of output tokens |
For visualizing histograms using Grafana, How to visualize Prometheus histograms in Grafana provides useful tips.
Quantiles
Quantiles are used to show the current p50(median), p90, and p99 percentiles of variables.| Quantiles | Metric Name | Description |
|---|---|---|
| Request completion latency (in nanoseconds) | friendli_requests_latencies | Percentile value for request completion latency (quantile label is either 0.5, 0.9, or 0.99) |
| friendli_requests_latencies_count | Total number of samples for request completion latency | |
| friendli_requests_latencies_sum | Sum of sample values for request completion latency | |
| Time to first token (TTFT) (in nanoseconds) | friendli_requests_ttft | Percentile value for time to first token (TTFT) (quantile label is either 0.5, 0.9, or 0.99) |
| friendli_requests_ttft_count | Total number of samples for time to first token (TTFT) | |
| friendli_requests_ttft_sum | Sum of sample values for time to first token (TTFT) | |
| Request queueing delay (in nanoseconds) | friendli_requests_queueing_delays | Percentile value for queueing delay (quantile label is either 0.5, 0.9, or 0.99) |
| friendli_requests_queueing_delays_count | Total number of samples for queueing delay | |
| friendli_requests_queueing_delays_sum | Sum of sample values for queueing delay |
Info
The following information metric always has a value of 1. The metric labels contain useful information in text.| Metric Name | Label | Description |
|---|---|---|
| friendli_engine_version | version | Engine version |
Grafana dashboard template
You can import the dashboard templates to your Grafana instance. The Grafana instance must be connected to a Prometheus instance (or a Prometheus-compatible data source) that scrapes metrics from Friendli Container processes.The dashboard template works with Grafana v8.0.0 or later versions. We recommend using Grafana v10.0.0 or later for the best experience.