Observability is an integral part of DevOps. To support this, Friendli Container exports internal metrics in a Prometheus text format.

By default, metrics are served at http://localhost:8281/metrics. You can configure the port number using the command line option --metrics-port.

Supported Metrics

Counters

Counters are cumulative metrics whose values monotonically increase. They are often used in combination with Prometheus function rate() for calculating the throughput.

Metric NameDescription
friendli_requests_totalCumulative number of requests received
friendli_responses_totalCumulative number of responses sent
friendli_items_totalCumulative number of items requested
friendli_failure_by_cancelCumulative number of failed requests due to cancellation
friendli_failure_by_timeoutCumulative number of failed requests due to timeout
friendli_failure_by_nan_errorCumulative number of failed requests due to NaN error
friendli_failure_by_rejectCumulative number of failed requests due to rejection

One inference request may generate multiple results with the n field in the request body. Upon receiving such request, friendli_requests_total is increased by 1 and friendli_items_total is increased by n.

Gauges

Gauges are numerical values that can go up and down to represent the current value.

Metric NameDescription
friendli_current_requestsCurrent number of requests in the engine (either assigned or waiting)
friendli_current_itemsCurrent number of items in the engine (either assigned or waiting)
friendli_current_assigned_itemsCurrent number of items actively processed by the engine
friendli_current_waiting_itemsCurrent number number of items waiting in the internal queue

Histograms

Histograms are used to track the distribution of variables over time.

HistogramMetric NameDescription
Friendli TCache hit ratio (0≤value≤1)friendli_tcache_hit_ratio_bucketBucketized number of histogram samples for TCache hit ratio, with le label
friendli_tcache_hit_ratio_countTotal number of histogram samples for TCache hit ratio
friendli_tcache_hit_ratio_sumSum of histogram sample values for TCache hit ratio
The length of input tokens (Experimental metric)friendli_input_lengths_bucketBucketized number of histogram samples for length of input tokens, with le label
friendli_input_lengths_countTotal number of histogram samples for length of input tokens
friendli_input_lengths_sumSum of histogram sample values for length of input tokens
The length of output tokens (Experimental metric)friendli_output_lengths_bucketBucketized number of histogram samples for length of output tokens, with le label
friendli_output_lengths_countTotal number of histogram samples for length of output tokens
friendli_output_lengths_sumSum of histogram sample values for length of output tokens

For visualizing histograms using Grafana, How to visualize Prometheus histograms in Grafana provides useful tips.

Quantiles

Quantiles are used to show the current p50(median), p90, and p99 percentiles of variables.

QuantilesMetric NameDescription
Request completion latency (in nanoseconds)friendli_requests_latenciesPercentile value for request completion latency (quantile label is either 0.5, 0.9, or 0.99)
friendli_requests_latencies_countTotal number of samples for request completion latency
friendli_requests_latencies_sumSum of sample values for request completion latency
Time to first token (TTFT) (in nanoseconds)friendli_requests_ttftPercentile value for time to first token (TTFT) (quantile label is either 0.5, 0.9, or 0.99)
friendli_requests_ttft_countTotal number of samples for time to first token (TTFT)
friendli_requests_ttft_sumSum of sample values for time to first token (TTFT)
Request queueing delay (in nanoseconds)friendli_requests_queueing_delaysPercentile value for queueing delay (quantile label is either 0.5, 0.9, or 0.99)
friendli_requests_queueing_delays_countTotal number of samples for queueing delay
friendli_requests_queueing_delays_sumSum of sample values for queueing delay

Info

The following information metric always has a value of 1. The metric labels contain useful information in text.

Metric NameLabelDescription
friendli_engine_versionversionEngine version

Grafana Dashboard Template

You can import the dashboard templates to your Grafana instance. The Grafana instance must be connected to a Prometheus instance (or a Prometheus-compatible data source) which is configured to scrape metrics from Friendli Container processes.

The dashboard template works with Grafana v8.0.0 or later versions. We recommend using Grafana v10.0.0 or later for the best experience.