> ## Documentation Index
> Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Observability for Friendli Container

> Observability is an integral part of DevOps. To support this, Friendli Container exports internal metrics in a Prometheus text format.

export const RoundedBorderBox = ({children, caption}) => <div className="rounded-border-box">
    {children}
    {caption && <p className="text-sm text-gray-700 dark:text-gray-400">{caption}</p>}
  </div>;

Observability is an integral part of DevOps. To support this, Friendli Container exports internal metrics in a [Prometheus](https://prometheus.io) text format.

By default, metrics are served at `http://localhost:8281/metrics`. You can configure the port number using the command line option `--metrics-port`.

## Supported Metrics

### Counters

Counters are cumulative metrics whose values monotonically increase.
They are often used in combination with Prometheus function [rate()](https://prometheus.io/docs/prometheus/latest/querying/functions/#rate) for calculating the throughput.

| Metric Name                       | Description                                              |
| --------------------------------- | -------------------------------------------------------- |
| friendli\_requests\_total         | Cumulative number of requests received                   |
| friendli\_responses\_total        | Cumulative number of responses sent                      |
| friendli\_items\_total            | Cumulative number of items requested                     |
| friendli\_failure\_by\_cancel     | Cumulative number of failed requests due to cancellation |
| friendli\_failure\_by\_timeout    | Cumulative number of failed requests due to timeout      |
| friendli\_failure\_by\_nan\_error | Cumulative number of failed requests due to NaN error    |
| friendli\_failure\_by\_reject     | Cumulative number of failed requests due to rejection    |

<Note>
  One inference request may generate multiple results with the `n` field in the request body.
  Upon receiving such a request, `friendli_requests_total` is increased by 1 and `friendli_items_total` is increased by `n`.
</Note>

### Gauges

Gauges are numerical values that can go up and down to represent the current value.

| Metric Name                        | Description                                                           |
| ---------------------------------- | --------------------------------------------------------------------- |
| friendli\_current\_requests        | Current number of requests in the engine (either assigned or waiting) |
| friendli\_current\_items           | Current number of items in the engine (either assigned or waiting)    |
| friendli\_current\_assigned\_items | Current number of items actively processed by the engine              |
| friendli\_current\_waiting\_items  | Current number of items waiting in the internal queue                 |

### Histograms

[Histograms](https://prometheus.io/docs/practices/histograms) are used to track the distribution of variables over time.

<table>
  <thead>
    <tr>
      <th>Histogram</th>
      <th>Metric Name</th>
      <th>Description</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td rowspan="3"><a href="https://friendli.ai/blog/friendli-tcache/">Friendli TCache</a> hit ratio (0≤value≤1)</td>
      <td>friendli\_tcache\_hit\_ratio\_bucket</td>
      <td>Bucketized number of histogram samples for TCache hit ratio, with <code>le</code> label</td>
    </tr>

    <tr>
      <td>friendli\_tcache\_hit\_ratio\_count</td>
      <td>Total number of histogram samples for TCache hit ratio</td>
    </tr>

    <tr>
      <td>friendli\_tcache\_hit\_ratio\_sum</td>
      <td>Sum of histogram sample values for TCache hit ratio</td>
    </tr>

    <tr>
      <td rowspan="3">The length of input tokens (Experimental metric)</td>
      <td>friendli\_input\_lengths\_bucket</td>
      <td>Bucketized number of histogram samples for length of input tokens, with <code>le</code> label</td>
    </tr>

    <tr>
      <td>friendli\_input\_lengths\_count</td>
      <td>Total number of histogram samples for length of input tokens</td>
    </tr>

    <tr>
      <td>friendli\_input\_lengths\_sum</td>
      <td>Sum of histogram sample values for length of input tokens</td>
    </tr>

    <tr>
      <td rowspan="3">The length of output tokens (Experimental metric)</td>
      <td>friendli\_output\_lengths\_bucket</td>
      <td>Bucketized number of histogram samples for length of output tokens, with <code>le</code> label</td>
    </tr>

    <tr>
      <td>friendli\_output\_lengths\_count</td>
      <td>Total number of histogram samples for length of output tokens</td>
    </tr>

    <tr>
      <td>friendli\_output\_lengths\_sum</td>
      <td>Sum of histogram sample values for length of output tokens</td>
    </tr>
  </tbody>
</table>

<Note>
  For visualizing histograms using Grafana, [How to visualize Prometheus histograms in Grafana](https://grafana.com/blog/2020/06/23/how-to-visualize-prometheus-histograms-in-grafana) provides useful tips.
</Note>

### Quantiles

Quantiles are used to show the current p50(median), p90, and p99 percentiles of variables.

<table>
  <thead>
    <tr>
      <th>Quantiles</th>
      <th>Metric Name</th>
      <th>Description</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td rowspan="3">Request completion latency (in nanoseconds)</td>
      <td>friendli\_requests\_latencies</td>
      <td>Percentile value for request completion latency (<code>quantile</code> label is either <code>0.5</code>, <code>0.9</code>, or <code>0.99</code>)</td>
    </tr>

    <tr>
      <td>friendli\_requests\_latencies\_count</td>
      <td>Total number of samples for request completion latency</td>
    </tr>

    <tr>
      <td>friendli\_requests\_latencies\_sum</td>
      <td>Sum of sample values for request completion latency</td>
    </tr>

    <tr>
      <td rowspan="3">Time to first token (TTFT) (in nanoseconds)</td>
      <td>friendli\_requests\_ttft</td>
      <td>Percentile value for time to first token (TTFT) (<code>quantile</code> label is either <code>0.5</code>, <code>0.9</code>, or <code>0.99</code>)</td>
    </tr>

    <tr>
      <td>friendli\_requests\_ttft\_count</td>
      <td>Total number of samples for time to first token (TTFT)</td>
    </tr>

    <tr>
      <td>friendli\_requests\_ttft\_sum</td>
      <td>Sum of sample values for time to first token (TTFT)</td>
    </tr>

    <tr>
      <td rowspan="3">Request queueing delay (in nanoseconds)</td>
      <td>friendli\_requests\_queueing\_delays</td>
      <td>Percentile value for queueing delay (<code>quantile</code> label is either <code>0.5</code>, <code>0.9</code>, or <code>0.99</code>)</td>
    </tr>

    <tr>
      <td>friendli\_requests\_queueing\_delays\_count</td>
      <td>Total number of samples for queueing delay</td>
    </tr>

    <tr>
      <td>friendli\_requests\_queueing\_delays\_sum</td>
      <td>Sum of sample values for queueing delay</td>
    </tr>
  </tbody>
</table>

### Info

The following information metric always has a value of 1. The metric labels contain useful information in text.

| Metric Name               | Label     | Description    |
| ------------------------- | --------- | -------------- |
| friendli\_engine\_version | `version` | Engine version |

## Grafana Dashboard Template

<RoundedBorderBox>
  <img alt="Grafana Dashboard" src="https://mintcdn.com/friendliai/SRK7vx0X1v_2rjkU/static/images/guides/container/grafana-template-dashboard-example.png?fit=max&auto=format&n=SRK7vx0X1v_2rjkU&q=85&s=16f13d2b171b076f6b9b7f37115ea63d" width="6016" height="3078" data-path="static/images/guides/container/grafana-template-dashboard-example.png" />
</RoundedBorderBox>

You can import [the dashboard templates](https://github.com/friendliai/container-resource/tree/main/grafana) to your Grafana instance.
The Grafana instance must be connected to a Prometheus instance (or a Prometheus-compatible data source) that scrapes metrics from Friendli Container processes.

<Note>
  The dashboard template works with Grafana v8.0.0 or later versions. We recommend using Grafana v10.0.0 or later for the best experience.
</Note>
