The Distributed dashboard lets you monitor important distribution layer health and performance metrics.
To view this dashboard, access the DB Console and click Metrics on the left-hand navigation, and then select Dashboard > Distributed.
Dashboard navigation
Use the Graph menu to display metrics for your entire cluster or for a specific node.
To the right of the Graph and Dashboard menus, a time interval selector allows you to filter the view for a predefined or custom time interval. Use the navigation buttons to move to the previous, next, or current time interval. When you select a time interval, the same interval is selected in the SQL Activity pages. However, if you select 10 or 30 minutes, the interval defaults to 1 hour in SQL Activity pages.
Hovering your mouse pointer over the graph title will display a tooltip with a description and the metrics used to create the graph.
When hovering on graphs, crosshair lines will appear at your mouse pointer. The series' values corresponding to the given time in the cross hairs are displayed in the legend under the graph. Hovering the mouse pointer on a given series displays the corresponding value near the mouse pointer and highlights the series line (graying out other series lines). Click anywhere within the graph to freeze the values in place. Click anywhere within the graph again to cause the values to change with your mouse movements once more.
In the legend, click on an individual series to isolate it on the graph. The other series will be hidden, while the hover will still work. Click the individual series again to make the other series visible. If there are many series, a scrollbar may appear on the right of the legend. This is to limit the size of the legend so that it does not get endlessly large, particularly on clusters with many nodes.
All timestamps in the DB Console are shown in Coordinated Universal Time (UTC).
The Distributed dashboard displays the following time series graphs:
Batches
The Batches graph displays various details about BatchRequest
traffic in the Distribution layer.
Hovering over the graph displays values for the following metrics:
Metric | Description |
---|---|
Batches | The number of BatchRequests made, as tracked by the distsender.batches metric. |
Partial Batches | The number of partial BatchRequests made, as tracked by the distsender.batches.partial metric. |
RPCs
The RPCs graph displays various details about RPC
traffic in the Distribution layer.
Hovering over the graph displays values for the following metrics:
Metric | Description |
---|---|
RPCs Sent | The number of RPC calls made, as tracked by the distsender.rpc.sent metric. |
Local Fast-path | The number of local fast-path RPC calls made, as tracked by the distsender.rpc.sent.local metric. |
RPC Errors
The RPC Errors graph displays various details about RPC
errors encountered in the Distribution layer.
Hovering over the graph displays values for the following metrics:
Metric | Description |
---|---|
Replica Errors | The number of RPCs sent due to per-replica errors, as tracked by the distsender.rpc.sent.nextreplicaerror metric. |
Not Leaseholder Errors | The number of NotLeaseHolderErrors logged, as tracked by the distsender.errors.notleaseholder metric. |
KV Transactions
The KV Transactions graph displays various details about transactions in the Transaction layer.
Hovering over the graph displays values for the following metrics:
Metric | Description |
---|---|
Committed | The number of committed KV transactions (including fast-path), as tracked by the txn.commits metric. |
Fast-path Committed | The number of committed one-phase KV transactions, as tracked by the txn.commits1PC metric. |
Aborted | The number of aborted KV transactions, as tracked by the txn.aborts metric. |
KV Transaction Durations: 99th percentile
The KV Transaction Durations: 99th percentile graph displays the 99th percentile of transaction durations over a one-minute period.
Hovering over the graph displays values for the following metrics:
Metric | Description |
---|---|
<node> |
The 99th percentile of transaction durations observed over a one-minute period for that node, as calculated from the txn.durations metric. |
KV Transaction Durations: 90th percentile
The KV Transaction Durations: 90th percentile graph displays the 90th percentile of transaction durations over a one-minute period.
Hovering over the graph displays values for the following metrics:
Metric | Description |
---|---|
<node> |
The 90th percentile of transaction durations observed over a one-minute period for that node, as calculated from the txn.durations metric. |
Node Heartbeat Latency: 99th percentile
The Node Heartbeat Latency: 99th percentile graph displays the 99th percentile of time elapsed between node liveness heartbeats on the cluster over a one-minute period.
Hovering over the graph displays values for the following metrics:
Metric | Description |
---|---|
<node> |
The 99th percentile of time elapsed between node liveness heartbeats on the cluster over a one-minute period for that node, as calculated from the liveness.heartbeatlatency metric. |
Node Heartbeat Latency: 90th percentile
The Node Heartbeat Latency: 90th percentile graph displays the 90th percentile of time elapsed between node liveness heartbeats on the cluster over a one-minute period.
Hovering over the graph displays values for the following metrics:
Metric | Description |
---|---|
<node> |
The 90th percentile of time elapsed between node liveness heartbeats on the cluster over a one-minute period for that node, as calculated from the liveness.heartbeatlatency metric. |
Summary and events
Summary panel
A Summary panel of key metrics is displayed to the right of the timeseries graphs.
Metric | Description |
---|---|
Total Nodes | The total number of nodes in the cluster. Decommissioned nodes are not included in this count. |
Capacity Used | The storage capacity used as a percentage of usable capacity allocated across all nodes. |
Unavailable Ranges | The number of unavailable ranges in the cluster. A non-zero number indicates an unstable cluster. |
Queries per second | The total number of SELECT , UPDATE , INSERT , and DELETE queries executed per second across the cluster. |
P99 Latency | The 99th percentile of service latency. |
If you are testing your deployment locally with multiple CockroachDB nodes running on a single machine (this is not recommended in production), you must explicitly set the store size per node in order to display the correct capacity. Otherwise, the machine's actual disk capacity will be counted as a separate store for each node, thus inflating the computed capacity.
Events panel
Underneath the Summary panel, the Events panel lists the 5 most recent events logged for all nodes across the cluster. To list all events, click View all events.
The following types of events are listed:
- Database created
- Database dropped
- Table created
- Table dropped
- Table altered
- Index created
- Index dropped
- View created
- View dropped
- Schema change reversed
- Schema change finished
- Node joined
- Node decommissioned
- Node restarted
- Cluster setting changed