New in v23.1.4:
This tutorial describes how to configure logging of sampled_query
events to Datadog for finer granularity and long-term retention of SQL statistics. The sampled_query
events contain common SQL event and execution details for sessions, transactions, and statements.
CockroachDB supports a built-in integration with Datadog which sends query events as logs via the Datadog HTTP API. This integration is the recommended path to achieve high throughput data ingestion, which will in turn provide more query events for greater workload observability.
This feature is in preview. This feature is subject to change. To share feedback and/or issues, contact Support.
Step 1. Create a Datadog API key
- In Datadog, navigate to Organization Settings > API keys.
- Follow the steps in the Datadog documentation on how to add an API key.
- Copy the newly created API key to be used in Step 2.
Step 2. Configure an HTTP network collector for Datadog
Configure an HTTP network collector by creating or modifying the logs.yaml
file.
Given the volume of sampled_query
events, do not write sampled_query
events to disk, or file-groups
. Writing a high volume of sampled_query
events to a file group will unnecessarily consume cluster resources and impact workload performance.
To disable the creation of a telemetry file and avoid writing sampled_query
events and other telemetry events to disk, change the telemetry file-groups
setting from the default of channels: [TELEMETRY]
to channels: []
.
In this logs.yaml
example:
- To send
sampled_query
events directly to Datadog without writing events to disk, override telemetry default configuration by settingfile-groups: telemetry: channels:
to[]
. - To connect to Datadog, replace
{DATADOG API KEY}
with the value you copied in Step 1. - To control the ingestion and potential drop rate for
sampled_query
events, configure the followingbuffering
values depending on your workload:
max-staleness
: The maximum time a log message will wait in the buffer before a flush is triggered. Set to0
to disable flushing based on elapsed time. Default:5s
flush-trigger-size
: The number of bytes that will trigger the buffer to flush. Set to0
to disable flushing based on accumulated size. Default:1MiB
. In this example, override to2.5MiB
.max-buffer-size
: The maximum size of the buffer: new log messages received when the buffer is full cause older messages to be dropped. Default:50MiB
sinks:
http-servers:
datadog:
channels: [TELEMETRY]
address: https://http-intake.logs.datadoghq.com/api/v2/logs
format: json
method: POST
compression: gzip
headers: {DD-API-KEY: "{DATADOG API KEY}"} # replace with actual API key
buffering:
format: json-array
max-staleness: 5s
flush-trigger-size: 2.5MiB # override default value
max-buffer-size: 50MiB
file-groups: # override default configuration
telemetry: # do not write telemetry events to disk
channels: [] # set to empty square brackets
Pass the logs.yaml
file to the cockroach
process with either --log-config-file
or --log
flag.
Step 3. Configure CockroachDB to emit query events
Enable the sql.telemetry.query_sampling.enabled
cluster setting so that executed queries will emit an event on the telemetry logging channel:
SET CLUSTER SETTING sql.telemetry.query_sampling.enabled = true;
Configure the following cluster setting to a value that is dependent on the level of granularity you require and how much performance impact from frequent logging you can tolerate:
sql.telemetry.query_sampling.max_event_frequency
(default8
) is the max event frequency (events per second) at which we sample executed queries for telemetry. In practice, this means that we only sample an executed query if 1/max_event_frequency
seconds have elapsed since the last executed query was sampled. Sampling impacts the volume of query events emitted which can have downstream impact to workload performance and third-party processing costs. Slowly increase this sampling threshold and monitor potential impact.
The sql.telemetry.query_sampling.max_event_frequency
cluster setting and the buffering
options in the logs.yaml
control how many events are emitted to Datadog and that can be potentially dropped. Adjust this setting and these options according to your workload, depending on the size of events and the queries per second (QPS) observed through monitoring.
Step 4. Monitor TELEMETRY logs in Datadog
- Navigate to Datadog > Logs.
- Filter by OTHERS > channel: TELEMETRY to see the logs for the query events that are emitted. For example: