Log SQL statistics to Datadog

On this page Carat arrow pointing down

New in v23.1.4: This tutorial describes how to configure logging of sampled_query events to Datadog for finer granularity and long-term retention of SQL statistics. The sampled_query events contain common SQL event and execution details for sessions, transactions, and statements.

CockroachDB supports a built-in integration with Datadog which sends query events as logs via the Datadog HTTP API. This integration is the recommended path to achieve high throughput data ingestion, which will in turn provide more query events for greater workload observability.

Note:

This feature is in preview. This feature is subject to change. To share feedback and/or issues, contact Support.

Step 1. Create a Datadog API key

  1. In Datadog, navigate to Organization Settings > API keys.
  2. Follow the steps in the Datadog documentation on how to add an API key.
  3. Copy the newly created API key to be used in Step 2.

Step 2. Configure an HTTP network collector for Datadog

Configure an HTTP network collector by creating or modifying the logs.yaml file.

Warning:

Given the volume of sampled_query events, do not write sampled_query events to disk, or file-groups. Writing a high volume of sampled_query events to a file group will unnecessarily consume cluster resources and impact workload performance.

To disable the creation of a telemetry file and avoid writing sampled_query events and other telemetry events to disk, change the telemetry file-groups setting from the default of channels: [TELEMETRY] to channels: [].

In this logs.yaml example:

  1. To send sampled_query events directly to Datadog without writing events to disk, override telemetry default configuration by setting file-groups: telemetry: channels: to [].
  2. To connect to Datadog, replace {DATADOG API KEY} with the value you copied in Step 1.
  3. To control the ingestion and potential drop rate for sampled_query events, configure the following buffering values depending on your workload:
  • max-staleness: The maximum time a log message will wait in the buffer before a flush is triggered. Set to 0 to disable flushing based on elapsed time. Default: 5s
  • flush-trigger-size: The number of bytes that will trigger the buffer to flush. Set to 0 to disable flushing based on accumulated size. Default: 1MiB. In this example, override to 2.5MiB.
  • max-buffer-size: The maximum size of the buffer: new log messages received when the buffer is full cause older messages to be dropped. Default: 50MiB
icon/buttons/copy
sinks:
  http-servers:
    datadog:
      channels: [TELEMETRY]
      address: https://http-intake.logs.datadoghq.com/api/v2/logs
      format: json
      method: POST
      compression: gzip
      headers: {DD-API-KEY: "{DATADOG API KEY}"} # replace with actual API key
      buffering:
        format: json-array
        max-staleness: 5s
        flush-trigger-size: 2.5MiB # override default value
        max-buffer-size: 50MiB
  file-groups: # override default configuration
    telemetry:  # do not write telemetry events to disk
      channels: [] # set to empty square brackets

Pass the logs.yaml file to the cockroach process with either --log-config-file or --log flag.

Step 3. Configure CockroachDB to emit query events

Enable the sql.telemetry.query_sampling.enabled cluster setting so that executed queries will emit an event on the telemetry logging channel:

icon/buttons/copy
SET CLUSTER SETTING sql.telemetry.query_sampling.enabled = true;

Configure the following cluster setting to a value that is dependent on the level of granularity you require and how much performance impact from frequent logging you can tolerate:

  • sql.telemetry.query_sampling.max_event_frequency (default 8) is the max event frequency (events per second) at which we sample executed queries for telemetry. In practice, this means that we only sample an executed query if 1/max_event_frequency seconds have elapsed since the last executed query was sampled. Sampling impacts the volume of query events emitted which can have downstream impact to workload performance and third-party processing costs. Slowly increase this sampling threshold and monitor potential impact.
Note:

The sql.telemetry.query_sampling.max_event_frequency cluster setting and the buffering options in the logs.yaml control how many events are emitted to Datadog and that can be potentially dropped. Adjust this setting and these options according to your workload, depending on the size of events and the queries per second (QPS) observed through monitoring.

Step 4. Monitor TELEMETRY logs in Datadog

  1. Navigate to Datadog > Logs.
  2. Filter by OTHERS > channel: TELEMETRY to see the logs for the query events that are emitted. For example:

Datadog Telemetry Logs

See also


Yes No
On this page

Yes No