Cookie Consent

Logging Use Cases

On this page Carat arrow pointing down

This page describes some common logging use cases, their relevant logging channels, and examples of notable events to be found in the logs:

We provide an example file sink configuration for each use case. These configurations are entirely optional and are intended to highlight the contents of each logging channel. A sink can include any combination of logging channels. Moreover, a single logging channel can be used in more than one sink in your logging configuration.

Your deployment may use an external service (e.g., Elasticsearch, Splunk) to collect and programmatically read logging data.

Note:

All log examples on this page use the default crdb-v2 format, except for the network logging configuration, which uses the default json-fluent-compact format for network output. Most log entries for non-DEV channels record structured events, which use a standardized format that can be reliably parsed by an external collector. All structured event types and their fields are detailed in the Notable events reference.

Logging channels may also contain events that are unstructured. Unstructured events can routinely change between CockroachDB versions, including minor patch revisions, so they are not officially documented.

Note:

‹ and › are placed around values in log messages that may contain sensitive data (PII). To customize this behavior, see Redact logs.

Operational monitoring

A database operator can use the OPS, HEALTH, and SQL_SCHEMA channels to monitor operational events initiated by users or automatic processes, DDL changes from applications, and overall cluster health.

In this example configuration, the channels are grouped into a file sink called ops. The combined logging output will be found in a cockroach-ops.log file at the configured logging directory.

sinks:
  file-groups:
    ops:
      channels: [OPS, HEALTH, SQL_SCHEMA]
Tip:

When monitoring your cluster, consider using these logs in conjunction with Prometheus, which can be set up to track node-level metrics.

OPS

The OPS channel logs operational events initiated by users or automation. These can include node additions and removals, process starts and shutdowns, gossip connection events, and zone configuration changes on the SQL schema or system ranges.

Example: Node decommissioning

This node_decommissioning event shows that a node is in the decommissioning state:

I210401 23:30:49.319360 5943 1@util/log/event_log.go:32 â‹® [-] 42 ={"Timestamp":1617319848793433000,"EventType":"node_decommissioning","RequestingNodeID":1,"TargetNodeID":4}
  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
  • TargetNodeID shows that the decommissioning node is 4.
  • RequestingNodeID shows that decommissioning was requested by node 1. You will see this when specifying the node ID explicitly in addition to the --host flag.

Example: Node restart

This node_restart event shows that a node has rejoined the cluster after being offline (e.g., by being restarted after being fully decommissioned):

I210323 20:53:44.765068 611 1@util/log/event_log.go:32 â‹® [n1] 20 ={"Timestamp":1616532824096394000,"EventType":"node_restart","NodeID":1,"StartedAt":1616532823668899000,"LastUp":1616532816150919000}
  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
  • NodeID shows that the restarted node is 1.
  • StartedAt shows the timestamp when the node was most recently restarted.
  • LastUp shows the timestamp when the node was up before being restarted.
Note:

All possible OPS event types are detailed in the reference documentation.

HEALTH

The HEALTH channel logs operational events initiated by CockroachDB or reported by automatic processes. These can include resource usage details, connection errors, gossip status, replication events, and runtime statistics.

Example: Runtime stats

A runtime_stats event is recorded every 10 seconds to reflect server health:

I210517 17:38:20.403619 586 2@util/log/event_log.go:32 â‹® [n1] 168 ={"Timestamp":1621273100403617000,"EventType":"runtime_stats","MemRSSBytes":119361536,"GoroutineCount":262,"MemStackSysBytes":4063232,"GoAllocBytes":40047584,"GoTotalBytes":68232200,"GoStatsStaleness":0.008556,"HeapFragmentBytes":6114336,"HeapReservedBytes":6324224,"HeapReleasedBytes":10559488,"CGoAllocBytes":8006304,"CGoTotalBytes":11997184,"CGoCallRate":0.6999931,"CPUUserPercent":5.4999456,"CPUSysPercent":6.2399383,"GCRunCount":12,"NetHostRecvBytes":16315,"NetHostSendBytes":21347}
  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
Note:

runtime_stats events are typically used for troubleshooting. To monitor your cluster's health, see Monitoring and Alerting.

SQL_SCHEMA

The SQL_SCHEMA channel logs changes to the SQL logical schema resulting from DDL operations.

Example: Schema change initiated

This alter_table event shows an ALTER TABLE ... ADD FOREIGN KEY schema change being initiated on a movr.public.vehicles table:

I210323 20:21:04.621132 113397 5@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:50812›,hostnossl,user=root] 14 ={"Timestamp":1616530864502127000,"EventType":"alter_table","Statement":"‹ALTER TABLE movr.public.vehicles ADD FOREIGN KEY (city, owner_id) REFERENCES movr.public.users (city, id)›","User":"‹root›","DescriptorID":59,"ApplicationName":"‹movr›","TableName":"‹movr.public.vehicles›","MutationID":1}
  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
  • ApplicationName shows that the events originated from an application named movr. You can use this field to filter the logging output by application.
  • DescriptorID identifies the object descriptor (e.g., movr.public.vehicles) undergoing the schema change.
  • MutationID identifies the job that is processing the schema change.

Example: Schema change completed

This finish_schema_change event shows that the above schema change has completed:

I210323 20:21:05.916626 114212 5@util/log/event_log.go:32 â‹® [n1,job=643761650092900353,scExec,id=59,mutation=1] 15 ={"Timestamp":1616530865791439000,"EventType":"finish_schema_change","InstanceID":1,"DescriptorID":59,"MutationID":1}
  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
  • DescriptorID identifies the object descriptor (e.g., movr.public.vehicles) affected by the schema change.
  • MutationID identifies the job that processed the schema change.

Note that the DescriptorID and MutationID values match in both of the above log entries, indicating that they are related.

Note:

All possible SQL_SCHEMA event types are detailed in the reference documentation.

Security and audit monitoring

A security engineer can use the SESSIONS, USER_ADMIN, PRIVILEGES, and SENSITIVE_ACCESS channels to monitor connection and authentication events, changes to user/role administration and privileges, and any queries on audited tables.

In this example configuration, the channels are grouped into a file sink called security. The combined logging output will be found in a cockroach-security.log file at the configured logging directory.

In addition, the security channels are configured as auditable. This feature guarantees non-repudiability by enabling exit-on-error (stops nodes when they encounter a logging error) and disabling buffered-writes (flushes each log entry and synchronizes writes). This setting can incur a performance overhead and higher disk IOPS consumption, so it should only be used when necessary (e.g., for security purposes).

sinks:
  file-groups:
    security:
      channels: [SESSIONS, USER_ADMIN, PRIVILEGES, SENSITIVE_ACCESS]
      auditable: true

SESSIONS

The SESSIONS channel logs SQL session events. This includes client connection and session authentication events, for which logging must be enabled separately. For complete logging of client connections, we recommend enabling both types of events.

Note:

These logs perform one disk I/O per event. Enabling each setting will impact performance.

Example: Client connection events

To log SQL client connection events to the SESSIONS channel, enable the server.auth_log.sql_connections.enabled cluster setting:

icon/buttons/copy
SET CLUSTER SETTING server.auth_log.sql_connections.enabled = true;
Note:

In addition to SQL sessions, connection events can include SQL-based liveness probe attempts.

These logs show a client_connection_start (client connection established) and a client_connection_end (client connection terminated) event over a hostssl (TLS transport over TCP) connection:

I210323 21:53:58.300180 53298 4@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:52632›] 49 ={"Timestamp":1616536438300176000,"EventType":"client_connection_start","InstanceID":1,"Network":"tcp","RemoteAddress":"‹[::1]:52632›"}
I210323 21:53:58.305074 53298 4@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:52632›,hostssl] 54 ={"Timestamp":1616536438305072000,"EventType":"client_connection_end","InstanceID":1,"Network":"tcp","RemoteAddress":"‹[::1]:52632›","Duration":4896000}
  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
  • Network shows the network protocol of the connection.
  • RemoteAddress shows the address of the SQL client, proxy, or other intermediate server.

Example: Session authentication events

To log SQL session authentication events to the SESSIONS channel, enable the server.auth_log.sql_sessions.enabled cluster setting on every cluster:

icon/buttons/copy
SET CLUSTER SETTING server.auth_log.sql_sessions.enabled = true;

These logs show certificate authentication success over a hostssl (TLS transport over TCP) connection:

I210323 23:35:19.458098 122619 4@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:53884›,hostssl,user=‹roach›] 62 ={"Timestamp":1616542519458095000,"EventType":"client_authentication_info","InstanceID":1,"Network":"tcp","RemoteAddress":"‹[::1]:53884›","Transport":"hostssl","User":"‹roach›","Method":"cert-password","Info":"‹HBA rule: host  all all  all cert-password # built-in CockroachDB default›"}
I210323 23:35:19.458136 122619 4@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:53884›,hostssl,user=‹roach›] 63 ={"Timestamp":1616542519458135000,"EventType":"client_authentication_info","InstanceID":1,"Network":"tcp","RemoteAddress":"‹[::1]:53884›","Transport":"hostssl","User":"‹roach›","Method":"cert-password","Info":"‹client presented certificate, proceeding with certificate validation›"}
I210323 23:35:19.458154 122619 4@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:53884›,hostssl,user=‹roach›] 64 ={"Timestamp":1616542519458154000,"EventType":"client_authentication_ok","InstanceID":1,"Network":"tcp","RemoteAddress":"‹[::1]:53884›","Transport":"hostssl","User":"‹roach›","Method":"cert-password"}
  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
  • The two client_authentication_info events show the progress of certificate authentication. The Info fields show the progress of certificate validation.
  • The client_authentication_ok event shows that certificate authentication was successful.
  • User shows that the SQL session is authenticated for user roach.

These logs show password authentication failure over a hostssl (TLS transport over TCP) connection:

I210323 21:53:58.304573 53299 4@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:52632›,hostssl,user=‹roach›] 50 ={"Timestamp":1616536438304572000,"EventType":"client_authentication_info","InstanceID":1,"Network":"tcp","RemoteAddress":"‹[::1]:52632›","Transport":"hostssl","User":"‹roach›","Method":"cert-password","Info":"‹HBA rule: host  all all  all cert-password # built-in CockroachDB default›"}
I210323 21:53:58.304648 53299 4@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:52632›,hostssl,user=‹roach›] 51 ={"Timestamp":1616536438304647000,"EventType":"client_authentication_info","InstanceID":1,"Network":"tcp","RemoteAddress":"‹[::1]:52632›","Transport":"hostssl","User":"‹roach›","Method":"cert-password","Info":"‹no client certificate, proceeding with password authentication›"}
I210323 21:53:58.304797 53299 4@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:52632›,hostssl,user=‹roach›] 52 ={"Timestamp":1616536438304796000,"EventType":"client_authentication_failed","InstanceID":1,"Network":"tcp","RemoteAddress":"‹[::1]:52632›","Transport":"hostssl","User":"‹roach›","Reason":6,"Detail":"‹password authentication failed for user roach›","Method":"cert-password"}
I210323 21:53:58.305016 53298 4@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:52632›,hostssl,user=‹roach›] 53 ={"Timestamp":1616536438305014000,"EventType":"client_session_end","InstanceID":1,"Network":"tcp","RemoteAddress":"‹[::1]:52632›","Duration":2273000}
  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
  • The two client_authentication_info events show the progress of certificate authentication. The Info fields show that password authentication was attempted, in the absence of a client certificate.
  • The client_authentication_failed event shows that password authentication was unsuccessful. The Detail field shows the related error.
  • The client_session_end event shows that the SQL session was terminated. This would typically be followed by a client_connection_end event.
  • User shows that the SQL session authentication was attempted for user roach.
Note:

All possible SESSIONS event types are detailed in the reference documentation. For more details on certificate and password authentication, see Authentication.

SENSITIVE_ACCESS

The SENSITIVE_ACCESS channel logs SQL audit events. These include all queries being run against audited tables, when enabled, as well as queries executed by users with the admin role.

Note:

Enabling these logs can negatively impact performance. We recommend using SENSITIVE_ACCESS for security purposes only.

Note:

This feature is in preview. This feature is subject to change. To share feedback and/or issues, contact Support.

To log all queries against a specific table, enable auditing on the table with ALTER TABLE ... EXPERIMENTAL_AUDIT.

Example: Audit events

This command enables auditing on a customers table:

icon/buttons/copy
ALTER TABLE customers EXPERIMENTAL_AUDIT SET READ WRITE;

This sensitive_table_access event shows that the audited table customers was accessed by user root issuing an INSERT statement:

I210323 18:50:04.518707 1182 8@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:49851›,hostnossl,user=root] 2 ={"Timestamp":1616525404415644000,"EventType":"sensitive_table_access","Statement":"‹INSERT INTO \"\".\"\".customers(name, address, national_id, telephone, email) VALUES ('Pritchard M. Cleveland', '23 Crooked Lane, Garden City, NY USA 11536', 778124477, 12125552000, 'pritchmeister@aol.com')›","User":"‹root›","DescriptorID":52,"ApplicationName":"‹$ cockroach sql›","ExecMode":"exec","NumRows":1,"Age":103.066,"TxnCounter":28,"TableName":"‹defaultdb.public.customers›","AccessMode":"rw"}
  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
  • AccessMode shows that the table was accessed with a read/write (rw) operation.
  • ApplicationName shows that the event originated from the cockroach sql shell. You can use this field to filter the logging output by application.
Note:

All possible SENSITIVE_ACCESS event types are detailed in the reference documentation. For a detailed tutorial on table auditing, see SQL Audit Logging.

PRIVILEGES

The PRIVILEGES channel logs SQL privilege changes. These include DDL operations performed by SQL operations that modify the privileges granted to roles and users on databases, schemas, tables, and user-defined types.

Example: Database privileges

This change_database_privilege event shows that user root granted all privileges to user roach on the database movr:

I210329 22:54:48.888312 1742207 7@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:52487›,hostssl,user=root] 1 ={"Timestamp":1617058488747117000,"EventType":"change_database_privilege","Statement":"‹GRANT ALL ON DATABASE movr TO roach›","User":"‹root›","DescriptorID":57,"ApplicationName":"‹$ cockroach sql›","Grantee":"‹roach›","GrantedPrivileges":["ALL"],"DatabaseName":"‹movr›"}
  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
  • ApplicationName shows that the event originated from the cockroach sql shell. You can use this field to filter the logging output by application.
  • GrantedPrivileges shows the privileges that were granted.
Note:

All possible PRIVILEGE event types are detailed in the reference documentation.

USER_ADMIN

The USER_ADMIN channel logs changes to users and roles. This includes user and role creation and assignment and changes to privileges, options, and passwords.

Example: SQL user creation

This create_role event shows that a user roach was created and assigned a password by user root. Note that the password in the SQL statement is pre-redacted even if redact is set to false for the logging sink. For more details on redaction behavior, see Redact logs.

I210323 20:54:53.122681 1943 6@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:51676›,hostssl,user=root] 1 ={"Timestamp":1616532892887402000,"EventType":"create_role","Statement":"‹CREATE USER 'roach' WITH PASSWORD *****›","User":"‹root›","ApplicationName":"‹$ cockroach sql›","RoleName":"‹roach›"}
  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
  • ApplicationName shows that the event originated from the cockroach sql shell. You can use this field to filter the logging output by application.
  • RoleName shows the name of the user/role. For details on user and role terminology, see Users and roles.
Note:

All possible USER_ADMIN event types are detailed in the reference documentation.

Performance tuning

An application developer can use the SQL_EXEC and SQL_PERF channels to examine SQL queries and filter slow queries in order to optimize or troubleshoot performance.

In this example configuration, the channels are grouped into a file sink called performance. The combined logging output will be found in a cockroach-performance.log file at the configured logging directory.

sinks:
  file-groups:
    performance:
      channels: [SQL_EXEC, SQL_PERF]

SQL_EXEC

The SQL_EXEC channel reports all SQL executions on the cluster, when enabled.

To log cluster-wide executions, enable the sql.trace.log_statement_execute cluster setting:

icon/buttons/copy
SET CLUSTER SETTING sql.trace.log_statement_execute = true;

Each node of the cluster will write all SQL queries it executes to the SQL_EXEC channel. These are recorded as query_execute events.

Warning:

Logging cluster-wide executions by enabling the sql.trace.log_statement_execute cluster setting will incur considerable overhead and may have a negative performance impact.

Example: SQL query

This event details a SELECT statement that was issued by user root:

I210401 22:57:20.047235 5475 9@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:59116›,hostnossl,user=root] 900 ={"Timestamp":1617317840045704000,"EventType":"query_execute","Statement":"‹SELECT * FROM \"\".\"\".users WHERE name = 'Cheyenne Smith'›","User":"‹root›","ApplicationName":"‹$ cockroach sql›","ExecMode":"exec","NumRows":1,"Age":1.583,"FullTableScan":true,"TxnCounter":12}

Note the FullTableScan value in the logged event, which shows that this query performed a full table scan and likely caused a performance hit. To learn more about when this issue appears and how it can be resolved, see Statement Tuning with EXPLAIN.

  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
  • ApplicationName shows that the event originated from the cockroach sql shell. You can use this field to filter the logging output by application.

Example: Internal SQL query

Internal queries are also logged to the SQL_EXEC channel. For example, this event details a statement issued on the internal system.jobs table:

I210330 16:09:04.966129 1885738 9@util/log/event_log.go:32 ⋮ [n1,intExec=‹find-scheduled-jobs›] 13 ={"Timestamp":1617120544952784000,"EventType":"query_execute","Statement":"‹SELECT (SELECT count(*) FROM \"\".system.jobs AS j WHERE ((j.created_by_type = 'crdb_schedule') AND (j.created_by_id = s.schedule_id)) AND (j.status NOT IN ('succeeded', 'canceled', 'failed'))) AS num_running, s.* FROM \"\".system.scheduled_jobs AS s WHERE next_run < current_timestamp() ORDER BY random() LIMIT 10 FOR UPDATE›","User":"‹root›","ApplicationName":"‹$ internal-find-scheduled-jobs›","ExecMode":"exec-internal","Age":2.934,"FullTableScan":true}

If you no longer need to log queries across the cluster, you can disable the setting:

icon/buttons/copy
SET CLUSTER SETTING sql.trace.log_statement_execute = false;
Note:

All possible SQL_EXEC event types are detailed in the reference documentation.

SQL_PERF

The SQL_PERF channel reports slow SQL queries, when enabled. This includes queries whose latency exceeds a configured threshold, as well as queries that perform a full table or index scan.

To enable slow query logging, enable the sql.log.slow_query.latency_threshold cluster setting by setting it to a non-zero value. This will log queries whose service latency exceeds a specified threshold value. The threshold value must be specified with a unit of time (e.g., 500ms for 500 milliseconds, 5us for 5 nanoseconds, or 5s for 5 seconds). A threshold of 0s disables the slow query log.

Note:

Setting sql.log.slow_query.latency_threshold to a non-zero time enables tracing on all queries, which impacts performance. After debugging, set the value back to 0s to disable the log.

To log all queries that perform full table or index scans to SQL_PERF, regardless of query latency, set the sql.log.slow_query.experimental_full_table_scans.enabled cluster setting to true.

Example: Slow SQL query

For example, to enable the slow query log for all queries with a latency above 100 milliseconds:

icon/buttons/copy
SET CLUSTER SETTING sql.log.slow_query.latency_threshold = '100ms';

Each gateway node will now record queries that take longer than 100 milliseconds to the SQL_PERF channel as slow_query events.

This slow_query event was logged with a service latency (age) of 100.205 milliseconds:

I210323 20:02:16.857133 59270 10@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:50595›,hostnossl,user=root] 573 ={"Timestamp":1616529736756959000,"EventType":"slow_query","Statement":"‹UPDATE \"\".\"\".bank SET balance = CASE id WHEN $1 THEN balance - $3 WHEN $2 THEN balance + $3 END WHERE id IN ($1, $2)›","User":"‹root›","ApplicationName":"‹bank›","PlaceholderValues":["‹158›","‹210›","‹257›"],"ExecMode":"exec","NumRows":2,"Age":100.205,"TxnCounter":97}
  • ApplicationName shows that the events originated from an application named bank. You can use this field to filter the logging output by application.

The following query was logged with a service latency (age) of 9329.26 milliseconds, a very high latency that resulted from a transaction retry error:

I210323 20:02:12.095253 59168 10@util/log/event_log.go:32 ⋮ [n1,client=‹[::1]:50621›,hostnossl,user=root] 361 ={"Timestamp":1616529731816553000,"EventType":"slow_query","Statement":"‹UPDATE \"\".\"\".bank SET balance = CASE id WHEN $1 THEN balance - $3 WHEN $2 THEN balance + $3 END WHERE id IN ($1, $2)›","User":"‹root›","ApplicationName":"‹bank›","PlaceholderValues":["‹351›","‹412›","‹206›"],"ExecMode":"exec","SQLSTATE":"40001","ErrorText":"‹TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1616529731.152644000,2 too old; wrote at 1616529731.816553000,1: \"sql txn\" meta={id=6c8f776f pri=0.02076160 epo=1 ts=1616529731.816553000,1 min=1616529722.766004000,0 seq=0} lock=true stat=PENDING rts=1616529731.152644000,2 wto=false gul=1616529723.266004000,0›","Age":9329.26,"NumRetries":1,"TxnCounter":1}
  • Preceding the = character is the crdb-v2 event metadata. See the reference documentation for details on the fields.
  • ApplicationName shows that the events originated from an application named bank. You can use this field to filter the logging output by application.
  • ErrorText shows that this query encountered a type of transaction retry error. For details on transaction retry errors and how to resolve them, see the Transaction Retry Error Reference.
  • NumRetries shows that the transaction was retried once before succeeding.
Note:

All possible SQL_PERF event types are detailed in the reference documentation.

Network logging

A database operator can send logs over the network to a Fluentd or HTTP server.

Warning:

TLS is not supported yet: the connection to the log collector is neither authenticated nor encrypted. Given that logging events may contain sensitive information, care should be taken to keep the log collector and the CockroachDB node close together on a private network, or connect them using a secure VPN. TLS support may be added at a later date.

In this example configuration, operational and security logs are grouped into separate ops and security network sinks. The logs from both sinks are sent to a Fluentd server, which can then route them to a compatible log collector (e.g., Elasticsearch, Splunk).

Note:

A network sink can be listed more than once with different address values. This routes the same logs to different Fluentd servers.

sinks:
  fluent-servers:
    ops:
      channels: [OPS, HEALTH, SQL_SCHEMA]
      address: 127.0.0.1:5170
      net: tcp
      redact: true
    security:
      channels: [SESSIONS, USER_ADMIN, PRIVILEGES, SENSITIVE_ACCESS]
      address: 127.0.0.1:5170
      net: tcp
      auditable: true

In this case, defining separate ops and security network sinks allows us to:

Otherwise, it is generally more flexible to configure Fluentd to route logs to different destinations.

By default, fluent-servers log messages use the json-fluent-compact format for ease of processing over a stream.

For example, this JSON message found in the OPS logging channel contains a node_restart event. The event shows that a node has rejoined the cluster after being offline (e.g., by being restarted after being fully decommissioned):

{"tag":"cockroach.ops","c":1,"t":"1625766470.804899000","s":1,"sev":"I","g":7,"f":"util/log/event_log.go","l":32,"n":17,"r":1,"tags":{"n":"1"},"event":{"Timestamp":1625766470804896000,"EventType":"node_restart","NodeID":1,"StartedAt":1625766470561283000,"LastUp":1617319541533204000}}
  • tag is a field required by the Fluentd protocol.
  • sev shows that the message has the INFO severity level.
  • event contains the fields for the structured node_restart event.
    • NodeID shows that the restarted node is 1.
    • StartedAt shows the timestamp when the node was most recently restarted.
    • LastUp shows the timestamp when the node was up before being restarted.

See the reference documentation for details on the remaining JSON fields.

Network logging with log buffering

A database operator can configure CockroachDB to buffer log messages for a configurable time period or collected message size before writing them to the log sink. This is especially useful for writing log messages to network log sinks, such as Fluentd-compatible servers or HTTP servers, where high-traffic or high-contention scenarios can result in log message write latency.

Log buffering is enabled by default on the Fluentd-compatible and HTTP log sink destinations, but you may wish to adjust the buffering configuration for these log sinks based on your needs.

For example, the following logging configuration adjusts the default log buffering behavior for both a Fluentd-compatible and an HTTP log sink destination:

fluent-defaults:
  buffering:
    flush-trigger-size: 2MiB
http-defaults:
  buffering:
    max-staleness: 10s
    max-buffer-size: 75MiB
sinks:
  fluent-servers:
    health:
      channels: HEALTH
      buffering:
        flush-trigger-size: 100KB  # Override flush-trigger-size for HEALTH channel only
  http-servers:
    health:
      channels: HEALTH
      buffering:
        max-staleness: 2s  # Override max-staleness for HEALTH channel only
        max-buffer-size: 100MiB # Override max-buffer-size for HEALTH channel only

Together, this configuration ensures that log messages to the Fluentd log sink target are buffered for up to 2MiB in accumulated size, and log messages to the HTTP server log sink target are buffered for up to 10s duration (with a limit of up to 75MiB accumulated message size in the buffer before messages begin being dropped), before being written to the log sink. Further, each long sink target is configured with an overridden value for these settings specific to log messages in the HEALTH log channel, which are flushed more aggressively in both cases.

See Log buffering for more information.

See also


Yes No
On this page

Yes No