The cockroach debug zip
command connects to your cluster and gathers information from each active node into a single .zip
file (inactive nodes are not included). For details on the .zip
contents, see Files.
You can use the cockroach debug merge-logs
command in conjunction with cockroach debug zip
to merge the collected logs into one file, making them easier to parse.
The files produced by cockroach debug zip
can contain highly sensitive, personally-identifiable information (PII), such as usernames, hashed passwords, and possibly table data. Use the --redact
flag to configure CockroachDB to redact sensitive data when generating the .zip
file (excluding range keys) if intending to share it with Cockroach Labs.
Details
Use cases
cockroach debug zip
is an expensive operation and impacts cluster performance.
Only use this command as an emergency measure under the guidance of Cockroach Labs.
Particularly fetching stack traces for all goroutines is a "stop-the-world" operation, which can momentarily but significantly increase SQL service latency. Exclude these goroutine stacks by using the --include-goroutine-stacks=false
flag.
There are two scenarios in which debug zip
is useful:
If you experience severe or difficult-to-reproduce issues with your cluster, Cockroach Labs might ask you to send us your cluster's debugging information using
cockroach debug zip
. We recommend reducing the*.zip
file size by only retrieving debugging information for the relevant time range of the issue by using the--files-from
, and/or--files-until
flags.To collect all of your nodes' logs, which you can then parse to locate issues. You can optionally use the flags to retrieve only the log files. For more information about logs, see Logging. Also note:
- Nodes that are currently down cannot deliver their logs over the network. For these nodes, you must log on to the machine where the
cockroach
process would otherwise be running, and gather the files manually. - Nodes that are currently up but disconnected from other nodes (e.g., because of a network partition) may not be able to respond to
debug zip
requests forwarded by other nodes, but can still respond to requests for data when asked directly. In such situations, we recommend using the--host
flag to pointdebug zip
at each of the disconnected nodes until data has been gathered for the entire cluster.
- Nodes that are currently down cannot deliver their logs over the network. For these nodes, you must log on to the machine where the
Files
cockroach debug zip
collects log files, heap profiles, CPU profiles, and goroutine dumps from the last 48 hours, by default.
These files can greatly increase the size of the cockroach debug zip
output. To limit the .zip
file size for a large cluster, we recommend first experimenting with cockroach debug list-files
and then using flags to filter the files.
The following files collected by cockroach debug zip
, which are found in the individual node directories, can be filtered using the --exclude-files
, --include-files
, --files-from
, and/or --files-until
flags:
Information | Filename |
---|---|
Log files | cockroach-{log-file-group}.{host}.{user}.{start timestamp in UTC}.{process ID}.log |
Goroutine dumps | goroutine_dump.{date-and-time}.{metadata}.double_since_last_dump.{metadata}.txt.gz |
Heap profiles | memprof.{date-and-time}.{heapsize}.pprof |
Memory statistics | memstats.{date-and-time}.{heapsize}.txt |
CPU profiles | cpuprof.{date-and-time} |
Active query dumps | activequeryprof.{date-and-time}.csv |
The following information is also contained in the .zip
file, and cannot be filtered:
- System tables. The following system tables are not included:
system.users
system.web_sessions
system.join_tokens
system.comments
system.ui
system.zones
system.statement_bundle_chunks
system.statement_statistics
system.transaction_statistics
- Cluster events
- Database details
- Schema change events
- Database, table, node, and range lists
- Node details
- Node liveness
- Gossip data
- Stack traces
- Range details
- Jobs
- Cluster Settings
- Metrics
- Replication Reports
- CPU profiles
- A script (
hot-ranges.sh
) that summarizes the hottest ranges (ranges receiving a high number of reads or writes)
Subcommands
While the cockroach debug
command has a few subcommands, users are expected to use only the zip
, encryption-active-key
, merge-logs
, list-files
, tsdump
, and ballast
subcommands.
We recommend using the encryption-decrypt
and job-trace
subcommands only when directed by the Cockroach Labs support team.
The other debug
subcommands are useful only to Cockroach Labs. Output of debug
commands may contain sensitive or secret information.
Synopsis
$ cockroach debug zip {ZIP file destination} {flags}
The following flags must apply to an active CockroachDB node. If no nodes are live, you must start at least one node.
Flags
The debug zip
subcommand supports the following general-use, client connection, and logging flags.
General
Flag | Description |
---|---|
--cpu-profile-duration |
Fetch CPU profiles from the cluster with the specified sample duration in seconds. The debug zip command will block for the duration specified. A value of 0 disables this feature.Default: 5s |
--concurrency |
The maximum number of nodes to concurrently poll for data. This can be any value between 1 and 15 . |
--exclude-files |
Files to exclude from the generated .zip . This can be used to limit the size of the generated .zip , and affects logs, heap profiles, goroutine dumps, and/or CPU profiles. The files are specified as a comma-separated list of glob patterns. For example:--exclude-files=*.log Note that this flag is applied after --include_files . Use cockroach debug list-files with this flag to see a list of files that will be contained in the .zip . |
--exclude-nodes |
Specify nodes to exclude from inspection as a comma-separated list or range of node IDs. For example:--exclude-nodes=1,10,13-15 |
--files-from |
Start timestamp for log file, goroutine dump, and heap profile collection. This can be used to limit the size of the generated .zip , which is increased by these files. The timestamp uses the format YYYY-MM-DD , followed optionally by HH:MM:SS or HH:MM . For example:--files-from='2021-07-01 15:00' When specifying a narrow time window, we recommend adding extra seconds/minutes to account for uncertainties such as clock drift. Default: 48 hours before now |
--files-until |
End timestamp for log file, goroutine dump, and heap profile collection. This can be used to limit the size of the generated .zip , which is increased by these files. The timestamp uses the format YYYY-MM-DD , followed optionally by HH:MM:SS or HH:MM . For example:--files-until='2021-07-01 16:00' When specifying a narrow time window, we recommend adding extra seconds/minutes to account for uncertainties such as clock drift. Default: 24 hours beyond now (to include files created during .zip creation) |
--include-files |
Files to include in the generated .zip . This can be used to limit the size of the generated .zip , and affects logs, heap profiles, goroutine dumps, and/or CPU profiles. The files are specified as a comma-separated list of glob patterns. For example:--include-files=*.pprof Note that this flag is applied before --exclude-files . Use cockroach debug list-files with this flag to see a list of files that will be contained in the .zip . |
--include-goroutine-stacks |
Fetch stack traces for all goroutines running on each targeted node in nodes/*/stacks.txt and nodes/*/stacks_with_labels.txt files. Note that fetching stack traces for all goroutines is a "stop-the-world" operation, which can momentarily have negative impacts on SQL service latency. Exclude these goroutine stacks by using the --include-goroutine-stacks=false flag. Note that any periodic goroutine dumps previously taken on the node will still be included in nodes/*/goroutines/*.txt.gz , as these would have already been generated and don't require any additional stop-the-world operations to be collected.Default: true |
--include-range-info |
Include one file per node with information about the KV ranges stored on that node, in nodes/{node ID}/ranges.json .This information can be vital when debugging issues that involve the KV layer (which includes everything below the SQL layer), such as data placement, load balancing, performance or other behaviors. In certain situations, on large clusters with large numbers of ranges, these files can be omitted if and only if the issue being investigated is already known to be in another layer of the system (for example, an error message about an unsupported feature or incompatible value in a SQL schema change or statement). However, many higher-level issues are ultimately related to the underlying KV layer described by these files. Only set this to false if directed to do so by Cockroach Labs support.In addition, include problem ranges information in reports/problemranges.json .Default: true |
--include-running-job-traces |
Include information about each traceable job that is running or reverting (such as backup, restore, import, physical cluster replication) in jobs/*/*/trace.zip files. This involves collecting cluster-wide traces for each running job in the cluster.Default: true |
--nodes |
Specify nodes to inspect as a comma-separated list or range of node IDs. For example:--nodes=1,10,13-15 |
--redact |
Redact sensitive data from the generated .zip , with the exception of range keys, which must remain unredacted because they are essential to support CockroachDB. This flag replaces the deprecated --redact-logs flag, which only applied to log messages contained within .zip . See Redact sensitive information for an example. |
--redact-logs |
Deprecated Redact sensitive data from collected log files only. Use the --redact flag instead, which redacts sensitive data across the entire generated .zip as well as the collected log files. Passing the --redact-logs flag will be interpreted as the --redact flag. |
--timeout |
In the process of generating a debug zip, many internal requests are made. Each request is allowed the maximum duration specified by the timeout. If an internal request does not complete within the timeout duration, an error is displayed for that request and its artifact is not included in the zip file. The timeout is suffixed with s (seconds), m (minutes), or h (hours).Default: 60s |
Client connection
Flag | Description |
---|---|
--cert-principal-map |
A comma-separated list of <cert-principal>:<db-principal> mappings. This allows mapping the principal in a cert to a DB principal such as node or root or any SQL user. This is intended for use in situations where the certificate management system places restrictions on the Subject.CommonName or SubjectAlternateName fields in the certificate (e.g., disallowing a CommonName like node or root ). If multiple mappings are provided for the same <cert-principal> , the last one specified in the list takes precedence. A principal not specified in the map is passed through as-is via the identity function. A cert is allowed to authenticate a DB principal if the DB principal name is contained in the mapped CommonName or DNS-type SubjectAlternateName fields. |
--certs-dir |
The path to the certificate directory containing the CA and client certificates and client key. Env Variable: COCKROACH_CERTS_DIR Default: ${HOME}/.cockroach-certs/ |
--cluster-name |
The cluster name to use to verify the cluster's identity. If the cluster has a cluster name, you must include this flag. For more information, see cockroach start . |
--disable-cluster-name-verification |
Disables the cluster name check for this command. This flag must be paired with --cluster-name . For more information, see cockroach start . |
--host |
The server host and port number to connect to. This can be the address of any node in the cluster. Env Variable: COCKROACH_HOST Default: localhost:26257 |
--insecure |
Use an insecure connection. Env Variable: COCKROACH_INSECURE Default: false |
--url |
A connection URL to use instead of the other arguments. To convert a connection URL to the syntax that works with your client driver, run cockroach convert-url .Env Variable: COCKROACH_URL Default: no URL |
Logging
By default, this command logs messages to stderr
. This includes events with WARNING
severity and higher.
If you need to troubleshoot this command's behavior, you can customize its logging behavior.
Examples
Generate a debug zip file
Generate the debug zip file for an insecure cluster:
$ cockroach debug zip ./cockroach-data/logs/debug.zip --insecure --host=200.100.50.25
Generate the debug zip file for a secure cluster:
$ cockroach debug zip ./cockroach-data/logs/debug.zip --host=200.100.50.25
Secure examples assume you have the appropriate certificates in the default certificate directory, ${HOME}/.cockroach-certs/
.
Generate a debug zip file for a time range
Generate a debug zip file containing only debugging information for a specified time range:
$ cockroach debug zip ./cockroach-data/logs/debug.zip --files-from='2023-10-03 13:30' --files-until='2023-10-03 14:30'
Generate a debug zip file with logs only
Generate a debug zip file containing only log files:
$ cockroach debug zip ./cockroach-data/logs/debug.zip --include-files=*.log
Redact sensitive information
Example of a log string without redaction enabled:
server/server.go:1423 ⋮ password of user ‹admin› was set to ‹"s3cr34?!@x_"›
Enable log redaction:
$ cockroach debug zip ./cockroach-data/logs/debug.zip --redact --insecure --host=200.100.50.25
server/server.go:1423 ⋮ password of user ‹×› was set to ‹×›