MOLT Fetch

On this page

MOLT Fetch moves data from a source database into CockroachDB as part of a database migration.

MOLT Fetch uses IMPORT INTO or COPY FROM to move the source data to cloud storage (Google Cloud Storage, Amazon S3, or Azure Blob Storage), a local file server, or local memory. Once the data is exported, MOLT Fetch loads the data into a target CockroachDB database. For details, refer to Migration phases.

Terminology

Shard: A portion of a table's data exported concurrently during the data export phase. Tables are divided into shards to enable parallel processing. For details, refer to Table sharding.
Continuation token: An identifier that marks the progress of a fetch task. Used to resume data loading from the point of interruption if a fetch task fails. For details, refer to Fetch continuation.
Intermediate files: Temporary data files written to cloud storage or a local file server during the data export phase. These files are used to stage exported data before importing it into CockroachDB during the data import phase. For details, refer to Data path.

Prerequisites

Supported databases

The following source databases are supported:

PostgreSQL 11-16
MySQL 5.7, 8.0 and later
Oracle Database 19c (Enterprise Edition) and 21c (Express Edition)

Database configuration

Ensure that the source and target schemas are identical, unless you enable automatic schema creation with the drop-on-target-and-recreate option. If you are creating the target schema manually, review the behaviors in Mismatch handling.

Note:

MOLT Fetch does not support migrating sequences. If your source database contains sequences, refer to the guidance on indexing with sequential keys. If a sequential key is necessary in your CockroachDB table, you must create it manually. After using MOLT Fetch to load the data onto the target, but before cutover, make sure to update each sequence's current value using setval() so that new inserts continue from the correct point.

If you plan to use cloud storage for the data migration, follow the steps in Cloud storage security.

User permissions

The SQL user running MOLT Fetch requires specific privileges on both the source and target databases:

Database	Required Privileges	Details
PostgreSQL source	`CONNECT` on database. `USAGE` on schema. `SELECT` on tables to migrate.	Create PostgreSQL migration user
MySQL source	`SELECT` on tables to migrate.	Create MySQL migration user
Oracle source	`CONNECT` and `CREATE SESSION`. `SELECT` and `FLASHBACK` on tables to migrate. `SELECT` on metadata views (`ALL_USERS`, `DBA_USERS`, `DBA_OBJECTS`, `DBA_SYNONYMS`, `DBA_TABLES`).	Create Oracle migration user
CockroachDB target	`ALL` on target database. `CREATE` on schema. `SELECT`, `INSERT`, `UPDATE`, `DELETE` on target tables. For `IMPORT INTO`: `SELECT`, `INSERT`, `DROP` on target tables. Optionally `EXTERNALIOIMPLICITACCESS` for implicit cloud storage authentication. For `COPY FROM`: `admin` role.	Create CockroachDB user

Installation

To install MOLT, download the binary that matches your architecture and source database:

Operating System	Architecture	PostgreSQL/MySQL	Oracle
Windows	AMD 64-bit	Download	N/A
	ARM 64-bit	Download	N/A
Linux	AMD 64-bit	Download	Download
	ARM 64-bit	Download	N/A
Mac	AMD 64-bit	Download	Download
	ARM 64-bit	Download	Download

The following binaries are included:

molt
replicator

Tip:

For ease of use, keep both molt and replicator in your current working directory.

To display the current version of each binary, run molt --version and replicator --version.

Note:

molt is bundled with the latest replicator version available at the time of the MOLT release. This means that the MOLT download always contains the latest released version of MOLT Replicator. To verify that the molt and replicator versions match, run molt --version and replicator --version.

For previous binaries, refer to the MOLT version manifest. For release details, refer to the MOLT changelog.

Note:

MOLT Fetch is supported on Red Hat Enterprise Linux (RHEL) 9 and above.

Docker images

MOLT Fetch

Docker multi-platform images containing both the AMD and ARM molt and replicator binaries are available. To pull the latest image for PostgreSQL and MySQL:

docker pull cockroachdb/molt

To pull a specific version (for example, 1.1.3):

docker pull cockroachdb/molt:1.1.3

To pull the latest image for Oracle (note that only linux/amd64 is supported):

docker pull cockroachdb/molt:oracle-latest

Docker usage

Performance

MOLT Fetch, Verify, and Replicator are likely to run more slowly in a Docker container than on a local machine. To improve performance, increase the memory or compute resources, or both, on your Docker container.

Authentication

When using MOLT Fetch with cloud storage, it is necessary to specify volumes and environment variables, as described in the following sections for Google Cloud Storage and Amazon S3.

No additional configuration is needed when running MOLT Fetch with a local file server or in direct copy mode:

docker run -it cockroachdb/molt fetch ...

For more information on docker run, see the Docker documentation.

Google Cloud Storage

If you are using Google Cloud Storage for cloud storage:

Volume map the application_default_credentials.json file into the container, and set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to this file.

docker run \
  -v ~/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \
  -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \
  -it \
  cockroachdb/molt fetch ...

In case the previous authentication method fails, you can volume map the entire Google Cloud configuration directory into the container. In addition to setting the GOOGLE_APPLICATION_CREDENTIALS environment variable, set CLOUDSDK_CONFIG to point to the configuration directory:

docker run \
  -v ~/.config/gcloud:/gcp/config:ro \
  -e CLOUDSDK_CONFIG=/gcp/config \
  -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/config/application_default_credentials.json \
  -it \
  cockroachdb/molt fetch ...

For details on Google Cloud Storage authentication, see Application Default Credentials.

Amazon S3

If you are using Amazon S3 for cloud storage:

Volume map the host's ~/.aws directory to the /root/.aws directory inside the container, and set the required AWS_REGION, AWS_SECRET_ACCESS_KEY, and AWS_ACCESS_KEY_ID environment variables:

docker run \
  -v ~/.aws:/root/.aws \
  -e AWS_REGION=your-region \
  -e AWS_SECRET_ACCESS_KEY=your-secret-access-key \
  -e AWS_ACCESS_KEY_ID=your-access-key-id \
  -it \
  cockroachdb/molt fetch ...

Local connection strings

When testing locally, specify the host as follows:

For macOS, use host.docker.internal. For example:

--source 'postgres://postgres:postgres@host.docker.internal:5432/migration_db?sslmode=disable'
--target "postgres://root@host.docker.internal:26257/defaultdb?sslmode=disable"

For Linux and Windows, use 172.17.0.1. For example:

--source 'postgres://postgres:postgres@172.17.0.1:5432/migration_db?sslmode=disable'
--target "postgres://root@172.17.0.1:26257/defaultdb?sslmode=disable"

Migration phases

MOLT Fetch operates in distinct phases to move data from source databases to CockroachDB. For details on available modes, refer to Fetch mode.

Data export phase

MOLT Fetch connects to the source database and exports table data to intermediate storage. Data is written to cloud storage (Amazon S3, Google Cloud Storage, Azure Blob Storage), a local file server, or directly to CockroachDB memory. Multiple tables and table shards can be exported simultaneously using --table-concurrency and --export-concurrency, with large tables divided into shards for parallel processing. For details, refer to:

Fetch mode
Table sharding

Data import phase

MOLT Fetch loads the exported data into the target CockroachDB database. The process uses IMPORT INTO (faster, tables offline during import) or COPY FROM (slower, tables remain queryable) to move data. Data files are imported in configurable batches using --import-batch-size, and target tables can be automatically created, truncated, or left unchanged based on --table-handling settings. For details, refer to:

Data movement
Target table handling

Commands

Command	Usage
`fetch`	Start the fetch task. This loads data from a source database to a target CockroachDB database.

Subcommands

Command	Usage
`tokens list`	List active continuation tokens.

Flags

Global flags

Flag	Description
`--source`	(Required) Connection string used to connect to the Oracle PDB (in a CDB/PDB architecture) or to a standalone database (non‑CDB). For details, refer to Source and target databases.
`--source-cdb`	Connection string for the Oracle container database (CDB) when using a multitenant (CDB/PDB) architecture. Omit this flag on a non‑multitenant Oracle database. For details, refer to Source and target databases.
`--target`	(Required) Connection string for the target database. For details, refer to Source and target databases.
`--allow-tls-mode-disable`	Allow insecure connections to databases. Secure SSL/TLS connections should be used by default. This should be enabled only if secure SSL/TLS connections to the source or target database are not possible.
`--assume-role`	Service account to use for assume role authentication. `--use-implicit-auth` must be included. For example, `--assume-role='user-test@cluster-ephemeral.iam.gserviceaccount.com' --use-implicit-auth`. For details, refer to Cloud Storage Authentication.
`--bucket-path`	The path within the cloud storage bucket where intermediate files are written (e.g., `'s3://bucket/path'` or `'gs://bucket/path'`). Only the URL path is used; query parameters (e.g., credentials) are ignored. To pass in query parameters, use the appropriate flags: `--assume-role`, `--import-region`, `--use-implicit-auth`.
`--case-sensitive`	Toggle case sensitivity when comparing table and column names on the source and target. To disable case sensitivity, set `--case-sensitive=false`. If `=` is not included (e.g., `--case-sensitive false`), the flag is interpreted as `--case-sensitive` (i.e., `--case-sensitive=true`). Default: `false`
`--cleanup`	Whether to delete intermediate files after moving data using cloud or local storage. Note: Cleanup does not occur on continuation.
`--compression`	Compression method for data when using `IMPORT INTO` (`gzip`/`none`). Default: `gzip`
`--continuation-file-name`	Restart fetch at the specified filename if the task encounters an error. `--fetch-id` must be specified. For details, see Fetch continuation.
`--continuation-token`	Restart fetch at a specific table, using the specified continuation token, if the task encounters an error. `--fetch-id` must be specified. For details, see Fetch continuation.
`--crdb-pts-duration`	The duration for which each timestamp used in data export from a CockroachDB source is protected from garbage collection. This ensures that the data snapshot remains consistent. For example, if set to `24h`, each timestamp is protected for 24 hours from the initiation of the export job. This duration is extended at regular intervals specified in `--crdb-pts-refresh-interval`. Default: `24h0m0s`
`--crdb-pts-refresh-interval`	The frequency at which the protected timestamp's validity is extended. This interval maintains protection of the data snapshot until data export from a CockroachDB source is completed. For example, if set to `10m`, the protected timestamp's expiration will be extended by the duration specified in `--crdb-pts-duration` (e.g., `24h`) every 10 minutes while export is not complete. Default: `10m0s`
`--direct-copy`	Enables direct copy, which copies data directly from source to target without using an intermediate store.
`--export-concurrency`	Number of shards to export at a time per table, each on a dedicated thread. This controls how many shards are created for each individual table during the data export phase and is distinct from `--table-concurrency`, which controls how many tables are processed simultaneously. The total number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`. Tables can be sharded with a range-based or stats-based mechanism. For details, refer to Table sharding. Default: `4`
`--export-retry-max-attempts`	Maximum number of retry attempts for source export queries when connection failures occur. Only supported for PostgreSQL and CockroachDB sources. Default: `3`
`--export-retry-max-duration`	Maximum total duration for retrying source export queries. If `0`, no time limit is enforced. Only supported for PostgreSQL and CockroachDB sources. Default: `5m0s`
`--filter-path`	Path to a JSON file defining row-level filters for the data import phase. Refer to Selective data movement.
`--fetch-id`	Restart fetch task corresponding to the specified ID. If `--continuation-file-name` or `--continuation-token` are not specified, fetch restarts for all failed tables.
`--flush-rows`	Number of rows before the source data is flushed to intermediate files. Note: If `--flush-size` is also specified, the fetch behavior is based on the flag whose criterion is met first.
`--flush-size`	Size (in bytes) before the source data is flushed to intermediate files. Note: If `--flush-rows` is also specified, the fetch behavior is based on the flag whose criterion is met first.
`--ignore-replication-check`	Skip querying for replication checkpoints such as `pg_current_wal_insert_lsn()` on PostgreSQL, `gtid_executed` on MySQL, and `CURRENT_SCN` on Oracle. This option is intended for use during bulk load migrations or when doing a one-time data export from a read replica.
`--import-batch-size`	The number of files to be imported at a time to the target database during the data import phase. This applies only when using `IMPORT INTO` for data movement. Note: Increasing this value can improve the performance of full-scan queries on the target database shortly after fetch completes, but very high values are not recommended. If any individual file in the import batch fails, you must retry the entire batch. Default: `1000`
`--import-region`	The region of the cloud storage bucket. This applies only to Amazon S3 buckets. Set this flag only if you need to specify an `AWS_REGION` explicitly when using `IMPORT INTO` for data movement. For example, `--import-region=ap-south-1`.
`--local-path`	The path within the local file server where intermediate files are written (e.g., `data/migration/cockroach`). `--local-path-listen-addr` must be specified.
`--local-path-crdb-access-addr`	Address of a local file server that is publicly accessible. This flag is only necessary if CockroachDB cannot reach the local address specified with `--local-path-listen-addr` (e.g., when moving data to a CockroachDB Cloud deployment). `--local-path` and `--local-path-listen-addr` must be specified. Default: Value of `--local-path-listen-addr`.
`--local-path-listen-addr`	Write intermediate files to a local file server at the specified address (e.g., `'localhost:3000'`). `--local-path` must be specified.
`--log-file`	Write messages to the specified log filename. If no filename is provided, messages write to `fetch-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`.
`--logging`	Level at which to log messages (`trace`/`debug`/`info`/`warn`/`error`/`fatal`/`panic`). Default: `info`
`--metrics-listen-addr`	Address of the Prometheus metrics endpoint, which has the path `{address}/metrics`. For details on important metrics to monitor, refer to Monitoring. Default: `'127.0.0.1:3030'`
`--mode`	Configure the MOLT Fetch behavior: `data-load`, `export-only`, or `import-only`. For details, refer to Fetch mode. Default: `data-load`
`--non-interactive`	Run the fetch task without interactive prompts. This is recommended only when running `molt fetch` in an automated process (i.e., a job or continuous integration).
`--pglogical-replication-slot-name`	Name of a PostgreSQL replication slot that will be created before taking a snapshot of data. Must match the slot name specified with `--slotName` in the MOLT Replicator command. For details, refer to Load before replication.
`--pglogical-publication-and-slot-drop-and-recreate`	Drop the PostgreSQL publication and replication slot if they exist, then recreate them. Creates a publication named `molt_fetch` and the replication slot specified with `--pglogical-replication-slot-name`. For details, refer to Load before replication. Default: `false`
`--pprof-listen-addr`	Address of the pprof endpoint. Default: `'127.0.0.1:3031'`
`--row-batch-size`	Number of rows per shard to export at a time. For details on sharding, refer to Table sharding. See also Best practices. Default: `100000`
`--schema-filter`	Move schemas that match a specified regular expression. Not used with MySQL sources. For Oracle sources, this filter is case-insensitive. Default: `'.*'`
`--skip-pk-check`	Skip primary-key matching to allow data load when source or target tables have missing or mismatched primary keys. Disables sharding and bypasses `--export-concurrency` and `--row-batch-size` settings. Refer to Skip primary key matching. Default: `false`
`--table-concurrency`	Number of tables to export at a time. The number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`. Default: `4`
`--table-exclusion-filter`	Exclude tables that match a specified POSIX regular expression. This value cannot be set to `'.'`, which would cause every table to be excluded. Default:* Empty string
`--table-filter`	Move tables that match a specified POSIX regular expression. Default: `'.*'`
`--table-handling`	How tables are initialized on the target database (`none`/`drop-on-target-and-recreate`/`truncate-if-exists`). For details, see Target table handling. Default: `none`
`--transformations-file`	Path to a JSON file that defines transformations to be performed on the target schema during the fetch task. Refer to Transformations.
`--type-map-file`	Path to a JSON file that contains explicit type mappings for automatic schema creation, when enabled with `--table-handling drop-on-target-and-recreate`. For details on the JSON format and valid type mappings, see type mapping.
`--use-console-writer`	Use the console writer, which has cleaner log output but introduces more latency. Default: `false` (log as structured JSON)
`--use-copy`	Use `COPY FROM` to move data. This makes tables queryable during data load, but is slower than using `IMPORT INTO`. For details, refer to Data load mode.
`--use-implicit-auth`	Use implicit authentication for cloud storage URIs.
`--use-stats-based-sharding`	Enable statistics-based sharding for PostgreSQL sources. This allows sharding of tables with primary keys of any data type and can create more evenly distributed shards compared to the default numerical range sharding. Requires PostgreSQL 11+ and access to `pg_stats`. For details, refer to Table sharding.

`tokens list` flags

Flag	Description
`--conn-string`	(Required) Connection string for the target database. For details, see List active continuation tokens.
`-n`, `--num-results`	Number of results to return.

Usage

The following sections describe how to use the molt fetch flags.

Source and target databases

Tip:

Follow the recommendations in Connection security.

--source specifies the connection string of the source database.

PostgreSQL or CockroachDB connection string:

--source 'postgresql://{username}:{password}@{host}:{port}/{database}'

MySQL connection string:

--source 'mysql://{username}:{password}@{protocol}({host}:{port})/{database}'

Oracle connection string:

--source 'oracle://{username}:{password}@{host}:{port}/{service_name}'

For Oracle Multitenant databases, --source-cdb specifies the container database (CDB) connection. --source specifies the pluggable database (PDB):

--source 'oracle://{username}:{password}@{host}:{port}/{pdb_service_name}'
--source-cdb 'oracle://{username}:{password}@{host}:{port}/{cdb_service_name}'

--target specifies the CockroachDB connection string:

--target 'postgresql://{username}:{password}@{host}:{port}/{database}'

Fetch mode

--mode specifies the MOLT Fetch behavior.

data-load (default) instructs MOLT Fetch to load the source data into CockroachDB:

--mode data-load

export-only instructs MOLT Fetch to export the source data to the specified cloud storage or local file server. It does not load the data into CockroachDB:

--mode export-only

import-only instructs MOLT Fetch to load the source data in the specified cloud storage or local file server into the CockroachDB target:

--mode import-only

Data load mode

MOLT Fetch can use either IMPORT INTO or COPY FROM to load data into CockroachDB.

By default, MOLT Fetch uses IMPORT INTO:

IMPORT INTO achieves the highest throughput, but requires taking the CockroachDB tables offline to achieve its import speed. Tables are taken back online once an import job completes successfully. See Best practices.
IMPORT INTO supports compression using the --compression flag, which reduces the amount of storage used.

--use-copy configures MOLT Fetch to use COPY FROM:

COPY FROM enables your tables to remain online and accessible. However, it is slower than using IMPORT INTO.
COPY FROM does not support compression.

Note:

COPY FROM is also used for direct copy.

Table sharding

During the data export phase, MOLT Fetch can divide large tables into multiple shards for concurrent export.

To control the number of shards created per table, use the --export-concurrency flag. For example:

--export-concurrency=4

Tip:

For performance considerations with concurrency settings, refer to Best practices.

Two sharding mechanisms are available:

Range-based sharding (default): Tables are divided based on numerical ranges found in primary key values. Only tables with INT, FLOAT, or UUID primary keys can use range-based sharding. Tables with other primary key data types export as a single shard.
Stats-based sharding (PostgreSQL only): Enable with --use-stats-based-sharding for PostgreSQL 11+ sources. Tables are divided by analyzing the pg_stats view to create more evenly distributed shards, up to a maximum of 200 shards. Primary keys of any data type are supported.

Stats-based sharding requires that the user has SELECT permissions on source tables and on each table's pg_stats view. The latter permission is automatically granted to users that can read the table.

To optimize stats-based sharding, run ANALYZE on source tables before migration to ensure that table statistics are up-to-date and shards are evenly distributed. This requires MAINTAIN or OWNER permissions on the table. You can analyze specific primary key columns or the entire table. For example:

ANALYZE table_name(PK1, PK2, PK3);

ANALYZE table_name;

Large tables may take time to analyze, but ANALYZE can run in the background. You can run ANALYZE with MAINTAIN or OWNER privileges during migration preparation, then perform the actual migration with standard SELECT privileges.

Note:

Migration without running ANALYZE will still work, but shard distribution may be less even.

When using --use-stats-based-sharding, monitor the log output for each table you want to migrate.

If stats-based sharding is successful on a table, MOLT logs the following INFO message:

Stats based sharding enabled for table {table_name}

If stats-based sharding fails on a table, MOLT logs the following WARNING message and defaults to range-based sharding:

Warning: failed to shard table {table_name} using stats based sharding: {reason_for_failure}, falling back to non stats based sharding

The number of shards is dependent on the number of distinct values in the first primary key column of the table to be migrated. If this is different from the number of shards requested with --export-concurrency, MOLT logs the following WARNING and continues with the migration:

number of shards formed: {num_shards_formed} is not equal to number of shards requested: {num_shards_requested} for table {table_name}

Because stats-based sharding analyzes the entire table, running --use-stats-based-sharding with --filter-path (refer to Selective data movement) will cause imbalanced shards to form.

Data path

MOLT Fetch can move the source data to CockroachDB via cloud storage, a local file server, or directly without an intermediate store.

Bucket path

Tip:

Only the path specified in --bucket-path is used. Query parameters, such as credentials, are ignored. To authenticate cloud storage, follow the steps in Secure cloud storage.

--bucket-path instructs MOLT Fetch to write intermediate files to a path within Google Cloud Storage, Amazon S3, or Azure Blob Storage to which you have the necessary permissions. Use additional flags, shown in the following examples, to specify authentication or region parameters as required for bucket access.

Connect to a Google Cloud Storage bucket with implicit authentication and assume role:

--bucket-path 'gs://migration/data/cockroach'
--assume-role 'user-test@cluster-ephemeral.iam.gserviceaccount.com'
--use-implicit-auth

Connect to an Amazon S3 bucket and explicitly specify the ap_south-1 region:

--bucket-path 's3://migration/data/cockroach'
--import-region 'ap-south-1'

Note:

When --import-region is set, IMPORT INTO must be used for data movement.

Connect to an Azure Blob Storage container with implicit authentication:

--bucket-path 'azure-blob://migration/data/cockroach'
--use-implicit-auth

Local path

--local-path instructs MOLT Fetch to write intermediate files to a path within a local file server. local-path-listen-addr specifies the address of the local file server. For example:

--local-path /migration/data/cockroach
--local-path-listen-addr 'localhost:3000'

In some cases, CockroachDB will not be able to use the local address specified by --local-path-listen-addr. This will depend on where CockroachDB is deployed, the runtime OS, and the source dialect.

For example, if you are migrating to CockroachDB Cloud, such that the Cloud cluster is in a different physical location than the machine running molt fetch, then CockroachDB cannot reach an address such as localhost:3000. In these situations, use --local-path-crdb-access-addr to specify an address for the local file server that is publicly accessible. For example:

--local-path /migration/data/cockroach
--local-path-listen-addr 'localhost:3000'
--local-path-crdb-access-addr '44.55.66.77:3000'

Tip:

Cloud storage is often preferable to a local file server, which can require considerable disk space.

Direct copy

--direct-copy specifies that MOLT Fetch should use COPY FROM to move the source data directly to CockroachDB without an intermediate store:

Because the data is held in memory, the machine must have sufficient RAM for the data currently in flight:
```
average size of each row * --row-batch-size * --export-concurrency * --table-concurrency
```
Direct copy does not support compression or continuation.
The --use-copy flag is redundant with --direct-copy.

Schema and table selection

By default, MOLT Fetch moves all data from the --source database to CockroachDB. Use the following flags to move a subset of data.

--schema-filter specifies a range of schema objects to move to CockroachDB, formatted as a POSIX regex string. For example, to move every table in the source database's migration_schema schema:

--schema-filter 'migration_schema'

Note:

--schema-filter does not apply to MySQL sources because MySQL tables belong directly to the database specified in the connection string, not to a separate schema.

--table-filter and --table-exclusion-filter specify tables to include and exclude from the migration, respectively, formatted as POSIX regex strings. For example, to move every source table that has "user" in the table name and exclude every source table that has "temp" in the table name:

--table-filter '.*user.*' --table-exclusion-filter '.*temp.*'

Selective data movement

Use --filter-path to specify the path to a JSON file that defines row-level filtering for data load. This enables you to move a subset of data in a table, rather than all data in the table. To apply row-level filters during replication, use MOLT Replicator with userscripts.

--filter-path 'data-filter.json'

The JSON file should contain one or more entries in filters, each with a resource_specifier (schema and table) and a SQL expression expr. For example, the following example exports only rows from migration_schema.t1 where v > 100:

{
  "filters": [
    {
      "resource_specifier": {
        "schema": "migration_schema",
        "table": "t1"
      },
      "expr": "v > 100"
    }
  ]
}

expr is case-sensitive and must be valid in your source dialect. For example, when using Oracle as the source, quote all identifiers and escape embedded quotes:

{
  "filters": [
    {
      "resource_specifier": {
        "schema": "C##FETCHORACLEFILTERTEST",
        "table": "FILTERTBL"
      },
      "expr": "ABS(\"X\") > 10 AND CEIL(\"X\") < 100 AND FLOOR(\"X\") > 0 AND ROUND(\"X\", 2) < 100.00 AND TRUNC(\"X\", 0) > 0 AND MOD(\"X\", 2) = 0 AND FLOOR(\"X\" / 3) > 1"
    }
  ]
}

Note:

If the expression references columns that are not indexed, MOLT Fetch will emit a warning like: filter expression ‘v > 100' contains column ‘v' which is not indexed. This may lead to performance issues.

Target table handling

--table-handling defines how MOLT Fetch loads data on the CockroachDB tables that match the selection.

To load the data without changing the existing data in the tables, use none:

--table-handling none

To truncate tables before loading the data, use truncate-if-exists:

--table-handling truncate-if-exists

To drop existing tables and create new tables before loading the data, use drop-on-target-and-recreate:

--table-handling drop-on-target-and-recreate

When using the drop-on-target-and-recreate option, MOLT Fetch creates a new CockroachDB table to load the source data if one does not already exist. To guide the automatic schema creation, you can explicitly map source types to CockroachDB types. drop-on-target-and-recreate does not create indexes or constraints other than PRIMARY KEY and NOT NULL.

Mismatch handling

If either none or truncate-if-exists is set, molt fetch loads data into the existing tables on the target CockroachDB database. If the target schema mismatches the source schema, molt fetch will exit early in certain cases, and will need to be re-run from the beginning. For details, refer to Fetch exits early due to mismatches.

Note:

This does not apply when drop-on-target-and-recreate is specified, since this option automatically creates a compatible CockroachDB schema.

Skip primary key matching

--skip-pk-check removes the requirement that source and target tables share matching primary keys for data load. When this flag is set:

The data load proceeds even if the source or target table lacks a primary key, or if their primary key columns do not match.
Table sharding is disabled. Each table is exported in a single batch within one shard, bypassing --export-concurrency and --row-batch-size. As a result, memory usage and execution time may increase due to full table scans.
If the source table contains duplicate rows but the target has PRIMARY KEY or UNIQUE constraints, duplicate rows are deduplicated during import.

When --skip-pk-check is set, all tables are treated as if they lack a primary key, and are thus exported in a single unsharded batch. To avoid performance issues, use this flag with --table-filter to target only tables without a primary key.

For example:

molt fetch \
  --mode data-load \
  --table-filter 'nopktbl' \
  --skip-pk-check

Example log output when --skip-pk-check is enabled:

{"level":"info","message":"sharding is skipped for table public.nopktbl - flag skip-pk-check is specified and thus no PK for source table is specified"}

Type mapping

If drop-on-target-and-recreate is set, MOLT Fetch automatically creates a CockroachDB schema that is compatible with the source data. The column types are determined as follows:

PostgreSQL types are mapped to existing CockroachDB types that have the same OID.

The following MySQL types are mapped to corresponding CockroachDB types:

MySQL type	CockroachDB type	Notes
`CHAR`, `CHARACTER`, `VARCHAR`, `NCHAR`, `NVARCHAR`	`VARCHAR`	Varying-length string; raises warning if BYTE semantics used
`TINYTEXT`, `TEXT`, `MEDIUMTEXT`, `LONGTEXT`	`STRING`	Unlimited-length string
`GEOMETRY`	`GEOMETRY`	Spatial type (PostGIS-style)
`LINESTRING`	`LINESTRING`	Spatial type (PostGIS-style)
`POINT`	`POINT`	Spatial type (PostGIS-style)
`POLYGON`	`POLYGON`	Spatial type (PostGIS-style)
`MULTIPOINT`	`MULTIPOINT`	Spatial type (PostGIS-style)
`MULTILINESTRING`	`MULTILINESTRING`	Spatial type (PostGIS-style)
`MULTIPOLYGON`	`MULTIPOLYGON`	Spatial type (PostGIS-style)
`GEOMETRYCOLLECTION`, `GEOMCOLLECTION`	`GEOMETRYCOLLECTION`	Spatial type (PostGIS-style)
`JSON`	`JSONB`	CRDB's native JSON format
`TINYINT`, `INT1`	`INT2`	2-byte integer
`BLOB`	`BYTES`	Binary data
`SMALLINT`, `INT2`	`INT2`	2-byte integer
`MEDIUMINT`, `INT`, `INTEGER`, `INT4`	`INT4`	4-byte integer
`BIGINT`, `INT8`	`INT`	8-byte integer
`FLOAT`	`FLOAT4`	32-bit float
`DOUBLE`	`FLOAT`	64-bit float
`DECIMAL`, `NUMERIC`, `REAL`	`DECIMAL`	Validates scale ≤ precision; warns if precision > 19
`BINARY`, `VARBINARY`	`BYTES`	Binary data
`DATETIME`	`TIMESTAMP`	Date and time (no time zone)
`TIMESTAMP`	`TIMESTAMPTZ`	Date and time with time zone
`TIME`	`TIME`	Time of day (no date)
`BIT`	`VARBIT`	Variable-length bit array
`DATE`	`DATE`	Date only (no time)
`TINYBLOB`, `MEDIUMBLOB`, `LONGBLOB`	`BYTES`	Binary data
`BOOL`, `BOOLEAN`	`BOOL`	Boolean

The following Oracle types are mapped to CockroachDB types:

Oracle type(s)	CockroachDB type	Notes
`NCHAR`, `CHAR`, `CHARACTER`	`CHAR`(n) or `CHAR`	Fixed-length character; falls back to unbounded if length not specified
`VARCHAR`, `VARCHAR2`, `NVARCHAR2`	`VARCHAR`(n) or `VARCHAR`	Varying-length string; raises warning if BYTE semantics used
`STRING`	`STRING`	Unlimited-length string
`SMALLINT`	`INT2`	2-byte integer
`INTEGER`, `INT`, `SIMPLE_INTEGER`	`INT4`	4-byte integer
`LONG`	`INT8`	8-byte integer
`FLOAT`, `BINARY_FLOAT`, `REAL`	`FLOAT4`	32-bit float
`DOUBLE`, `BINARY_DOUBLE`	`FLOAT8`	64-bit float
`DEC`, `NUMBER`, `DECIMAL`, `NUMERIC`	`DECIMAL`(p, s) or `DECIMAL`	Validates scale ≤ precision; warns if precision > 19
`DATE`	`DATE`	Date only (no time)
`BLOB`, `RAW`, `LONG RAW`	`BYTES`	Binary data
`JSON`	`JSONB`	CRDB's native JSON format
`CLOB`, `NCLOB`	`STRING`	Treated as large text
`BOOLEAN`	`BOOL`	Boolean
`TIMESTAMP`	`TIMESTAMP` or `TIMESTAMPTZ`	If `WITH TIME ZONE` → `TIMESTAMPTZ`, else `TIMESTAMP`
`ROWID`, `UROWID`	`STRING`	Treated as opaque identifier
`SDO_GEOMETRY`	`GEOMETRY`	Spatial type (PostGIS-style)
`XMLTYPE`	`STRING`	Stored as text

To override the default mappings for automatic schema creation, you can map source to target CockroachDB types explicitly. These are defined in the JSON file indicated by the --type-map-file flag. The allowable custom mappings are valid CockroachDB aliases, casts, and the following mappings specific to MOLT Fetch and Verify:
- TIMESTAMP <> TIMESTAMPTZ
- VARCHAR <> UUID
- BOOL <> INT2
- VARBIT <> TEXT
- VARBIT <> BOOL
- JSONB <> TEXT
- INET <> TEXT

--type-map-file specifies the path to the JSON file containing the explicit type mappings. For example:

--type-map-file 'type-mappings.json'

The following JSON example defines two type mappings:

 [
  {
    "table": "public.t1",
    "column_type_map": [
      {
        "column": "*",
        "source_type": "int",
        "crdb_type": "INT2"
      },
      {
        "column": "name",
        "source_type": "varbit",
        "crdb_type": "string"
      }
    ]
  }
]

table specifies the table that will use the custom type mappings in column_type_map. The value is written as {schema}.{table}.
column specifies the column that will use the custom type mapping. If * is specified, then all columns in the table with the matching source_type are converted.
source_type specifies the source type to be mapped.
crdb_type specifies the target CockroachDB type to be mapped.

Transformations

You can define transformation rules to be performed on the target database during the fetch task. These can be used to:

Map computed columns from source to target.
Map partitioned tables to a single target table.
Rename tables on the target database.
Rename database schemas.

Transformation rules are defined in the JSON file indicated by the --transformations-file flag. For example:

--transformations-file 'transformation-rules.json'

Transformation rules example

The following JSON example defines three transformation rules: rule 1 maps computed columns, rule 2 renames tables, and rule 3 renames schemas.

{
  "transforms": [
    {
      "id": 1,
      "resource_specifier": {
        "schema": ".*",
        "table": ".*"
      },
      "column_exclusion_opts": {
        "add_computed_def": true,
        "column": "^age$"
      }
    },
    {
      "id": 2,
      "resource_specifier": {
        "schema": "public",
        "table": "charges_part.*"
      },
      "table_rename_opts": {
        "value": "charges"
      }
    },
    {
      "id": 3,
      "resource_specifier": {
        "schema": "previous_schema"
      },
      "schema_rename_opts": {
        "value": "new_schema"
      }
    }
  ]
}

Column exclusions and computed columns

resource_specifier: Identifies which schemas and tables to transform.
- schema: POSIX regex matching source schemas.
- table: POSIX regex matching source tables.
column_exclusion_opts: Exclude columns or map them as computed columns.
- column: POSIX regex matching source columns to exclude.
- add_computed_def: When true, map matching columns as computed columns on target tables using ALTER TABLE ... ADD COLUMN and the source column definition. All matching columns must be computed columns on the source.
  Warning:
  Columns matching column are not moved to CockroachDB if add_computed_def is false (default) or if matching columns are not computed columns.

Example rule 1 maps all source age columns to computed columns on CockroachDB. This assumes that all matching age columns are defined as computed columns on the source:

{
  "id": 1,
  "resource_specifier": {
    "schema": ".*",
    "table": ".*"
  },
  "column_exclusion_opts": {
    "add_computed_def": true,
    "column": "^age$"
  }
},

Table renaming

resource_specifier: Identifies which schemas and tables to transform.
- schema: POSIX regex matching source schemas.
- table: POSIX regex matching source tables.
table_rename_opts: Rename tables on the target.
- value: Target table name. For a single matching source table, renames it to this value. For multiple matches (n-to-1), consolidates matching partitioned tables with the same table definition into a single table with this name.
  
  For n-to-1 mappings:
  - Use --use-copy or --direct-copy for data movement.
  - Manually create the target table. Do not use --table-handling drop-on-target-and-recreate.

Example rule 2 maps all table names with prefix charges_part to a single charges table on CockroachDB (an n-to-1 mapping). This assumes that all matching charges_part.* tables have the same table definition:

{
  "id": 2,
  "resource_specifier": {
    "schema": "public",
    "table": "charges_part.*"
  },
  "table_rename_opts": {
    "value": "charges"
  }
},

Schema renaming

resource_specifier: Identifies which schemas and tables to transform.
- schema: POSIX regex matching source schemas.
- table: POSIX regex matching source tables.
schema_rename_opts: Rename database schemas on the target.
- value: Target schema name. For example, previous_schema.table1 becomes new_schema.table1.

Example rule 3 renames the database schema previous_schema to new_schema on CockroachDB:

{
  "id": 3,
  "resource_specifier": {
    "schema": "previous_schema"
  },
  "schema_rename_opts": {
    "value": "new_schema"
  }
}

General notes

Each rule is applied in the order it is defined. If two rules overlap, the later rule will override the earlier rule.

To verify that the logging shows that the computed columns are being created:

When running molt fetch, set --logging debug and look for ALTER TABLE ... ADD COLUMN statements with the STORED or VIRTUAL keywords in the log output:

{"level":"debug","time":"2024-07-22T12:01:51-04:00","message":"running: ALTER TABLE IF EXISTS public.computed ADD COLUMN computed_col INT8 NOT NULL AS ((col1 + col2)) STORED"}

After running molt fetch, issue a SHOW CREATE TABLE statement on CockroachDB:

SHOW CREATE TABLE computed;

  table_name |                         create_statement
-------------+-------------------------------------------------------------------
  computed   | CREATE TABLE public.computed (
  ...
             |     computed_col INT8 NOT NULL AS (col1 + col2) STORED
             | )

Fetch continuation

If MOLT Fetch fails while loading data into CockroachDB from intermediate files, it exits with an error message, fetch ID, and continuation token for each table that failed to load on the target database. You can use this information to continue the task from the continuation point where it was interrupted.

Continuation is only possible under the following conditions:

All data has been exported from the source database into intermediate files on cloud or local storage.
The initial load of source data into the target CockroachDB database is incomplete.

Note:

Only one fetch ID and set of continuation tokens, each token corresponding to a table, are active at any time. See List active continuation tokens.

To retry all data starting from the continuation point, reissue the molt fetch command and include the --fetch-id.

--fetch-id d44762e5-6f70-43f8-8e15-58b4de10a007

To retry a specific table that failed, include both --fetch-id and --continuation-token. The latter flag specifies a token string that corresponds to a specific table on the source database. A continuation token is written in the molt fetch output for each failed table. If the fetch task encounters a subsequent error, it generates a new token for each failed table. See List active continuation tokens.

Note:

This will retry only the table that corresponds to the continuation token. If the fetch task succeeds, there may still be source data that is not yet loaded into CockroachDB.

--fetch-id d44762e5-6f70-43f8-8e15-58b4de10a007
--continuation-token 011762e5-6f70-43f8-8e15-58b4de10a007

To retry all data starting from a specific file, include both --fetch-id and --continuation-file-name. The latter flag specifies the filename of an intermediate file in cloud or local storage. All filenames are prepended with part_ and have the .csv.gz or .csv extension, depending on compression type (gzip by default). For example:

--fetch-id d44762e5-6f70-43f8-8e15-58b4de10a007
--continuation-file-name part_00000003.csv.gz

Note:

Continuation is not possible when using direct copy.

List active continuation tokens

To view all active continuation tokens, issue a molt fetch tokens list command along with --conn-string, which specifies the connection string for the target CockroachDB database. For example:

molt fetch tokens list \
--conn-string 'postgres://root@localhost:26257/defaultdb?sslmode=verify-full'

+--------------------------------------+--------------------------------------+------------------+----------------------+
|                  ID                  |               FETCH ID               |    TABLE NAME    |      FILE NAME       |
+--------------------------------------+--------------------------------------+------------------+----------------------+
| f6f0284c-d9c1-43c9-8fde-af609d0dbd82 | 66443597-5689-4df3-a7b9-9fc5e27180eb | public.employees | part_00000001.csv.gz |
+--------------------------------------+--------------------------------------+------------------+----------------------+
Continuation Tokens.

CDC cursor

A change data capture (CDC) cursor is written to the output as cdc_cursor at the beginning and end of the fetch task.

For MySQL:

{"level":"info","type":"summary","fetch_id":"735a4fe0-c478-4de7-a342-cfa9738783dc","num_tables":1,"tables":["public.employees"],"cdc_cursor":"b7f9e0fa-2753-1e1f-5d9b-2402ac810003:3-21","net_duration_ms":4879.890041,"net_duration":"000h 00m 04s","time":"2024-03-18T12:37:02-04:00","message":"fetch complete"}

For Oracle:

{"level":"info","type":"summary","fetch_id":"735a4fe0-c478-4de7-a342-cfa9738783dc","num_tables":3,"tables":["migration_schema.employees"],"cdc_cursor":"backfillFromSCN=26685444,scn=26685786","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:37:02-04:00","message":"fetch complete"}

Use the cdc_cursor value as the checkpoint for MySQL or Oracle replication with MOLT Replicator.

You can also use the cdc_cursor value with an external change data capture (CDC) tool to continuously replicate subsequent changes from the source database to CockroachDB.

Security

Cockroach Labs strongly recommends the following security practices.

Connection security

To keep your database credentials out of shell history and logs, follow these best practices when specifying your source and target connection strings:
- Avoid plaintext connection strings.
- Provide your connection strings as environment variables. For example:
```
export SOURCE="postgres://migration_user:a%2452%26@localhost:5432/migration_db?sslmode=verify-full"
export TARGET="postgres://root@localhost:26257/defaultdb?sslmode=verify-full"
```
  Afterward, reference the environment variables in MOLT commands:
```
--source $SOURCE
--target $TARGET
```
- If possible, use an external secrets manager to load the environment variables from stored secrets.
Use TLS-enabled connection strings to encrypt data in transit from MOLT to the database. When using TLS certificates, ensure certificate files are accessible to the MOLT binary on the same machine.

For example, a PostgreSQL connection string with TLS certificates:
```
postgresql://migration_user@db.example.com:5432/appdb?sslmode=verify-full&sslrootcert=/etc/migration_db/certs/ca.pem&sslcert=/etc/migration_db/certs/client.crt&sslkey=/etc/migration_db/certs/client.key
```
URL-encode connection strings for the source database and CockroachDB so special characters in passwords are handled correctly.
- Given a password a$52&, pass it to the molt escape-password command with single quotes:
```
molt escape-password --password 'a$52&'
```
  Use the encoded password in your connection string. For example:
```
postgres://migration_user:a%2452%26@localhost:5432/migration_db
```
Remove sslmode=disable from production connection strings.

Note:

By default, insecure connections (i.e., sslmode=disable on PostgreSQL; sslmode not set on MySQL) are disallowed. When using an insecure connection, molt fetch returns an error. To override this check, you can enable the --allow-tls-mode-disable flag. Do this only when testing, or if a secure SSL/TLS connection to the source or target database is not possible.

Cloud storage security

Ensure that access control is properly configured for Amazon S3, Google Cloud Storage, or Azure Blob Storage.

Amazon S3

Set the following environment variables in the terminal running molt fetch:
```
export AWS_REGION='us-east-1'
export AWS_SECRET_ACCESS_KEY='key'
export AWS_ACCESS_KEY_ID='id'
```
- To run molt fetch in a containerized environment (e.g., Docker), pass the required environment variables using -e. If your authentication method relies on local credential files, you may also need to volume map the host path to the appropriate location inside the container using -v. For example:
```
docker run \
  -e AWS_ACCESS_KEY_ID='your-access-key' \
  -e AWS_SECRET_ACCESS_KEY='your-secret-key' \
  -v ~/.aws:/root/.aws \
  -it \
  cockroachdb/molt fetch \
  --bucket-path 's3://migration/data/cockroach' ...
```
Alternatively, set --use-implicit-auth to use implicit authentication. When using assume role authentication, specify the service account with --assume-role. For example:
```
--bucket-path 's3://migration/data/cockroach'
--assume-role 'arn:aws:iam::123456789012:role/MyMigrationRole'
--use-implicit-auth
```
Set --import-region to specify an AWS_REGION (e.g., --import-region 'ap-south-1').
Ensure the S3 bucket is created and accessible by authorized roles and users only.

Google Cloud Storage

Authenticate your local environment with Application Default Credentials:

Using gcloud:
```
gcloud init
gcloud auth application-default login
```
Using the environment variable:
```
export GOOGLE_APPLICATION_CREDENTIALS={path_to_cred_json}
```
- To run molt fetch in a containerized environment (e.g., Docker), pass the required environment variables using -e. If your authentication method relies on local credential files, you may also need to volume map the host path to the appropriate location inside the container using -v. For example:
```
docker run \
  -e GOOGLE_APPLICATION_CREDENTIALS='/root/.config/gcloud/application_default_credentials.json' \
  -v ~/.config/gcloud:/root/.config/gcloud \
  -it \
  cockroachdb/molt fetch \
  --bucket-path 'gs://migration/data/cockroach' ...
```
Alternatively, set --use-implicit-auth to use implicit authentication. When using assume role authentication, specify the service account with --assume-role. For example:
```
--bucket-path 'gs://migration/data/cockroach
--use-implicit-auth
--assume-role 'user-test@cluster-ephemeral.iam.gserviceaccount.com'
```
Ensure the Google Cloud Storage bucket is created and accessible by authorized roles and users only.

Azure Blob Storage

Set the following environment variables in the terminal running molt fetch:
```
export AZURE_ACCOUNT_NAME='account'
export AZURE_ACCOUNT_KEY='key'
```
You can also speicfy client and tenant credentials as environment variables:
```
export AZURE_CLIENT_SECRET='secret'
export AZURE_TENANT_ID='id'
```
- To run molt fetch in a containerized environment (e.g., Docker), pass the required environment variables using -e. If your authentication method relies on local credential files, you may also need to volume map the host path to the appropriate location inside the container using -v. For example:
```
docker run \
  -e AZURE_ACCOUNT_NAME='account' \
  -e AZURE_ACCOUNT_KEY='key' \
  -e AZURE_CLIENT_SECRET='secret' \
  -e AZURE_TENANT_ID='id' \
  -v ~/.azure:/root/.azure \
  -it \
  cockroachdb/molt fetch \
  --bucket-path 'azure-blob://migration/data/cockroach' ...
```
Alternatively, set --use-implicit-auth to use implicit authentication: For example:
```
--bucket-path 'azure-blob://migration/data/cockroach'
--use-implicit-auth
```
This mode supports Azure managed identities and workload identities.
Ensure the Azure Blob Storage container is created and accessible by authorized roles and users only.

Common workflows

Bulk data load

To perform a bulk data load migration from your source database to CockroachDB, run the molt fetch command with the required flags.

Specify the source and target database connections. For connection string formats, refer to Source and target databases.

--source $SOURCE
--target $TARGET

For Oracle Multitenant (CDB/PDB) sources, also include --source-cdb to specify the container database (CDB) connection string.

--source $SOURCE
--source-cdb $SOURCE_CDB
--target $TARGET

Specify how to move data to CockroachDB. Use cloud storage for intermediate file storage:

--bucket-path 's3://bucket/path'

Alternatively, use a local file server for intermediate storage:

--local-path /migration/data/cockroach
--local-path-listen-addr 'localhost:3000'

Alternatively, use direct copy to move data directly without intermediate storage:

--direct-copy

Optionally, filter the source data to migrate. By default, all schemas and tables are migrated. For details, refer to Schema and table selection.

--schema-filter 'migration_schema'
--table-filter '.*user.*'

For Oracle sources, --schema-filter is case-insensitive. You can use either lowercase or uppercase:

--schema-filter 'migration_schema'
--table-filter '.*user.*'

For MySQL sources, omit --schema-filter because MySQL tables belong directly to the database specified in the connection string, not to a separate schema. If needed, use --table-filter to select specific tables:

--table-filter '.*user.*'

Specify how to handle target tables. By default, --table-handling is set to none, which loads data without changing existing data in the tables. For details, refer to Target table handling:

--table-handling truncate-if-exists

When performing a bulk load without subsequent replication, use --ignore-replication-check to skip querying for replication checkpoints (such as pg_current_wal_insert_lsn() on PostgreSQL, gtid_executed on MySQL, and CURRENT_SCN on Oracle). This is appropriate when:

Performing a one-time data migration with no plan to replicate ongoing changes.
Exporting data from a read replica where replication checkpoints are unavailable.

--ignore-replication-check

At minimum, the molt fetch command should include the source, target, data path, and --ignore-replication-check flags:

molt fetch \
--source $SOURCE \
--target $TARGET \
--bucket-path 's3://bucket/path' \
--ignore-replication-check

For detailed steps, refer to Bulk load migration.

Load before replication

To perform an initial data load before setting up ongoing replication with MOLT Replicator, run the molt fetch command without --ignore-replication-check. This captures replication checkpoints during the data load.

The workflow is the same as Bulk data load, except:

Exclude --ignore-replication-check. MOLT Fetch will query and record replication checkpoints.
You must include --pglogical-replication-slot-name and --pglogical-publication-and-slot-drop-and-recreate to automatically create the publication and replication slot during the data load.
After the data load completes, check the CDC cursor in the output for the checkpoint value to use with MOLT Replicator.

At minimum, the molt fetch command should include the source, target, and data path flags:

molt fetch \
--source $SOURCE \
--target $TARGET \
--bucket-path 's3://bucket/path' \
--pglogical-replication-slot-name molt_slot \
--pglogical-publication-and-slot-drop-and-recreate

molt fetch \
--source $SOURCE \
--target $TARGET \
--bucket-path 's3://bucket/path'

molt fetch \
--source $SOURCE \
--source-cdb $SOURCE_CDB \
--target $TARGET \
--bucket-path 's3://bucket/path'

The output will include a cdc_cursor value at the end of the fetch task:

{"level":"info","type":"summary","fetch_id":"735a4fe0-c478-4de7-a342-cfa9738783dc","num_tables":1,"tables":["public.employees"],"cdc_cursor":"b7f9e0fa-2753-1e1f-5d9b-2402ac810003:3-21","net_duration_ms":4879.890041,"net_duration":"000h 00m 04s","time":"2024-03-18T12:37:02-04:00","message":"fetch complete"}

Use this cdc_cursor value when starting MOLT Replicator to ensure replication begins from the correct position. For detailed steps, refer to Load and replicate.

Monitoring

Metrics

By default, MOLT Fetch exports Prometheus metrics at 127.0.0.1:3030/metrics. You can configure this endpoint with the --metrics-listen-addr flag.

Cockroach Labs recommends monitoring the following metrics:

Metric Name	Description
`molt_fetch_num_tables`	Number of tables that will be moved from the source.
`molt_fetch_num_task_errors`	Number of errors encountered by the fetch task.
`molt_fetch_overall_duration`	Duration (in seconds) of the fetch task.
`molt_fetch_rows_exported`	Number of rows that have been exported from a table. For example: `molt_fetch_rows_exported{table="public.users"}`
`molt_fetch_rows_imported`	Number of rows that have been imported from a table. For example: `molt_fetch_rows_imported{table="public.users"}`
`molt_fetch_table_export_duration_ms`	Duration (in milliseconds) of a table's export. For example: `molt_fetch_table_export_duration_ms{table="public.users"}`
`molt_fetch_table_import_duration_ms`	Duration (in milliseconds) of a table's import. For example: `molt_fetch_table_import_duration_ms{table="public.users"}`

You can also use the sample Grafana dashboard to view the preceding metrics.

Best practices

Test and validate

To verify that your connections and configuration work properly, run MOLT Fetch in a staging environment before migrating any data in production. Use a test or development environment that closely resembles production.

Configure the source database and connection

To prevent connections from terminating prematurely during the data export phase, set the following to high values on the source database:
- Maximum allowed number of connections. MOLT Fetch can export data across multiple connections. The number of connections it will create is the number of shards (--export-concurrency) multiplied by the number of tables (--table-concurrency) being exported concurrently.
  
  Note:
  
  With the default numerical range sharding, only tables with primary key types of INT, FLOAT, or UUID can be sharded. PostgreSQL users can enable --use-stats-based-sharding to use statistics-based sharding for tables with primary keys of any data type. For details, refer to Table sharding.
- Maximum lifetime of a connection.
If a PostgreSQL database is set as a source, ensure that idle_in_transaction_session_timeout on PostgreSQL is either disabled or set to a value longer than the duration of the data export phase. Otherwise, the connection will be prematurely terminated. To estimate the time needed to export the PostgreSQL tables, you can perform a dry run and sum the value of molt_fetch_table_export_duration_ms for all exported tables.

Optimize performance

To optimize data load performance, drop all non-PRIMARY KEY constraints and indexes on the target CockroachDB database before migrating:
- FOREIGN KEY
- UNIQUE
- Secondary indexes
- CHECK
- DEFAULT
- NOT NULL (you do not need to drop this constraint when using drop-on-target-and-recreate for table handling)
Warning:

Do not drop PRIMARY KEY constraints.

You can recreate constraints and indexes after loading the data.
For PostgreSQL sources using --use-stats-based-sharding, run ANALYZE on source tables before migration to ensure optimal shard distribution. This is especially important for large tables where even distribution can significantly improve export performance.
To prevent memory outages during READ COMMITTED data export of tables with large rows, estimate the amount of memory used to export a table:
```
--row-batch-size * --export-concurrency * average size of the table rows
```
If you are exporting more than one table at a time (i.e., --table-concurrency is set higher than 1), add the estimated memory usage for the tables with the largest row sizes. Ensure that you have sufficient memory to run molt fetch, and adjust --row-batch-size accordingly. For details on how concurrency and sharding interact, refer to Table sharding.
If a table in the source database is much larger than the other tables, filter and export the largest table in its own molt fetch task. Repeat this for each of the largest tables. Then export the remaining tables in another task.
Ensure that the machine running MOLT Fetch is large enough to handle the amount of data being migrated. Fetch performance can sometimes be limited by available resources, but should always be making progress. To identify possible resource constraints, observe the molt_fetch_rows_exported metric for decreases in the number of rows being processed. You can use the sample Grafana dashboard to view metrics. For details on optimizing export performance through sharding, refer to Table sharding.

Import and continuation handling

When using IMPORT INTO during the data import phase to load tables into CockroachDB, if the fetch task terminates before the import job completes, the hanging import job on the target database will keep the table offline. To make this table accessible again, manually resume or cancel the job. Then resume molt fetch using continuation, or restart the task from the beginning.

Troubleshooting

Fetch issues

Fetch exits early due to mismatches

When run in none or truncate-if-exists mode, molt fetch exits early in the following cases, and will output a log with a corresponding mismatch_tag and failable_mismatch set to true:

A source table is missing a primary key.
A source primary key and target primary key have mismatching types.
Tip:
These restrictions (missing or mismatching primary keys) can be bypassed with --skip-pk-check.
A STRING primary key has a different collation on the source and target.
A source and target column have mismatching types that are not allowable mappings.
A target table is missing a column that is in the corresponding source table.
A source column is nullable, but the corresponding target column is not nullable (i.e., the constraint is more strict on the target).

molt fetch can continue in the following cases, and will output a log with a corresponding mismatch_tag and failable_mismatch set to false:

A target table has a column that is not in the corresponding source table.
A source column has a NOT NULL constraint, and the corresponding target column is nullable (i.e., the constraint is less strict on the target).
A DEFAULT, CHECK, FOREIGN KEY, or UNIQUE constraint is specified on a target column and not on the source column.

Failed to export snapshot: no rows in result set

failed to export snapshot: please ensure that you have GTID-based replication enabled: sql: no rows in result set

This typically occurs on a new MySQL cluster that has not had any writes committed. The GTID set will not appear in SHOW MASTER STATUS until at least one transaction has been committed on the database.

Resolution: Execute a minimal transaction to initialize the GTID set:

START TRANSACTION;
SELECT 1;
COMMIT;

Verify that the GTID set now appears:

SHOW MASTER STATUS;

This should return a valid GTID value instead of an empty result.

ORA-01950: no privileges on tablespace

If you receive ORA-01950: no privileges on tablespace 'USERS', it means the Oracle table owner (migration_schema in the preceding examples) does not have sufficient quota on the tablespace used to store its data. By default, this tablespace is USERS, but it can vary. To resolve this issue, grant a quota to the table owner. For example:

-- change UNLIMITED to a suitable limit for the table owner
ALTER USER migration_schema QUOTA UNLIMITED ON USERS;

No tables to drop and recreate on target

When expecting a bulk load but seeing no tables to drop and recreate on the target, ensure the migration user has SELECT and FLASHBACK privileges on each table to be migrated. For example:

GRANT SELECT, FLASHBACK ON migration_schema.employees TO C##MIGRATION_USER;
GRANT SELECT, FLASHBACK ON migration_schema.payments TO C##MIGRATION_USER;
GRANT SELECT, FLASHBACK ON migration_schema.orders TO C##MIGRATION_USER;

Table or view does not exist

If the Oracle migration user lacks privileges on certain tables, you may receive errors stating that the table or view does not exist. Either use --table-filter to limit the tables to be migrated, or grant the migration user SELECT privileges on all objects in the schema. Refer to Create migration user on source database.

Oracle sessions remain open after forcefully stopping `molt` or `replicator`

If you shut down molt or replicator unexpectedly (e.g., with kill -9 or a system crash), Oracle sessions opened by these tools may remain active.

Check your operating system for any running molt or replicator processes and terminate them manually.
After confirming that both processes have stopped, ask a DBA to check for active Oracle sessions using:
```
SELECT sid, serial#, username, status, osuser, machine, program
FROM v$session
WHERE username = 'C##MIGRATION_USER';
```
Wait until any remaining sessions display an INACTIVE status, then terminate them using:
```
ALTER SYSTEM KILL SESSION 'sid,serial#' IMMEDIATE;
```
Replace sid and serial# in the preceding statement with the values returned by the SELECT query.

MOLT Fetch

Terminology

Prerequisites

Supported databases

Database configuration

User permissions

Installation

Docker images

MOLT Fetch

Docker usage

Performance

Authentication

Google Cloud Storage

Amazon S3

Local connection strings

Migration phases

Data export phase

Data import phase

Commands

Subcommands

Flags

Global flags

tokens list flags

Usage

Source and target databases

Fetch mode

Data load mode

Table sharding

Data path

Bucket path

Local path

Direct copy

Schema and table selection

Selective data movement

Target table handling

Mismatch handling

Skip primary key matching

Type mapping

Transformations

Transformation rules example

Column exclusions and computed columns

Table renaming

Schema renaming

General notes

Fetch continuation

List active continuation tokens

CDC cursor

Security

Connection security

Cloud storage security

Amazon S3

Google Cloud Storage

Azure Blob Storage

Common workflows

Bulk data load

Load before replication

Monitoring

Metrics

Best practices

Test and validate

Configure the source database and connection

Optimize performance

Import and continuation handling

Troubleshooting

Fetch issues

Fetch exits early due to mismatches

Failed to export snapshot: no rows in result set

ORA-01950: no privileges on tablespace

No tables to drop and recreate on target

Table or view does not exist

Oracle sessions remain open after forcefully stopping molt or replicator

See also

Tell us about your experience

Select the problem area

Thank you for your feedback!

Explore More Documentation:

`tokens list` flags

Oracle sessions remain open after forcefully stopping `molt` or `replicator`