MOLT Replicator Best Practices

On this page Carat arrow pointing down

This page describes best practices for using MOLT Replicator to ensure reliable, secure, and performant data migration to CockroachDB.

Test and validate

To verify that your connections and configuration work properly, run MOLT Replicator in a staging environment before replicating any data in production. Use a test or development environment that closely resembles production.

Optimize performance

Configure the following replicator flags to optimize replication throughput and resource usage. Test different combinations in a pre-production environment to find the optimal balance of stability and performance for your workload.

Note:

The following parameters apply to PostgreSQL, Oracle, and CockroachDB (failback) sources.

Flag Description
--parallelism Control the maximum number of concurrent target transactions. Higher values increase throughput but require more target connections. Start with a conservative value and increase based on target database capacity.
--flushSize Balance throughput and latency. Controls how many mutations are batched into each query to the target. Increase for higher throughput at the cost of higher latency.
--targetApplyQueueSize Control memory usage during operation. Increase to allow higher throughput at the expense of memory; decrease to apply backpressure and limit memory consumption.
--targetMaxPoolSize Set larger than --parallelism by a safety factor to avoid exhausting target pool connections. Replicator enforces setting parallelism to 80% of this value.
--collapseMutations Reduce the number of queries to the target by combining multiple mutations on the same primary key within each batch. Disable only if exact mutation order matters more than end state.
--enableParallelApplies Improve apply throughput for independent tables and table groups that share foreign key dependencies. Increases memory and target connection usage, so ensure you increase --targetMaxPoolSize or reduce --parallelism.
--flushPeriod Set to the maximum allowable time between flushes (for example, 10s if data must be applied within 10 seconds). Works with --flushSize to control when buffered mutations are committed to the target.
--quiescentPeriod Lower this value if constraint violations resolve quickly on your workload to make retries more frequent and reduce latency. Do not lower if constraint violations take time to resolve.
--scanSize Applies to failback (replicator start) scenarios only. Balance memory usage and throughput. Increase to read more rows at once from the CockroachDB staging cluster for higher throughput, at the cost of memory pressure. Decrease to reduce memory pressure and increase stability.

Docker performance

MOLT Replicator is likely to run more slowly in a Docker container than on a local machine. To improve performance, increase the memory or compute resources, or both, on your Docker container.

Security

Cockroach Labs strongly recommends the following:

Connection security and credentials

  • To keep your database credentials out of shell history and logs, follow these best practices when specifying your source and target connection strings:

    • Avoid plaintext connection strings.
    • Provide your connection strings as environment variables. For example:

      export SOURCE="postgres://migration_user:a%2452%26@localhost:5432/migration_db?sslmode=verify-full"
      export TARGET="postgres://root@localhost:26257/defaultdb?sslmode=verify-full"
      

      Afterward, reference the environment variables in MOLT commands:

      --source $SOURCE
      --target $TARGET
      
    • If possible, use an external secrets manager to load the environment variables from stored secrets.

  • Use TLS-enabled connection strings to encrypt data in transit from MOLT to the database. When using TLS certificates, ensure certificate files are accessible to the MOLT binary on the same machine.

    For example, a PostgreSQL connection string with TLS certificates:

    icon/buttons/copy
    postgresql://migration_user@db.example.com:5432/appdb?sslmode=verify-full&sslrootcert=/etc/migration_db/certs/ca.pem&sslcert=/etc/migration_db/certs/client.crt&sslkey=/etc/migration_db/certs/client.key
    
  • URL-encode connection strings for the source database and CockroachDB so special characters in passwords are handled correctly.

    • Given a password a$52&, pass it to the molt escape-password command with single quotes:

      icon/buttons/copy
      molt escape-password --password 'a$52&'
      

      Use the encoded password in your connection string. For example:

      postgres://migration_user:a%2452%26@localhost:5432/migration_db
      
  • Remove sslmode=disable from production connection strings.

CockroachDB changefeed security

For failback scenarios, secure the connection from CockroachDB to MOLT Replicator using TLS certificates. Generate TLS certificates using self-signed certificates, certificate authorities like Let's Encrypt, or your organization's certificate management system.

TLS from CockroachDB to Replicator

Configure MOLT Replicator with server certificates using the --tlsCertificate and --tlsPrivateKey flags to specify the certificate and private key file paths. For example:

icon/buttons/copy
replicator start \
--tlsCertificate ./certs/server.crt \
--tlsPrivateKey ./certs/server.key \
...

These server certificates must correspond to the client certificates specified in the changefeed webhook URL to ensure proper TLS handshake.

Encode client certificates for changefeed webhook URLs:

  • Webhook URLs: Use both URL encoding and base64 encoding: base64 -i ./client.crt | jq -R -r '@uri'
  • Non-webhook contexts: Use base64 encoding only: base64 -w 0 ca.cert

JWT authentication

You can use JSON Web Tokens (JWT) to authorize incoming changefeed connections and restrict writes to a subset of SQL databases or user-defined schemas in the target cluster.

Replicator accepts any JWT token that meets the following requirements:

  • Tokens must be signed using RSA or EC keys. HMAC and None signatures are automatically rejected.
  • Tokens must include a jti (JWT ID) claim for revocation support.
  • Tokens must include a custom claim with the schema authorization list.
Tip:

You can generate tokens using the make-jwt command.

To configure JWT authentication:

  1. Add PEM-formatted public signing keys to the _replicator.jwt_public_keys table in the staging database.

  2. To revoke a specific token, add its jti value to the _replicator.jwt_revoked_ids table in the staging database.

The Replicator process re-reads these tables every minute to pick up changes.

To pass the JWT token from the changefeed to the Replicator webhook sink, use the webhook_auth_header option:

icon/buttons/copy
CREATE CHANGEFEED ... WITH webhook_auth_header='Bearer <encoded_token>';
Generate JWT tokens

The make-jwt command generates JWT tokens or claims for authorizing changefeed connections. It requires a signing key (-k) and the database or schema to authorize (-a). You can output a signed token to a file (-o) or generate an unsigned claim (--claim) for signing with an external JWT provider.

The format of the -a argument depends on your target database. For CockroachDB and PostgreSQL, which have a schema concept, use the database.schema format:

icon/buttons/copy
replicator make-jwt -k ec.key -a database_name.schema_name -o out.jwt

For MySQL and Oracle, which do not have a schema concept, use only the database name:

icon/buttons/copy
replicator make-jwt -k ec.key -a database_name -o out.jwt
Tip:

You can repeat the -a flag to authorize multiple schemas.

Token quickstart

The following example uses OpenSSL to generate keys, but any PEM-encoded RSA or EC keys will work. When using this example, ensure the -a argument format matches your target database as specified in Generate JWT tokens.

icon/buttons/copy
# Generate an EC private key using OpenSSL.
openssl ecparam -out ec.key -genkey -name prime256v1

# Write the public key components to a separate file.
openssl ec -in ec.key -pubout -out ec.pub

# Upload the public key for all instances of Replicator to find it.
cockroach sql -e "INSERT INTO _replicator.jwt_public_keys (public_key) VALUES ('$(cat ec.pub)')"

# Reload configuration, or wait one minute.
killall -HUP replicator

# Generate a token which can write to the ycsb.public schema.
# The key can be decoded using the debugger at https://jwt.io.
# Add the contents of out.jwt to the CREATE CHANGEFEED command:
# WITH webhook_auth_header='Bearer {out.jwt}'
replicator make-jwt -k ec.key -a ycsb.public -o out.jwt
External JWT providers

To use an external JWT provider, generate a claim with the --claim flag. The PEM-formatted public key or keys for that provider must be inserted into the _replicator.jwt_public_keys table. The iss (issuers) and jti (token id) fields will likely be specific to your auth provider, but the custom claim must be retained in its entirety:

icon/buttons/copy
replicator make-jwt -a 'database.schema' --claim
{
  "iss": "replicator",
  "jti": "d5ffa211-8d54-424b-819a-bc19af9202a5",
  "https://github.com/cockroachdb/replicator": {
    "schemas": [
      [
        "database",
        "schema"
      ]
    ]
  }
}

Production considerations

Supply chain security

Use the version command to verify the integrity of your MOLT Replicator build and identify potential upstream vulnerabilities.

icon/buttons/copy
replicator version

The output includes:

  • Module name
  • go.mod checksum
  • Version

Use this information to determine if your build may be subject to vulnerabilities from upstream packages. Cockroach Labs uses Dependabot to automatically upgrade Go modules, and the team regularly merges Dependabot updates to address security issues.

See also

×