Why we need a multi-cloud database, and how to build one

Why we need a multi-cloud database, and how to build one
[ Guides ]

Best practices for multi-cloud

Hear from on-the-ground experts about the pitfalls and best practices associated with building multi-cloud companies and applications.

Get your copy

In our State of Multi-Cloud 2024 report, we were a bit surprised to find that half of our respondents said their companies were already multi-cloud, and of those, half had already begun working with complex multi-cloud deployment patterns like deploying a single workload across multiple clouds.

What are the benefits of that sort of complex deployment, and how can it actually be achieved in the real world to bring the advantages of multi-cloud to mission-critical workloads like your transactional database workloads?

In this blog, we will talk about what multi-cloud architecture is and why it is important. We will also look at a working example of how to deploy CockroachDB across three different cloud providers using Kubernetes and network VPNs. This is not an easy task, so this example should help get you on the road to a true multi-cloud database that can support your multi-cloud application.

What is multi-cloud?

Multi-cloud often refers to using more than one cloud to deliver an application. Doing this enables organizations to take advantage of the best-in-class services provided by each of the cloud providers, rather than having to commit to just one. A multi-cloud deployment could include a combination of public cloud providers, or private clouds, or a combination of both.

With a multi-cloud application you can:

  • Power a single application with data stored across multiple clouds.
  • Use data that is created in one cloud to perform analysis in another cloud without having to manage or maintain manual data movement.
  • Enhance the mobility of applications by being able to move them from one cloud to another.

multicloud database architectural diagram - CockroachDB running across multiple clouds, while still working like a single logical database from a developer perspective

Why is multi-cloud important?

Putting all your eggs in one basket with a single cloud provider can prove to be a risky approach. No one is immune to outages and that includes the big cloud providers. By spreading the risk across multiple cloud providers, you are mitigating the chance of an outage affecting your customers.

Also, each cloud provider has their forte, the services that they do best. By spreading your data across cloud providers, you are able to take advantage of the best services each cloud offers (like analytics servers in GCP, for example).

Manage your own destiny

Having a multi-cloud strategy also gives you the flexibility to migrate your applications between the clouds with relative ease. Historically cross-cloud migration has been challenging, particularly if cloud-provider-specific services have been used. But now, using cloud agnostic solutions like CockroachDB can give you the flexibility to move applications more simply, as the data is available in all locations due to CockroachDB’s built-in data distribution while still functioning like a single logical database from a developer perspective.

Multi-cloud can thus prevent vendor lock-in and give you the power to do things like take advantage of the best prices and services, regardless of which cloud is offering them.

Avoid cloud concentration risk

When organizations focus on a single cloud – particularly financial institutions and other critical national infrastructure – this can result in a dependance on that provider. That, in turn, could result in the failure of critical elements of the civilized world if that single provider goes down.

Regulators of critical industries are increasingly nervous about the risks presented by this type of cloud concentration. To avoid the possibility of critical infrastructure being impacted by a cloud provider outage or other disruption, you need to adopt a multi-cloud strategy, with your infrastructure and application deployed across two or more cloud providers.

Of course, actually doing that can be a challenge, especially when we think of how to distribute the data.

How does CockroachDB enable multi-cloud?

Replication across cloud providers: CockroachDB allows data to be automatically replicated across multiple cloud providers. This means that you can have clusters of CockroachDB running across different cloud platforms, and data is automatically synchronized between them to maintain consistency and availability.

Global data distribution: CockroachDB supports row-level geo-replication, enabling data to be distributed across specific geographical regions hosted by different cloud providers. This enables you to place data closer to end-users, reducing latency and improving the application’s performance.

Active-active deployments: With CockroachDB’s distributed architecture and data replication capabilities, you can set up active-active deployments across different cloud providers. In an active-active setup, read and write operations can be handled by multiple clusters simultaneously, offering better load distribution and fault tolerance.

Failover and disaster recovery: By deploying CockroachDB clusters in multiple cloud providers, you can implement effective failover and disaster recovery strategies. If one cloud provider experiences an outage or becomes unavailable, the application can automatically failover to another cloud provider where CockroachDB is running.

Data sovereignty and compliance: Multi-cloud deployments can help organizations adhere to data sovereignty regulations that require certain data to be stored within specific geographic regions. CockroachDB’s ability to replicate data across clouds while maintaining strong consistency facilitates compliance with such regulations.

Vendor lock-in mitigation: Using multiple cloud providers reduces the risk of vendor lock-in. Organizations can take advantage of competitive pricing, unique features, and specialized services from different cloud providers without being tied to a single vendor.

Load balancing and performance optimization: CockroachDB’s automatic load balancing ensures that data and workloads are evenly distributed across the multi-cloud environment, maximizing resource utilization and maintaining optimal performance.

Security and compliance: CockroachDB provides built-in security features, such as encryption at rest and in transit, ensuring that data remains secure and compliant with regulatory requirements, even in multi-cloud setups.

By combining these multi-cloud capabilities, CockroachDB allows developers and organizations to create highly available, fault-tolerant, and performant applications that can span multiple cloud providers. It provides the flexibility to leverage the strengths of different cloud platforms while mitigating the risks associated with relying solely on one provider.

How to create a multi-cloud SQL database

In this github repo, we have created a working example of a multi-cloud CockroachDB cluster. A Kuberentes cluster is created in each of the three cloud providers using their managed Kubernetes services. These are then connected together using VPN devices with CockroachDB deployed across all three cloud providers.

Sounds simple, right? However, there are a number of considerations to take into account when deploying such a solution.

Networking

With any infrastructure solution, IP addressing is important. When designing your multi-cloud solution, ensure that no overlapping address space exists across all clouds and pod networks across all Kuberentes clusters. This makes routing simple and allows traffic to flow seamlessly across clouds. If overlapping address space exists, then complex Network Address Translation (NAT) has to take place – this can add lots of complexity and make solutions hard to manage and administer.

Connectivity

All of the nodes that make up your CockroachDB cluster need to be able to communicate with each other. Clouds can be joined using Virtual Private Networks (VPNs), encrypted tunnels over the internet that connect two or more local networks. Other more private and resilient networking solutions are available, but these come at a premium from a cost perspective. For this reason, in the demo code we will stick to VPNs. These are deployed in Step 2 and Step 3.

Another consideration is the cost of networking. Cloud providers can tend to charge larger amounts for data leaving their cloud in an attempt to encourage you to use all of their services. If your workload dictates a multi-cloud strategy, these egress costs must be factored into your plans and budgets.

Name resolution

Along with network connectivity, name resolution is also important. CockroachDB nodes running in one cloud need to be able resolve the names of the nodes running in other clouds. In Kubernetes there are generally two solutions: CoreDNS and kube-dns. In AKS and EKS CoreDNS is used for DNS, however in GKE kube-dns is used. This makes a single DNS solution tricky when deploying across all three Kubernetes clusters.

The solution to this is to replace kube-dns with CoreDNS on GKE. You can follow the instructions in Step 5. Doing this gives us a unified solution for DNS across all three clusters, enabling cross cluster name resolution.

Deploying multi-region CockroachDB in Kubernetes

To deploy CockroachDB in a multi-region setup, manifests are the recommended approach, as documented in Step 5. CockroachDB is best deployed in a secure configuration so that all traffic between nodes and clients is encrypted using TLS. To configure this, use the Cockroach binary, which includes a Certificate Authority. These certificates need to be shared across all the nodes in the cluster, and so need to be added as secrets in all three Kubernetes clusters. This ensures that all the nodes trust each other and are able to communicate with each other.

Follow the remaining steps to deploy CockroachDB across the three Kubernetes clusters.

Additional multi-cloud considerations

No cloud provider is immune from outages. We have seen this in recent months with both AWS and GCP suffering major outages. If you are running mission-critical applications that require a high level of resilience, with CockroachDB you are able to create true multi-cloud applications that have a common data plane extending beyond the traditional boundaries of a single cloud, protecting your workload in the event of a cloud outage.

By adopting a multi-cloud strategy and using CockroachDB you put the power back into your hands when it comes to where it’s best to run your application. Whether it is moving from on-prem to cloud or between clouds, and whether your goals are avoiding vendor lock-in, taking advantage of best-in-class services at different clouds, maintaining regulatory compliance, increasing resilience, or all of the above, CockroachDB makes it possible.

That said, multi-cloud isn’t the right solution for every company or every workload. Considerations such as the cost of data transfers between clouds and the additional complexity of such a solution may rule it out as an option in some cases. Still, the many benefits of multi-cloud applications mean that in many cases, they can provide you with the competitive advantage you are looking for to stay ahead of the pack.

Keep Reading

5 reasons to build multi-region application architecture

TL;DR - Multi-region application architecture makes applications more resilient and improves end-user experiences by …

Read more
What is CRDB?

CRDB is shorthand for CockroachDB: the scalable, consistently-replicated, distributed SQL database. And what, exactly, …

Read more
Migrating PostgreSQL to CockroachDB For Multicloud

You need to send cash from your bank to a friend, who happens to use another bank. The money should take a fairly …

Read more