Cross-Cluster Replication with Confluent Kafka: When and How to Use It

Riya

Apache Kafka is a cornerstone of modern distributed systems, enabling reliable, high-throughput messaging between applications. However, as organizations scale and adopt geographically distributed systems, the need to replicate Kafka data across clusters becomes critical. This is where Cross-Cluster Replication (CCR) with Confluent Kafka comes into play. In this blog, we’ll explore the need for CCR, its use cases, and how to set it up with code examples.

Why Cross-Cluster Replication?

Cross-Cluster Replication allows Kafka topics from one cluster to be replicated in another, ensuring data consistency and availability across geographically distributed environments. Here are some key scenarios where CCR is invaluable:

Disaster Recovery: Protect your data from regional failures by maintaining a replica in another cluster.
Geo-Proximity: Serve low-latency data to consumers located in different regions by replicating data closer to them.
Cloud Migration: Facilitate seamless migration from on-premise clusters to cloud-based ones.
Load Balancing: Distribute workloads by replicating data to clusters closer to specific applications.
Compliance and Archiving: Meet compliance requirements by keeping backups in different jurisdictions.

How Cross-Cluster Replication Works

Confluent’s Cross-Cluster Replication is powered by Confluent Replicator or MirrorMaker 2. These tools enable seamless replication of Kafka topics by leveraging Kafka’s Connect framework.

Confluent Replicator: A commercial offering optimized for ease of use and operational stability.
MirrorMaker 2 (MM2): An open-source alternative that provides basic replication functionality with some advanced features like active-active replication.

Setting Up Cross-Cluster Replication with Confluent Replicator

Step 1: Prerequisites

Kafka Clusters: Ensure you have two Kafka clusters running (source and destination).
Confluent Platform: Install Confluent Platform on both clusters.
Connectivity: Ensure network connectivity between the source and destination clusters.
Access Control: Set up authentication and authorization (e.g., SASL/SSL).

Step 2: Install Confluent Replicator Connector

Confluent Replicator is installed as a Kafka Connect connector. First, download the Confluent Platform and start Kafka Connect on the source and destination clusters.

# Start Kafka Connect on the source cluster
bin/connect-distributed.sh config/connect-distributed.properties

# Start Kafka Connect on the destination cluster
bin/connect-distributed.sh config/connect-distributed.properties

Step 3: Configure the Replicator

Create a configuration file for the Replicator connector. Below is an example configuration for replicating a topic named source-topic.

# File: replicator.properties
name=replicator
connector.class=io.confluent.connect.replicator.ReplicatorSourceConnector

# Source cluster configuration
src.kafka.bootstrap.servers=source-cluster:9092
src.kafka.security.protocol=SASL_SSL
src.kafka.sasl.mechanism=PLAIN
src.kafka.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="source-user" password="source-password";

# Destination cluster configuration
dest.kafka.bootstrap.servers=destination-cluster:9092
dest.kafka.security.protocol=SASL_SSL
dest.kafka.sasl.mechanism=PLAIN
dest.kafka.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="dest-user" password="dest-password";

# Topic configuration
topics=source-topic
replication.factor=3
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter

Deploy the connector by posting the configuration to the Kafka Connect REST API:

curl -X POST -H "Content-Type: application/json" --data @replicator.properties http://destination-cluster:8083/connectors

Monitoring the Replication

Once the Replicator is running, you can monitor its status using the Kafka Connect REST API:

curl -X GET http://destination-cluster:8083/connectors/replicator/status

You should see the replication status for the connector, including task states and any errors.

Using MirrorMaker 2 for Cross-Cluster Replication

If you prefer an open-source solution, MirrorMaker 2 is a great alternative. Below is a guide to setting it up.

Step 1: Configure MirrorMaker 2

MirrorMaker 2 uses a properties file for configuration. Here’s an example to replicate source-topic:

# File: mm2.properties

# Source cluster configuration
source.cluster.alias=source
source.bootstrap.servers=source-cluster:9092
source.security.protocol=SASL_SSL
source.sasl.mechanism=PLAIN
source.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="source-user" password="source-password";

# Target cluster configuration
target.cluster.alias=target
target.bootstrap.servers=destination-cluster:9092
target.security.protocol=SASL_SSL
target.sasl.mechanism=PLAIN
target.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="dest-user" password="dest-password";

# Replication configuration
clusters=source,target
topics=source-topic
sync.topic.acls.enabled=true

Step 2: Start MirrorMaker 2

Run MirrorMaker 2 using the configuration file:

bin/connect-mirror-maker.sh mm2.properties

Advanced Use Cases and Configurations

1. Active-Active Clusters

For environments that require bi-directional replication (e.g., active-active setups), configure both clusters to mirror each other. Be mindful of potential data conflicts.

2. Selective Topic Replication

You can use regular expressions in the topics parameter to replicate only specific topics:

topics=source-topic-.*

3. Handling Schema Changes

If you’re using Confluent Schema Registry, ensure it’s accessible by both clusters. Replicate schemas using the avro.converter.schema.registry.url parameter.

Common Challenges

Network Latency: Ensure minimal latency between clusters to avoid lag.
Data Conflicts: For bi-directional replication, implement strategies to handle conflicting writes.
Scaling: Monitor and scale Kafka Connect workers as needed to handle large replication loads.

Conclusion

Cross-Cluster Replication is an essential capability for organizations that rely on Apache Kafka for distributed data streaming. Whether using Confluent Replicator or MirrorMaker 2, CCR ensures high availability, disaster recovery, and low-latency data access across regions. With the detailed steps and configurations provided in this blog, you’re well-equipped to implement CCR in your environment.

By replicating Kafka topics across clusters, you can create robust, resilient systems that scale seamlessly to meet the demands of modern distributed applications. Start experimenting with CCR today and unlock the full potential of your Kafka infrastructure! That’s it for now. I hope this blog gave you some useful insights. Please feel free to post a comment, question or suggestion.

Riya

Riya is a DevOps Engineer with a passion for new technologies. She is a programmer by heart trying to learn something about everything. On a personal front, she loves traveling, listening to music, and binge-watching web series.

Solutions

Industry

Our thinking

Cross-Cluster Replication with Confluent Kafka: When and How to Use It

Riya

Table of Contents

Why Cross-Cluster Replication?

How Cross-Cluster Replication Works

Setting Up Cross-Cluster Replication with Confluent Replicator

Step 1: Prerequisites

Step 2: Install Confluent Replicator Connector

Step 3: Configure the Replicator

Monitoring the Replication

Using MirrorMaker 2 for Cross-Cluster Replication

Step 1: Configure MirrorMaker 2

Step 2: Start MirrorMaker 2

Advanced Use Cases and Configurations

1. Active-Active Clusters

2. Selective Topic Replication

3. Handling Schema Changes

Common Challenges

Conclusion

Riya

Leave a Comment Cancel Reply

Suggested Article

NashTech

Solutions

Useful links

Connect with us

Our achievements