Apache Kafka is a cornerstone of modern distributed systems, enabling reliable, high-throughput messaging between applications. However, as organizations scale and adopt geographically distributed systems, the need to replicate Kafka data across clusters becomes critical. This is where Cross-Cluster Replication (CCR) with Confluent Kafka comes into play. In this blog, we’ll explore the need for CCR, its use cases, and how to set it up with code examples.
Why Cross-Cluster Replication?
Cross-Cluster Replication allows Kafka topics from one cluster to be replicated in another, ensuring data consistency and availability across geographically distributed environments. Here are some key scenarios where CCR is invaluable:
- Disaster Recovery: Protect your data from regional failures by maintaining a replica in another cluster.
- Geo-Proximity: Serve low-latency data to consumers located in different regions by replicating data closer to them.
- Cloud Migration: Facilitate seamless migration from on-premise clusters to cloud-based ones.
- Load Balancing: Distribute workloads by replicating data to clusters closer to specific applications.
- Compliance and Archiving: Meet compliance requirements by keeping backups in different jurisdictions.
How Cross-Cluster Replication Works

Confluent’s Cross-Cluster Replication is powered by Confluent Replicator or MirrorMaker 2. These tools enable seamless replication of Kafka topics by leveraging Kafka’s Connect framework.
- Confluent Replicator: A commercial offering optimized for ease of use and operational stability.
- MirrorMaker 2 (MM2): An open-source alternative that provides basic replication functionality with some advanced features like active-active replication.
Setting Up Cross-Cluster Replication with Confluent Replicator
Step 1: Prerequisites
- Kafka Clusters: Ensure you have two Kafka clusters running (source and destination).
- Confluent Platform: Install Confluent Platform on both clusters.
- Connectivity: Ensure network connectivity between the source and destination clusters.
- Access Control: Set up authentication and authorization (e.g., SASL/SSL).
Step 2: Install Confluent Replicator Connector
Confluent Replicator is installed as a Kafka Connect connector. First, download the Confluent Platform and start Kafka Connect on the source and destination clusters.
# Start Kafka Connect on the source cluster
bin/connect-distributed.sh config/connect-distributed.properties
# Start Kafka Connect on the destination cluster
bin/connect-distributed.sh config/connect-distributed.properties
Step 3: Configure the Replicator
Create a configuration file for the Replicator connector. Below is an example configuration for replicating a topic named source-topic.
# File: replicator.properties
name=replicator
connector.class=io.confluent.connect.replicator.ReplicatorSourceConnector
# Source cluster configuration
src.kafka.bootstrap.servers=source-cluster:9092
src.kafka.security.protocol=SASL_SSL
src.kafka.sasl.mechanism=PLAIN
src.kafka.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="source-user" password="source-password";
# Destination cluster configuration
dest.kafka.bootstrap.servers=destination-cluster:9092
dest.kafka.security.protocol=SASL_SSL
dest.kafka.sasl.mechanism=PLAIN
dest.kafka.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="dest-user" password="dest-password";
# Topic configuration
topics=source-topic
replication.factor=3
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
Deploy the connector by posting the configuration to the Kafka Connect REST API:
curl -X POST -H "Content-Type: application/json" --data @replicator.properties http://destination-cluster:8083/connectors
Monitoring the Replication
Once the Replicator is running, you can monitor its status using the Kafka Connect REST API:
curl -X GET http://destination-cluster:8083/connectors/replicator/status
You should see the replication status for the connector, including task states and any errors.
Using MirrorMaker 2 for Cross-Cluster Replication
If you prefer an open-source solution, MirrorMaker 2 is a great alternative. Below is a guide to setting it up.
Step 1: Configure MirrorMaker 2
MirrorMaker 2 uses a properties file for configuration. Here’s an example to replicate source-topic:
# File: mm2.properties
# Source cluster configuration
source.cluster.alias=source
source.bootstrap.servers=source-cluster:9092
source.security.protocol=SASL_SSL
source.sasl.mechanism=PLAIN
source.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="source-user" password="source-password";
# Target cluster configuration
target.cluster.alias=target
target.bootstrap.servers=destination-cluster:9092
target.security.protocol=SASL_SSL
target.sasl.mechanism=PLAIN
target.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="dest-user" password="dest-password";
# Replication configuration
clusters=source,target
topics=source-topic
sync.topic.acls.enabled=true
Step 2: Start MirrorMaker 2
Run MirrorMaker 2 using the configuration file:
bin/connect-mirror-maker.sh mm2.properties
Advanced Use Cases and Configurations
1. Active-Active Clusters
For environments that require bi-directional replication (e.g., active-active setups), configure both clusters to mirror each other. Be mindful of potential data conflicts.
2. Selective Topic Replication
You can use regular expressions in the topics parameter to replicate only specific topics:
topics=source-topic-.*
3. Handling Schema Changes
If you’re using Confluent Schema Registry, ensure it’s accessible by both clusters. Replicate schemas using the avro.converter.schema.registry.url parameter.
Common Challenges
- Network Latency: Ensure minimal latency between clusters to avoid lag.
- Data Conflicts: For bi-directional replication, implement strategies to handle conflicting writes.
- Scaling: Monitor and scale Kafka Connect workers as needed to handle large replication loads.
Conclusion
Cross-Cluster Replication is an essential capability for organizations that rely on Apache Kafka for distributed data streaming. Whether using Confluent Replicator or MirrorMaker 2, CCR ensures high availability, disaster recovery, and low-latency data access across regions. With the detailed steps and configurations provided in this blog, you’re well-equipped to implement CCR in your environment.
By replicating Kafka topics across clusters, you can create robust, resilient systems that scale seamlessly to meet the demands of modern distributed applications. Start experimenting with CCR today and unlock the full potential of your Kafka infrastructure! That’s it for now. I hope this blog gave you some useful insights. Please feel free to post a comment, question or suggestion.