The following tutorial on how to run a multi-broker cluster provides examples for both KRaft mode and ZooKeeper mode. Confluent recommends KRaft mode for new deployments.To learn more about running Kafka in KRaft mode, see KRaft Overview, the KRaft steps in the Platform Quick Start,and Settings for other components. The fundamental capabilities, concepts,design ethos, and ways of working that you already know from using Kafka,also apply to Confluent Platform. By definition, Confluent Platform ships with all of the basic Kafka commandutilities and APIs used in development, along with several additional CLIs tosupport Confluent specific features. For detailed information about converters, see Configuring Keyand Value Converters.

  1. Confluent offersConfluent Cloud, a data-streaming service, and Confluent Platform, software you download and manage yourself.
  2. The key part of a Kafka event is not necessarily a unique identifier for the event, like the primary key of a row in a relational database would be.
  3. This happens automatically, and while you can tune some settings in the producer to produce varying levels of durability guarantees, this is not usually a process you have to think about as a developer building systems on Kafka.
  4. Likewise, reading from a relational database, Salesforce, or a legacy HDFS filesystem is the same operation no matter what sort of application does it.

Once applications are busily producing messages to Kafka and consuming messages from it, two things will happen. These are brand new applications—perhaps written by the team that wrote the original producer of the messages, perhaps by another team—and will need to understand the format of the messages in the topic. Order objects gain a new status field, usernames split into first and last name from full name, and so on. The schema of our domain objects is a constantly moving target, and we must have a way of agreeing on the schema of messages in any given topic. Kafka can connect to nearly any other data source in traditional enterprise information systems, modern databases, or in the cloud. It forms an efficient point of integration with built-in data connectors, without hiding logic or routing inside brittle, centralized infrastructure.

Security and resilience features¶

When you produce data to the leader—in general, reading and writing are done to the leader—the leader and the followers work together to replicate those new writes to the followers. Partitioning takes the single topic log and breaks it into multiple logs, each of which can live on a separate node in the Kafka cluster. This way, the work of storing messages, writing new messages, and processing existing messages can be split among many nodes in the cluster. Events have a tendency to proliferate—just think of the events that happened to you this morning—so we’ll need a system for organizing them. Kafka’s most fundamental unit of organization is the topic, which is something like a table in a relational database.

Typically, if a message has no key, subsequent messages will be distributed round-robin among all the topic’s partitions. In this case, all partitions get an even share of the data, but we don’t preserve any kind of ordering of the input messages. If the message does have a key, then the destination partition will be computed from a hash of the key. This allows Kafka to guarantee that messages having the same key always land in the same partition, and therefore are always in order. As a distributed pub/sub messaging system, Kafka works well as a modernized version of the traditional message broker.

Confluent was founded by the creators of Kafka, and itsproduct line includes proprietary products based on open-source Kafka. This topic describesKafka use cases, the relationship between Confluent and Kafka, and key differences betweenthe Confluent products. Confluent Platform is a full-scale streaming platform that enables you to easily access,store, and manage data as continuous, real-time streams. Built by the original creatorsof Apache Kafka®, Confluent Platform is an enterprise-ready platform that completes Kafka withadvanced capabilities designed to help accelerate application development andconnectivity.

Run a multi-broker cluster¶

Replace [Add your cluster API key here] and [Add your cluster API secret here] with your cluster API key and secret. Operators and developers who want to set up production-ready deployments can follow theworkflows for Install Confluent Platform On-Premises or Ansible Playbooks. Confluent offers a number of features to scale effectively and get the maximum performance for your investment. Confluent Platform provides several features to supplement Kafka’s Admin API, and built-in JMX monitoring.

When a producer is configured to use the Schema Registry, it calls an API at the Schema Registry REST endpoint and presents the schema of the new message. If it is the same as the last message produced, then the produce may succeed. If it is different from the last message but matches the compatibility rules defined for the topic, the produce may still succeed. But if it is different in a way that violates the compatibility rules, the produce will fail in a way that the application code can detect. Schema Registry is a standalone server process that runs on a machine external to the Kafka brokers.

Development and connectivity features¶

Due to Kafka’s high throughput, fault tolerance, resilience, and scalability, there are numerous use cases across almost every industry – from banking and fraud detection, to transportation and IoT. In this section, you create a Flink workspace and write queries against theusers topic and other streaming data. Now that you have created some topics and produced message data to a topic (bothmanually and with auto-generated), take another look at Control Center, this time toinspect the existing topics. You cannot use the kafka-storage command to update an existing cluster.If you make a mistake in configurations at that point, you must recreate the directories from scratch, and work through the steps again.

KafkaConsumer manages connection pooling and the network protocol just like KafkaProducer does, but there is a much bigger story on the read side than just the network plumbing. First of all, Kafka is different from legacy message queues in that reading a message does not destroy it; it is still there to be read by any other consumer that might be interested in it. In fact, it’s perfectly normal in Kafka for many consumers to read from one topic. This one small fact has a positively disproportionate impact on the kinds of software architectures that emerge around Kafka, which is a topic covered very well elsewhere. Having broken a topic up into partitions, we need a way of deciding which messages to write to which partitions.

And if after all that you still can’t find a connector that does what you need, you can write your own using a fairly simple API. Internally, keys and values are just sequences of bytes, but externally in your programming language of choice, they are often structured objects represented in your language’s type system. Kafka famously calls the translation between rfp for software development language types and internal bytes serialization and deserialization. If you are ready to start working at the command line, skip to Kafka Commands Primer and try creating Kafka topics, working with producers and consumers, and so forth. An error-handling feature is available that will route all invalid records to aspecial topic and report the error.

Configure Control Center with REST endpoints and advertised listeners (Optional)¶

And if that plugin ecosystem happens not to have what you need, the open-source Connect framework makes it simple to build your own connector and inherit all the scalability and fault tolerance properties Connect offers. Since Kafka topics are logs, there is nothing inherently temporary about the data in them. Every topic can be configured to expire data after it has reached a certain age (or the topic overall has reached a certain size), from as short as seconds to as long as years or even to retain messages indefinitely. When you write an event to a topic, it is as durable as it would be if you had written it to any database you ever trusted. To bridge the gap between the developer environment quick starts and full-scale,multi-node deployments, you can start by pioneering multi-broker clustersand multi-cluster setups on a single machine, like your laptop. Confluent Cloud includes different types of server processes for steaming data in a production environment.

A fully-managed data streaming platform, available on AWS, GCP, and Azure, with a cloud-native Apache Kafka® engine for elastic scaling, enterprise-grade security, stream processing, and governance. At a minimum,you will need ZooKeeper and the brokers (already started), and Kafka REST. However,it is useful to have all components running if you are just getting startedwith the platform, and want to explore everything. This gives you a similarstarting point as you get in Quick Start for Confluent Platform, and enables youto work through the examples in that Quick Start in addition to the Kafkacommand examples provided here.

As a developer using Kafka, the topic is the abstraction you probably think the most about. You create different topics to hold different kinds of events and different topics to hold filtered and transformed versions of the same kind of event. Performing real-time computations on event streams is a core competency of Kafka. From real-time data processing to dataflow programming, Kafka ingests, stores, and processes streams of data as it’s being generated, at any scale.

This rebalancing procedure is alsoused when connectors increase or decrease the number of tasks they require, orwhen a connector’s configuration is changed. When a task fails, no rebalance istriggered, as a task failure is considered an exceptional case. As such, failedtasks are not restarted by the framework and should be restartedusing the REST API. Confluent Cloud offers pre-built, fully managed, Kafkaconnectors that make it easy to instantly connect to popular data sources andsinks.

You can use Kafka to collect user activity data, system logs, application metrics,stock ticker data, and device instrumentation signals. Regardless of the use case,Confluent Platform lets you focus on how to derive business value from your data rather than worryingabout the underlying mechanics, such as how data is being transported or integrated betweendisparate systems. Specifically, Confluent Platform simplifies connecting data sources to Kafka, buildingstreaming applications, as well as securing, monitoring, and managing your Kafka infrastructure. This happens automatically, and while you can tune some settings in the producer to produce varying levels of durability guarantees, this is not usually a process you have to think about as a developer building systems on Kafka.