pyflink.datastream.connectors.kafka.KafkaSourceBuilder#
- class KafkaSourceBuilder[source]#
The builder class for
KafkaSourceto make it easier for the users to construct aKafkaSource.The following example shows the minimum setup to create a KafkaSource that reads the String values from a Kafka topic.
>>> source = KafkaSource.builder() \ ... .set_bootstrap_servers('MY_BOOTSTRAP_SERVERS') \ ... .set_topics('TOPIC1', 'TOPIC2') \ ... .set_value_only_deserializer(SimpleStringSchema()) \ ... .build()
The bootstrap servers, topics/partitions to consume, and the record deserializer are required fields that must be set.
To specify the starting offsets of the KafkaSource, one can call
set_starting_offsets().By default, the KafkaSource runs in an CONTINUOUS_UNBOUNDED mode and never stops until the Flink job is canceled or fails. To let the KafkaSource run in CONTINUOUS_UNBOUNDED but stops at some given offsets, one can call
set_stopping_offsets(). For example the following KafkaSource stops after it consumes up to the latest partition offsets at the point when the Flink started.>>> source = KafkaSource.builder() \ ... .set_bootstrap_servers('MY_BOOTSTRAP_SERVERS') \ ... .set_topics('TOPIC1', 'TOPIC2') \ ... .set_value_only_deserializer(SimpleStringSchema()) \ ... .set_unbounded(KafkaOffsetsInitializer.latest()) \ ... .build()
New in version 1.16.0.
Methods
build()set_bootstrap_servers(bootstrap_servers)Sets the bootstrap servers for the KafkaConsumer of the KafkaSource.
set_bounded(stopping_offsets_initializer)By default, the KafkaSource is set to run in CONTINUOUS_UNBOUNDED manner and thus never stops until the Flink job fails or is canceled.
set_client_id_prefix(prefix)Sets the client id prefix of this KafkaSource.
set_group_id(group_id)Sets the consumer group id of the KafkaSource.
set_partitions(partitions)Set a set of partitions to consume from.
set_properties(props)Set arbitrary properties for the KafkaSource and KafkaConsumer.
set_property(key, value)Set an arbitrary property for the KafkaSource and KafkaConsumer.
set_starting_offsets(...)Specify from which offsets the KafkaSource should start consume from by providing an
KafkaOffsetsInitializer.set_topic_pattern(topic_pattern)Set a topic pattern to consume from use the java Pattern.
set_topics(*topics)Set a list of topics the KafkaSource should consume from.
set_unbounded(stopping_offsets_initializer)By default, the KafkaSource is set to run in CONTINUOUS_UNBOUNDED manner and thus never stops until the Flink job fails or is canceled.
set_value_only_deserializer(...)Sets the
DeserializationSchemafor deserializing the value of Kafka's ConsumerRecord.