Ctrl+K
Logo image Logo image

Site Navigation

  • API Reference
  • Examples

Site Navigation

  • API Reference
  • Examples

Section Navigation

  • PyFlink Table
  • PyFlink DataStream
    • StreamExecutionEnvironment
    • DataStream
    • Functions
    • State
    • Timer
    • Window
    • Checkpoint
    • Side Outputs
    • Connectors
    • Formats
  • PyFlink Common

pyflink.datastream.connectors.kafka.KafkaSourceBuilder#

class KafkaSourceBuilder[source]#

The builder class for KafkaSource to make it easier for the users to construct a KafkaSource.

The following example shows the minimum setup to create a KafkaSource that reads the String values from a Kafka topic.

>>> source = KafkaSource.builder() \
...     .set_bootstrap_servers('MY_BOOTSTRAP_SERVERS') \
...     .set_topics('TOPIC1', 'TOPIC2') \
...     .set_value_only_deserializer(SimpleStringSchema()) \
...     .build()

The bootstrap servers, topics/partitions to consume, and the record deserializer are required fields that must be set.

To specify the starting offsets of the KafkaSource, one can call set_starting_offsets().

By default, the KafkaSource runs in an CONTINUOUS_UNBOUNDED mode and never stops until the Flink job is canceled or fails. To let the KafkaSource run in CONTINUOUS_UNBOUNDED but stops at some given offsets, one can call set_stopping_offsets(). For example the following KafkaSource stops after it consumes up to the latest partition offsets at the point when the Flink started.

>>> source = KafkaSource.builder() \
...     .set_bootstrap_servers('MY_BOOTSTRAP_SERVERS') \
...     .set_topics('TOPIC1', 'TOPIC2') \
...     .set_value_only_deserializer(SimpleStringSchema()) \
...     .set_unbounded(KafkaOffsetsInitializer.latest()) \
...     .build()

New in version 1.16.0.

Methods

build()

set_bootstrap_servers(bootstrap_servers)

Sets the bootstrap servers for the KafkaConsumer of the KafkaSource.

set_bounded(stopping_offsets_initializer)

By default, the KafkaSource is set to run in CONTINUOUS_UNBOUNDED manner and thus never stops until the Flink job fails or is canceled.

set_client_id_prefix(prefix)

Sets the client id prefix of this KafkaSource.

set_group_id(group_id)

Sets the consumer group id of the KafkaSource.

set_partitions(partitions)

Set a set of partitions to consume from.

set_properties(props)

Set arbitrary properties for the KafkaSource and KafkaConsumer.

set_property(key, value)

Set an arbitrary property for the KafkaSource and KafkaConsumer.

set_starting_offsets(...)

Specify from which offsets the KafkaSource should start consume from by providing an KafkaOffsetsInitializer.

set_topic_pattern(topic_pattern)

Set a topic pattern to consume from use the java Pattern.

set_topics(*topics)

Set a list of topics the KafkaSource should consume from.

set_unbounded(stopping_offsets_initializer)

By default, the KafkaSource is set to run in CONTINUOUS_UNBOUNDED manner and thus never stops until the Flink job fails or is canceled.

set_value_only_deserializer(...)

Sets the DeserializationSchema for deserializing the value of Kafka's ConsumerRecord.

previous

pyflink.datastream.connectors.kafka.KafkaSource

next

pyflink.datastream.connectors.kafka.KafkaTopicPartition

On this page
  • KafkaSourceBuilder
Show Source

Created using Sphinx 5.3.0.