Ctrl+K
Logo image Logo image

Site Navigation

  • API Reference
  • Examples

Site Navigation

  • API Reference
  • Examples

Section Navigation

  • PyFlink Table
  • PyFlink DataStream
    • StreamExecutionEnvironment
    • DataStream
    • Functions
    • State
    • Timer
    • Window
    • Checkpoint
    • Side Outputs
    • Connectors
    • Formats
  • PyFlink Common

pyflink.datastream.checkpoint_storage.JobManagerCheckpointStorage#

class JobManagerCheckpointStorage(checkpoint_path=None, max_state_size=None, j_jobmanager_checkpoint_storage=None)[source]#

The CheckpointStorage checkpoints state directly to the JobManager’s memory (hence the name), but savepoints will be persisted to a file system.

This checkpoint storage is primarily for experimentation, quick local setups, or for streaming applications that have very small state: Because it requires checkpoints to go through the JobManager’s memory, larger state will occupy larger portions of the JobManager’s main memory, reducing operational stability. For any other setup, the FileSystemCheckpointStorage should be used. The FileSystemCheckpointStorage but checkpoints state directly to files rather than to the JobManager’s memory, thus supporting larger state sizes and more highly available recovery.

State Size Considerations

State checkpointing with this checkpoint storage is subject to the following conditions:

  • Each individual state must not exceed the configured maximum state size (see get_max_state_size().

  • All state from one task (i.e., the sum of all operator states and keyed states from all chained operators of the task) must not exceed what the RPC system supports, which is be default < 10 MB. That limit can be configured up, but that is typically not advised.

  • The sum of all states in the application times all retained checkpoints must comfortably fit into the JobManager’s JVM heap space.

Persistence Guarantees

For the use cases where the state sizes can be handled by this storage, it does guarantee persistence for savepoints, externalized checkpoints (of configured), and checkpoints (when high-availability is configured).

Configuration

As for all checkpoint storage, this type can either be configured within the application (by creating the storage with the respective constructor parameters and setting it on the execution environment) or by specifying it in the Flink configuration.

If the storage was specified in the application, it may pick up additional configuration parameters from the Flink configuration. For example, if the backend if configured in the application without a default savepoint directory, it will pick up a default savepoint directory specified in the Flink configuration of the running job/cluster. That behavior is implemented via the configure() method.

Methods

get_checkpoint_path()

Gets the base directory where all the checkpoints are stored.

get_max_state_size()

Gets the maximum size that an individual state can have, as configured in the constructor.

get_savepoint_path()

Gets the base directory where all the savepoints are stored.

Attributes

DEFAULT_MAX_STATE_SIZE

previous

pyflink.datastream.checkpoint_config.ExternalizedCheckpointCleanup

next

pyflink.datastream.checkpoint_storage.FileSystemCheckpointStorage

On this page
  • JobManagerCheckpointStorage
Show Source

Created using Sphinx 5.3.0.