Ctrl+K
Logo image Logo image

Site Navigation

  • API Reference
  • Examples

Site Navigation

  • API Reference
  • Examples

Section Navigation

  • PyFlink Table
  • PyFlink DataStream
    • StreamExecutionEnvironment
    • DataStream
    • Functions
    • State
    • Timer
    • Window
    • Checkpoint
    • Side Outputs
    • Connectors
    • Formats
  • PyFlink Common

pyflink.datastream.formats.parquet.ParquetColumnarRowInputFormat#

class ParquetColumnarRowInputFormat(row_type: RowType, hadoop_config: Optional[Configuration] = None, batch_size: int = 2048, is_utc_timestamp: bool = False, is_case_sensitive: bool = True)[source]#

A ParquetVectorizedInputFormat to provide RowData iterator. Using ColumnarRowData to provide a row view of column batch. Only primitive types are supported for a column, composite types such as array, map are not supported.

Example:

>>> row_type = DataTypes.ROW([
...     DataTypes.FIELD('a', DataTypes.INT()),
...     DataTypes.FIELD('b', DataTypes.STRING()),
... ])
>>> source = FileSource.for_bulk_file_format(ParquetColumnarRowInputFormat(
...     row_type=row_type,
...     hadoop_config=Configuration(),
...     batch_size=2048,
...     is_utc_timestamp=False,
...     is_case_sensitive=True,
... ), PARQUET_FILE_PATH).build()
>>> ds = env.from_source(source, WatermarkStrategy.no_watermarks(), "parquet-source")

New in version 1.16.0.

previous

pyflink.datastream.formats.parquet.AvroParquetWriters

next

pyflink.datastream.formats.parquet.ParquetBulkWriters

On this page
  • ParquetColumnarRowInputFormat
Show Source

Created using Sphinx 5.3.0.