pyflink.datastream.formats.parquet.ParquetColumnarRowInputFormat#

class ParquetColumnarRowInputFormat(row_type: RowType, hadoop_config: Optional[Configuration] = None, batch_size: int = 2048, is_utc_timestamp: bool = False, is_case_sensitive: bool = True)[source]#

A ParquetVectorizedInputFormat to provide RowData iterator. Using ColumnarRowData to provide a row view of column batch. Only primitive types are supported for a column, composite types such as array, map are not supported.

Example:

>>> row_type = DataTypes.ROW([
...     DataTypes.FIELD('a', DataTypes.INT()),
...     DataTypes.FIELD('b', DataTypes.STRING()),
... ])
>>> source = FileSource.for_bulk_file_format(ParquetColumnarRowInputFormat(
...     row_type=row_type,
...     hadoop_config=Configuration(),
...     batch_size=2048,
...     is_utc_timestamp=False,
...     is_case_sensitive=True,
... ), PARQUET_FILE_PATH).build()
>>> ds = env.from_source(source, WatermarkStrategy.no_watermarks(), "parquet-source")

New in version 1.16.0.