Table#

A Table object is the core abstraction of the Table API. Similar to how the DataStream API has DataStream, the Table API is built around Table.

A Table object describes a pipeline of data transformations. It does not contain the data itself in any way. Instead, it describes how to read data from a table source, and how to eventually write data to a table sink. The declared pipeline can be printed, optimized, and eventually executed in a cluster. The pipeline can work with bounded or unbounded streams which enables both streaming and batch scenarios.

By the definition above, a Table object can actually be considered as a view in SQL terms.

The initial Table object is constructed by a TableEnvironment. For example, from_path() obtains a table from a catalog. Every Table object has a schema that is available through get_schema(). A Table object is always associated with its original table environment during programming.

Every transformation (i.e. select()} or filter() on a Table object leads to a new Table object.

Use execute() to execute the pipeline and retrieve the transformed data locally during development. Otherwise, use execute_insert() to write the data into a table sink.

Many methods of this class take one or more Expression as parameters. For fluent definition of expressions and easier readability, we recommend to add a star import:

Example:

>>> from pyflink.table.expressions import *

Check the documentation for more programming language specific APIs.

The following example shows how to work with a Table object.

Example:

>>> from pyflink.table import EnvironmentSettings, TableEnvironment
>>> from pyflink.table.expressions import *
>>> env_settings = EnvironmentSettings.in_streaming_mode()
>>> t_env = TableEnvironment.create(env_settings)
>>> table = t_env.from_path("my_table").select(col("colA").trim(), col("colB") + 12)
>>> table.execute().print()

`Table.add_columns`(*fields)	Adds additional columns.
`Table.add_or_replace_columns`(*fields)	Adds additional columns.
`Table.aggregate`(func)	Performs a global aggregate operation with an aggregate function.
`Table.alias`(field, *fields)	Renames the fields of the expression result.
`Table.distinct`()	Removes duplicate values and returns only distinct (different) values.
`Table.drop_columns`(*fields)	Drops existing columns.
`Table.drop_columns`(*fields)	Drops existing columns.
`Table.execute`()	Collects the contents of the current table local client.
`Table.execute_insert`(table_path_or_descriptor)	When target_path_or_descriptor is a tale path:
`Table.explain`(*extra_details)	Returns the AST of this table and the execution plan.
`Table.fetch`(fetch)	Limits a (possibly sorted) result to the first n rows.
`Table.filter`(predicate)	Filters out elements that don't pass the filter predicate.
`Table.flat_aggregate`(func)	Perform a global flat_aggregate without group_by.
`Table.flat_map`(func)	Performs a flatMap operation with a user-defined table function.
`Table.full_outer_join`(right, join_predicate)	Joins two `Table`.
`Table.get_schema`()	Returns the `TableSchema` of this table.
`Table.group_by`(*fields)	Groups the elements on some grouping keys.
`Table.intersect`(right)	Intersects two `Table` with duplicate records removed.
`Table.intersect_all`(right)	Intersects two `Table`.
`Table.join`(right[, join_predicate])	Joins two `Table`.
`Table.join_lateral`(table_function_call[, ...])	Joins this Table with an user-defined TableFunction.
`Table.left_outer_join`(right[, join_predicate])	Joins two `Table`.
`Table.left_outer_join_lateral`(...[, ...])	Joins this Table with an user-defined TableFunction.
`Table.limit`(fetch[, offset])	Limits a (possibly sorted) result to the first n rows.
`Table.map`(func)	Performs a map operation with a user-defined scalar function.
`Table.minus`(right)	Minus of two `Table` with duplicate records removed.
`Table.minus_all`(right)	Minus of two `Table`.
`Table.offset`(offset)	Limits a (possibly sorted) result from an offset position.
`Table.order_by`(*fields)	Sorts the given `Table`.
`Table.over_window`(*over_windows)	Defines over-windows on the records of a table.
`Table.print_schema`()	Prints the schema of this table to the console in a tree format.
`Table.rename_columns`(*fields)	Renames existing columns.
`Table.right_outer_join`(right, join_predicate)	Joins two `Table`.
`Table.select`(*fields)	Performs a selection operation.
`Table.to_pandas`()	Converts the table to a pandas DataFrame.
`Table.union`(right)	Unions two `Table` with duplicate records removed.
`Table.union_all`(right)	Unions two `Table`.
`Table.where`(predicate)	Filters out elements that don't pass the filter predicate.
`Table.window`(window)	Defines group window on the records of a table.