Table#

A Table object is the core abstraction of the Table API. Similar to how the DataStream API has DataStream, the Table API is built around Table.

A Table object describes a pipeline of data transformations. It does not contain the data itself in any way. Instead, it describes how to read data from a table source, and how to eventually write data to a table sink. The declared pipeline can be printed, optimized, and eventually executed in a cluster. The pipeline can work with bounded or unbounded streams which enables both streaming and batch scenarios.

By the definition above, a Table object can actually be considered as a view in SQL terms.

The initial Table object is constructed by a TableEnvironment. For example, from_path() obtains a table from a catalog. Every Table object has a schema that is available through get_schema(). A Table object is always associated with its original table environment during programming.

Every transformation (i.e. select()} or filter() on a Table object leads to a new Table object.

Use execute() to execute the pipeline and retrieve the transformed data locally during development. Otherwise, use execute_insert() to write the data into a table sink.

Many methods of this class take one or more Expression as parameters. For fluent definition of expressions and easier readability, we recommend to add a star import:

Example:

>>> from pyflink.table.expressions import *

Check the documentation for more programming language specific APIs.

The following example shows how to work with a Table object.

Example:

>>> from pyflink.table import EnvironmentSettings, TableEnvironment
>>> from pyflink.table.expressions import *
>>> env_settings = EnvironmentSettings.in_streaming_mode()
>>> t_env = TableEnvironment.create(env_settings)
>>> table = t_env.from_path("my_table").select(col("colA").trim(), col("colB") + 12)
>>> table.execute().print()

Table.add_columns(*fields)

Adds additional columns.

Table.add_or_replace_columns(*fields)

Adds additional columns.

Table.aggregate(func)

Performs a global aggregate operation with an aggregate function.

Table.alias(field, *fields)

Renames the fields of the expression result.

Table.distinct()

Removes duplicate values and returns only distinct (different) values.

Table.drop_columns(*fields)

Drops existing columns.

Table.drop_columns(*fields)

Drops existing columns.

Table.execute()

Collects the contents of the current table local client.

Table.execute_insert(table_path_or_descriptor)

  1. When target_path_or_descriptor is a tale path:

Table.explain(*extra_details)

Returns the AST of this table and the execution plan.

Table.fetch(fetch)

Limits a (possibly sorted) result to the first n rows.

Table.filter(predicate)

Filters out elements that don't pass the filter predicate.

Table.flat_aggregate(func)

Perform a global flat_aggregate without group_by.

Table.flat_map(func)

Performs a flatMap operation with a user-defined table function.

Table.full_outer_join(right, join_predicate)

Joins two Table.

Table.get_schema()

Returns the TableSchema of this table.

Table.group_by(*fields)

Groups the elements on some grouping keys.

Table.intersect(right)

Intersects two Table with duplicate records removed.

Table.intersect_all(right)

Intersects two Table.

Table.join(right[, join_predicate])

Joins two Table.

Table.join_lateral(table_function_call[, ...])

Joins this Table with an user-defined TableFunction.

Table.left_outer_join(right[, join_predicate])

Joins two Table.

Table.left_outer_join_lateral(...[, ...])

Joins this Table with an user-defined TableFunction.

Table.limit(fetch[, offset])

Limits a (possibly sorted) result to the first n rows.

Table.map(func)

Performs a map operation with a user-defined scalar function.

Table.minus(right)

Minus of two Table with duplicate records removed.

Table.minus_all(right)

Minus of two Table.

Table.offset(offset)

Limits a (possibly sorted) result from an offset position.

Table.order_by(*fields)

Sorts the given Table.

Table.over_window(*over_windows)

Defines over-windows on the records of a table.

Table.print_schema()

Prints the schema of this table to the console in a tree format.

Table.rename_columns(*fields)

Renames existing columns.

Table.right_outer_join(right, join_predicate)

Joins two Table.

Table.select(*fields)

Performs a selection operation.

Table.to_pandas()

Converts the table to a pandas DataFrame.

Table.union(right)

Unions two Table with duplicate records removed.

Table.union_all(right)

Unions two Table.

Table.where(predicate)

Filters out elements that don't pass the filter predicate.

Table.window(window)

Defines group window on the records of a table.