Table#
A Table object is the core abstraction of the Table API.
Similar to how the DataStream API has DataStream,
the Table API is built around Table.
A Table object describes a pipeline of data transformations. It does not
contain the data itself in any way. Instead, it describes how to read data from a table source,
and how to eventually write data to a table sink. The declared pipeline can be
printed, optimized, and eventually executed in a cluster. The pipeline can work with bounded or
unbounded streams which enables both streaming and batch scenarios.
By the definition above, a Table object can actually be considered as
a view in SQL terms.
The initial Table object is constructed by a
TableEnvironment. For example,
from_path() obtains a table from a catalog.
Every Table object has a schema that is available through
get_schema(). A Table object is
always associated with its original table environment during programming.
Every transformation (i.e. select()} or
filter() on a Table object leads to a new
Table object.
Use execute() to execute the pipeline and retrieve the transformed
data locally during development. Otherwise, use execute_insert() to
write the data into a table sink.
Many methods of this class take one or more Expression as parameters.
For fluent definition of expressions and easier readability, we recommend to add a star import:
Example:
>>> from pyflink.table.expressions import *
Check the documentation for more programming language specific APIs.
The following example shows how to work with a Table object.
Example:
>>> from pyflink.table import EnvironmentSettings, TableEnvironment
>>> from pyflink.table.expressions import *
>>> env_settings = EnvironmentSettings.in_streaming_mode()
>>> t_env = TableEnvironment.create(env_settings)
>>> table = t_env.from_path("my_table").select(col("colA").trim(), col("colB") + 12)
>>> table.execute().print()
|
Adds additional columns. |
|
Adds additional columns. |
|
Performs a global aggregate operation with an aggregate function. |
|
Renames the fields of the expression result. |
Removes duplicate values and returns only distinct (different) values. |
|
|
Drops existing columns. |
|
Drops existing columns. |
Collects the contents of the current table local client. |
|
|
|
|
Returns the AST of this table and the execution plan. |
|
Limits a (possibly sorted) result to the first n rows. |
|
Filters out elements that don't pass the filter predicate. |
|
Perform a global flat_aggregate without group_by. |
|
Performs a flatMap operation with a user-defined table function. |
|
Joins two |
Returns the |
|
|
Groups the elements on some grouping keys. |
|
Intersects two |
|
Intersects two |
|
Joins two |
|
Joins this Table with an user-defined TableFunction. |
|
Joins two |
|
Joins this Table with an user-defined TableFunction. |
|
Limits a (possibly sorted) result to the first n rows. |
|
Performs a map operation with a user-defined scalar function. |
|
Minus of two |
|
Minus of two |
|
Limits a (possibly sorted) result from an offset position. |
|
Sorts the given |
|
Defines over-windows on the records of a table. |
Prints the schema of this table to the console in a tree format. |
|
|
Renames existing columns. |
|
Joins two |
|
Performs a selection operation. |
Converts the table to a pandas DataFrame. |
|
|
Unions two |
|
Unions two |
|
Filters out elements that don't pass the filter predicate. |
|
Defines group window on the records of a table. |