pyspark.sql.functions.window_time#

pyspark.sql.functions.window_time(windowColumn)[source]#

Computes the event time from a window column. The column window values are produced by window aggregating operators and are of type STRUCT<start: TIMESTAMP, end: TIMESTAMP> where start is inclusive and end is exclusive. The event time of records produced by window aggregating operators can be computed as window_time(window) and are window.end - lit(1).alias("microsecond") (as microsecond is the minimal supported event time precision). The window column must be one produced by a window aggregating operator.

New in version 3.4.0.

Parameters

windowColumnColumn or column name: The window column of a window aggregate records.

Returns

Column: the column for computed results.

See also

pyspark.sql.functions.window()
pyspark.sql.functions.session_window()

Examples

>>> import datetime
>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(datetime.datetime(2016, 3, 11, 9, 0, 7), 1)], ['dt', 'v'])

Group the data into 5 second time windows and aggregate as sum.

>>> df2 = df.groupBy(sf.window('dt', '5 seconds')).agg(sf.sum('v'))

Extract the window event time using the window_time function.

>>> df2.select('*', sf.window_time('window')).show(truncate=False)
+------------------------------------------+------+--------------------------+
|window                                    |sum(v)|window_time(window)       |
+------------------------------------------+------+--------------------------+
|{2016-03-11 09:00:05, 2016-03-11 09:00:10}|1     |2016-03-11 09:00:09.999999|
+------------------------------------------+------+--------------------------+