pyspark.sql.functions.window_time#
- pyspark.sql.functions.window_time(windowColumn)[source]#
Computes the event time from a window column. The column window values are produced by window aggregating operators and are of type STRUCT<start: TIMESTAMP, end: TIMESTAMP> where start is inclusive and end is exclusive. The event time of records produced by window aggregating operators can be computed as
window_time(window)
and arewindow.end - lit(1).alias("microsecond")
(as microsecond is the minimal supported event time precision). The window column must be one produced by a window aggregating operator.New in version 3.4.0.
- Parameters
- windowColumn
Column
or column name The window column of a window aggregate records.
- windowColumn
- Returns
Column
the column for computed results.
Examples
>>> import datetime >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(datetime.datetime(2016, 3, 11, 9, 0, 7), 1)], ['dt', 'v'])
Group the data into 5 second time windows and aggregate as sum.
>>> df2 = df.groupBy(sf.window('dt', '5 seconds')).agg(sf.sum('v'))
Extract the window event time using the window_time function.
>>> df2.select('*', sf.window_time('window')).show(truncate=False) +------------------------------------------+------+--------------------------+ |window |sum(v)|window_time(window) | +------------------------------------------+------+--------------------------+ |{2016-03-11 09:00:05, 2016-03-11 09:00:10}|1 |2016-03-11 09:00:09.999999| +------------------------------------------+------+--------------------------+