pyspark.sql.functions.string_agg_distinct#
- pyspark.sql.functions.string_agg_distinct(col, delimiter=None)[source]#
Aggregate function: returns the concatenation of distinct non-null input values, separated by the delimiter.
An alias of
listagg_distinct()
.New in version 4.0.0.
- Parameters
- Returns
Column
the column for computed results.
Examples
Example 1: Using string_agg_distinct function
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([('a',), ('b',), (None,), ('c',), ('b',)], ['strings']) >>> df.select(sf.string_agg_distinct('strings')).show() +----------------------------------+ |string_agg(DISTINCT strings, NULL)| +----------------------------------+ | abc| +----------------------------------+
Example 2: Using string_agg_distinct function with a delimiter
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([('a',), ('b',), (None,), ('c',), ('b',)], ['strings']) >>> df.select(sf.string_agg_distinct('strings', ', ')).show() +--------------------------------+ |string_agg(DISTINCT strings, , )| +--------------------------------+ | a, b, c| +--------------------------------+
Example 3: Using string_agg_distinct function with a binary column and delimiter
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(b'',), (b'',), (None,), (b'',), (b'',)], ... ['bytes']) >>> df.select(sf.string_agg_distinct('bytes', b'B')).show() +---------------------------------+ |string_agg(DISTINCT bytes, X'42')| +---------------------------------+ | [01 42 02 42 03]| +---------------------------------+
Example 4: Using string_agg_distinct function on a column with all None values
>>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import StructType, StructField, StringType >>> schema = StructType([StructField("strings", StringType(), True)]) >>> df = spark.createDataFrame([(None,), (None,), (None,), (None,)], schema=schema) >>> df.select(sf.string_agg_distinct('strings')).show() +----------------------------------+ |string_agg(DISTINCT strings, NULL)| +----------------------------------+ | NULL| +----------------------------------+