The explode function in PySpark is used to transform an array or map column into multiple rows. It is commonly used when dealing with nested data structures like JSON or arrays within a DataFrame.
1.) explode is used to convert array or map elements into separate rows.
2.) If the array is empty, that row is removed.
3.) When applied to a map, explode creates key-value pairs as rows.
4.) If you want to keep empty rows, use posexplode, which retains the row index.
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,explode,lit
spark=SparkSession.builder.appName("sangam_test_explode").getOrCreate()
data = [
    (1, ["apple", "banana", "cherry"]),
    (2, ["grape", "orange"]),
    (3, [])
]
df=spark.createDataFrame(data,["id","fruits"])
df.show(truncate=False)
df_explode=df.withColumn("fruits",explode(col("fruits")))
df_explode.show(truncate=False)
 
No comments:
Post a Comment