โ๏ธ It is a SQL function in pyspark to ๐๐ฑ๐๐๐ฎ๐ญ๐ ๐๐๐-๐ฅ๐ข๐ค๐ ๐๐ฑ๐ฉ๐ซ๐๐ฌ๐ฌ๐ข๐จ๐ง๐ฌ.
๐ต ๐๐ฒ๐ง๐ญ๐๐ฑ:- ๐๐ฑ๐ฉ๐ซ(๐ฌ๐ญ๐ซ)
โ๏ธ It will take SQL expression as a ๐ฌ๐ญ๐ซ๐ข๐ง๐ ๐๐ซ๐ ๐ฎ๐ฆ๐๐ง๐ญ and performs the operations within the expression.
โ๏ธ It allows using SQL-like functions that are not present in PySpark Column type & pyspark.sql.functions API. Ex:- ๐๐๐๐ ๐๐๐๐
โ๏ธ We are allowed to use ๐๐๐ญ๐๐ ๐ซ๐๐ฆ๐ ๐๐จ๐ฅ๐ฎ๐ฆ๐ง๐ฌ in the expression.
Follow for more:- linkedin.com/in/kishanyadav Notebook Link:- github.com/kishanpython/PySpark-Notebooks/b..
# install pyspark
!pip install pyspark
# importing neccessary libs
from pyspark.sql import SparkSession
from pyspark.sql.functions import expr
# creating session
spark = SparkSession.builder.appName("practice").getOrCreate()
# create dataframe
data = [("Prashant","Banglore",25, 58, "2022-08-01", 1), ("Ankit","Banglore",26, 54, "2021-05-02", 2),
("Ramakant","Gurugram",24, 60, "2022-06-02", 3), ("Brijesh","Gazipur", 26, 75, "2022-07-04", 4),
("Devendra","Gurugram", 27, 62, "2022-04-03", 5), ("Ajay","Chandigarh", 25, 72, "2022-02-01", 6)]
columns= ["friends_name","location", "age", "weight", "meetup_date", "offset"]
df_friends = spark.createDataFrame(data = data, schema = columns)
df_friends.show()
# Output:-
+------------+----------+---+------+-----------+------+
|friends_name| location|age|weight|meetup_date|offset|
+------------+----------+---+------+-----------+------+
| Prashant| Banglore| 25| 58| 2022-08-01| 1|
| Ankit| Banglore| 26| 54| 2021-05-02| 2|
| Ramakant| Gurugram| 24| 60| 2022-06-02| 3|
| Brijesh| Gazipur| 26| 75| 2022-07-04| 4|
| Devendra| Gurugram| 27| 62| 2022-04-03| 5|
| Ajay|Chandigarh| 25| 72| 2022-02-01| 6|
+------------+----------+---+------+-----------+------+
# Concatenating One or More Columns
# concate friends name, age and location columns using expr()
df_concat = df_friends.withColumn("name-age-location", expr("friends_name|| '-'|| age || '-' || location"))
df_concat.show()
# Output:-
+------------+----------+---+------+-----------+------+--------------------+
|friends_name| location|age|weight|meetup_date|offset| name-age-location|
+------------+----------+---+------+-----------+------+--------------------+
| Prashant| Banglore| 25| 58| 2022-08-01| 1|Prashant-25-Banglore|
| Ankit| Banglore| 26| 54| 2021-05-02| 2| Ankit-26-Banglore|
| Ramakant| Gurugram| 24| 60| 2022-06-02| 3|Ramakant-24-Gurugram|
| Brijesh| Gazipur| 26| 75| 2022-07-04| 4| Brijesh-26-Gazipur|
| Devendra| Gurugram| 27| 62| 2022-04-03| 5|Devendra-27-Gurugram|
| Ajay|Chandigarh| 25| 72| 2022-02-01| 6| Ajay-25-Chandigarh|
+------------+----------+---+------+-----------+------+--------------------+
# Add a New Column Based on Conditions
# check if exercise needed based on weight
# if weight is more or equal to 60 -- Yes
# if weight is less than 55 -- No
# else "Enjoy"
df_condition = df_friends.withColumn("Exercise_Need", expr("CASE WHEN weight >= 60 THEN 'Yes' " + "WHEN weight < 55 THEN 'No' ELSE 'Enjoy' END"))
df_condition.show()
# Output:-
+------------+----------+---+------+-----------+------+-------------+
|friends_name| location|age|weight|meetup_date|offset|Exercise_Need|
+------------+----------+---+------+-----------+------+-------------+
| Prashant| Banglore| 25| 58| 2022-08-01| 1| Enjoy|
| Ankit| Banglore| 26| 54| 2021-05-02| 2| No|
| Ramakant| Gurugram| 24| 60| 2022-06-02| 3| Yes|
| Brijesh| Gazipur| 26| 75| 2022-07-04| 4| Yes|
| Devendra| Gurugram| 27| 62| 2022-04-03| 5| Yes|
| Ajay|Chandigarh| 25| 72| 2022-02-01| 6| Yes|
+------------+----------+---+------+-----------+------+-------------+
# Creating a new column using the existing column value inside the expression
# let increment the meetup month by number of offset
df_meetup = df_friends.withColumn("new_meetup_date", expr("add_months(meetup_date,offset)"))
df_meetup.show()
# Output:-
+------------+----------+---+------+-----------+------+---------------+
|friends_name| location|age|weight|meetup_date|offset|new_meetup_date|
+------------+----------+---+------+-----------+------+---------------+
| Prashant| Banglore| 25| 58| 2022-08-01| 1| 2022-09-01|
| Ankit| Banglore| 26| 54| 2021-05-02| 2| 2021-07-02|
| Ramakant| Gurugram| 24| 60| 2022-06-02| 3| 2022-09-02|
| Brijesh| Gazipur| 26| 75| 2022-07-04| 4| 2022-11-04|
| Devendra| Gurugram| 27| 62| 2022-04-03| 5| 2022-09-03|
| Ajay|Chandigarh| 25| 72| 2022-02-01| 6| 2022-08-01|
+------------+----------+---+------+-----------+------+---------------+
Thank You!!
Keep Learning!!!