PySpark Date Function Cheat Sheet (with Input-Output Types & Examples)

This one-pager covers all core PySpark date and timestamp functions, their input/output types, and example usage. Suitable for data engineers and interview prep.


๐Ÿ”„ Date Conversion & Parsing

FunctionInputOutputExample
to_date(col, fmt)StringDateto_date('2025-06-14', 'yyyy-MM-dd') โ†’ 2025-06-14
to_timestamp(col, fmt)StringTimestampto_timestamp('2025-06-14 12:01', 'yyyy-MM-dd HH:mm')
unix_timestamp(col, fmt)StringLong (seconds since epoch)unix_timestamp('2025-06-14', 'yyyy-MM-dd')
from_unixtime(col)LongString (formatted time)from_unixtime(1718342400)

๐Ÿ•’ Date Extraction

FunctionInputOutputExample
year(col)Date/TimestampIntyear('2025-06-14') โ†’ 2025
month(col)Date/TimestampIntmonth('2025-06-14') โ†’ 6
dayofmonth(col)Date/TimestampIntdayofmonth('2025-06-14') โ†’ 14
dayofweek(col)Date/TimestampInt (1=Sun, 7=Sat)dayofweek('2025-06-14') โ†’ 7
dayofyear(col)Date/TimestampIntdayofyear('2025-06-14') โ†’ 165
weekofyear(col)Date/TimestampIntweekofyear('2025-06-14') โ†’ 24
quarter(col)Date/TimestampIntquarter('2025-06-14') โ†’ 2
hour(col)TimestampInthour('2025-06-14 09:30:00') โ†’ 9
minute(col)TimestampIntminute('2025-06-14 09:30:00') โ†’ 30
second(col)TimestampIntsecond('2025-06-14 09:30:25') โ†’ 25

โž• Date Arithmetic

FunctionInputOutputExample
date_add(date, days)DateDatedate_add('2025-06-14', 10) โ†’ 2025-06-24
date_sub(date, days)DateDatedate_sub('2025-06-14', 7) โ†’ 2025-06-07
add_months(date, n)DateDateadd_months('2025-06-14', -1) โ†’ 2025-05-14
months_between(date1, date2)DatesDoublemonths_between('2025-06-14', '2025-05-14') โ†’ 1.0
datediff(end, start)DatesIntdatediff('2025-06-14', '2025-06-01') โ†’ 13
next_day(date, 'day')DateDatenext_day('2025-06-14', 'Sunday') โ†’ 2025-06-15

โš–๏ธ Truncation & Formatting

FunctionInputOutputExample
`trunc(date, ‘MM’‘YYYY’)`DateDate (truncated)
date_trunc('unit', ts)TimestampTimestampdate_trunc('hour', '2025-06-14 12:34:56') โ†’ 2025-06-14 12:00:00
last_day(date)DateDatelast_day('2025-06-14') โ†’ 2025-06-30
date_format(date, fmt)Date/TimestampStringdate_format('2025-06-14', 'MMM-yyyy') โ†’ 'Jun-2025'

โ“ Miscellaneous

FunctionInputOutputExample
current_date()NoneDateReturns today (e.g., 2025-06-14)
current_timestamp()NoneTimestampReturns now (e.g., 2025-06-14 12:34:56)
now() (alias)NoneTimestampSame as current_timestamp()
from_utc_timestamp(ts, tz)Timestamp, TZTimestampfrom_utc_timestamp('2025-06-14 12:00', 'Asia/Kolkata')
to_utc_timestamp(ts, tz)Timestamp, TZTimestampto_utc_timestamp('2025-06-14 17:30', 'Asia/Kolkata')

โš ๏ธ Notes

  • Most functions require DateType or TimestampType, not String.
  • Use to_date() / to_timestamp() to convert string columns before applying date functions.
  • Use lit("2025-06-14") with to_date() if working with literal strings.

๐Ÿ“ Practical Use Case: Get First and Last Day of Previous Month

from pyspark.sql import functions as F

prev_month_last = F.last_day(F.add_months(F.current_date(), -1))
prev_month_first = F.trunc(prev_month_last, 'MM')

Pages: 1 2 3 4 5

Posted in ,

Leave a Reply

Your email address will not be published. Required fields are marked *