Optimization in Pyspark