HintsToday
Hints and Answers for Everything
recent posts
- Memory Management in PySpark- CPU Cores, executors, executor memory
- Memory Management in PySpark- Scenario 1, 2
- Develop and maintain CI/CD pipelines using GitHub for automated deployment, version control
- Complete guide to building and managing data workflows in Azure Data Factory (ADF)
- Complete guide to architecting and implementing data governance using Unity Catalog on Databricks
about
Author: lochan2014
Here’s an enhanced Spark SQL cheatsheet with additional details, covering join types, union types, and set operations like EXCEPT and INTERSECT, along with options for table management (DDL operations like UPDATE, INSERT, DELETE, etc.). This comprehensive sheet is designed to help with quick Spark SQL reference. Category Concept Syntax / Example Description Basic Statements SELECT SELECT col1, col2 FROM table WHERE…
Comparative overview of partitions, bucketing, segmentation, and broadcasting in PySpark, Spark SQL, and Hive QL in tabular form, along with examples Here’s a comparative overview of partitions, bucketing, segmentation, and broadcasting in PySpark, Spark SQL, and Hive QL in tabular form, along with examples: Concept PySpark Spark SQL Hive QL Partitions df.repartition(numPartitions, “column”) creates partitions based on specified column. CREATE TABLE table_name PARTITIONED BY (col1 STRING) allows data to be organized by partition. ALTER TABLE…
PIVOT Clause in Spark sql or Mysql or Oracle Pl sql or Hive QL The PIVOT clause is a powerful tool in SQL that allows you to rotate rows into columns, making it easier to analyze and report data. Here’s how to use the PIVOT clause in Spark SQL, MySQL, Oracle PL/SQL, and Hive QL:…