This Post is Collection of Handy Tricks and Snippets. Passing Parameters in Automation of Scripts using Python Python provides several ways to pass parameters in automation of scripts, mimicking SAS macro variables, macro modules, and macro scripting. Here are some…
SQL
Useful Code Snippets in Python and Pyspark
dbname.table_name want to save dbname and table_name in seperate variable and then to pass them as parameters in pyspark/python script # String containing dbname and table_name full_table_name = “my_database.my_table” # Split into dbname and table_name dbname,…
What is indexing in SQL- Syntax, Types, Uses, Advantages, Disadvantages, and Scenarios
What is Indexing? Indexing is a data structure technique that allows the database to quickly locate and access specific data. It’s similar to the index at the back of a book, which helps you find specific pages quickly. How Indexing Works Index Creation: The…
Spark SQL- operators Cheatsheet- Explanation with Usecases
Spark SQL Operators Cheatsheet 1. Arithmetic Operators OperatorSyntaxDescriptionExample+a + bAdds two valuesSELECT 5 + 3;-a – bSubtracts one value from anotherSELECT 5 – 3;*a * bMultiplies two valuesSELECT 5 * 3;/a / bDivides one value by anotherSELECT 6 / 2;%a %…
How to Write Perfect Pseudocode- Syntax , Standards, Terms
Syntax Rules for Pseudocode Natural Language: Use simple and clear natural language to describe steps. Keywords: Use standard control flow keywords such as: IF, ELSE, ENDIF FOR, WHILE, ENDWHILE FUNCTION, CALL INPUT, OUTPUT Indentation: Indent blocks within loops or…
Date and Time Functions- Pyspark Dataframes & Pyspark Sql Queries
A quick reference for date manipulation in PySpark:– FunctionDescriptionWorks OnExample (Spark SQL)Example (DataFrame API)to_dateConverts string to date.StringTO_DATE(‘2024-01-15’, ‘yyyy-MM-dd’)to_date(col(“date_str”), “yyyy-MM-dd”)to_timestampConverts string to…
Window functions in PySpark on Dataframe programming
Window functions in PySpark allow you to perform operations on a subset of your data using a “window” that defines a range of rows. These functions are similar to SQL window functions and are useful for tasks like ranking, cumulative sums, and moving…
Spark SQL windows Function and Best Usecases
For Better understanding on Spark SQL windows Function and Best Usecases do refer our post Window functions in Oracle Pl/Sql and Hive explained and compared with examples. Window functions in Spark SQL are powerful tools that allow you to perform calculations across a…
PySpark architecture cheat sheet- How to Know Which parts of your PySpark ETL script are executed on the driver, master (YARN), or executors
PySpark Architecture Cheat Sheet 1. Core Components of PySpark ComponentDescriptionKey FeaturesSpark CoreThe foundational Spark component for scheduling, memory management, and fault tolerance.Task scheduling, data partitioning, RDD APIs.Spark SQLEnables interaction…
Scientists find a ‘Unique’ Black Hole that is hungrier than ever in the Universe
Yup! Scientists find a ‘Unique’ Black Hole that is hungier than ever in the Universe! Scientists have observed a fascinating phenomenon involving a supermassive black hole, AT2022dsb, which appears to be devouring a star in a “tidal disruption event”…
Quick Spark SQL reference- Spark SQL cheatsheet for Revising in One Go
Here’s an enhanced Spark SQL cheatsheet with additional details, covering join types, union types, and set operations like EXCEPT and INTERSECT, along with options for table management (DDL operations like UPDATE, INSERT, DELETE, etc.). This comprehensive sheet…
Functions in Spark SQL- Cheatsheets, Complex Examples
Here’s a categorized Spark SQL function reference, which organizes common Spark SQL functions by functionality. This can help with selecting the right function based on the operation you want to perform. 1. Aggregate Functions FunctionDescriptionExampleavg()Calculates…