Spark SQL Join Types- Syntax examples, Comparision

by | Jun 26, 2024 | SQL | 0 comments

Spark SQL supports several types of joins, each suited to different use cases. Below is a detailed explanation of each join type, including syntax examples and comparisons.

Types of Joins in Spark SQL

  1. Inner Join
  2. Left (Outer) Join
  3. Right (Outer) Join
  4. Full (Outer) Join
  5. Left Semi Join
  6. Left Anti Join
  7. Cross Join

1. Inner Join

An inner join returns only the rows that have matching values in both tables.

Syntax:

SELECT a.*, b.*
FROM tableA a
INNER JOIN tableB b
ON a.id = b.id;

Example:

SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
INNER JOIN departments
ON employees.dept_id = departments.dept_id;

2. Left (Outer) Join

A left join returns all rows from the left table and the matched rows from the right table. If no match is found, NULLs are returned for columns from the right table.

Syntax:

SELECT a.*, b.*
FROM tableA a
LEFT JOIN tableB b
ON a.id = b.id;

Example:

SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
LEFT JOIN departments
ON employees.dept_id = departments.dept_id;

3. Right (Outer) Join

A right join returns all rows from the right table and the matched rows from the left table. If no match is found, NULLs are returned for columns from the left table.

Syntax:

SELECT a.*, b.*
FROM tableA a
RIGHT JOIN tableB b
ON a.id = b.id;

Example:

SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
RIGHT JOIN departments
ON employees.dept_id = departments.dept_id;

4. Full (Outer) Join

A full outer join returns all rows when there is a match in either left or right table. Rows without a match in one of the tables will have NULLs in the columns of the non-matching table.

Syntax:

SELECT a.*, b.*
FROM tableA a
FULL OUTER JOIN tableB b
ON a.id = b.id;

Example:

SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
FULL OUTER JOIN departments
ON employees.dept_id = departments.dept_id;

5. Left Semi Join

A left semi join returns only the rows from the left table for which there is a match in the right table. It is equivalent to using an IN clause.

Syntax:

SELECT a.*
FROM tableA a
LEFT SEMI JOIN tableB b
ON a.id = b.id;

Example:

SELECT employees.emp_id, employees.emp_name
FROM employees
LEFT SEMI JOIN departments
ON employees.dept_id = departments.dept_id;

6. Left Anti Join

A left anti join returns only the rows from the left table for which there is no match in the right table. It is equivalent to using a NOT IN clause.

Syntax:

SELECT a.*
FROM tableA a
LEFT ANTI JOIN tableB b
ON a.id = b.id;

Example:

SELECT employees.emp_id, employees.emp_name
FROM employees
LEFT ANTI JOIN departments
ON employees.dept_id = departments.dept_id;

7. Cross Join

A cross join returns the Cartesian product of the two tables, meaning every row from the left table is joined with every row from the right table.

Syntax:

SELECT a.*, b.*
FROM tableA a
CROSS JOIN tableB b;

Example:

SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
CROSS JOIN departments;

Comparison

Join TypeReturns Rows from Left TableReturns Rows from Right TableReturns Matched RowsReturns Unmatched Rows with NULLs
Inner JoinYesYesYesNo
Left JoinYesNoYesYes (for left table)
Right JoinNoYesYesYes (for right table)
Full Outer JoinYesYesYesYes (for both tables)
Left Semi JoinYesNoYesNo
Left Anti JoinYesNoNoYes (only for non-matching rows)
Cross JoinYesYesN/AN/A

Examples From Official Doc

-- Use employee and department tables to demonstrate different type of joins.
SELECT * FROM employee;
+---+-----+------+
| id| name|deptno|
+---+-----+------+
|105|Chloe|     5|
|103| Paul|     3|
|101| John|     1|
|102| Lisa|     2|
|104| Evan|     4|
|106|  Amy|     6|
+---+-----+------+

SELECT * FROM department;
+------+-----------+
|deptno|   deptname|
+------+-----------+
|     3|Engineering|
|     2|      Sales|
|     1|  Marketing|
+------+-----------+

-- Use employee and department tables to demonstrate inner join.
SELECT id, name, employee.deptno, deptname
    FROM employee INNER JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno|   deptname|
+---+-----+------+-----------|
|103| Paul|     3|Engineering|
|101| John|     1|  Marketing|
|102| Lisa|     2|      Sales|
+---+-----+------+-----------|

-- Use employee and department tables to demonstrate left join.
SELECT id, name, employee.deptno, deptname
    FROM employee LEFT JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno|   deptname|
+---+-----+------+-----------|
|105|Chloe|     5|       NULL|
|103| Paul|     3|Engineering|
|101| John|     1|  Marketing|
|102| Lisa|     2|      Sales|
|104| Evan|     4|       NULL|
|106|  Amy|     6|       NULL|
+---+-----+------+-----------|

-- Use employee and department tables to demonstrate right join.
SELECT id, name, employee.deptno, deptname
    FROM employee RIGHT JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno|   deptname|
+---+-----+------+-----------|
|103| Paul|     3|Engineering|
|101| John|     1|  Marketing|
|102| Lisa|     2|      Sales|
+---+-----+------+-----------|

-- Use employee and department tables to demonstrate full join.
SELECT id, name, employee.deptno, deptname
    FROM employee FULL JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno|   deptname|
+---+-----+------+-----------|
|101| John|     1|  Marketing|
|106|  Amy|     6|       NULL|
|103| Paul|     3|Engineering|
|105|Chloe|     5|       NULL|
|104| Evan|     4|       NULL|
|102| Lisa|     2|      Sales|
+---+-----+------+-----------|

-- Use employee and department tables to demonstrate cross join.
SELECT id, name, employee.deptno, deptname FROM employee CROSS JOIN department;
+---+-----+------+-----------|
| id| name|deptno|   deptname|
+---+-----+------+-----------|
|105|Chloe|     5|Engineering|
|105|Chloe|     5|  Marketing|
|105|Chloe|     5|      Sales|
|103| Paul|     3|Engineering|
|103| Paul|     3|  Marketing|
|103| Paul|     3|      Sales|
|101| John|     1|Engineering|
|101| John|     1|  Marketing|
|101| John|     1|      Sales|
|102| Lisa|     2|Engineering|
|102| Lisa|     2|  Marketing|
|102| Lisa|     2|      Sales|
|104| Evan|     4|Engineering|
|104| Evan|     4|  Marketing|
|104| Evan|     4|      Sales|
|106|  Amy|     4|Engineering|
|106|  Amy|     4|  Marketing|
|106|  Amy|     4|      Sales|
+---+-----+------+-----------|

-- Use employee and department tables to demonstrate semi join.
SELECT * FROM employee SEMI JOIN department ON employee.deptno = department.deptno;
+---+-----+------+
| id| name|deptno|
+---+-----+------+
|103| Paul|     3|
|101| John|     1|
|102| Lisa|     2|
+---+-----+------+

-- Use employee and department tables to demonstrate anti join.
SELECT * FROM employee ANTI JOIN department ON employee.deptno = department.deptno;
+---+-----+------+
| id| name|deptno|
+---+-----+------+
|105|Chloe|     5|
|104| Evan|     4|
|106|  Amy|     6|
+---+-----+------+

Written by HintsToday Team

Related Posts

Temporary Functions in PL/Sql Vs Spark Sql

Temporary functions allow users to define functions that are session-specific and used to encapsulate reusable logic within a database session. While both PL/SQL and Spark SQL support the concept of user-defined functions, their implementation and usage differ...

Spark SQL windows Function and Best Usecases

For Better understanding on Spark SQL windows Function and Best Usecases do refer our post Window functions in Oracle Pl/Sql and Hive explained and compared with examples. Window functions in Spark SQL are powerful tools that allow you to perform calculations across a...

Date and Time manipulation in Oracle SQL, Apache Hive QL, Mysql

Date and Time manipulation in Oracle SQL In Oracle SQL, date and time manipulation is essential for many database operations, ranging from basic date arithmetic to complex formatting and extraction. Here's a guide covering various common operations you might need. 1....

String/Character Manipulation functions in Oracle PL/SQL, Apache Hive

Function NameDescriptionExample UsageResultCONCATConcatenates two strings.SELECT CONCAT('Oracle', 'PL/SQL') FROM dual;OraclePL/SQL`` (Concatenation)Concatenates two strings.LENGTHReturns the length of a string.SELECT LENGTH('Oracle');6LOWERConverts all characters in a...

Indexing in SQL- Explain with examples

Indexing in SQL is a technique used to improve the performance of queries by creating special data structures (indexes) that allow for faster data retrieval. Indexes are created on one or more columns of a table, and they store the values of those columns in a sorted...

Pattern matching in SQL- Like Operator

LIKE Operator: The LIKE operator is used to search for a specified pattern in a column. It allows the use of wildcards: % (percent sign): Matches zero or more characters. _ (underscore): Matches any single character. Examples: SELECT * FROM employees WHERE last_name...

Order of appearance and execution in the SELECT query?

For Appearance( So Few Jawans & Workers Go Home On Late) SELECT – FROM – JOIN – ON – WHERE – GROUP BY – HAVING – ORDER BY – LIMIT FROM – JOIN – ON – WHERE – GROUP BY – HAVING – SELECT – ORDER...

Functions in SQL- Examples

SQL provides various functions to perform operations on data stored in a database. Here are some commonly used SQL functions categorized based on their functionality: Aggregate Functions: COUNT(): Counts the number of rows. SUM(): Calculates the sum of values. AVG():...

Get the latest news

Subscribe to our Newsletter

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *