Below are three Spark‑SQL‑friendly patterns for producing all distinct, unordered pairs from a single‑column table. Pick whichever feels most readable in your environment.


1️⃣ Self‑join with an inequality (the classic)

-- any table name is fine; keep only distinct values first
WITH base AS (
  SELECT DISTINCT a                         --  x, y, w, v
  FROM  your_table
)

SELECT  t1.a AS val1,
        t2.a AS val2
FROM    base t1
JOIN    base t2
        ON  t1.a < t2.a;                    -- guarantees val1 ≠ val2 and no duplicates

Why it works

  • t1.a < t2.a eliminates (x,x) pairs and keeps only the
    lexicographically “smaller‑first” orientation, so (x,y) survives but (y,x) never appears.

2️⃣ Row‑number window (if the data type isn’t naturally comparable)

WITH numbered AS (
  SELECT  a,
          ROW_NUMBER() OVER (ORDER BY a) AS rn
  FROM    (SELECT DISTINCT a FROM your_table)
)

SELECT  n1.a AS val1,
        n2.a AS val2
FROM    numbered n1
JOIN    numbered n2
        ON  n1.rn < n2.rn;                 -- rn1 < rn2 ensures uniqueness

This avoids relying on alphabetical ordering and works even if a is a complex
type (e.g., structs).


3️⃣ One‑liner with collect_set + combinations  (Spark 3.1+)

SELECT  c[0] AS val1,
        c[1] AS val2
FROM (
  SELECT  EXPLODE( combinations(
            SORT_ARRAY( collect_set(a) ), 2)      -- 2‑element combinations
          ) AS c
  FROM    your_table
);

Highlights

  • collect_set(a) → unique values
  • sort_array → deterministic order
  • combinations(arr, 2) → every 2‑item subset
  • explode flattens the array of pairs into rows.

🚀 Result for your sample values (x,y,w,v)

Whatever method you choose returns exactly:

val1val2
vw
vx
vy
wx
wy
xy

(Assuming simple string order v < w < x < y; actual order depends on your sort key.)


When to prefer which?

Requirement / ContextRecommended pattern
Small, readable SQL; strings/ints are fine#1 self‑join
Non‑comparable or multi‑column keys#2 row_number
“One shot” query inside Databricks SQL#3 combinations

All three avoid (x,x) and treat (x,y)(y,x).

Let’s break this down step by step so the expression becomes crystal clear:

You’re using:

SELECT EXPLODE(combinations(SORT_ARRAY(collect_set(a)), 2)) AS c
FROM your_table;

We’ll explain each part inside-out, step-by-step.


🧱 Step-by-step Breakdown

✅ Step 1: collect_set(a)

This aggregates all unique values of column a into an array.

Input table:

a
---
x
y
x
w
v

collect_set(a) ⇒ removes duplicates and gives:

["x", "y", "w", "v"]

It’s non-deterministic in order, so we fix it next.


✅ Step 2: SORT_ARRAY(...)

This makes the array deterministically sorted, so results like (x,y) or (y,x) are consistently ordered.

From:

["x", "y", "w", "v"]

To:

["v", "w", "x", "y"]

Now Spark always uses the same order → avoids (x,y) and (y,x) both appearing.


✅ Step 3: combinations(sorted_array, 2)

This generates all unique 2-element combinations, preserving order.

combinations(["v", "w", "x", "y"], 2)

Gives:

["v", "w"]
["v", "x"]
["v", "y"]
["w", "x"]
["w", "y"]
["x", "y"]

This is a built-in Spark SQL function (since Spark 3.1) that mimics the Python itertools.combinations().


✅ Step 4: EXPLODE(...)

Now that we have an array of arrays, we need to flatten it into rows.

Input:

[
 ["v", "w"],
 ["v", "x"],
 ["v", "y"],
 ["w", "x"],
 ["w", "y"],
 ["x", "y"]
]

EXPLODE(...) turns each of these into a row:

c
[v, w]
[v, x]
[v, y]
[w, x]
[w, y]
[x, y]

✅ Final SELECT

You can go one step further and split each pair into separate columns:

SELECT
  c[0] AS val1,
  c[1] AS val2
FROM (
  SELECT EXPLODE(combinations(SORT_ARRAY(collect_set(a)), 2)) AS c
  FROM your_table
);

🧠 Summary

FunctionPurpose
collect_set(a)Gathers unique values of a into an array
sort_array(...)Ensures consistent ordering (no duplicate unordered pairs)
combinations(..., 2)Generates all 2-element combinations (no repeats, no (x,x))
explode(...)Flattens array of arrays into rows

Result: You get all unique, unordered pairs (a, b) such that a ≠ b and no (b, a) duplicates exist.

Pages: 1 2 3


Discover more from HintsToday

Subscribe to get the latest posts sent to your email.

Posted in

Leave a Reply

Discover more from HintsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading