pivoting (turning row values into columns)
Got it — you mean pivoting (turning row values into columns).
Here’s a quick breakdown depending on where you want to do it:
1️⃣ In SQL
Use PIVOT
or conditional aggregation.
Example table:
product | month | sales |
---|---|---|
A | Jan | 100 |
A | Feb | 120 |
B | Jan | 90 |
B | Feb | 150 |
SQL (conditional aggregation):
SELECT
product,
SUM(CASE WHEN month = 'Jan' THEN sales END) AS Jan,
SUM(CASE WHEN month = 'Feb' THEN sales END) AS Feb
FROM sales_table
GROUP BY product;
Result:
product | Jan | Feb |
---|---|---|
A | 100 | 120 |
B | 90 | 150 |
2️⃣ In PySpark
Use pivot()
.
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [
("A", "Jan", 100),
("A", "Feb", 120),
("B", "Jan", 90),
("B", "Feb", 150)
]
df = spark.createDataFrame(data, ["product", "month", "sales"])
pivot_df = df.groupBy("product").pivot("month").sum("sales")
pivot_df.show()
3️⃣ In Pandas (Python)
Use pivot_table()
.
import pandas as pd
data = {
"product": ["A", "A", "B", "B"],
"month": ["Jan", "Feb", "Jan", "Feb"],
"sales": [100, 120, 90, 150]
}
df = pd.DataFrame(data)
pivot_df = df.pivot_table(index="product", columns="month", values="sales", aggfunc="sum")
print(pivot_df)
Leave a Reply