13 Reshape – Python for Data Science

1 Introduction

In data analysis, reshaping refers to transforming a dataset between:

Long ↔︎ Wide layouts, where repeated measures can be stacked as rows or spread across columns
Managing composite variable names by splitting them into multiple fields or combining fields into a single label

In Python (pandas), these tasks use well-known tools.

Operation	pandas tool
Wide → Long	`melt()`
Long → Wide	`pivot()` or `pivot_table()`
Split a column	`str.split(..., expand=True)`
Combine multiple columns	string concatenation (`+`)

Example Dataset

import pandas as pd

# A simple dataset with two subject-wise columns
df = pd.DataFrame({
    "id": [1, 2, 3],
    "math": [80, 85, 90],
    "english": [70, 78, 88]
})
df

	id	math	english
0	1	80	70
1	2	85	78
2	3	90	88

2 Long Format (Wide → Long)

Transform the subject-wise columns into key–value rows.

df_long = df.melt(
    id_vars="id",               # keep 'id' as identifier column
    value_vars=["math", "english"],  # columns to unpivot
    var_name="subject",         # new column storing original column names
    value_name="score"          # new column storing values
)
df_long

	id	subject	score
0	1	math	80
1	2	math	85
2	3	math	90
3	1	english	70
4	2	english	78
5	3	english	88

3 Wide Format (Long → Wide)

Starting from the long dataset:

df_long

	id	subject	score
0	1	math	80
1	2	math	85
2	3	math	90
3	1	english	70
4	2	english	78
5	3	english	88

df_wide = df_long.pivot(
    index="id",                 # one row per id
    columns="subject",          # create new columns from subject names
    values="score"              # fill cells with score values
).reset_index()                 # bring index back as a regular column
df_wide

subject	id	english	math
0	1	70	80
1	2	78	85
2	3	88	90

4 Splitting a Column into Multiple Fields

Suppose you have a dataset where people’s full names, separated by commas, are stored in a single column:

df = pd.DataFrame({
    "name": ["Md Rasel,Biswas", "Sara,Khan", "John,Doe"]
})
df

	name
0	Md Rasel,Biswas
1	Sara,Khan
2	John,Doe

Split the name column into two separate fields:

# split on the comma character; expand=True creates two new columns
df[["first", "last"]] = df["name"].str.split(",", expand=True)
df

	name	first	last
0	Md Rasel,Biswas	Md Rasel	Biswas
1	Sara,Khan	Sara	Khan
2	John,Doe	John	Doe

5 Concatenating the Fields Back Together

You can then combine the pieces again—for example, to create a compact label:

# concatenate first and last names with an underscore in between
df["combined"] = df["first"] + "_" + df["last"]
df

	name	first	last	combined
0	Md Rasel,Biswas	Md Rasel	Biswas	Md Rasel_Biswas
1	Sara,Khan	Sara	Khan	Sara_Khan
2	John,Doe	John	Doe	John_Doe

6 Summary

Task	tidyverse	pandas
Long → Wide	`pivot_wider()`	`pivot()` / `pivot_table()`
Wide → Long	`pivot_longer()`	`melt()`
Split columns	`separate()`	`str.split(expand=True)`
Combine columns	`unite()`	string concat (`+`)

7 Exercise

	year	species	n
0	2009	Adelie	52
1	2007	Adelie	50
2	2008	Adelie	50
3	2008	Gentoo	46
4	2009	Gentoo	44
5	2007	Gentoo	34
6	2007	Chinstrap	26
7	2009	Chinstrap	24
8	2008	Chinstrap	18

Convert this bi-variate frequency table into a wide layout like:

species	year	Adelie	Chinstrap	Gentoo
0	2007	50	26	34
1	2008	50	18	46
2	2009	52	24	44