13 Reshape

1 Introduction

In data analysis, reshaping refers to transforming a dataset between:

  • Long ↔︎ Wide layouts, where repeated measures can be stacked as rows or spread across columns
  • Managing composite variable names by splitting them into multiple fields or combining fields into a single label

In Python (pandas), these tasks use well-known tools.

Operation pandas tool
Wide → Long melt()
Long → Wide pivot() or pivot_table()
Split a column str.split(..., expand=True)
Combine multiple columns string concatenation (+)

Example Dataset

import pandas as pd

# A simple dataset with two subject-wise columns
df = pd.DataFrame({
    "id": [1, 2, 3],
    "math": [80, 85, 90],
    "english": [70, 78, 88]
})
df
id math english
0 1 80 70
1 2 85 78
2 3 90 88

2 Long Format (Wide → Long)

Transform the subject-wise columns into key–value rows.

df_long = df.melt(
    id_vars="id",               # keep 'id' as identifier column
    value_vars=["math", "english"],  # columns to unpivot
    var_name="subject",         # new column storing original column names
    value_name="score"          # new column storing values
)
df_long
id subject score
0 1 math 80
1 2 math 85
2 3 math 90
3 1 english 70
4 2 english 78
5 3 english 88

3 Wide Format (Long → Wide)

Starting from the long dataset:

df_long
id subject score
0 1 math 80
1 2 math 85
2 3 math 90
3 1 english 70
4 2 english 78
5 3 english 88
df_wide = df_long.pivot(
    index="id",                 # one row per id
    columns="subject",          # create new columns from subject names
    values="score"              # fill cells with score values
).reset_index()                 # bring index back as a regular column
df_wide
subject id english math
0 1 70 80
1 2 78 85
2 3 88 90

4 Splitting a Column into Multiple Fields

  • Suppose you have a dataset where people’s full names, separated by commas, are stored in a single column:
df = pd.DataFrame({
    "name": ["Md Rasel,Biswas", "Sara,Khan", "John,Doe"]
})
df
name
0 Md Rasel,Biswas
1 Sara,Khan
2 John,Doe
  • Split the name column into two separate fields:
# split on the comma character; expand=True creates two new columns
df[["first", "last"]] = df["name"].str.split(",", expand=True)
df
name first last
0 Md Rasel,Biswas Md Rasel Biswas
1 Sara,Khan Sara Khan
2 John,Doe John Doe

5 Concatenating the Fields Back Together

  • You can then combine the pieces again—for example, to create a compact label:
# concatenate first and last names with an underscore in between
df["combined"] = df["first"] + "_" + df["last"]
df
name first last combined
0 Md Rasel,Biswas Md Rasel Biswas Md Rasel_Biswas
1 Sara,Khan Sara Khan Sara_Khan
2 John,Doe John Doe John_Doe

6 Summary

Task tidyverse pandas
Long → Wide pivot_wider() pivot() / pivot_table()
Wide → Long pivot_longer() melt()
Split columns separate() str.split(expand=True)
Combine columns unite() string concat (+)

7 Exercise

year species n
0 2009 Adelie 52
1 2007 Adelie 50
2 2008 Adelie 50
3 2008 Gentoo 46
4 2009 Gentoo 44
5 2007 Gentoo 34
6 2007 Chinstrap 26
7 2009 Chinstrap 24
8 2008 Chinstrap 18

Convert this bi-variate frequency table into a wide layout like:

species year Adelie Chinstrap Gentoo
0 2007 50 26 34
1 2008 50 18 46
2 2009 52 24 44