14 Join/Merge

In R, data-merging is done through a family of join operations (e.g., left join, right join, etc.).
Python (specifically pandas) provides the same types of joins.
The general logic, terminology, and workflow closely mirror what R users are familiar with, but pandas uses a single versatile function: merge().

1 Two Main Approaches in Python

Approach 1: The `pd.merge()` function

The most explicit way to join two tables.
Clearly specifies left table, right table, join type, and keys.

Approach 2: The `.merge()` method

Example: df1.merge(df2, ...)
Same result; some find it more natural in method-chaining workflows.

2 Pandas Merge Syntax

pd.merge(
    left=df1,
    right=df2,
    how="inner",        # join type
    on="id"             # key variable(s)
)

Or:

df1.merge(df2, how="inner", on="id")

3 Types of Joins

R concept	pandas argument	explanation
inner	`how="inner"`	rows with matches in both
left	`how="left"`	keep all rows from the left
right	`how="right"`	keep all rows from the right
full	`how="outer"`	union of keys from both tables
semi	filtering after merge	keep matching rows from left
anti	filtering after merge	keep non-matching rows from left

4 Example Data

import pandas as pd
students = pd.DataFrame({
    "id": [1, 2, 3, 4],
    "name": ["A", "B", "C", "D"]
})
students.head()

	id	name
0	1	A
1	2	B
2	3	C
3	4	D

scores = pd.DataFrame({
    "id": [2, 3, 3, 5],
    "score": [80, 90, 85, 70]
})

scores.head()

	id	score
0	2	80
1	3	90
2	3	85
3	5	70

5 Inner Join

students.merge(scores, how="inner", on="id")

	id	name	score
0	2	B	80
1	3	C	90
2	3	C	85

This returns rows where the key appears in both tables.

Note

If a key appears more than once in either table, pandas automatically performs a many-to-many join, producing all pairwise combinations of matches.

6 Left Join

students.merge(scores, how="left", on="id")

	id	name	score
0	1	A	NaN
1	2	B	80.0
2	3	C	90.0
3	3	C	85.0
4	4	D	NaN

Keeps all rows from students; unmatched scores show as NaN.

7 Right Join

students.merge(scores, how="right", on="id")

	id	name	score
0	2	B	80
1	3	C	90
2	3	C	85
3	5	NaN	70

Less commonly used but works the same way.

8 Full/Outer Join

students.merge(scores, how="outer", on="id")

	id	name	score
0	1	A	NaN
1	2	B	80.0
2	3	C	90.0
3	3	C	85.0
4	4	D	NaN
5	5	NaN	70.0

9 Joining by Different Column Names

Equivalent to R’s left_join(df1, df2, by = c("x" = "y"))

Python:

df1.merge(df2, left_on="x", right_on="y")

10 Joining by Multiple Keys

Equivalent to R’s left_join(df1, df2, by = c("id", "year"))

Python:

df1.merge(df2, on=["id", "year"])

11 Semi Join in Python

Keep only rows in df1 that have a match in df2:

df1[df1["id"].isin(df2["id"])]

Or:

df1.merge(df2[["id"]], on="id")

12 Anti Join in Python

Keep only rows in df1 that do not have a match in df2:

df1[~df1["id"].isin(df2["id"])]

13 Summary

Concept	tidyverse	pandas
Join function	`left_join()`	`.merge()`
Join type	`left`, `right`, `inner`, `full`	`how=`
Keys	`by=`	`on=`, `left_on=`, `right_on=`
Semi/anti	`semi_join()`, `anti_join()`	`isin()`, boolean filtering

1 Two Main Approaches in Python

Approach 1: The pd.merge() function

Approach 2: The .merge() method

2 Pandas Merge Syntax

3 Types of Joins

4 Example Data

5 Inner Join

6 Left Join

7 Right Join

8 Full/Outer Join

9 Joining by Different Column Names

10 Joining by Multiple Keys

11 Semi Join in Python

12 Anti Join in Python

13 Summary

Approach 1: The `pd.merge()` function

Approach 2: The `.merge()` method