🚀
Pandas
06 Combining Dataframes
++++
Data Science
May 2026×Notebook lesson

Notebook converted from Jupyter for blog publishing.

06-Combining-DataFrames

Driptanil Datta
Driptanil DattaSoftware Developer

Combining DataFrames

Full Official Guide (Lots of examples!)

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html (opens in a new tab)



import numpy as np
import pandas as pd

Concatenation

Directly "glue" together dataframes.

data_one = {'A': ['A0', 'A1', 'A2', 'A3'],'B': ['B0', 'B1', 'B2', 'B3']}
data_two = {'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']}
one = pd.DataFrame(data_one)
two = pd.DataFrame(data_two)
one
HTML
MORE
A
B
0
A0
B0
two
HTML
MORE
C
D
0
C0
D0

Axis = 0

Concatenate along rows

axis0 = pd.concat([one,two],axis=0)
axis0
HTML
MORE
A
B
C
D
0

Axis = 1

Concatenate along columns

axis1 = pd.concat([one,two],axis=1)
axis1
HTML
MORE
A
B
C
D
0

Axis 0 , but columns match up

In case you wanted this:

two.columns = one.columns
pd.concat([one,two])
HTML
MORE
A
B
0
A0
B0

Merge

Data Tables

registrations = pd.DataFrame({'reg_id':[1,2,3,4],'name':['Andrew','Bobo','Claire','David']})
logins = pd.DataFrame({'log_id':[1,2,3,4],'name':['Xavier','Andrew','Yolanda','Bobo']})
registrations
HTML
MORE
reg_id
name
0
1
Andrew
logins
HTML
MORE
log_id
name
0
1
Xavier

pd.merge()

Merge pandas DataFrames based on key columns, similar to a SQL join. Results based on the how parameter.

help(pd.merge)
STDOUT
MORE
Help on function merge in module pandas.core.reshape.merge:

merge(left, right, how: str = 'inner', on=None, left_on=None, right_on=None, left_index: bool = False, right_index: bool = False, sort: bool = False, suffixes=('_x', '_y'), copy: bool = True, indicator: bool = False, validate=None) -> 'DataFrame'
    Merge DataFrame or named Series objects with a database-style join.
    

Inner,Left, Right, and Outer Joins

Inner Join

Match up where the key is present in BOTH tables. There should be no NaNs due to the join, since by definition to be part of the Inner Join they need info in both tables. Only Andrew and Bobo both registered and logged in.

# Notice pd.merge doesn't take in a list like concat
pd.merge(registrations,logins,how='inner',on='name')
HTML
MORE
reg_id
name
log_id
0
1
# Pandas smart enough to figure out key column (on parameter) if only one column name matches up
pd.merge(registrations,logins,how='inner')
HTML
MORE
reg_id
name
log_id
0
1
# Pandas reports an error if "on" key column isn't in both dataframes
# pd.merge(registrations,logins,how='inner',on='reg_id')

Left Join

Match up AND include all rows from Left Table. Show everyone who registered on Left Table, if they don't have login info, then fill with NaN.

pd.merge(registrations,logins,how='left')
HTML
MORE
reg_id
name
log_id
0
1

Right Join

Match up AND include all rows from Right Table. Show everyone who logged in on the Right Table, if they don't have registration info, then fill with NaN.

pd.merge(registrations,logins,how='right')
HTML
MORE
reg_id
name
log_id
0
1.0

Outer Join

Match up on all info found in either Left or Right Table. Show everyone that's in the Log in table and the registrations table. Fill any missing info with NaN

pd.merge(registrations,logins,how='outer')
HTML
MORE
reg_id
name
log_id
0
1.0

Join on Index or Column

Use combinations of left_on,right_on,left_index,right_index to merge a column or index on each other

registrations
HTML
MORE
reg_id
name
0
1
Andrew
logins
HTML
MORE
log_id
name
0
1
Xavier
registrations = registrations.set_index("name")
registrations
HTML
MORE
reg_id
name
Andrew
1
Bobo
pd.merge(registrations,logins,left_index=True,right_on='name')
HTML
MORE
reg_id
log_id
name
1
1
pd.merge(logins,registrations,right_index=True,left_on='name')
HTML
MORE
log_id
name
reg_id
1
2

Dealing with differing key column names in joined tables

registrations = registrations.reset_index()
registrations
HTML
MORE
name
reg_id
0
Andrew
1
logins
HTML
MORE
log_id
name
0
1
Xavier
registrations.columns = ['reg_name','reg_id']
registrations
HTML
MORE
reg_name
reg_id
0
Andrew
1
# ERROR
# pd.merge(registrations,logins)
pd.merge(registrations,logins,left_on='reg_name',right_on='name')
HTML
MORE
reg_name
reg_id
log_id
name
0
pd.merge(registrations,logins,left_on='reg_name',right_on='name').drop('reg_name',axis=1)
HTML
MORE
reg_id
log_id
name
0
1

Pandas automatically tags duplicate columns

registrations.columns = ['name','id']
logins.columns = ['id','name']
registrations
HTML
MORE
name
id
0
Andrew
1
logins
HTML
MORE
id
name
0
1
Xavier
# _x is for left
# _y is for right
pd.merge(registrations,logins,on='name')
HTML
MORE
name
id_x
id_y
0
Andrew
pd.merge(registrations,logins,on='name',suffixes=('_reg','_log'))
HTML
MORE
name
id_reg
id_log
0
Andrew


Drip

Driptanil Datta

Software Developer

Building full-stack systems, one commit at a time. This blog is a centralized learning archive for developers.

Legal Notes
Disclaimer

The content provided on this blog is for educational and informational purposes only. While I strive for accuracy, all information is provided "as is" without any warranties of completeness, reliability, or accuracy. Any action you take upon the information found on this website is strictly at your own risk.

Copyright & IP

Certain technical content, interview questions, and datasets are curated from external educational sources to provide a centralized learning resource. Respect for original authorship is maintained; no copyright infringement is intended. All trademarks, logos, and brand names are the property of their respective owners.

System Operational

© 2026 Driptanil Datta. All rights reserved.