🚀
Pandas
05 Groupby Operations and Multiindex
++++
Data Science
May 2026×Notebook lesson

Notebook converted from Jupyter for blog publishing.

05-Groupby-Operations-and-MultiIndex

Driptanil Datta
Driptanil DattaSoftware Developer

Groupby Operations and Multi-level Index

import numpy as np
import pandas as pd

Data

df = pd.read_csv('mpg.csv')
df
HTML
MORE
mpg
cylinders
displacement
horsepower
weight

groupby() method

# Creates a groupby object waiting for an aggregate method
df.groupby('model_year')
RESULT
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000246790FEC88>

Adding an aggregate method call. To use a grouped object, you need to tell pandas how you want to aggregate the data.

Common Options:

mean(): Compute mean of groups sum(): Compute sum of group values size(): Compute group sizes count(): Compute count of group std(): Standard deviation of groups var(): Compute variance of groups sem(): Standard error of the mean of groups describe(): Generates descriptive statistics first(): Compute first of group values last(): Compute last of group values nth() : Take nth value, or a subset if n is a list min(): Compute min of group values max(): Compute max of group values

Full List at the Online Documentation: https://pandas.pydata.org/docs/reference/groupby.html (opens in a new tab)

# model_year becomes the index! It is NOT a column name,it is now the name of the index
df.groupby('model_year').mean()
HTML
MORE
mpg
cylinders
displacement
weight
acceleration
avg_year = df.groupby('model_year').mean()
avg_year.index
RESULT
Int64Index([70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82], dtype='int64', name='model_year')
avg_year.columns
RESULT
Index(['mpg', 'cylinders', 'displacement', 'weight', 'acceleration', 'origin'], dtype='object')
avg_year['mpg']
RESULT
MORE
model_year
70    17.689655
71    21.250000
72    18.714286
73    17.100000
df.groupby('model_year').mean()['mpg']
RESULT
MORE
model_year
70    17.689655
71    21.250000
72    18.714286
73    17.100000
df.groupby('model_year').describe()
HTML
MORE
mpg
cylinders
...
acceleration
origin
df.groupby('model_year').describe().transpose()
HTML
MORE
model_year
70
71
72
73

Groupby Multiple Columns

Let's explore average mpg per year per cylinder count

df.groupby(['model_year','cylinders']).mean()
HTML
MORE
mpg
displacement
weight
acceleration
origin
df.groupby(['model_year','cylinders']).mean().index
RESULT
MORE
MultiIndex([(70, 4),
            (70, 6),
            (70, 8),
            (71, 4),
            (71, 6),

MultiIndex

The MultiIndex Object

year_cyl = df.groupby(['model_year','cylinders']).mean()
year_cyl
HTML
MORE
mpg
displacement
weight
acceleration
origin
year_cyl.index
RESULT
MORE
MultiIndex([(70, 4),
            (70, 6),
            (70, 8),
            (71, 4),
            (71, 6),
year_cyl.index.levels
RESULT
FrozenList([[70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82], [3, 4, 5, 6, 8]])
year_cyl.index.names
RESULT
FrozenList(['model_year', 'cylinders'])

Indexing with the Hierarchical Index

Full Documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html (opens in a new tab)

year_cyl.head()
HTML
MORE
mpg
displacement
weight
acceleration
origin

Grab Based on Outside Index

year_cyl.loc[70]
HTML
MORE
mpg
displacement
weight
acceleration
origin
year_cyl.loc[[70,72]]
HTML
MORE
mpg
displacement
weight
acceleration
origin

Grab a Single Row

year_cyl.loc[(70,8)]
RESULT
MORE
mpg               14.111111
displacement     367.555556
weight          3940.055556
acceleration      11.194444
origin             1.000000

Grab Based on Cross-section with .xs()

This method takes a key&#123;:python&#125; argument to select data at a particular level of a MultiIndex.

Parameters

key : label or tuple of label Label contained in the index, or partially in a MultiIndex. axis : {0 or 'index', 1 or 'columns'}, default 0 Axis to retrieve cross-section on. level : object, defaults to first n levels (n=1 or len(key)) In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.

year_cyl.xs(key=70,axis=0,level='model_year')
HTML
MORE
mpg
displacement
weight
acceleration
origin
# Mean column values for 4 cylinders per year
year_cyl.xs(key=4,axis=0,level='cylinders')
HTML
MORE
mpg
displacement
weight
acceleration
origin

Careful note!

Keep in mind, its usually much easier to filter out values before running a groupby() call, so you should attempt to filter out any values/categories you don't want to use. For example, its much easier to remove 4 cylinder cars before the groupby() call, very difficult to this sort of thing after a group by.

df[df['cylinders'].isin([6,8])].groupby(['model_year','cylinders']).mean()
HTML
MORE
mpg
displacement
weight
acceleration
origin

Swap Levels

year_cyl.swaplevel().head()
HTML
MORE
mpg
displacement
weight
acceleration
origin

Sorting MultiIndex

year_cyl.sort_index(level='model_year',ascending=False)
HTML
MORE
mpg
displacement
weight
acceleration
origin
year_cyl.sort_index(level='cylinders',ascending=False)
HTML
MORE
mpg
displacement
weight
acceleration
origin

Advanced: agg() method

The agg() method allows you to customize what aggregate functions you want per category

df
HTML
MORE
mpg
cylinders
displacement
horsepower
weight

agg() on a DataFrame

# These strings need to match up with built-in method names
df.agg(['median','mean'])
HTML
MORE
mpg
cylinders
displacement
weight
acceleration
df.agg(['sum','mean'])[['mpg','weight']]
HTML
MORE
mpg
weight
sum
9358.800000
1.182229e+06

Specify aggregate methods per column

agg() is very powerful,allowing you to pass in a dictionary where the keys are the columns and the values are a list of aggregate methods.

df.agg({'mpg':['median','mean'],'weight':['mean','std']})
HTML
MORE
mpg
weight
mean
23.514573
2970.424623

agg() with groupby()

df.groupby('model_year').agg({'mpg':['median','mean'],'weight':['mean','std']})
HTML
MORE
mpg
weight
median
mean
mean
Drip

Driptanil Datta

Software Developer

Building full-stack systems, one commit at a time. This blog is a centralized learning archive for developers.

Legal Notes
Disclaimer

The content provided on this blog is for educational and informational purposes only. While I strive for accuracy, all information is provided "as is" without any warranties of completeness, reliability, or accuracy. Any action you take upon the information found on this website is strictly at your own risk.

Copyright & IP

Certain technical content, interview questions, and datasets are curated from external educational sources to provide a centralized learning resource. Respect for original authorship is maintained; no copyright infringement is intended. All trademarks, logos, and brand names are the property of their respective owners.

System Operational

© 2026 Driptanil Datta. All rights reserved.