🚀
Pandas
01 Dataframes
++++
Data Science
May 2026×Notebook lesson

Notebook converted from Jupyter for blog publishing.

01-DataFrames

Driptanil Datta
Driptanil DattaSoftware Developer

DataFrames

Throughout the course, most of our data exploration will be done with DataFrames. DataFrames are an extremely powerful tool and a natural extension of the Pandas Series. By definition all a DataFrame is:

A Pandas DataFrame consists of multiple Pandas Series that share index values.

Imports

import numpy as np
import pandas as pd

Creating a DataFrame from Python Objects

# help(pd.DataFrame)
# Make sure the seed is in the same cell as the random call
# https://stackoverflow.com/questions/21494489/what-does-numpy-random-seed0-do
np.random.seed(101)
mydata = np.random.randint(0,101,(4,3))
mydata
RESULT
array([[95, 11, 81],
       [70, 63, 87],
       [75,  9, 77],
       [40,  4, 63]])
myindex = ['CA','NY','AZ','TX']
mycolumns = ['Jan','Feb','Mar']
df = pd.DataFrame(data=mydata)
df
HTML
MORE
0
1
2
0
95
df = pd.DataFrame(data=mydata,index=myindex)
df
HTML
MORE
0
1
2
CA
95
df = pd.DataFrame(data=mydata,index=myindex,columns=mycolumns)
df
HTML
MORE
Jan
Feb
Mar
CA
95
df.info()
STDOUT
MORE
<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, CA to TX
Data columns (total 3 columns):
Jan    4 non-null int32
Feb    4 non-null int32

Reading a .csv file for a DataFrame


NOTE: We will go over all kinds of data inputs and outputs (.html, .csv, .xlxs , etc...) later on in the course! For now we just need to read in a simple .csv file.


CSV

Comma Separated Values files are text files that use commas as field delimeters.
Unless you're running the virtual environment included with the course, you may need to install xlrd and openpyxl.
In your terminal/command prompt run:

conda install xlrd conda install openpyxl

Then restart Jupyter Notebook. (or use pip install if you aren't using the Anaconda Distribution)

Understanding File Paths

You have two options when reading a file with pandas:

  1. If your .py file or .ipynb notebook is located in the exact same folder location as the .csv file you want to read, simply pass in the file name as a string, for example:

    df = pd.read_csv('some_file.csv')

  2. Pass in the entire file path if you are located in a different directory. The file path must be 100% correct in order for this to work. For example:

    df = pd.read_csv("C:\Users\myself\files\some_file.csv")

Print your current directory file path with pwd

pwd
RESULT
'C:\\Users\\Marcial\\Pierian-Data-Courses\\Machine-Learning-MasterClass\\03-Pandas'

List the files in your current directory with ls

ls
STDOUT
MORE
 Volume in drive C has no label.
 Volume Serial Number is 3652-BD2F

 Directory of C:\Users\Marcial\Pierian-Data-Courses\Machine-Learning-MasterClass\03-Pandas
df = pd.read_csv('tips.csv')
df
HTML
MORE
total_bill
tip
sex
smoker
day

About this DataSet (in case you are interested)

  • Description

    • One waiter recorded information about each tip he received over a period of a few months working in one restaurant. He collected several variables:
  • Format

    • A data frame with 244 rows and 7 variables
  • Details

    • tip in dollars,
    • bill in dollars,
    • sex of the bill payer,
    • whether there were smokers in the party,
    • day of the week,
    • time of day,
    • size of the party.

In all he recorded 244 tips. The data was reported in a collection of case studies for business statistics (Bryant & Smith 1995).

  • References
    • Bryant, P. G. and Smith, M (1995) Practical Data Analysis: Case Studies in Business Statistics. Homewood, IL: Richard D. Irwin Publishing:
  • Note: We created some additional columns with Fake data, including Name, CC Number, and Payment ID.

DataFrames

Obtaining Basic Information About DataFrame

df.columns
RESULT
Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size',
       'price_per_person', 'Payer Name', 'CC Number', 'Payment ID'],
      dtype='object')
df.index
RESULT
RangeIndex(start=0, stop=244, step=1)
df.head(3)
HTML
MORE
total_bill
tip
sex
smoker
day
df.tail(3)
HTML
MORE
total_bill
tip
sex
smoker
day
df.info()
STDOUT
MORE
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 11 columns):
total_bill          244 non-null float64
tip                 244 non-null float64
len(df)
RESULT
244
df.describe()
HTML
MORE
total_bill
tip
size
price_per_person
CC Number
df.describe().transpose()
HTML
MORE
count
mean
std
min
25%

Selection and Indexing

Let's learn how to retrieve information from a DataFrame.

COLUMNS

We will begin be learning how to extract information based on the columns

df.head()
HTML
MORE
total_bill
tip
sex
smoker
day

Grab a Single Column

df['total_bill']
RESULT
MORE
0      16.99
1      10.34
2      21.01
3      23.68
4      24.59
type(df['total_bill'])
RESULT
pandas.core.series.Series

Grab Multiple Columns

# Note how its a python list of column names! Thus the double brackets.
df[['total_bill','tip']]
HTML
MORE
total_bill
tip
0
16.99
1.01

Create New Columns

df['tip_percentage'] = 100* df['tip'] / df['total_bill']
df.head()
HTML
MORE
total_bill
tip
sex
smoker
day
df['price_per_person'] = df['total_bill'] / df['size']
df.head()
HTML
MORE
total_bill
tip
sex
smoker
day
help(np.round)
STDOUT
MORE
Help on function round_ in module numpy:

round_(a, decimals=0, out=None)
    Round an array to the given number of decimals.
    

Adjust Existing Columns

# Because pandas is based on numpy, we get awesome capabilities with numpy's universal functions!
df['price_per_person'] = np.round(df['price_per_person'],2)
df.head()
HTML
MORE
total_bill
tip
sex
smoker
day

Remove Columns

# df.drop('tip_percentage',axis=1)
df = df.drop("tip_percentage",axis=1)
df.head()
HTML
MORE
total_bill
tip
sex
smoker
day

Index Basics

Before going over the same retrieval tasks for rows, let's build some basic understanding of the pandas DataFrame Index.

df.head()
HTML
MORE
total_bill
tip
sex
smoker
day
df.index
RESULT
RangeIndex(start=0, stop=244, step=1)
df.set_index('Payment ID')
HTML
MORE
total_bill
tip
sex
smoker
day
df.head()
HTML
MORE
total_bill
tip
sex
smoker
day
df = df.set_index('Payment ID')
df.head()
HTML
MORE
total_bill
tip
sex
smoker
day
df = df.reset_index()
df.head()
HTML
MORE
Payment ID
total_bill
tip
sex
smoker

ROWS

Let's now explore these same concepts but with Rows.

df.head()
HTML
MORE
Payment ID
total_bill
tip
sex
smoker
df = df.set_index('Payment ID')
df.head()
HTML
MORE
total_bill
tip
sex
smoker
day

Grab a Single Row

# Integer Based
df.iloc[0]
RESULT
MORE
total_bill                       16.99
tip                               1.01
sex                             Female
smoker                              No
day                                Sun
# Name Based
df.loc['Sun2959']
RESULT
MORE
total_bill                       16.99
tip                               1.01
sex                             Female
smoker                              No
day                                Sun

Grab Multiple Rows

df.iloc[0:4]
HTML
MORE
total_bill
tip
sex
smoker
day
df.loc[['Sun2959','Sun5260']]
HTML
MORE
total_bill
tip
sex
smoker
day

Remove Row

Typically are datasets will be large enough that we won't remove rows like this since we won't know thier row location for some specific condition, instead, we drop rows based on conditions such as missing data or column values. The next lecture will cover this in a lot more detail.

df.head()
HTML
MORE
total_bill
tip
sex
smoker
day
df.drop('Sun2959',axis=0).head()
HTML
MORE
total_bill
tip
sex
smoker
day
# Error if you have a named index!
# df.drop(0,axis=0).head()

Insert a New Row

Pretty rare to add a single row like this. Usually you use pd.concat() to add many rows at once. You could use the .append() method with a list of pd.Series() objects, but you won't see us do this with realistic real-world data.

one_row = df.iloc[0]
one_row
RESULT
MORE
total_bill                       16.99
tip                               1.01
sex                             Female
smoker                              No
day                                Sun
type(one_row)
RESULT
pandas.core.series.Series
df.tail()
HTML
MORE
total_bill
tip
sex
smoker
day
df.append(one_row).tail()
HTML
MORE
total_bill
tip
sex
smoker
day

Drip

Driptanil Datta

Software Developer

Building full-stack systems, one commit at a time. This blog is a centralized learning archive for developers.

Legal Notes
Disclaimer

The content provided on this blog is for educational and informational purposes only. While I strive for accuracy, all information is provided "as is" without any warranties of completeness, reliability, or accuracy. Any action you take upon the information found on this website is strictly at your own risk.

Copyright & IP

Certain technical content, interview questions, and datasets are curated from external educational sources to provide a centralized learning resource. Respect for original authorship is maintained; no copyright infringement is intended. All trademarks, logos, and brand names are the property of their respective owners.

System Operational

© 2026 Driptanil Datta. All rights reserved.