++++Data Science
May 2026×Notebook lesson
Notebook converted from Jupyter for blog publishing.
07-Text-Methods
Driptanil DattaSoftware Developer
Text Methods
A normal Python string has a variety of method calls available:
mystring = 'hello'mystring.capitalize()RESULT
'Hello'mystring.isdigit()RESULT
Falsehelp(str)STDOUT
MORE
Help on class str in module builtins:
class str(object)
| str(object='') -> str
| str(bytes_or_buffer[, encoding[, errors]]) -> strPandas and Text
Pandas can do a lot more than what we show here. Full online documentation on things like advanced string indexing and regular expressions with pandas can be found here: https://pandas.pydata.org/docs/user_guide/text.html (opens in a new tab)
Text Methods on Pandas String Column
import pandas as pdnames = pd.Series(['andrew','bobo','claire','david','4'])namesRESULT
MORE
0 andrew
1 bobo
2 claire
3 david
4 4names.str.capitalize()RESULT
MORE
0 Andrew
1 Bobo
2 Claire
3 David
4 4names.str.isdigit()RESULT
MORE
0 False
1 False
2 False
3 False
4 TrueSplitting , Grabbing, and Expanding
tech_finance = ['GOOG,APPL,AMZN','JPM,BAC,GS']len(tech_finance)RESULT
2tickers = pd.Series(tech_finance)tickersRESULT
0 GOOG,APPL,AMZN
1 JPM,BAC,GS
dtype: objecttickers.str.split(',')RESULT
0 [GOOG, APPL, AMZN]
1 [JPM, BAC, GS]
dtype: objecttickers.str.split(',').str[0]RESULT
0 GOOG
1 JPM
dtype: objecttickers.str.split(',',expand=True)HTML
MORE
0
1
2
0
GOOGCleaning or Editing Strings
messy_names = pd.Series(["andrew ","bo;bo"," claire "])# Notice the "mis-alignment" on the right hand side due to spacing in "andrew " and " claire "
messy_namesRESULT
0 andrew
1 bo;bo
2 claire
dtype: objectmessy_names.str.replace(";","")RESULT
0 andrew
1 bobo
2 claire
dtype: objectmessy_names.str.strip()RESULT
0 andrew
1 bo;bo
2 claire
dtype: objectmessy_names.str.replace(";","").str.strip()RESULT
0 andrew
1 bobo
2 claire
dtype: objectmessy_names.str.replace(";","").str.strip().str.capitalize()RESULT
0 Andrew
1 Bobo
2 Claire
dtype: objectAlternative with Custom apply() call
def cleanup(name):
name = name.replace(";","")
name = name.strip()
name = name.capitalize()
return namemessy_namesRESULT
0 andrew
1 bo;bo
2 claire
dtype: objectmessy_names.apply(cleanup)RESULT
0 Andrew
1 Bobo
2 Claire
dtype: objectWhich one is more efficient?
import timeit
# code snippet to be executed only once
setup = '''
import pandas as pd
import numpy as np
messy_names = pd.Series(["andrew ","bo;bo"," claire "])
def cleanup(name):
name = name.replace(";","")
name = name.strip()
name = name.capitalize()
return name
'''
# code snippet whose execution time is to be measured
stmt_pandas_str = '''
messy_names.str.replace(";","").str.strip().str.capitalize()
'''
stmt_pandas_apply = '''
messy_names.apply(cleanup)
'''
stmt_pandas_vectorize='''
np.vectorize(cleanup)(messy_names)
'''timeit.timeit(setup = setup,
stmt = stmt_pandas_str,
number = 10000)RESULT
3.931618999999955timeit.timeit(setup = setup,
stmt = stmt_pandas_apply,
number = 10000)RESULT
1.2268500999999787timeit.timeit(setup = setup,
stmt = stmt_pandas_vectorize,
number = 10000)RESULT
0.28283379999993485Wow! While .str() methods can be extremely convienent, when it comes to performance, don't forget about np.vectorize()! Review the "Useful Methods" lecture for a deeper discussion on np.vectorize()