Data manipulation with Pandas

Author

Tony Duan

1 Set up Python:Using python 3.11 instead of 3.13

because siuba only work with python 3.11 for now

Code
Sys.setenv(RETICULATE_PYTHON = "/Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11")
library(reticulate)
use_python("/Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11")
Code
repl_python()
exit
Code
from platform import python_version
print(python_version())
3.11.9

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Show package’s dependencies

Code
import os
#os.system('pip show pandas')
from pip._vendor import pkg_resources
_package_name = 'pandas'
_package = pkg_resources.working_set.by_key[_package_name]
print([str(r) for r in _package.requires()])  # retrieve deps from setup.py
['python-dateutil>=2.8.2', 'pytz>=2020.1', 'tzdata>=2022.1', 'numpy>=1.21.0; python_version >= "3.10"', 'numpy>=1.23.2; python_version >= "3.11"']

2 install pacakge

Code
!pip3.11 install siuba seaborn pandasql nbformat nbclient

3 load package

Code
import pandas as pd
from pandasql import sqldf
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns

from siuba.siu import call
from siuba import _, mutate, filter, group_by, summarize,show_query
from siuba import *
from siuba.data import mtcars,penguins
Code
mtcars=mtcars
Code
mtcars=r.mtcars
Code
mtcars.head(5)
                    mpg  cyl   disp     hp  drat  ...   qsec   vs   am  gear  carb
Mazda RX4          21.0  6.0  160.0  110.0  3.90  ...  16.46  0.0  1.0   4.0   4.0
Mazda RX4 Wag      21.0  6.0  160.0  110.0  3.90  ...  17.02  0.0  1.0   4.0   4.0
Datsun 710         22.8  4.0  108.0   93.0  3.85  ...  18.61  1.0  1.0   4.0   1.0
Hornet 4 Drive     21.4  6.0  258.0  110.0  3.08  ...  19.44  1.0  0.0   3.0   1.0
Hornet Sportabout  18.7  8.0  360.0  175.0  3.15  ...  17.02  0.0  0.0   3.0   2.0

[5 rows x 11 columns]

row name to column

Code
mtcars.index.name = 'newhead'
mtcars.reset_index(inplace=True)
Code
mtcars.head(5)
             newhead   mpg  cyl   disp     hp  ...   qsec   vs   am  gear  carb
0          Mazda RX4  21.0  6.0  160.0  110.0  ...  16.46  0.0  1.0   4.0   4.0
1      Mazda RX4 Wag  21.0  6.0  160.0  110.0  ...  17.02  0.0  1.0   4.0   4.0
2         Datsun 710  22.8  4.0  108.0   93.0  ...  18.61  1.0  1.0   4.0   1.0
3     Hornet 4 Drive  21.4  6.0  258.0  110.0  ...  19.44  1.0  0.0   3.0   1.0
4  Hornet Sportabout  18.7  8.0  360.0  175.0  ...  17.02  0.0  0.0   3.0   2.0

[5 rows x 12 columns]
Code
small_mtcars = mtcars[["cyl", "mpg",'hp']]
small_mtcars=small_mtcars.head(5)
small_mtcars
   cyl   mpg     hp
0  6.0  21.0  110.0
1  6.0  21.0  110.0
2  4.0  22.8   93.0
3  6.0  21.4  110.0
4  8.0  18.7  175.0

3.1 get info on the data

Code
small_mtcars.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   cyl     5 non-null      float64
 1   mpg     5 non-null      float64
 2   hp      5 non-null      float64
dtypes: float64(3)
memory usage: 252.0 bytes

4 select column

4.1 get column names

Code
list(small_mtcars)
['cyl', 'mpg', 'hp']

4.2 select by name

Code
small_mtcars [["cyl", "mpg",'hp']]
   cyl   mpg     hp
0  6.0  21.0  110.0
1  6.0  21.0  110.0
2  4.0  22.8   93.0
3  6.0  21.4  110.0
4  8.0  18.7  175.0

other way:

Code
small_mtcars.filter(items=['cyl', 'mpg','hp'])
   cyl   mpg     hp
0  6.0  21.0  110.0
1  6.0  21.0  110.0
2  4.0  22.8   93.0
3  6.0  21.4  110.0
4  8.0  18.7  175.0
Code
small_mtcars >> select(_.cyl, _.mpg, _.hp)
   cyl   mpg     hp
0  6.0  21.0  110.0
1  6.0  21.0  110.0
2  4.0  22.8   93.0
3  6.0  21.4  110.0
4  8.0  18.7  175.0

4.2.1 select columns by name match with ‘p’

Code
small_mtcars.loc[:,small_mtcars.columns.str.contains("p")]
    mpg     hp
0  21.0  110.0
1  21.0  110.0
2  22.8   93.0
3  21.4  110.0
4  18.7  175.0

other way:

Code
small_mtcars.filter(regex='p.*', axis=1)
    mpg     hp
0  21.0  110.0
1  21.0  110.0
2  22.8   93.0
3  21.4  110.0
4  18.7  175.0

4.2.2 select columns by index

4.2.2.1 select first and 3rd columns

Code
small_mtcars.iloc[[0,2]]
   cyl   mpg     hp
0  6.0  21.0  110.0
2  4.0  22.8   93.0

4.2.2.2 select first to 3rd columns

Code
small_mtcars[0:3]
   cyl   mpg     hp
0  6.0  21.0  110.0
1  6.0  21.0  110.0
2  4.0  22.8   93.0

5 drop column

Code
small_mtcars.drop('cyl', axis=1)
    mpg     hp
0  21.0  110.0
1  21.0  110.0
2  22.8   93.0
3  21.4  110.0
4  18.7  175.0
Code
small_mtcars>>select(-_.cyl)
    mpg     hp
0  21.0  110.0
1  21.0  110.0
2  22.8   93.0
3  21.4  110.0
4  18.7  175.0

6 Renaming column

Code
small_mtcars.rename(columns={'mpg':"new_name_mpg", 'cyl':'new_name_cyl'})
   new_name_cyl  new_name_mpg     hp
0           6.0          21.0  110.0
1           6.0          21.0  110.0
2           4.0          22.8   93.0
3           6.0          21.4  110.0
4           8.0          18.7  175.0
Code
small_mtcars>> rename(new_name_mpg=_.mpg)
   cyl  new_name_mpg     hp
0  6.0          21.0  110.0
1  6.0          21.0  110.0
2  4.0          22.8   93.0
3  6.0          21.4  110.0
4  8.0          18.7  175.0

7 Create column

Code
small_mtcars['mpg2'] = small_mtcars['mpg']+1


small_mtcars['mpg3']  = np.where(small_mtcars['mpg']> 20, "long", "short")


small_mtcars['mpg4'] =np.where(small_mtcars["mpg"]<19, "short",
                   np.where(small_mtcars["mpg"]<=22, "Medium",
                   np.where(small_mtcars["mpg"]>22, "long","else")))


small_mtcars
   cyl   mpg     hp  mpg2   mpg3    mpg4
0  6.0  21.0  110.0  22.0   long  Medium
1  6.0  21.0  110.0  22.0   long  Medium
2  4.0  22.8   93.0  23.8   long    long
3  6.0  21.4  110.0  22.4   long  Medium
4  8.0  18.7  175.0  19.7  short   short
Code
small_mtcars >>  mutate(mpg2 = _.mpg+1
                      ,mpg3=if_else(_.mpg > 20, "long", "short")
                      )
   cyl   mpg     hp  mpg2   mpg3    mpg4
0  6.0  21.0  110.0  22.0   long  Medium
1  6.0  21.0  110.0  22.0   long  Medium
2  4.0  22.8   93.0  23.8   long    long
3  6.0  21.4  110.0  22.4   long  Medium
4  8.0  18.7  175.0  19.7  short   short

8 Transmute,create column and only keep this column

Code
small_mtcars['mpg2'] = small_mtcars['mpg']+1

new_data=small_mtcars[['mpg2']]

new_data
   mpg2
0  22.0
1  22.0
2  23.8
3  22.4
4  19.7

8.1 Filter rows

Code
mtcars[(mtcars['gear'] ==4)]
           newhead   mpg  cyl   disp     hp  ...   qsec   vs   am  gear  carb
0        Mazda RX4  21.0  6.0  160.0  110.0  ...  16.46  0.0  1.0   4.0   4.0
1    Mazda RX4 Wag  21.0  6.0  160.0  110.0  ...  17.02  0.0  1.0   4.0   4.0
2       Datsun 710  22.8  4.0  108.0   93.0  ...  18.61  1.0  1.0   4.0   1.0
7        Merc 240D  24.4  4.0  146.7   62.0  ...  20.00  1.0  0.0   4.0   2.0
8         Merc 230  22.8  4.0  140.8   95.0  ...  22.90  1.0  0.0   4.0   2.0
9         Merc 280  19.2  6.0  167.6  123.0  ...  18.30  1.0  0.0   4.0   4.0
10       Merc 280C  17.8  6.0  167.6  123.0  ...  18.90  1.0  0.0   4.0   4.0
17        Fiat 128  32.4  4.0   78.7   66.0  ...  19.47  1.0  1.0   4.0   1.0
18     Honda Civic  30.4  4.0   75.7   52.0  ...  18.52  1.0  1.0   4.0   2.0
19  Toyota Corolla  33.9  4.0   71.1   65.0  ...  19.90  1.0  1.0   4.0   1.0
25       Fiat X1-9  27.3  4.0   79.0   66.0  ...  18.90  1.0  1.0   4.0   1.0
31      Volvo 142E  21.4  4.0  121.0  109.0  ...  18.60  1.0  1.0   4.0   2.0

[12 rows x 12 columns]

other way: ::: {.cell}

Code
mtcars.query('gear==4')
           newhead   mpg  cyl   disp     hp  ...   qsec   vs   am  gear  carb
0        Mazda RX4  21.0  6.0  160.0  110.0  ...  16.46  0.0  1.0   4.0   4.0
1    Mazda RX4 Wag  21.0  6.0  160.0  110.0  ...  17.02  0.0  1.0   4.0   4.0
2       Datsun 710  22.8  4.0  108.0   93.0  ...  18.61  1.0  1.0   4.0   1.0
7        Merc 240D  24.4  4.0  146.7   62.0  ...  20.00  1.0  0.0   4.0   2.0
8         Merc 230  22.8  4.0  140.8   95.0  ...  22.90  1.0  0.0   4.0   2.0
9         Merc 280  19.2  6.0  167.6  123.0  ...  18.30  1.0  0.0   4.0   4.0
10       Merc 280C  17.8  6.0  167.6  123.0  ...  18.90  1.0  0.0   4.0   4.0
17        Fiat 128  32.4  4.0   78.7   66.0  ...  19.47  1.0  1.0   4.0   1.0
18     Honda Civic  30.4  4.0   75.7   52.0  ...  18.52  1.0  1.0   4.0   2.0
19  Toyota Corolla  33.9  4.0   71.1   65.0  ...  19.90  1.0  1.0   4.0   1.0
25       Fiat X1-9  27.3  4.0   79.0   66.0  ...  18.90  1.0  1.0   4.0   1.0
31      Volvo 142E  21.4  4.0  121.0  109.0  ...  18.60  1.0  1.0   4.0   2.0

[12 rows x 12 columns]

:::

8.1.1 Filters with AND conditions

Code
mtcars[(mtcars['cyl'] >4)&(mtcars['gear'] ==5) ]
           newhead   mpg  cyl   disp     hp  ...  qsec   vs   am  gear  carb
28  Ford Pantera L  15.8  8.0  351.0  264.0  ...  14.5  0.0  1.0   5.0   4.0
29    Ferrari Dino  19.7  6.0  145.0  175.0  ...  15.5  0.0  1.0   5.0   6.0
30   Maserati Bora  15.0  8.0  301.0  335.0  ...  14.6  0.0  1.0   5.0   8.0

[3 rows x 12 columns]

other way: ::: {.cell}

Code
mtcars.query('cyl>4 and gear==5')
           newhead   mpg  cyl   disp     hp  ...  qsec   vs   am  gear  carb
28  Ford Pantera L  15.8  8.0  351.0  264.0  ...  14.5  0.0  1.0   5.0   4.0
29    Ferrari Dino  19.7  6.0  145.0  175.0  ...  15.5  0.0  1.0   5.0   6.0
30   Maserati Bora  15.0  8.0  301.0  335.0  ...  14.6  0.0  1.0   5.0   8.0

[3 rows x 12 columns]

:::

8.1.2 Filters with OR conditions

Code
mtcars[(mtcars['cyl'] ==6) |(mtcars['gear'] ==5) ]
           newhead   mpg  cyl   disp     hp  ...   qsec   vs   am  gear  carb
0        Mazda RX4  21.0  6.0  160.0  110.0  ...  16.46  0.0  1.0   4.0   4.0
1    Mazda RX4 Wag  21.0  6.0  160.0  110.0  ...  17.02  0.0  1.0   4.0   4.0
3   Hornet 4 Drive  21.4  6.0  258.0  110.0  ...  19.44  1.0  0.0   3.0   1.0
5          Valiant  18.1  6.0  225.0  105.0  ...  20.22  1.0  0.0   3.0   1.0
9         Merc 280  19.2  6.0  167.6  123.0  ...  18.30  1.0  0.0   4.0   4.0
10       Merc 280C  17.8  6.0  167.6  123.0  ...  18.90  1.0  0.0   4.0   4.0
26   Porsche 914-2  26.0  4.0  120.3   91.0  ...  16.70  0.0  1.0   5.0   2.0
27    Lotus Europa  30.4  4.0   95.1  113.0  ...  16.90  1.0  1.0   5.0   2.0
28  Ford Pantera L  15.8  8.0  351.0  264.0  ...  14.50  0.0  1.0   5.0   4.0
29    Ferrari Dino  19.7  6.0  145.0  175.0  ...  15.50  0.0  1.0   5.0   6.0
30   Maserati Bora  15.0  8.0  301.0  335.0  ...  14.60  0.0  1.0   5.0   8.0

[11 rows x 12 columns]

other way: ::: {.cell}

Code
mtcars.query('cyl==6 or gear==5')
           newhead   mpg  cyl   disp     hp  ...   qsec   vs   am  gear  carb
0        Mazda RX4  21.0  6.0  160.0  110.0  ...  16.46  0.0  1.0   4.0   4.0
1    Mazda RX4 Wag  21.0  6.0  160.0  110.0  ...  17.02  0.0  1.0   4.0   4.0
3   Hornet 4 Drive  21.4  6.0  258.0  110.0  ...  19.44  1.0  0.0   3.0   1.0
5          Valiant  18.1  6.0  225.0  105.0  ...  20.22  1.0  0.0   3.0   1.0
9         Merc 280  19.2  6.0  167.6  123.0  ...  18.30  1.0  0.0   4.0   4.0
10       Merc 280C  17.8  6.0  167.6  123.0  ...  18.90  1.0  0.0   4.0   4.0
26   Porsche 914-2  26.0  4.0  120.3   91.0  ...  16.70  0.0  1.0   5.0   2.0
27    Lotus Europa  30.4  4.0   95.1  113.0  ...  16.90  1.0  1.0   5.0   2.0
28  Ford Pantera L  15.8  8.0  351.0  264.0  ...  14.50  0.0  1.0   5.0   4.0
29    Ferrari Dino  19.7  6.0  145.0  175.0  ...  15.50  0.0  1.0   5.0   6.0
30   Maserati Bora  15.0  8.0  301.0  335.0  ...  14.60  0.0  1.0   5.0   8.0

[11 rows x 12 columns]

:::

8.1.3 filter row with index

8.1.3.1 5th rows

Code
# not in siuba, in pandas
mtcars.iloc[[4]]
             newhead   mpg  cyl   disp     hp  ...   qsec   vs   am  gear  carb
4  Hornet Sportabout  18.7  8.0  360.0  175.0  ...  17.02  0.0  0.0   3.0   2.0

[1 rows x 12 columns]

8.1.3.2 1 and 5tj rows

Code
# not in siuba, in pandas
mtcars.iloc[[0,4]]
             newhead   mpg  cyl   disp     hp  ...   qsec   vs   am  gear  carb
0          Mazda RX4  21.0  6.0  160.0  110.0  ...  16.46  0.0  1.0   4.0   4.0
4  Hornet Sportabout  18.7  8.0  360.0  175.0  ...  17.02  0.0  0.0   3.0   2.0

[2 rows x 12 columns]

8.1.3.3 1 to 5th rows

Code
# not in siuba, in pandas
mtcars.iloc[0:4]
          newhead   mpg  cyl   disp     hp  ...   qsec   vs   am  gear  carb
0       Mazda RX4  21.0  6.0  160.0  110.0  ...  16.46  0.0  1.0   4.0   4.0
1   Mazda RX4 Wag  21.0  6.0  160.0  110.0  ...  17.02  0.0  1.0   4.0   4.0
2      Datsun 710  22.8  4.0  108.0   93.0  ...  18.61  1.0  1.0   4.0   1.0
3  Hornet 4 Drive  21.4  6.0  258.0  110.0  ...  19.44  1.0  0.0   3.0   1.0

[4 rows x 12 columns]

8.1.3.4 get ramdon 5 rows

Code
mtcars.sample(5, random_state=42)
                newhead   mpg  cyl   disp     hp  ...   qsec   vs   am  gear  carb
29         Ferrari Dino  19.7  6.0  145.0  175.0  ...  15.50  0.0  1.0   5.0   6.0
15  Lincoln Continental  10.4  8.0  460.0  215.0  ...  17.82  0.0  0.0   3.0   4.0
24     Pontiac Firebird  19.2  8.0  400.0  175.0  ...  17.05  0.0  0.0   3.0   2.0
17             Fiat 128  32.4  4.0   78.7   66.0  ...  19.47  1.0  1.0   4.0   1.0
8              Merc 230  22.8  4.0  140.8   95.0  ...  22.90  1.0  0.0   4.0   2.0

[5 rows x 12 columns]

8.2 Append

8.2.1 append by row

Code
# not available in siuba yet
#from siuba import bind_rows
Code
# using pandas

# get 1 to 4 rows
data1=mtcars.iloc[0:4]

# get 9 rows
data2=mtcars.iloc[10:11]

data3=pd.concat([data1, data2], ignore_index = True,axis=0)

data3
          newhead   mpg  cyl   disp     hp  ...   qsec   vs   am  gear  carb
0       Mazda RX4  21.0  6.0  160.0  110.0  ...  16.46  0.0  1.0   4.0   4.0
1   Mazda RX4 Wag  21.0  6.0  160.0  110.0  ...  17.02  0.0  1.0   4.0   4.0
2      Datsun 710  22.8  4.0  108.0   93.0  ...  18.61  1.0  1.0   4.0   1.0
3  Hornet 4 Drive  21.4  6.0  258.0  110.0  ...  19.44  1.0  0.0   3.0   1.0
4       Merc 280C  17.8  6.0  167.6  123.0  ...  18.90  1.0  0.0   4.0   4.0

[5 rows x 12 columns]

8.2.2 append by column

Code
# using pandas
data1=small_mtcars[["cyl", "mpg"]]

data2=small_mtcars[['hp']]

data3=pd.concat([data1, data2], axis=1).reindex(data2.index)

data3
   cyl   mpg     hp
0  6.0  21.0  110.0
1  6.0  21.0  110.0
2  4.0  22.8   93.0
3  6.0  21.4  110.0
4  8.0  18.7  175.0

8.2.3 Dropping NA values

8.2.4 keep NA values

8.3 group by

8.3.1 average,min,max,sum

Code
mtcars.groupby("cyl")["hp"].mean()
cyl
4.0     82.636364
6.0    122.285714
8.0    209.214286
Name: hp, dtype: float64
Code
mtcars.groupby("cyl")["hp"].min()
cyl
4.0     52.0
6.0    105.0
8.0    150.0
Name: hp, dtype: float64
Code
mtcars.groupby("cyl")["hp"].max()
cyl
4.0    113.0
6.0    175.0
8.0    335.0
Name: hp, dtype: float64
Code
mtcars.groupby("cyl")["hp"].sum()
cyl
4.0     909.0
6.0     856.0
8.0    2929.0
Name: hp, dtype: float64

8.3.2 count record and count distinct record

Code
mtcars.groupby("cyl")["hp"].count()
cyl
4.0    11
6.0     7
8.0    14
Name: hp, dtype: int64
Code
mtcars.groupby("cyl")["hp"].nunique()
cyl
4.0    10
6.0     4
8.0     9
Name: hp, dtype: int64

8.4 order rows

Code
small_mtcars.sort_values('hp')
   cyl   mpg     hp  mpg2   mpg3    mpg4
2  4.0  22.8   93.0  23.8   long    long
0  6.0  21.0  110.0  22.0   long  Medium
1  6.0  21.0  110.0  22.0   long  Medium
3  6.0  21.4  110.0  22.4   long  Medium
4  8.0  18.7  175.0  19.7  short   short

8.4.1 Sort in descending order

Code
small_mtcars.sort_values('hp',ascending=False)
   cyl   mpg     hp  mpg2   mpg3    mpg4
4  8.0  18.7  175.0  19.7  short   short
0  6.0  21.0  110.0  22.0   long  Medium
1  6.0  21.0  110.0  22.0   long  Medium
3  6.0  21.4  110.0  22.4   long  Medium
2  4.0  22.8   93.0  23.8   long    long

8.4.2 Arrange by multiple variables

Code
small_mtcars.sort_values(by=['cyl','mpg'])
   cyl   mpg     hp  mpg2   mpg3    mpg4
2  4.0  22.8   93.0  23.8   long    long
0  6.0  21.0  110.0  22.0   long  Medium
1  6.0  21.0  110.0  22.0   long  Medium
3  6.0  21.4  110.0  22.4   long  Medium
4  8.0  18.7  175.0  19.7  short   short

8.5 join

Code

lhs = pd.DataFrame({'id': [1,2,3], 'val': ['lhs.1', 'lhs.2', 'lhs.3']})
rhs = pd.DataFrame({'id': [1,2,4], 'val': ['rhs.1', 'rhs.2', 'rhs.3']})
Code
lhs
   id    val
0   1  lhs.1
1   2  lhs.2
2   3  lhs.3
Code
rhs
   id    val
0   1  rhs.1
1   2  rhs.2
2   4  rhs.3

8.5.1 inner_join

Code
result=pd.merge(lhs, rhs, on='id', how='inner')
result
   id  val_x  val_y
0   1  lhs.1  rhs.1
1   2  lhs.2  rhs.2

8.5.2 full join

Code
result=pd.merge(lhs, rhs, on='id', how='outer')
result
   id  val_x  val_y
0   1  lhs.1  rhs.1
1   2  lhs.2  rhs.2
2   3  lhs.3    NaN
3   4    NaN  rhs.3

8.5.3 left join

Code
result=pd.merge(lhs, rhs, on='id', how='left')
result
   id  val_x  val_y
0   1  lhs.1  rhs.1
1   2  lhs.2  rhs.2
2   3  lhs.3    NaN

8.6 Reshape tables

Code
costs = pd.DataFrame({
    'id': [1,2],
    'price_x': [.1, .2],
    'price_y': [.4, .5],
    'price_z': [.7, .8]
})

costs
   id  price_x  price_y  price_z
0   1      0.1      0.4      0.7
1   2      0.2      0.5      0.8

8.6.1 Gather data long(wide to long)

Code
# selecting each variable manually
long_date=pd.melt(costs,id_vars=['id'], value_vars=['price_x', 'price_y','price_z'])

long_date
   id variable  value
0   1  price_x    0.1
1   2  price_x    0.2
2   1  price_y    0.4
3   2  price_y    0.5
4   1  price_z    0.7
5   2  price_z    0.8
Code
#costs >> gather('measure', 'value', _.price_x, _.price_y, _.price_z)

8.6.2 Spread data wide (long to wide)

Code
long_date.pivot(index="id", columns="variable", values="value")
variable  price_x  price_y  price_z
id                                 
1             0.1      0.4      0.7
2             0.2      0.5      0.8

8.7 string

Code
df = pd.DataFrame({'text': ['abc', 'DDD','1243c','aeEe'], 'num': [3, 4,7,8]})

df
    text  num
0    abc    3
1    DDD    4
2  1243c    7
3   aeEe    8

8.7.1 upper case

8.7.2 lower case

8.7.3 match

8.7.4 concatenation

8.7.5 replace

8.7.6 extract

Use str.extract() with a regular expression to pull out a matching piece of text.

For example the regular expression “^(.*) ” contains the following pieces:

  • a matches the literal letter “a”

  • .* has a . which matches anything, and * which modifies it to apply 0 or more times.

8.8 date

Code
df_dates = pd.DataFrame({
    "dates": pd.to_datetime(["2021-01-02", "2021-02-03"]),
    "raw": ["2023-04-05 06:07:08", "2024-05-06 07:08:09"],
})
df_dates
       dates                  raw
0 2021-01-02  2023-04-05 06:07:08
1 2021-02-03  2024-05-06 07:08:09
Code
df_dates.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   dates   2 non-null      datetime64[ns]
 1   raw     2 non-null      object        
dtypes: datetime64[ns](1), object(1)
memory usage: 164.0+ bytes

8.9 dataframe to numpy array

Code
df_dates.to_numpy()
array([[Timestamp('2021-01-02 00:00:00'), '2023-04-05 06:07:08'],
       [Timestamp('2021-02-03 00:00:00'), '2024-05-06 07:08:09']],
      dtype=object)

9 Reference:

https://pandas.pydata.org/docs/user_guide

Back to top