input & output in Python

Author

Tony Duan

Data input and ouput in python

1 input

1.1 read CSV

Code
import pandas as pd
data=pd.read_csv('data/Book3.csv')
data
a b
0 1241 rhth
1 35235 rjyyj

read CSV online

Code
import pandas as pd
url='https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-11/hotels.csv'
hotels=pd.read_csv(url)

1.2 read excel

sheet_name=0 read first sheet.

sheet_name=1 read second sheet.

.sheet_name=‘Sheet1’ read ‘Sheet1’ sheet.

Code
import pandas as pd
data_excel=pd.read_excel('data/Book1.xlsx',sheet_name=0)
data_excel
a b
0 1241 rhth
1 35235 rjyyj

1.3 read parquet

parquet format is one of the best for data analytic

Code
data= pd.read_parquet("data/df.parquet")
data.shape
(100, 62)
Code
data.head()
FlightDate Airline Origin Dest Cancelled Diverted CRSDepTime DepTime DepDelayMinutes DepDelay ... WheelsOn TaxiIn CRSArrTime ArrDelay ArrDel15 ArrivalDelayGroups ArrTimeBlk DistanceGroup DivAirportLandings __index_level_0__
0 2022-04-04 00:00:00+00:00 Commutair Aka Champlain Enterprises, Inc. GJT DEN False False 1133 1123.0 0.0 -10.0 ... 1220.0 8.0 1245 -17.0 0.0 -2.0 1200-1259 1 0 0
1 2022-04-04 00:00:00+00:00 Commutair Aka Champlain Enterprises, Inc. HRL IAH False False 732 728.0 0.0 -4.0 ... 839.0 9.0 849 -1.0 0.0 -1.0 0800-0859 2 0 1
2 2022-04-04 00:00:00+00:00 Commutair Aka Champlain Enterprises, Inc. DRO DEN False False 1529 1514.0 0.0 -15.0 ... 1622.0 14.0 1639 -3.0 0.0 -1.0 1600-1659 2 0 2
3 2022-04-04 00:00:00+00:00 Commutair Aka Champlain Enterprises, Inc. IAH GPT False False 1435 1430.0 0.0 -5.0 ... 1543.0 4.0 1605 -18.0 0.0 -2.0 1600-1659 2 0 3
4 2022-04-04 00:00:00+00:00 Commutair Aka Champlain Enterprises, Inc. DRO DEN False False 1135 1135.0 0.0 0.0 ... 1243.0 8.0 1245 6.0 0.0 0.0 1200-1259 2 0 4

5 rows × 62 columns

read parquet zip

Code
data= pd.read_parquet("data/df.parquet.gzip")
data.shape
(100, 62)

1.4 read feather

Code
data=pd.read_feather("data/feather_file.feather")
data.head()
FlightDate Airline Origin Dest Cancelled Diverted CRSDepTime DepTime DepDelayMinutes DepDelay ... WheelsOn TaxiIn CRSArrTime ArrDelay ArrDel15 ArrivalDelayGroups ArrTimeBlk DistanceGroup DivAirportLandings __index_level_0__
0 2022-04-04 00:00:00+00:00 Commutair Aka Champlain Enterprises, Inc. GJT DEN False False 1133 1123.0 0.0 -10.0 ... 1220.0 8.0 1245 -17.0 0.0 -2.0 1200-1259 1 0 0
1 2022-04-04 00:00:00+00:00 Commutair Aka Champlain Enterprises, Inc. HRL IAH False False 732 728.0 0.0 -4.0 ... 839.0 9.0 849 -1.0 0.0 -1.0 0800-0859 2 0 1
2 2022-04-04 00:00:00+00:00 Commutair Aka Champlain Enterprises, Inc. DRO DEN False False 1529 1514.0 0.0 -15.0 ... 1622.0 14.0 1639 -3.0 0.0 -1.0 1600-1659 2 0 2
3 2022-04-04 00:00:00+00:00 Commutair Aka Champlain Enterprises, Inc. IAH GPT False False 1435 1430.0 0.0 -5.0 ... 1543.0 4.0 1605 -18.0 0.0 -2.0 1600-1659 2 0 3
4 2022-04-04 00:00:00+00:00 Commutair Aka Champlain Enterprises, Inc. DRO DEN False False 1135 1135.0 0.0 0.0 ... 1243.0 8.0 1245 6.0 0.0 0.0 1200-1259 2 0 4

5 rows × 62 columns

1.5 txt

Code
f = open("txt_example.txt", "r")
a=f.read()
print(a)
Testing

Testing,  Testing.
Testing

2 outout

2.1 write CSV

Code
data.head().to_csv('data/out.csv', index=False)  

2.2 write excel

Code
data_excel.to_excel('data/out.xlsx')

2.3 write parquet

Code
data.head(100).to_parquet('data/df.parquet') 

output to zip format

Code
data.head(100).to_parquet('data/df.parquet.gzip',
              compression='gzip')  

2.4 write feather

Code
data.head(100).to_feather("data/feather_file.feather")

2.5 write txt

Code
a_txt='''
Testing

Testing,  Testing.
Testing
'''

print(a_txt)

Testing

Testing,  Testing.
Testing
Code
f = open("myfile.txt", "w")
f.write(a_txt)
f.close()

3 Refernce

https://medium.com/@gadhvirushiraj/the-best-file-format-for-data-science-ed756f937be8

Back to top