Code
import pandas as pd
=pd.read_csv('data/Book3.csv')
data data
a | b | |
---|---|---|
0 | 1241 | rhth |
1 | 35235 | rjyyj |
Tony Duan
Data input and ouput in python
read CSV online
sheet_name=0 read first sheet.
sheet_name=1 read second sheet.
.sheet_name=‘Sheet1’ read ‘Sheet1’ sheet.
parquet format is one of the best for data analytic
FlightDate | Airline | Origin | Dest | Cancelled | Diverted | CRSDepTime | DepTime | DepDelayMinutes | DepDelay | ... | WheelsOn | TaxiIn | CRSArrTime | ArrDelay | ArrDel15 | ArrivalDelayGroups | ArrTimeBlk | DistanceGroup | DivAirportLandings | __index_level_0__ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2022-04-04 00:00:00+00:00 | Commutair Aka Champlain Enterprises, Inc. | GJT | DEN | False | False | 1133 | 1123.0 | 0.0 | -10.0 | ... | 1220.0 | 8.0 | 1245 | -17.0 | 0.0 | -2.0 | 1200-1259 | 1 | 0 | 0 |
1 | 2022-04-04 00:00:00+00:00 | Commutair Aka Champlain Enterprises, Inc. | HRL | IAH | False | False | 732 | 728.0 | 0.0 | -4.0 | ... | 839.0 | 9.0 | 849 | -1.0 | 0.0 | -1.0 | 0800-0859 | 2 | 0 | 1 |
2 | 2022-04-04 00:00:00+00:00 | Commutair Aka Champlain Enterprises, Inc. | DRO | DEN | False | False | 1529 | 1514.0 | 0.0 | -15.0 | ... | 1622.0 | 14.0 | 1639 | -3.0 | 0.0 | -1.0 | 1600-1659 | 2 | 0 | 2 |
3 | 2022-04-04 00:00:00+00:00 | Commutair Aka Champlain Enterprises, Inc. | IAH | GPT | False | False | 1435 | 1430.0 | 0.0 | -5.0 | ... | 1543.0 | 4.0 | 1605 | -18.0 | 0.0 | -2.0 | 1600-1659 | 2 | 0 | 3 |
4 | 2022-04-04 00:00:00+00:00 | Commutair Aka Champlain Enterprises, Inc. | DRO | DEN | False | False | 1135 | 1135.0 | 0.0 | 0.0 | ... | 1243.0 | 8.0 | 1245 | 6.0 | 0.0 | 0.0 | 1200-1259 | 2 | 0 | 4 |
5 rows × 62 columns
read parquet zip
FlightDate | Airline | Origin | Dest | Cancelled | Diverted | CRSDepTime | DepTime | DepDelayMinutes | DepDelay | ... | WheelsOn | TaxiIn | CRSArrTime | ArrDelay | ArrDel15 | ArrivalDelayGroups | ArrTimeBlk | DistanceGroup | DivAirportLandings | __index_level_0__ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2022-04-04 00:00:00+00:00 | Commutair Aka Champlain Enterprises, Inc. | GJT | DEN | False | False | 1133 | 1123.0 | 0.0 | -10.0 | ... | 1220.0 | 8.0 | 1245 | -17.0 | 0.0 | -2.0 | 1200-1259 | 1 | 0 | 0 |
1 | 2022-04-04 00:00:00+00:00 | Commutair Aka Champlain Enterprises, Inc. | HRL | IAH | False | False | 732 | 728.0 | 0.0 | -4.0 | ... | 839.0 | 9.0 | 849 | -1.0 | 0.0 | -1.0 | 0800-0859 | 2 | 0 | 1 |
2 | 2022-04-04 00:00:00+00:00 | Commutair Aka Champlain Enterprises, Inc. | DRO | DEN | False | False | 1529 | 1514.0 | 0.0 | -15.0 | ... | 1622.0 | 14.0 | 1639 | -3.0 | 0.0 | -1.0 | 1600-1659 | 2 | 0 | 2 |
3 | 2022-04-04 00:00:00+00:00 | Commutair Aka Champlain Enterprises, Inc. | IAH | GPT | False | False | 1435 | 1430.0 | 0.0 | -5.0 | ... | 1543.0 | 4.0 | 1605 | -18.0 | 0.0 | -2.0 | 1600-1659 | 2 | 0 | 3 |
4 | 2022-04-04 00:00:00+00:00 | Commutair Aka Champlain Enterprises, Inc. | DRO | DEN | False | False | 1135 | 1135.0 | 0.0 | 0.0 | ... | 1243.0 | 8.0 | 1245 | 6.0 | 0.0 | 0.0 | 1200-1259 | 2 | 0 | 4 |
5 rows × 62 columns
output to zip format
https://medium.com/@gadhvirushiraj/the-best-file-format-for-data-science-ed756f937be8
---
title: "input & output in Python"
author: "Tony Duan"
execute:
warning: false
error: false
format:
html:
toc: true
toc-location: right
code-fold: show
code-tools: true
number-sections: true
code-block-bg: true
code-block-border-left: "#31BAE9"
---
Data input and ouput in python
{width="500"}
# input
## read CSV
```{python}
import pandas as pd
data=pd.read_csv('data/Book3.csv')
data
```
read CSV online
```{python}
#| eval: false
import pandas as pd
url='https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-11/hotels.csv'
hotels=pd.read_csv(url)
```
## read excel
sheet_name=0 read first sheet.
sheet_name=1 read second sheet.
.sheet_name='Sheet1' read 'Sheet1' sheet.
```{python}
import pandas as pd
data_excel=pd.read_excel('data/Book1.xlsx',sheet_name=0)
data_excel
```
## read parquet
parquet format is one of the best for data analytic
```{python}
data= pd.read_parquet("data/df.parquet")
data.shape
```
```{python}
data.head()
```
read parquet zip
```{python}
data= pd.read_parquet("data/df.parquet.gzip")
data.shape
```
## read feather
```{python}
data=pd.read_feather("data/feather_file.feather")
data.head()
```
## txt
```{python}
f = open("txt_example.txt", "r")
a=f.read()
print(a)
```
# outout
## write CSV
```{python}
data.head().to_csv('data/out.csv', index=False)
```
## write excel
```{python}
data_excel.to_excel('data/out.xlsx')
```
## write parquet
```{python}
data.head(100).to_parquet('data/df.parquet')
```
output to zip format
```{python}
data.head(100).to_parquet('data/df.parquet.gzip',
compression='gzip')
```
## write feather
```{python}
data.head(100).to_feather("data/feather_file.feather")
```
## write txt
```{python}
a_txt='''
Testing
Testing, Testing.
Testing
'''
print(a_txt)
```
```{python}
f = open("myfile.txt", "w")
f.write(a_txt)
f.close()
```
# Refernce
https://medium.com/@gadhvirushiraj/the-best-file-format-for-data-science-ed756f937be8