Pandas 基础

Pandas

Pandas 一个强大的分析结构化数据的工具集,基础是 Numpy(提供高性能的矩阵运算)。Pandas 可以从各种文件格式比如 CSV、JSON、SQL、Microsoft Excel 导入数据。Pandas 可以对各种数据进行运算操作,比如归并、再成形、选择,还有数据清洗和数据加工特征。Pandas 广泛应用在学术、金融、统计学等各个数据分析领域。

Pandas 的主要数据结构是 Series (一维数据)与 DataFrame(二维数据),这两种数据结构足以处理金融、统计、社会科学、工程等领域里的大多数典型用例。

Series

创建

1
2
3
import pandas as pd
fruits={"origin":2,"bannaa":8}
print(pd.Series(fruits))
1
2
3
4
5
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
print(series)

数据的引用

1
2
3
4
5
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
print(series[0:2])
1
2
3
4
5
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
print(series[["dog","pig"]])

数据与索引的读取

1
2
3
4
5
6
7
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
series_values =series.values
series_index =series.index
print(series_values,series_index)

元素的添加

在向 Series 中添加元素时,要添加的元素必须是 Series 类型的数据

1
2
3
4
5
6
7
8
9
10
11
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
#方式一
series=series.append(pd.Series([12],index=["goose"]))
series.append(pd.Series ({"orange":45}))
#方法二
grap=pd.Series([1],index=["grap"])
series.append(series)
print(series)

元素的删除

通过设置 series 数据的索引来实现元素的删除

1
2
3
4
5
6
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
series=series.drop("cat")
print(series)

过滤

1
2
3
4
5
6
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
conditions=[True,False,True,False,False]
print(series[conditions])
1
2
3
4
5
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
print(series[series%2==0])

排序

1
2
3
4
5
6
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
print(series.sort_values())
print(series.sort_index())

DataFrame

DataFrame 就像将多个 Series 数据捆绑在一起的二维数据结构

创建

1
2
3
4
5
6
import pandas as pd
data={"fruits":["apple","orange","banana","peach"],
"num":[1,34,23,54],
"year":[2000,2023,2015,2045]}
df=pd.DataFrame(data)
print(df)

设置索引和列

  • DateFrame 类型的变量 df 的索引可以通过将长度与其行数相同的的列表代入 df.index 来实现

  • df 的列可以通过将与其列相同的列代入 df.columns 中来实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import pandas as pd

index=["apple","orange","banna","strawberry","kiwifruit"]
data1=[10,5,8,12,3]
data2=[30,25,12,10,8]
series1=pd.Series(data1,index=index)
series2=pd.Series(data2,index=index)
df=pd.DataFrame([series1,series2])
print(df)
df.index=[1,2]
print("")
print(df)
print()
df.columns=[1,2,3,4,5]
print(df)

添加行

添加新的数据到 DataFrame 中。对 DataFrame 类型变量 df 调用 df.append(“series 类型数据 “,ignore_index = True)

添加列

对 DateFrame 类型调用 df [“新列”]

数据的引用