pandas知识点(基本功能)
2018-11-27 08:31:16来源:博客园 阅读 ()
1.重新索引
In [3]: obj = Series([4.5,7.2,-5.3,3.6], index=["d","b","a","c"]) In [4]: obj Out[4]: d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64 In [6]: obj2 = obj.reindex(["a","b","c","d","e"]) In [7]: obj2 Out[7]: a -5.3 b 7.2 c 3.6 d 4.5 e NaN dtype: float64
In [8]: obj3 = Series(["blue","purple","yellow"], index=[0,2,4]) In [9]: obj3.reindex(range(6), method="ffill") Out[9]: 0 blue 1 blue 2 purple 3 purple 4 yellow 5 yellow dtype: object
In [12]: obj = Series(np.arange(5.), index=["a","b","c","d","e"]) In [13]: new_obj = obj.drop("c") In [14]: new_obj Out[14]: a 0.0 b 1.0 d 3.0 e 4.0 dtype: float64
DataFrame可以删除任意轴上的索引值
In [4]: obj = Series(np.arange(4.), index=["a","b","c","d"])Out[6]: a 0.0 b 1.0 dtype: float64 In [7]: obj[obj<2] Out[7]: a 0.0 b 1.0 dtype: float64
In [8]: obj["b":"c"] Out[8]: b 1.0 c 2.0 dtype: float64
In [10]: data Out[10]: one two three four Ohio 0 1 2 3 Colorado 4 5 6 7 Utah 8 9 10 11 New York 12 13 14 15 In [11]: data['two'] Out[11]: Ohio 1 Colorado 5 Utah 9 New York 13 Name: two, dtype: int32 In [12]: data[:2] Out[12]: one two three four Ohio 0 1 2 3 Colorado 4 5 6 7
In [13]: data > 5 Out[13]: one two three four Ohio False False False False Colorado False False True True Utah True True True True New York True True True True
In [18]: data.ix['Colorado',['two','three']] Out[18]: two 5 three 6 Name: Colorado, dtype: int32 In [19]: data.ix[['Colorado','Utah'],[3,0,1]] Out[19]: four one two Colorado 7 4 5 Utah 11 8 9
In [20]: s1 = Series([7.3,-2.5,3.4,1.5],index=['a','c','d','e']) In [21]: s2 = Series([-2.1, 3.6, -1.5, 4, 3.1],index=['a','c','e','f','g']) In [22]: s1+s2 Out[22]: a 5.2 c 1.1 d NaN e 0.0 f NaN g NaN dtype: float64
In [26]: df1 Out[26]: b d e Utah 0.0 1.0 2.0 Ohio 3.0 4.0 5.0 Texas 6.0 7.0 8.0 Oregon 9.0 10.0 11.0 In [27]: df2 Out[27]: b c d Ohio 0.0 1.0 2.0 Texas 3.0 4.0 5.0 Colorado 6.0 7.0 8.0 In [28]: df1+df2 Out[28]: b c d e Colorado NaN NaN NaN NaN Ohio 3.0 NaN 6.0 NaN Oregon NaN NaN NaN NaN Texas 9.0 NaN 12.0 NaN Utah NaN NaN NaN NaN
In [30]: df2.add(df1,fill_value=0) Out[30]: b c d e Colorado 6.0 7.0 8.0 NaN Ohio 3.0 1.0 6.0 5.0 Oregon 9.0 NaN 10.0 11.0 Texas 9.0 4.0 12.0 8.0 Utah 0.0 NaN 1.0 2.0
In [31]: arr = np.arange(12.).reshape((3,4)) In [32]: arr Out[32]: array([[ 0., 1., 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.]]) In [33]: arr - arr[1] Out[33]: array([[-4., -4., -4., -4.], [ 0., 0., 0., 0.], [ 4., 4., 4., 4.]])
In [35]: frame = DataFrame(np.arange(12.).reshape((4,3)),columns=list('bde'),index=['Utah','Ohio','Texas','Oregon']) In [39]: series = frame.iloc[0] In [40]: frame Out[40]: b d e Utah 0.0 1.0 2.0 Ohio 3.0 4.0 5.0 Texas 6.0 7.0 8.0 Oregon 9.0 10.0 11.0 In [41]: series Out[41]: b 0.0 d 1.0 e 2.0 Name: Utah, dtype: float64 In [43]: frame - series Out[43]: b d e Utah 0.0 0.0 0.0 Ohio 3.0 3.0 3.0 Texas 6.0 6.0 6.0 Oregon 9.0 9.0 9.0
In [45]: frame + series2 Out[45]: b d e f Utah 0.0 NaN 3.0 NaN Ohio 3.0 NaN 6.0 NaN Texas 6.0 NaN 9.0 NaN Oregon 9.0 NaN 12.0 NaN
In [46]: series3 = frame['d'] In [47]: frame.sub(series3, axis=0) Out[47]: b d e Utah -1.0 0.0 1.0 Ohio -1.0 0.0 1.0 Texas -1.0 0.0 1.0 Oregon -1.0 0.0 1.0
In [49]: frame = DataFrame(np.random.randn(4,3), columns=list('bde'),index=['Utah','Ohio','Texas','Oregon']) In [50]: frame Out[50]: b d e Utah 0.913051 -1.289725 -0.590573 Ohio 1.417612 -1.835357 -0.010755 Texas 0.328839 -0.121878 -1.209583 Oregon 1.315330 -1.026557 -1.777427 In [51]: np.abs(frame) Out[51]: b d e Utah 0.913051 1.289725 0.590573 Ohio 1.417612 1.835357 0.010755 Texas 0.328839 0.121878 1.209583 Oregon 1.315330 1.026557 1.777427 DataFrame的apply方法可以实现将函数应用到由各行或列形成的一维数组上: In [52]: f = lambda x:x.max() - x.min() In [53]: frame.apply(f) Out[53]: b 1.088773 d 1.713479 e 1.766671 dtype: float64 In [54]: frame.apply(f, axis=1) Out[54]: Utah 2.202776 Ohio 3.252969 Texas 1.538421 Oregon 3.092757 dtype: float64
In [57]: obj = Series(range(4), index=['d','a','b','c']) In [58]: obj Out[58]: d 0 a 1 b 2 c 3 dtype: int64 In [59]: obj.sort_index Out[59]: <bound method Series.sort_index of d 0 a 1 b 2 c 3 dtype: int64> In [62]: frame.sort_index() Out[62]: b d e Ohio 1.417612 -1.835357 -0.010755 Oregon 1.315330 -1.026557 -1.777427 Texas 0.328839 -0.121878 -1.209583 Utah 0.913051 -1.289725 -0.590573 In [63]: frame.sort_index(axis=1) Out[63]: b d e Utah 0.913051 -1.289725 -0.590573 Ohio 1.417612 -1.835357 -0.010755 Texas 0.328839 -0.121878 -1.209583 Oregon 1.315330 -1.026557 -1.777427
In [65]: frame.sort_index(axis=1,ascending=False) Out[65]: e d b Utah -0.590573 -1.289725 0.913051 Ohio -0.010755 -1.835357 1.417612 Texas -1.209583 -0.121878 0.328839 Oregon -1.777427 -1.026557 1.315330
In [67]: frame.sort_values(by='b') Out[67]: b d e Texas 0.328839 -0.121878 -1.209583 Utah 0.913051 -1.289725 -0.590573 Oregon 1.315330 -1.026557 -1.777427 Ohio 1.417612 -1.835357 -0.010755
In [70]: obj Out[70]: 0 7 1 -5 2 7 3 4 4 2 5 0 6 4 dtype: int64 In [71]: obj.rank() Out[71]: 0 6.5 1 1.0 2 6.5 3 4.5 4 3.0 5 2.0 6 4.5 dtype: float64
In [72]: obj.rank(method='first') Out[72]: 0 6.0 1 1.0 2 7.0 3 4.0 4 3.0 5 2.0 6 5.0 dtype: float64
In [73]: obj = Series(range(5),index=['a','a','b','b','c']) In [74]: obj Out[74]: a 0 a 1 b 2 b 3 c 4 dtype: int64 In [75]: obj.index.is_unique Out[75]: False
In [76]: obj['a'] Out[76]: a 0 a 1 dtype: int64
DataFrame也是同样的道理
标签:
版权申明:本站文章部分自网络,如有侵权,请联系:west999com@outlook.com
特别注意:本站所有转载文章言论不代表本站观点,本站所提供的摄影照片,插画,设计作品,如需使用,请与原作者联系,版权归原作者所有
上一篇:python之变量与常量
下一篇:python之if循环
- 网络编程相关知识点 2019-08-13
- 进程相关 2019-08-13
- Django基本知识 2019-08-13
- 网络编程之udp_socket 2019-07-24
- Python入门学习——PyQt5程序基本结构 2019-07-24
IDC资讯: 主机资讯 注册资讯 托管资讯 vps资讯 网站建设
网站运营: 建站经验 策划盈利 搜索优化 网站推广 免费资源
网络编程: Asp.Net编程 Asp编程 Php编程 Xml编程 Access Mssql Mysql 其它
服务器技术: Web服务器 Ftp服务器 Mail服务器 Dns服务器 安全防护
软件技巧: 其它软件 Word Excel Powerpoint Ghost Vista QQ空间 QQ FlashGet 迅雷
网页制作: FrontPages Dreamweaver Javascript css photoshop fireworks Flash