Python scikit-learn (metrics): difference bet…
2018-10-26 05:29:24来源:博客园 阅读 ()
I noticed that that 'r2_score' and 'explained_variance_score' are both build-in sklearn.metrics methods for regression problems.
I was always under the impression that r2_score is the percent variance explained by the model. How is it different from 'explained_variance_score'?
When would you choose one over the other?
Thanks!
OK, look at this example:
In [123]: #data y_true = [3, -0.5, 2, 7] y_pred = [2.5, 0.0, 2, 8] print metrics.explained_variance_score(y_true, y_pred) print metrics.r2_score(y_true, y_pred) 0.957173447537 0.948608137045 In [124]: #what explained_variance_score really is 1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true) Out[124]: 0.95717344753747324 In [125]: #what r^2 really is 1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2) Out[125]: 0.94860813704496794 In [126]: #Notice that the mean residue is not 0 (np.array(y_true)-np.array(y_pred)).mean() Out[126]: -0.25 In [127]: #if the predicted values are different, such that the mean residue IS 0: y_pred=[2.5, 0.0, 2, 7] (np.array(y_true)-np.array(y_pred)).mean() Out[127]: 0.0 In [128]: #They become the same stuff print metrics.explained_variance_score(y_true, y_pred) print metrics.r2_score(y_true, y_pred) 0.982869379015 0.982869379015
So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residue suppose to be 0?
Most of the answers I found (including here) emphasize on the difference between R2 and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).
However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?
Refresher:
R2: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.
You can look at it from a different angle for the purpose of evaluating the predicted values of y
like this:
Varianceactual_y × R2actual_y = Variancepredicted_y
So intuitively, the more R2 is closer to 1
, the more actual_y and predicted_y will have samevariance (i.e. same spread)
As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that's true:
R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual]
Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual]
in which:
Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n
So, obviously the only difference is that we are subtracting the Mean Error from the first formula! ... But Why?
When we compare the R2 Score with the Explained Variance Score, we are basically checking the Mean Error; so if R2 = Explained Variance Score, that means: The Mean Error = Zero!
The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.
In Summary:
If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.
参考链接:https://stackoverflow.com/questions/24378176/python-sci-kit-learn-metrics-difference-between-r2-score-and-explained-varian
标签:
版权申明:本站文章部分自网络,如有侵权,请联系:west999com@outlook.com
特别注意:本站所有转载文章言论不代表本站观点,本站所提供的摄影照片,插画,设计作品,如需使用,请与原作者联系,版权归原作者所有
- python3基础之“术语表(2)” 2019-08-13
- python3 之 字符串编码小结(Unicode、utf-8、gbk、gb2312等 2019-08-13
- Python3安装impala 2019-08-13
- 小白如何入门 Python 爬虫? 2019-08-13
- python_字符串方法 2019-08-13
IDC资讯: 主机资讯 注册资讯 托管资讯 vps资讯 网站建设
网站运营: 建站经验 策划盈利 搜索优化 网站推广 免费资源
网络编程: Asp.Net编程 Asp编程 Php编程 Xml编程 Access Mssql Mysql 其它
服务器技术: Web服务器 Ftp服务器 Mail服务器 Dns服务器 安全防护
软件技巧: 其它软件 Word Excel Powerpoint Ghost Vista QQ空间 QQ FlashGet 迅雷
网页制作: FrontPages Dreamweaver Javascript css photoshop fireworks Flash