特征重要性评估及筛选

标签: 模板  python  特征重要性  特征选择  cumsum

feature_results = pd.DataFrame({'feature': list(train_features.columns),
                               'importance': model.feature_importances_})
feature_results = feature_results.sort_values('importance',ascending=False).reset_index(drop=True)

from IPython.core.pylabtools import figsize
figsize(12, 10)
plt.style.use('ggplot')

feature_results.loc[:9, :].plot(x = 'feature', y = 'importance', edgecolor = 'k',
                               kind = 'barh', color = 'blue')
plt.xlabel('Relative Importance', fontsize = 18); plt.ylabel('')
plt.title('Feature Importances from Random Forest', size = 26)

在这里插入图片描述

cumulative_importances = np.cumsum(feature_results['importance'])
plt.figure(figsize = (20, 6))
plt.plot(list(range(feature_results.shape[0])), cumulative_importances.values, 'b-')
plt.hlines(y=0.95, xmin=0, xmax=feature_results.shape[0], color='r', linestyles='dashed')
# plt.xticks(list(range(feature_results.shape[0])), feature_results.feature, rotation=60)
plt.xlabel('Feature', fontsize = 18)
plt.ylabel('Cumulative importance', fontsize = 18)
plt.title('Cumulative Importances', fontsize = 26)

在这里插入图片描述

most_num_importances = np.where(cumulative_importances > 0.95)[0][0] + 1
print('Number of features for 95% importance: ', most_num_importances)

Number of features for 95% importance: 13

  • 基于重要性来进行特征选择
most_important_features = feature_results['feature'][:13]
indices = [list(train_features.columns).index(x) for x in most_important_features]
X_reduced = X[:, indices]
X_test_reduced = X_test[:, indices]
print('Most import training features shape: ', X_reduced.shape)
print('Most import testing features shape: ', X_test_reduced.shape)

Most import training features shape: (6622, 13)
Most import testing features shape: (2839, 13)

版权声明:本文为sanjianjixiang原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/sanjianjixiang/article/details/104657955