在scikit-learn中結合概率分類器的最佳方法

[英]Best way to combine probabilistic classifiers in scikit-learn


I have a logistic regression and a random forest and I'd like to combine them (ensemble) for the final classification probability calculation by taking an average.

我有一個邏輯回歸和一個隨機森林,我想通過取平均值將它們(整體)組合起來進行最終的分類概率計算。

Is there a built-in way to do this in sci-kit learn? Some way where I can use the ensemble of the two as a classifier itself? Or would I need to roll my own classifier?

在sci-kit中有沒有內置的方法來學習?在某些方面我可以使用兩者的集合作為分類器本身?或者我需要滾動自己的分類器?

3 个解决方案

#1


30  

NOTE: The scikit-learn Voting Classifier is probably the best way to do this now

注意:scikit-learn Voting Classifier現在可能是最好的方法


OLD ANSWER:

For what it's worth I ended up doing this as follows:

為了它的價值我最終做到如下:

class EnsembleClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, classifiers=None):
        self.classifiers = classifiers

    def fit(self, X, y):
        for classifier in self.classifiers:
            classifier.fit(X, y)

    def predict_proba(self, X):
        self.predictions_ = list()
        for classifier in self.classifiers:
            self.predictions_.append(classifier.predict_proba(X))
        return np.mean(self.predictions_, axis=0)

#2


3  

Given the same problem, I used a majority voting method. Combing probabilities/scores arbitrarily is very problematic, in that the performance of your different classifiers can be different, (For example, an SVM with 2 different kernels , + a Random forest + another classifier trained on a different training set).

鑒於同樣的問題,我使用了多數投票方法。任意組合概率/分數是非常有問題的,因為不同分類器的性能可能不同(例如,具有2個不同內核的SVM,+隨機森林+在不同訓練集上訓練的另一個分類器)。

One possible method to "weigh" the different classifiers, might be to use their Jaccard score as a "weight". (But be warned, as I understand it, the different scores are not "all made equal", I know that a Gradient Boosting classifier I have in my ensemble gives all its scores as 0.97, 0.98, 1.00 or 0.41/0 . I.E. it's very overconfident..)

“權衡”不同分類器的一種可能方法可能是將他們的Jaccard分數用作“權重”。 (但是請注意,據我所知,不同的分數並非“全部相等”,我知道我的合奏中的漸變增強分類器將其所有分數分別設為0.97,0.98,1.00或0.41 / 0.IE即時非常過分自信..)

#3


3  

What about the sklearn.ensemble.VotingClassifier?

那sklearn.ensemble.VotingClassifier呢?

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html#sklearn.ensemble.VotingClassifier

Per the description:

根據描述:

The idea behind the voting classifier implementation is to combine conceptually different machine learning classifiers and use a majority vote or the average predicted probabilities (soft vote) to predict the class labels. Such a classifier can be useful for a set of equally well performing model in order to balance out their individual weaknesses.

投票分類器實現背后的想法是結合概念上不同的機器學習分類器並使用多數投票或平均預測概率(軟投票)來預測類標簽。這樣的分類器可以用於一組同樣表現良好的模型,以便平衡它們各自的弱點。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2014/02/02/729fa92171d68d574097ba28cb555379.html



 
粤ICP备14056181号  © 2014-2020 ITdaan.com