### 在scikit-learn中結合概率分類器的最佳方法

#### [英]Best way to combine probabilistic classifiers in scikit-learn

I have a logistic regression and a random forest and I'd like to combine them (ensemble) for the final classification probability calculation by taking an average.

Is there a built-in way to do this in sci-kit learn? Some way where I can use the ensemble of the two as a classifier itself? Or would I need to roll my own classifier?

## 3 个解决方案

### #1

30

NOTE: The scikit-learn Voting Classifier is probably the best way to do this now

For what it's worth I ended up doing this as follows:

``````class EnsembleClassifier(BaseEstimator, ClassifierMixin):
def __init__(self, classifiers=None):
self.classifiers = classifiers

def fit(self, X, y):
for classifier in self.classifiers:
classifier.fit(X, y)

def predict_proba(self, X):
self.predictions_ = list()
for classifier in self.classifiers:
self.predictions_.append(classifier.predict_proba(X))
return np.mean(self.predictions_, axis=0)
``````

### #2

3

Given the same problem, I used a majority voting method. Combing probabilities/scores arbitrarily is very problematic, in that the performance of your different classifiers can be different, (For example, an SVM with 2 different kernels , + a Random forest + another classifier trained on a different training set).

One possible method to "weigh" the different classifiers, might be to use their Jaccard score as a "weight". (But be warned, as I understand it, the different scores are not "all made equal", I know that a Gradient Boosting classifier I have in my ensemble gives all its scores as 0.97, 0.98, 1.00 or 0.41/0 . I.E. it's very overconfident..)

“權衡”不同分類器的一種可能方法可能是將他們的Jaccard分數用作“權重”。 (但是請注意,據我所知,不同的分數並非“全部相等”,我知道我的合奏中的漸變增強分類器將其所有分數分別設為0.97,0.98,1.00或0.41 / 0.IE即時非常過分自信..)

### #3

3

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html#sklearn.ensemble.VotingClassifier

Per the description:

The idea behind the voting classifier implementation is to combine conceptually different machine learning classifiers and use a majority vote or the average predicted probabilities (soft vote) to predict the class labels. Such a classifier can be useful for a set of equally well performing model in order to balance out their individual weaknesses.