Ensemble Modeling
We’ve learned that diversifying our trees can create a more accurate
prediction. But what if, instead of using several versions of the same model,
we just used several different models? This is a common trick in machine
learning, known as ensemble modeling. By combining information from
multiple
different types of algorithms, we can improve our model’s
accuracy and ability to forecast.
Ensemble modeling is all about the divide and conquers mindset. Different
models will give us different insights about the data that may not be
recognizable by other models. By combining the information attained from
different models, we can learn even more about the truth of our data.
Ensemble modeling also helps to minimize bias and variance in our
predictions. Individual models may have prediction errors, but the
aggregate of all our predictions will be more accurate.
There are a few different methods to use ensemble modeling:
The first is to take the mode of your predictions. That is, take the value
which occurs most frequently across the models.
Whichever prediction
occurs the most frequently, or has the highest number of ‘votes,’ is the
prediction we choose.
We could also take the average of all the predictions,
depending on what
kind of models we have. The average of all the predictions will be our final
prediction. Our ensemble should also consider the reliability of individual
models. The results of our models receive different weights, making some
predictions more important than others based on reliability.
How do we know which kind of models we want to combine? We already
know from this book that there are several types of models to choose from,
each giving us different possibilities and advantages.
A common pair of models is using neural networks and decision trees
together. Neural networks give us new information,
and the decision tree
ensures that we have not missed anything.
In addition to the bootstrapping and bagging that we discussed earlier, there
are a few other ways of doing ensemble modeling. Data scientists use
what’s called a bucket of models. Here they use several different types of
models to use with the test data and then choose the one that did the best.
Another idea is called stacking. Stacking uses several different types of
models and then uses all the results to give us a prediction that is a
combination of all of them.
Data scientists like to
use ensemble modeling because, with a variety of
models, we can usually produce better predictions than with a single model
alone.
The drawback to ensemble modeling is that we lose some of our readability.
Having multiple models working together at once makes interpretation
more difficult, especially when you want to share the data with stakeholders
who don’t have data science knowledge.
We can also use several versions of the same model,
like how random
forests improve the forecast with multiple versions of itself — using neural
networks with different sets of nodes and different values for k, or numbers
of clusters to see how that changes the outcome of our prediction and find if
there is an optimal value for k, or if there are groups or subgroups that we
may have overlooked.
It doesn’t do as much when we already have a strong model. But if we
combine a few models that have weaker forecasting abilities, then it usually
improves the overall accuracy.