Untitled

  1. Random forest is a supervised learning algorithm

  2. The forest it builds is an ensemble of decision trees, usually trained with the bagging method. The idea of a bagging method is that a combination of learning models increases the overall results. Each tree is trained on a random sample of the data

  3. Random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction.

  4. We make a prediction, not based on one decision tree, but by an unanimous prediction, made by “k” decision trees

  5. Random forest adds additional randomness to the model. When building individual trees, random forest introduces randomness not just in the sampling of the dataset but also in the selection of features. For any given split, only a subset of all the features is considered. This results in a wide diversity that generally results in a better model.

  6. Only a random subset of features is take into consideration by the algorithm for splitting a node

  7. The main limitation of random forest is that a large number of trees can make the algorithm too slow and ineffective for real-time predictions

  8. Random forest is a predictive modeling tool, not a descriptive tool, meaning if you are looking for a description of relationships in your data, other approaches would be better.

Advantages of Random forest:

  1. The chance of overfitting decreases, since several different decision trees are used in the learning procedure

  2. Random forest can be used for both classification and regression problems. For classification tasks, the prediction by the random forest is made by majority voting. Each tree votes and the most popular class is returned. For regression tasks, it usually averages the predictions.

  3. Training of multiple trees can be parallelized

  4. Unlike linear models, Random forest are able to capture non-linear interactions between the input features and the target variable.