October 15, 2021

Decision Tree Models - Explained

Decision trees are one of the most powerful tools in data analysis. Using decision trees, we can classify and predict outcomes in various scenarios. This article discusses how decision trees can be used to generate insight from historical data and apply it to new problems.

What are decision tree models? 

A decision tree model is a simple method that can be used to classify objects according to their features. For example, you might have a decision tree that tells you if your object is an apple or not based on the following attributes: color, size, and weight. A decision tree works by going down from the root node until it reaches the decision node. The decision nodes have branches that take us to either leaf nodes or more decision nodes. Leaf nodes are terminal nodes that present a final decision just as their name suggests. 

You might first ask yourself what decision trees have anything to do with machine learning, so let's break this down further before we go any further. Decision trees work by branching things out into different groups ("decision" = "de- + code,"). In other words, they give clues as to how objects can be classified according to their features.

As decision trees are binary, they only have two decision nodes that split into left or right branches. For example, let's say there is an object with certain features and you want to use a decision tree to classify it as apple or not apple. The decision tree works by first asking if the color of the object is red or green (the root decision node). 

If it is then we follow one set of decision branches which leads us to another decision node where we ask about the size and if it is large then we know it must be an apple and this creates a leaf node in which we just say "apple."

This method might seem simple but when you add enough attributes, such as weight, for example, decision trees can become very effective. In the decision tree models, these decision trees are created in a method called "categorical optimization."

In plain English, every decision tree works by going down from the root node until it reaches the decision node. The decision nodes have branches that take us to either leaf nodes or more decision nodes.

What are decision trees used for? 

Decision trees have many applications in machine learning and in particular in predictive modeling with several advantages over other methods such as regression or neural networks.

Decision trees model complex relationships between variables by finding patterns in the training set and applying them to new data. They can be used to predict the probability of events and find optimal decision-making strategies for decision-makers. 

It can also work as a generative model, finding patterns in training data and using those patterns to predict unobserved cases. An example might be the decision tree model that predicts if a student would vote Democrat or Republican based on political views, education, and income.

What is random decision tree modeling? 

The decision tree model uses decision trees as its basis, which enables it to classify complex objects by recursively breaking them down into smaller groups based on their features. A decision tree works by going down from the root node until it reaches decision nodes (also known as splitting nodes). The decision nodes represent a point at which a decision is made and have decision branches that lead to either leaf nodes or more decision nodes.

At each decision node, there is a divide by zero error because the decision tree tries to split on too many variables at once, resulting in "overfitting" of the model where it predicts training data better than generalizing test data. If you were to use decision trees for classification with scikit-learn, random decision trees are created using randomized construction methods with statistical methods in order to reduce the chance of overfitting.

Random decision trees work by creating decision trees but this time randomizing which features are used when splitting up objects into different groups ("randomized + construct"). This process creates multiple decision tree models with slightly different decision trees and then averages the decision tree's accuracy in order to reduce the chance of overfitting.

Another way to reduce decision tree model overfitting is by using a boosting method called AdaBoost, which stands for "Adaptive Boosting." In the decision trees models, this technique works by starting with an empty decision tree and iteratively adding decision nodes as long as they improve the classification accuracy. This iterative process continues until no more decision nodes can be added or there is no significant improvement in the decision node being added.

Decision tree models that use boosting (AdaBoost) start with an empty decision tree and iteratively adds decision nodes in order to improve accuracy. AdaBoost can be used when decision trees are created in scikit-learn in the classifier module.

Random decision forests are another type of decision tree model that uses multiple randomized decision trees in order to reduce the chance of overfitting, similar to random decision trees. The difference between random decision forests and random decision trees is that random forest models use averaging across all randomized decision trees whereas random decision tree models only choose one single best set of features when splitting up objects into groups.

Categorization of Decision Trees

Decision trees can be classified into two types: continuous response and binary/multiclass response. Continous response decision trees are used for problems where there is an interval or continuous output, such as house prices or incomes or age, whereas binary/multiclass response decision trees are used when you have two possible categories of outputs, such as yes versus no, good versus bad, etc. The process for building decision trees is similar across both types of decision trees.

Where decision trees are categorical models with decision nodes that split by testing for equality of values, decision stump is a decision tree model where each decision node only tests for membership in one group or class. So instead of using equality to split decisions across variables, decision stumps use "one-hot encoding" to split decisions across variables.

How does the Decision Tree Model work?

For decision trees, input data points are taken as decision tree nodes that are then arranged in hierarchical order with decision node attributes used as decision boundaries between them. At each decision region, one or more branches lead out to the next decision region, and output is generated by classifying an input data point at each decision node into its respective non-terminal decision regions according to their attribute values.

However, it should be kept in mind that for regression decision trees, we do not have terminal decision regions due to the existence of more than one possible class label. In such cases, we simply assign predicted value within the confidence interval of the response variable range for each leaf node i.e., we predict the mean value of response variable distribution for this decision region if we were to take another random sample from the same population.

Advantages of Tree-Based Algorithms

Tree-based algorithms provide the following advantages:

  • These algorithms produce easily understandable models due to their graphical representation.
  • They allow non-linear separability - They allow non-linear separability (decision boundaries can be any shape and decision nodes can have more than one label).
  • They are not sensitive to the scale of decision attributes (they work well for both categorical and continuous decision attribute types).

Disadvantages of Tree-Based Algorithms

Tree-based algorithms suffer from the following disadvantages:

  • Because these models build classification trees by recursive partitioning, they may generate very complex decision boundary regions. This may lead to overfitting in decision tree models. 
  • To overcome this disadvantage, pruning techniques like reduced error pruning are used. Another way is to use regularization methods like ridge regression, lasso, etc.

Wrapping Up

In this article, we have seen that the decision tree is a decision support tool that uses branch-and-bound search (or any random optimization technique) on decision node attributes to produce decision tree models.

At its simplest form, it can be imagined as putting all of one's available choices on the edges of a diagram and then drawing lines between them in such a way that all "Yes" answers go down on one side of the line while all "No" responses land on the other. 

At each decision node, there are zero or more than zero branches leading out depending upon the data. It may not be immediately obvious how this decision process works but decision trees can be used for classifying new data points based on their attribute values.


Share Post:

Comments System WIDGET PACK
Written by

Harsh Gupta

Start engaging with your users and clients today