3️⃣Decision Trees

A decision tree is a tree-based model that uses a set of decision rules to partition the input space and make predictions. The decision tree algorithm builds a tree from the training data, recursively partitioning the input space into smaller regions, called leaves, by making a series of binary decisions.

Each internal node of the tree represents a test on an input feature, each branch represents the outcome of the test, and each leaf node represents a class label or a prediction. The goal is to partition the input space into regions that are as pure as possible with respect to the target variable. This process is recursive and continues until a stopping criteria is met, such as a maximum tree depth, or a minimum number of samples in a leaf.

Mathematically, the decision tree algorithm builds a tree by repeatedly selecting the feature that maximizes a certain criterion, such as information gain or Gini impurity. Information gain measures the decrease in entropy from the parent node to a child node, and Gini impurity measures the probability of misclassification at a given node. The feature with the highest information gain or lowest Gini impurity is chosen as the splitting feature. The process is then recursively applied to each child node until a stopping criteria is met.

For example, in a binary classification problem, a decision tree algorithm recursively partitions the input space by selecting a feature and a threshold, and then split the data into two subsets: one subset where the feature is greater than the threshold, and another subset where the feature is less than or equal to the threshold. The process is then recursively applied to each subset until the stopping criteria is met.

In a nutshell, Decision tree is a powerful method for classification and regression problems, it's easy to understand, interpret, and visualize. It's based on a simple and intuitive concept that recursively partitions the input space into smaller regions, making it easy to understand the relationship between the input features and the target variable.

A decision tree algorithm builds a tree by repeatedly selecting the feature that maximizes a certain criterion, such as information gain or Gini impurity. Information gain is defined as the reduction in entropy from the parent node to a child node, and it can be calculated using the following equation:

InformationGain=Entropy(parent)Sum(Entropy(child)Proportion(child))Information Gain = Entropy(parent) - Sum(Entropy(child)*Proportion(child))

Information Gain = Entropy(parent) - Sum(Entropy(child)*Proportion(child))

Where:

  • Entropy(parent) is the entropy of the parent node, which is a measure of the impurity of the data at that node. It can be calculated using the following equation:

Entropy(parent)=iP(i)log2P(i)Entropy(parent) = - ∑i P(i) log2 P(i)

Where P(i) is the proportion of the data that belongs to class i.

  • Entropy(child) is the entropy of a child node, which is a measure of the impurity of the data at that node. It can be calculated in the same way as the entropy of the parent node.

  • Proportion(child) is the proportion of the data that belongs to the child node.

Another criterion used to select the feature is Gini impurity, it's defined as the probability of misclassifying a randomly chosen element from the set, it can be calculated using the following equation:

Giniimpurity=1iP(i)2Gini impurity = 1 - ∑i P(i)^2

Gini impurity = 1 - ∑i P(i)^2

Where P(i) is the proportion of the data that belongs to class i.

This measure ranges between 0 and 0.5. A lower value of Gini impurity indicates that the data in that node is more pure or less likely to be misclassified.

It's also worth mentioning that, the decision tree algorithm also uses a stopping criteria to decide when to stop the recursive partitioning process, this criteria can be set by the user, for example, maximum tree depth, minimum number of samples in a leaf, or a minimum impurity.

Analogy:

A decision tree can be thought of as a flowchart that helps to make a decision by asking a series of questions. Each question corresponds to a test on an input feature, each answer corresponds to the outcome of the test, and the final decision corresponds to the leaf node.

Imagine you are trying to decide what to eat for dinner. You start by asking yourself, "Do I want something light or something heavy?". Depending on your answer, you might ask yourself another question like "Do I want something sweet or something savory?". You keep asking questions and following the answer until you reach the final decision of what to eat.

This process is similar to how a decision tree algorithm works. The algorithm starts with the root node, which represents the entire population or sample. It splits the population into two or more subsets based on the most significant differentiator/feature, which is like asking the question that maximizes the information gain or minimizes the Gini impurity. This process is continued recursively for each child node until it reaches the final decision.

In this analogy, the questions are the features, the answers are the decision rules, and the final decision is the leaf node. The decision tree algorithm is a powerful tool for both classification and regression problems, it's easy to understand, interpret, and visualize. It's based on a simple and intuitive concept that recursively partitions the input space into smaller regions, making it easy to understand the relationship between the input features and the target variable.

Example

In this example, the iris dataset is loaded and the data is split into training and test sets using the train_test_split function. Then, the DecisionTreeClassifier is imported and trained using the fit() method on the training data. Once the model is trained, it can be used to make predictions on new data using the predict() method. The code then prints the accuracy of the model.

It's important to note that this is a simple example and in the real world the dataset will be more complex, the features may be represented by multiple variables and the target will not always be a classification problem, but rather a regression problem, but the idea behind the decision tree remains the same. Decision tree is a powerful method for classification and regression problems, it's easy to understand, interpret, and visualize.

Python code

from sklearn import datasets
from sklearn import tree
from sklearn.model_selection import train_test_split
import numpy as np

# load the iris dataset as an example
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

# Train the decision tree classifier
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Print the predictions
print("Predictions:", y_pred)

# Print the accuracy of the model
accuracy = (y_pred == y_test).sum() / len(y_test)
print("Accuracy:", accuracy)

Output

Predictions: [0 1 1 0 2 1 2 0 0 2 1 0 2 1 1 0 1 1 0 0 1 1 2 0 2 1 0 0 1 2 1 2 1 2 2 0 1
 0 1 2 2 0 1 2 1]
Accuracy: 0.9555555555555556

References:

https://venngage.com/blog/what-is-a-decision-tree/

Last updated