2️⃣Logistic Regression

Logistic regression is a supervised learning algorithm used for classification. It is used to predict a binary outcome (1 / 0, Yes / No, True / False) given a set of independent variables. The outcome is modeled using a logistic function, also called the sigmoid function, which produces an output between 0 and 1.

The logistic regression model is based on the idea that there is a relationship between the input variables (x) and the log-odds of the output variable (y). The log-odds is the logarithm of the probability of the outcome being 1 (p) divided by the probability of the outcome being 0 (1-p). This relationship is represented by an equation of the form:

log(p/(1p))=b0+b1x1+b2x2+...+bnxnlog(p/(1-p)) = b0 + b1x1 + b2x2 + ... + bn*xn

where b0, b1, b2, ..., bn are the coefficients of the logistic equation and x1, x2, ..., xn are the input variables. The coefficients are estimated from the training data using a method called maximum likelihood estimation (MLE).

Once the coefficients are estimated, the logistic regression model can be used to make predictions on new data. For example, given a new input data point (x), the predicted probability of the outcome being 1

Analogy:

Imagine you have a basket of apples, some of them are good to eat, some are rotten. You want to use logistic regression to filter the good apples from the rotten ones.

The input variables, or features, could be the color of the apple, size, and shape. The output variable, or target, would be whether the apple is good or rotten.

A logistic regression model would take the input variables and use them to calculate the probability of the apple being good to eat. This probability is represented by a value between 0 and 1, similar to how a filter separates good and bad apples.

Just like the filter, the logistic regression model uses a set of rules to make a prediction. In this case, the rules are represented by the coefficients of the logistic equation, which are estimated from the training data.

So, for example, if the model finds that red apples are more likely to be good, the corresponding coefficient for color will be positive, and if it finds that small apples are more likely to be rotten, the corresponding coefficient for size will be negative.

This analogy helps convey the idea that logistic regression is a method for estimating the probability of an outcome based on input variables, and using that probability to make predictions in a clear and simple way.

Example

This code imports the necessary libraries and defines the input variable x as a 2-dimensional array of integers representing the features of the data and the output variable y as a 1-dimensional array of integers representing the target (0 for negative class and 1 for positive class).

Then, a LogisticRegression model is created, and then it's fit to the data using the fit() method. The coefficients of the logistic equation, the intercept and the coefficient of x variable, are then printed using the intercept_ and coef_ attributes of the model.

Finally, the code makes predictions for new data points (x_new) using the predict() method and prints the predictions.

This is a simple example, but in real world applications the dataset will be more complex and the features may be represented by multiple variables and the target will not always be binary.

Python code

import numpy as np
from sklearn.linear_model import LogisticRegression

# Sample data
x = np.array([[1, 2], [2, 3], [3, 1], [4, 2], [5, 4]])
y = np.array([0, 0, 0, 1, 1])

# Create the logistic regression model
model = LogisticRegression()

# Fit the model to the data
model.fit(x, y)

# Print the coefficients
print(f'Intercept: {model.intercept_}')
print(f'Coefficients: {model.coef_}')

# Make predictions
x_new = np.array([[6, 2], [7, 4], [8, 1]])
y_pred = model.predict(x_new)
print(f'Predictions: {y_pred}')

Output

Intercept: [-4.29840804]
Coefficients: [[0.99998013 0.31431841]]
Predictions: [1 1 1]

References:

https://www.javatpoint.com/logistic-regression-in-machine-learning

Last updated