1️⃣Linear Regression

Linear regression is a supervised learning algorithm used to predict a continuous outcome variable (also called dependent variable) based on one or more predictor variables (also known as independent variables, features or attributes). Linear regression models the relationship between the data-points by fitting a linear equation to the observed data.

The basic idea behind linear regression is that there is a linear relationship between the input variables (x) and the output variable (y). This relationship is represented by an equation of the form:

y=b0+b1x1+b2x2+...+bnxny = b0 + b1x1 + b2x2 + ... + bn*xn

where b0, b1, b2, ..., bn are the coefficients of the linear equation and x1, x2, ..., xn are the input variables. The coefficients are estimated from the training data using a method called Ordinary Least Squares (OLS).

Once the coefficients are estimated, the linear regression model can be used to make predictions on new data. For example, given a new input data point (x), the predicted output (y) can be computed by plugging the values of the input variables into the linear equation.

Linear regression is simple, easy to understand and interpret, and it has good interpretability. However, it can only be used when there is a linear relationship between the input and output variables. If the relationship is non-linear, a non-linear regression model would have to be used.

Analogy:

Imagine you have a scatter plot of points that represent the relationship between the number of hours studied and the test score. A linear regression algorithm will draw a straight line through these points that best fits the data. The line represents the linear equation that models the relationship between the hours studied and the test score.

The slope of the line represents the coefficient of the number of hours studied variable in the linear equation, and the y-intercept represents the coefficient of the constant term. This line allows us to predict the test score for any given number of hours studied.

For example, if the line equation is y = 3x + 10, it means for every hour studied, the test score will increase by 3 points and the minimum score when no hour studied is 10.

This analogy helps to convey the idea that linear regression is a method for finding the best-fitting line through a set of data points, and using that line to make predictions about new data points.

Example

This code imports the necessary libraries and defines the input variable x as an array of integers representing the number of hours studied and the output variable y as an array of integers representing the test scores.

Then, a LinearRegression model is created, and then it's fit to the data using the fit() method. The coefficients of the linear equation, the intercept and the coefficient of x variable, are then printed using the intercept_ and coef_ attributes of the model.

Finally, the code makes predictions for new data points (x_new) using the predict() method and prints the predictions.

Python Code

import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data
x = np.array([[1], [2], [3], [4], [5]])
y = np.array([3, 5, 7, 9, 11])

# Create the linear regression model
model = LinearRegression()

# Fit the model to the data
model.fit(x, y)

# Print the coefficients
print(f'Intercept: {model.intercept_}')
print(f'Coefficient: {model.coef_}')

# Make predictions
x_new = np.array([[6], [7], [8]])
y_pred = model.predict(x_new)
print(f'Predictions: {y_pred}')

Output:

Intercept: 1.0
Coefficient: [2.]
Predictions: [13. 15. 17.]

References:

https://www.geeksforgeeks.org/ml-linear-regression/

Last updated