5️⃣Support Vector Machines

Support Vector Machine (SVM) is a type of supervised learning algorithm that can be used for classification or regression problems. The main idea behind SVM is to find the best boundary (or "hyperplane") that separates the different classes in the data. This boundary is chosen in such a way that it maximizes the margin, which is the distance between the boundary and the closest points from each class, also known as support vectors.

SVMs are particularly useful when the data has many features and when the classes are not linearly separable, meaning that a straight line (or a hyperplane in higher dimensions) cannot properly separate the classes. In such cases, SVMs can use a technique called the "kernel trick" to transform the data into a higher-dimensional space where a linear boundary can be applied.

Example

In this example, we first generate some synthetic data using the make_classification function from the sklearn.datasets module. Next, we create an instance of the SVC (support vector classifier) with a linear kernel and a regularization parameter of 1. We then fit the model to the data using the fit method and make predictions on new data using the predict method.

There are different types of kernel that can be used in SVM, such as linear, polynomial, and radial basis function (RBF). The choice of kernel depends on the nature of the data and the problem.

SVM is a powerful algorithm and it's widely used in many industry problem, as it can handle high dimensional data and also it can handle non-linearity by using kernel trick. However, it's worth noting that a SVM model can be overfitting and it can be resolved by tuning the parameters or by using the techniques like cross-validation, pruning and regularization.

Python code


from sklearn import svm
from sklearn.datasets import make_classification

# Generate some synthetic data
X, y = make_classification(n_features=4, n_informative=2, n_redundant=0, random_state=0)

# Create the model
clf = svm.SVC(kernel='linear', C=1)

# Train the model
clf.fit(X, y)


# Generate new data for predictions
X_new, _ = make_classification(n_features=4, n_informative=2,
                           n_redundant=0, random_state=1)

# Predict on new data
y_pred = clf.predict(X_new)
print(y_pred)

Output

[1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 0 0 1 1 0 1 0 1 0
 0 0 1 0 1 1 1 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 0 1 1 1 0 1 1 1 1 1 1 0 1
 0 1 0 1 0 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 1 0 1 1]

References

https://en.wikipedia.org/wiki/Support_vector_machine

Last updated