The demand for AI and Machine Learning (ML) professionals is high. Hence, it is evident that people will be looking to pursue a career in ML. While most beginners start with primary classification and regression algorithms, learning about SVM can help set a concrete foundation for your ML career. The full form of SVM is Support Vector Machines. If you want to pursue a career in Machine Learning, support vector machines must be a part of your arsenal. Let’s delve deep to find out an answer to what is the SVM algorithm?

  1. What are support vector machines?
  2. SVM use case example
  3. How does a support vector machine work?
  4. What is a linear SVM?
  5. What is a non-linear SVM?
  6. What are the different types of kernels?
  7. What are the other tuning parameters used with a kernel in support vector machines?
  8. Implementing the support vector machine algorithm in Python
  9. What are the applications of SVM algorithms?

1. What are support vector machines?

The SVM algorithm is a statistics-based algorithm that is widely accepted as a Supervised Learning algorithm. You can use the support vector for regression and classification. However, it is primarily used as an SVM classification algorithm.

The primary goal of support vector machines is to create a decision boundary (hyperplane) that can be used to create classes into an n-dimensional space (n is the number of features). SVM can then use the segregated classes to add new data points correctly.

SVM chooses some extreme data points to create a decision boundary. These extreme data points are termed as support vectors, hence the name support vector machines. Let’s understand this concept better with an example.

2. SVM use case example

There are several similar-looking animals such as jaguar and leopard, seal and sea lion, and even cat and dog. Now, suppose you find a strange cat having some dog features and want to differentiate it accurately; the SVM algorithm can help. You need to first train your data with a lot of cat and dog images. With data’s help, the algorithm will learn about different and similar features of dogs and cats.

Now, when you feed the strange cat’s image, it will select some support vectors to create a hyperplane. With the hyperplane’s help, the support vector machines will classify the image as that of a cat and not a dog.

3. How does a support vector machine work?

Before getting into the SVM’s working mechanism, it is essential to understand some of the key terminologies and metrics it uses.

  • Hyperplane: Hyperplanes are decision boundaries that help segregate data points into different classes. It creates different sides, and based on the side where the newly added data point falls, it can be attributed to that particular class. Depending on the number of features, the hyperplane can have multiple dimensions.
  • Support Vectors: These are the extreme data points that can directly impact the position of a hyperplane. They are the closest data points to a hyperplane.
  • Margin: It is the gap between support vectors and the hyperplane. There can be multiple hyperplanes, but the SVM algorithm’s goal is to select the one with the maximum margin as margin and accuracy are directly proportional to each other.

Now, since you know the key terminologies, it’s time to delve deep into the SVM algorithm flowchart.

The first thing that the SVM algorithm does is to read the training data to segregate a class’s different features or attributes. It then uses support vectors to create a hyperplane with maximal margin.

Whenever a new data point is added to the database, the SVM determines which side of the hyperplane the data falls. Depending on the falling side, the SVM can add the data point to the class associated with that feature.

  • Types of support vector machines?

Depending on the dataset, there are two different support vector machine algorithms: linear and non-linear.

4. What is a linear SVM?

As the name gives it out, a linear SVM is used on a linearly separable dataset with only a single feature. It uses a linear SVM classifier to separate features of a dataset. Take an example of a dataset with only a single feature: a person’s weight, for example. Here, the support vector machines algorithm can easily create a hyperplane with the help of support vectors. Whenever new data is added, the SVM will detect which side of the hyperplane it falls to classify a person as obese or not obese.

Image source: commons.wikimedia.org

5. What is a non-linear SVM?

A dataset with multiple features uses non-linear SVM for classification and regression. It uses a non-linear SVM classifier to separate features of a dataset. Due to numerous features, you cannot split a dataset linearly. Hence, SVM adds another dimension using a kernel to separate the features of such datasets. You can term this new dimension as z and calculate it as z = x2 + Y2. This will transform the dataset into a linearly separable one. The SVM can then create a hyperplane using the support vectors. Once the hyperplane is made, the dataset is transformed back to a non-separable one.

Image source: commons.wikimedia.org

  • What is a kernel in support vector machines?

A kernel is a tuning parameter that can transform low dimensions inputs to higher dimensions for enhancing the accuracy of the outputs. They help support vector machines to create a hyperplane in non-linear datasets.

6. What are the different types of kernels?

Depending on the approach to segregate data, there are three popular types of kernels, which are:

Linear kernel

Syntax: K(x, y) = sum(x*y)

x and y are vectors

It is used for separable datasets with a huge number of features, for instance, text classification. Linear kernel training is the fastest among all kernels.

Polynomial kernel

Syntax: K(x, y) = (xT*y + c)d

x and y are vectors, c is a constant, and d is the kernel’s order. The order is decided manually by the developer. It is similar to the linear kernel but can be used in non-linear datasets.

Radial basis function kernel

Syntax: K(x, y) = exp(-gamma*sum(x – y2))

x and y are vectors, and gamma is a tuning parameter (0 to 1). Gamma is supposed to be pre-defined by the developer.

Also referred to as the Gaussian kernel, it is a kernel widely used in SVM classification algorithms. It can transform a dataset into indefinite dimensions and make it linearly separable. The radial basis function kernel provides the best accuracy but consumes a lot of training time.

You need to choose the kernel that best suits your needs. For instance, although the radial basis function kernel gives the best accuracy, you should select linear kernel or polynomial kernel if you don’t have enough time and resources to train the solution.

7. What are the other tuning parameters used with a kernel in support vector machines?

There are two other tuning parameters used with the kernel in an SVM algorithm: C regularization and gamma.

  • C regularization: This parameter allows misclassification in the training set. A higher value leads to smaller misclassification, resulting in a small-margin hyperplane. On the contrary, a smaller value leads to a higher misclassification and a high-margin hyperplane.
  • Gamma: Higher values allow the SVM to only consider nearby support vectors for creating a hyperplane, and lower values consider even the far vectors.

8. Implementing the support vector machine algorithm in Python

The easiest way to implement the support vector machine algorithm in Python is through the Scikit-Learn library. It is one of the most popular libraries that can help run several ML algorithms, and support vector machines algorithm is one of them. It also provides the iris dataset that we will be using here.

The first thing to do is to import the necessary libraries, such as Pandas and NumPy, and the iris dataset.

import pandas as pd

import numpy as np

from sklearn import svm, datasets

import matplotlib.pyplot as plt

iris = datasets.load_iris()

The next thing is to take the first two features, like this:

X = iris.data[:, :2]

y = iris.target

After taking the features, plot the SVM boundaries.

x_min, x_max = X[:, 0].min() – 1, X[:, 0].max() + 1

y_min, y_max = X[:, 1].min() – 1, X[:, 1].max() + 1

h = (x_max / x_min)/100

xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

X_plot = np.c_[xx.ravel(), yy.ravel()]

The next thing is to set the regularization value, which can be anywhere between 0 to 1 (we are assigning it 1.0).

C = 1.0

Now, it’s time to create an SVM classifier, or in other words, train the support vector machines algorithm to create the hyperplane.

Svc_classifier = svm.SVC(kernel=’linear’, C=C).fit(X, y)

You can also replace the kernel attribute with any other types of kernels to train the algorithm accordingly. After creating the SVM classifier, you can visualize the SVM classifier with the following code:

Z = svc_classifier.predict(X_plot)

Z = Z.reshape(xx.shape)

plt.figure(figsize=(15, 5))

plt.subplot(121)

plt.contourf(xx, yy, Z, cmap=plt.cm.tab10, alpha=0.3)

plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1)

plt.xlabel(‘Sepal length’)

plt.ylabel(‘Sepal width’)

plt.xlim(xx.min(), xx.max())

plt.title(‘Support Vector Classifier with linear kernel’)

If you don’t want to use the Scikit-Learn library, you can train support vector machines by following the standard steps, which are:

  • Import the libraries and dataset: The first step is to import all the necessary libraries and the dataset you want to train the algorithm.
  • Analyze data: You can analyze your data through multiple methods, including checking the dimensions, setting KPIs, and dividing the entire dataset into response and explanatory variables.
  • Data preprocessing: Check for incomplete or irrelevant data and exclude it. Next, divide the data into different attributes.
  • Train the algorithm and execute: The last thing to do is train the algorithm and visualize the SVM classifier.

9. What are the applications of SVM algorithms?

SVM provides several benefits, such as effective in separating even non-linear datasets, giving accurate results in low and high dimension spaces, and immune to the overfitting problem (only support vectors impact hyperplane and not overfitting). However, a primary question that remains here is where to use it. What are the applications of support vector machines? You will find numerous SVM applications across classification and regression problems. Here are some of the most common use cases:

  • Text classification
  • Character recognition
  • Image classification
  • Satellite data classification (Synthetic-Aperture Radar)
  • Biological substances classification (Proteins)

What’s next?

This guide describes the basics of support vector machines essential to know for everyone in the AI field. However, if you want to build a great career and reach new heights in the area, it is crucial to go beyond the basics. While you have come to know the answers to SVM and how it works, several other questions, such as how the SVM knows which is the best hyperplane and what are the other libraries in Python to use with the SVM algorithms, remains.

Eager to learn more about Logistic Regression and other Machine Learning tools, Deep Learning and Artificial Intelligence? Check out our Postgraduate Certificate Program In Artificial Intelligence & Deep Learning. Designed in collaboration with Manipal Academy of Higher Education (MAHE), it is the only program in the field of Artificial Intelligence, Machine Learning (ML), and Deep Learning (DL), provided by Google Colab’s GPU-based cloud laboratories. This six-month postgraduate training program provides intense practical experience over 100+ hours.

Also Read

SHARE