Introduction

Sigmoid functions are popularly used in neural networks and deep learning algorithms because of their uses as activation functions. For Ex: Biological neural networks activation.

They are also used in machine learning applications, where a real number needs to be mapped to a dataset and deduces the probability of an event. Ex: Tumour spread based on its size. In deep learning networks, it is used for its activation potential in algorithms using sigmoid functions between the layers. They also form a part of logistic regression models using two variables, one real and the other a probability expressed as a logistic function. For Ex: Will a customer buy this product? So, let’s study sigmoid-functions!

  1. Sigmoid Function Formula
  2. Calculating the Sigmoid Function
  3. Sigmoid Function vs ReLU
  4. Applications of Sigmoid Function
  5. History

1. Sigmoid Function Formula

For the actual formulae of sigmoid-functions, one would need to understand logistic regression in the sigmoid function equation and involves a lot of mathematics. Consider a mathematical function with the S (Sigma)-shaped sigmoid curve being called a sigmoid function for brevity. Common functions are the Hyperbolic, logistic, and arctangent sigmoid functions. In machine learning, the term refers to the sigmoid logistic function. 

2. Calculating the Sigmoid Function

Looking at the key properties of sigmoid-functions, one can see that probability is linked to the convergence of the functions and is very fast in logistic functions, very slow in the arctan function and very fast in the tan hyperbolic functions. These functions are used for deducing probability because they map 2 classes by converting the data to small ranges between 1 and 0 using sigmoid values wherein the output can read the probability of an event’s occurrence. They always have the first derivative of sigmoid-function curve that is bell-shaped and are monotonic functions. 

The various types of sigmoid graphs are

  1. Logistic Sigmoid Function Formula: The most commonly used sigmoid function in ML works with inputs of any real-value with its output value being between one and zero.
  2. Hyperbolic Tangent Function Formula: The hyperbolic function is used when the input values are real and range between 1 and -1.
  3. Arctangent Function Formula: The arctangent function or inverse of the tangent function is also very popular and used if the real-value of inputs lies between π/2 and −π/2.

3. Sigmoid Function vs ReLU

ReLU is also known as the Rectified Linear Unit which is the present-day substitute for activation functions in artificial neural networks when compared to the calculation-intensive sigmoid functions. The main advantage of the ReLU vs sigmoid-function is its computational ability which is very fast. In biological networks, if the input has a negative value the ReLU activation potential does not change and mimics the system very well. 

If the values of x are positive then the gradient of the ReLU function is constant and has a value of 1. In sigmoid functions, the gradient will converge quickly to zero for these values making the networks dependent on them train very slowly in an issue called the vanishing gradient. ReLU overcomes this problem as its gradient stays at one and learning processes are not affected by the diminishing or vanishing gradient values. At zero gradient and input values being negative, a similar issue happens in the ReLU called the zero gradient issue. This is however resolved by adding to x a small-value linear term such that the ReLU function slope or gradient remains at nonzero for all input values.

4. Applications of Sigmoid Function

  • Logistic regression models for probability prediction: The logistic regression model of sigmoid-functions are used in machine learning to estimate the binary event’s probability with a probability value output between 1 and 0. This means that the dependent variable is either 1 or 0, while the independent variables can have any real value when fit to a dataset. For Ex: Choose a dataset of diagnoses and tumour measurements where one needs to predict the tumour spread based on its size in cm. A plot shows that generally, large tumours spread faster, and overlap in classes is found in tumours between 2.5-3.5 cms. If the model plots using logistic regression, the tumour status on y (1 and 0) with tumour size x (any real value) by finding the best values for b and m, the sigmoid curve can be stretched to fit the data. Such a model shows from plots that tumours of 4cm had near-certainty of spread with y = 1. Thus sigmoid logistic functions can be very useful in modelling for probability.
  • Artificial neural networks using a sigmoid function for activation: In artificial neural networks, there are several functional layers on top of each other. These layers have biases, weights and an activation function. The sigmoid activation function introduces non-linearity between its layers. In the past, sigmoid functions served well in activating neural networks that were biological and function like the arctangent, logistic function, hyperbolic tangent etc., found many uses. In the modern world, variants of ReLU are used for activation by sigmoid functions.

5. History

In 1798, Thomas Robert Malthus postulated in his book that with the population increasing in a GP or geometric progression and food supplies increasing in an arithmetic progression, the difference would lead to a famine. In the 1830s, Pierre François Verhulst chose the logical adjustment of a logistic function to model the population’s growth on depleting its resources.

The next century used sigmoid functions as the tool for models of human civilizations, population growth etc., explaining why sigmoid-functions grew in use. In 1943, Walter Pitts developed Warren McCulloch developed the artificial neural network with an activation function using a hard cutoff. In 1972, Jack Cowan and Hugh Wilson modelled computational biological neurons using the stimulus of neuron activation represented by a sigmoid logistic function in the model. Yann LeCun, in 1988 used the activation sigmoid-function of the hyperbolic tangent in a convolutional neural network to recognize handwritten digits accurately.

Conclusion

Artificial neural networks have preferred ReLU functions over sigmoid, as the sigmoid function variants need intensive-calculation sigmoid analytics, whereas the ReLU function is nonlinear and uses the network’s depth and computes speedily.

There are no right or wrong ways of learning AI and ML technologies – the more, the better! These valuable resources can be the starting point for your journey on how to learn Artificial Intelligence and Machine Learning. Do pursuing AI and ML interest you? If you want to step into the world of emerging tech, you can accelerate your career with this Machine Learning And AI Courses by Jigsaw Academy.

ALSO READ

SHARE
share

Are you ready to build your own career?