Why sigmoid is not a good activation function?

Table of Contents

1 Why sigmoid is not a good activation function?
2 Why tanh activation is used in LSTM?
3 Why does a LSTM network need some sigmoid units?
4 Why tanh works better than sigmoid?
5 Which sigmoid function is used in LSTM network?
6 Why do we need tanh in LSTM?

Why sigmoid is not a good activation function?

The two major problems with sigmoid activation functions are: Sigmoid saturate and kill gradients: The output of sigmoid saturates (i.e. the curve becomes parallel to x-axis) for a large positive or large negative number. Thus, the gradient at these regions is almost zero.

Why tanh activation is used in LSTM?

In LSTM network, tanh activation function is used to determine candidate cell state (internal state) values ( \tilde{C}_{t} ) and update the hidden state ( h_{t} ).

Why does a Lstm network need some sigmoid units?

Sigmoid belongs to the family of non-linear activation functions. It is contained by the gate. Unlike tanh, sigmoid maintains the values between 0 and 1. It helps the network to update or forget the data.

Why does a LSTM network need some sigmoid units?

Why tanh works better than sigmoid?

tanh function is symmetric about the origin, where the inputs would be normalized and they are more likely to produce outputs (which are inputs to next layer)and also, they are on an average close to zero. These are the main reasons why tanh is preferred and performs better than sigmoid (logistic).

Is the tanh function better than the sigmoid function for neural networks?

This makes the tanh function almost always better as an activation function (for hidden layers) rather than the sigmoid function. To prove this myself (at least in a simple case), I coded a simple neural network and used sigmoid, tanh and relu as activation functions, then I plotted how the error value evolved and this is what I got.

Which sigmoid function is used in LSTM network?

Both input gate (i_{t}) and output gate (o_{t}) use sigmoid function. In LSTM network, tanh activation function is used to determine candidate cell state (internal state) values ( ilde{C}_{t}) and update the hidden state (h_{t}). – ARAT Nov 30 ’17 at 18:00 Add a comment | 2 Answers 2 ActiveOldestVotes

Why do we need tanh in LSTM?

As all neural network layers need an activation function to create non linearity to the input, you will always need it. Tanh works better with LSTM because of some reasons: The tanh decides which values to add to the state, with the help of the sigmoid gate.

What is the difference between sigmoid and Tangens hyperbolicus?

Another widely used activation function is the tangens hyperbolicus, or hyperbolic tangent / tanh function: It works similar to the Sigmoid function, but has some differences. First, the change in output accelerates close to , which is similar with the Sigmoid function.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.