Is ReLU After Sigmoid Bad?
CRANK

Recently [we] were analyzing how different activation functions interact among themselves, and we found that using relu after sigmoid in the last two layers worsens the performance of the model. commentsBy Nishant Nikhil, IIT KharagpurThere was a recent blog post on mental models for deep learning drawing parallels from optics [link]. We all have intuitions for few models but it is hard to put it in words, I believe it is necessary to work collectively for this mental model.Sigmoid graph from wikipediaRecently I and Rajasekhar (for a KWoC project) were analyzing how different activation functions interact among themselves, and we found that using relu after sigmoid in the last two layers worsens the performance of the model. We use the MNIST dataset and a four-layered fully connected network, the first layer is the input layer of 784 dimensions, then the second layer is a hidden layer of 500 dimensions, after which another hidden layer of having 256 dimensions and finally an output la…

kdnuggets.com
Related Topics: Deep Learning GitHub