Back to articles
Understanding Internal Covariate Shift and Residual Connections: Beyond Activation Functions and Optimizers
How-ToTools

Understanding Internal Covariate Shift and Residual Connections: Beyond Activation Functions and Optimizers

via Dev.toNilavukkarasan R

"No man ever steps in the same river twice, for it's not the same river and he's not the same man" - Heraclitus When Going Deeper Made Things Worse In my last post , we built CNNs that could see. Filters learned edges. Pooling built spatial tolerance. Stack enough layers and the network recognizes digits, faces, objects. So the obvious next move: go deeper. More layers, more capacity, more power. But there is a catch. Researchers took a 20-layer network and added 36 more layers. The 56-layer network should have been better. More parameters, more room to learn. Instead, it was worse . Not just on test data, But on training data as well. That's not overfitting. Overfitting means you're too good on training data. This was the opposite: a bigger network that couldn't even fit the data it was trained on. Two things were broken. And fixing them required two elegant ideas. The Noisy Room Problem Imagine you're at a loud party, trying to follow a conversation. The room is packed, music is blas

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles