My notes from Martin Görner’s Youtube video talk at Devoxx about neural networks and Tensorflow.
3-hour course (video + slides) offers developers a quick introduction to deep-learning fundamentals, with some TensorFlow thrown into the bargain. More info
Part 1 - slides
-
Softmax - a good activation function for multi-class logistic regression
Y = softmax(X * W +b )
b biases; W weights; X images in arrays; softmax applied line by line; Y predictions
Hidden layers: ReLu outperforms sigmoid
Activation functions - external link -
Loss function
For classification problems, cross entropy works a bit better
-
Optimization
Gradient descent performs better with batches, points better towards lower value
If the accuracy curve is noisy, jumping by 1%, it means you go too fast
Start with more significant decent value first ex 0.003, decrease later to 0.0001
Epoch - you see all your data (all batches) once -
Overfitting
Overfitting happens when you have too much freedom when you have too many weights and biases, and you store your training data there in some form
Once model works great with training, fails miserably once it faces test data
If cross-entropy loss graph is strange, starts increasing slowly, there is potential overfitting
Good solution for over-fitting is regularization: dropout
Dropout removes part of the neurons above a specific threshold pKeep = 0.75 -
CNN Convolutional Neural Networks
Good for 2d representations
With the previous example, we used 1d matrix for image pixels, losing shape information
Part 2 - slides
-
Batch normalization
The intention behind batch normalization is to optimize network training
The idea is to normalize the inputs of each layer in such a way that they have a mean output activation of zero and standard deviation of one.
Batch normalization happens before activation function
When you use batch normalization, bias is no longer needed -
RNN Recurrent Neural Network
Good for long sequences, for example writing the next word
RNNs are always very deep -
LSTM Long Short Term Memory networks
Tracking long-term dependencies
-
GRU Gated Recurrent Unit networks
GRUs are improved version of the standard recurrent neural network
The special thing about them is that they can be trained to keep information from long ago, without washing it through time or remove information which is irrelevant to the prediction
The GRU unit controls the flow of information like the LSTM unit, but without having to use a memory unit. It just exposes the full hidden content without any control