Neural Networks

Ecole Nationale Supérieure de Cognitique

Baptiste Pesquet

GitHub logo


  • Introducing Neural Networks
  • Neural Network Tuning

Introducing Neural Networks

The origins

  • 1943 : first mathematical model of a biological neuron (McCulloch & Pitts)
  • 1949 : Hebb’s rule
  • 1958 : The perceptron (F. Rosenblatt)
  • 1969 : Limits of perceptrons (M. Minsky)

A biological inspiration


McCulloch & Pitts’ formal neuron

Formal neuron model

Hebb’s rule

Attempt to explain synaptic plasticity, the adaptation of brain neurons during the learning process.

“The general idea is an old one, that any two cells or systems of cells that are repeatedly active at the same time will tend to become ‘associated’ so that activity in one facilitates activity in the other.”

Franck Rosenblatt’s perceptron

The Perceptron

The perceptron learning algorithm

  1. Init randomly the $ \omega $ connection weights
  2. For each training sample $ x^{(i)} $ :
    1. Compute the perceptron output $ y'^{(i)} $
    2. Adjust weights : $ \omega_{next} = \omega + \eta (y^{(i)} - y'^{(i)}) x^{(i)} $

MultiLayer Perceptron (MLP)

MultiLayer Perceptron

Minsky’s critic

One perceptron cannot learn non-linearly separable functions.

XOR problem

At the time, no learning algorithm existed for training the hidden layers of a MLP.

Decisive breakthroughs

  • 1974 : Backpropagation theory (P. Werbos)
  • 1986 : Learning through backpropagation (Rumelhart, Hinton, Williams)
  • 1989 : Universal approximation theorem (Hornik, Stinchcombe, White)
  • 1989 : first researchs on deep neural nets (LeCun, Bengio)

The deep learning tsunami

  • 2012 : AlexNet (Krizhevsky, Sutskever, Hinton) wins the ImageNet competition
  • 2016 : AlphaGo (DeepMind) beats Go master Lee Sedol by 4 victories to 1
  • 2017 : AlphaZero reaches a surhuman level at Go and chess in less than 24h hours

Anatomy of a network

A neural network

Learning algorithm

Extract from the book Deep Learning with Python

Training and inference

Training and inference

Neuron output

Neuron output

Activation functions

They must be non-linear

Activation functions

Weights initialization

To facilitate training, they must be:

  • Non-zero
  • Random
  • Have small values

Several techniques exist.

Vectorization of computations

NN matrixes

Layer 1 output

Layer 1 output

Layer 2 output

Layer 2 output

Layer 3 output

Layer 3 output

Weights update

Objective: minimize the loss function

Method : gradient descent

$$\theta_{next} = \theta - \eta\nabla_{\theta}\mathcal{L}(\theta)$$


Objective: compute the loss function gradient wrt all its parameters (the network weights).

Method: apply the chain rule to compute partial derivatives backwards, starting from the current output.

Backprop explained visually

Demo time!

A Neural Network Playground (Simplified)

Neural Network Tuning

Hyperparameters choice

  • Number of layers
  • Number of neurons on hidden layers
  • activation functions
  • Learning rate
  • Mini-batch size

(Iterative process)

Gradient Descent optimization

Gradient Descent evolution map


Increase the descent speed in the direction of the minimum.

Image: a ball rolling down on a hill.


Other techniques

  • RMSprop (Root Mean Square Prop) : use the previous gradients to update the learning rate.
  • Adam (Adaptive Moment Estimation) : combines Momentum and RMSprop

An overview of gradient descent optimization algorithms


  • Tackle underfitting:

    • Use a more complex network
    • Train the network longer
  • Tackle overfitting:

    • Use more training data
    • Limit the network size
    • Introduce regularization
    • Introduce dropout


Limit weights values by adding a penalty to the loss function.

  • L1 : $ \frac{\lambda}{m} {\sum |{\theta_{ij}}|} $
  • L2 : $ \frac{\lambda}{m} {\sum {\theta_{ij}}^2} $

$ \lambda $ is called regularization rate.


During training, some weights are randomly set to 0. The network must adapt and become more generic.


Demo time!

A Neural Network Playground (Complete)