The network usually consists of convolutional layer, pooling layer, and fully-connected layer.

- Convolutional layer (input layer – filter – feature map)
- Filter parameters: number of filters, stride, filter size and amount of zero padding.
- Learning parameters (weights and biases)

- Activation functions
- Sigmoid: σ(x)=1/((1+e^(-x)))
- Rectified linear unit (ReLU): f(x)=max(0,x)
- Tanh: tanh(x)=2σ(2x)-1

- Pooling layer
- Pooling parameters: number of filters, filter size, pooling ratio and amount of zero padding.
- Sensitivity irrelevant
- Common pooling methods are max, average, and sum.

- Loss function
- Softmax classifier: L_i=-log(e^(s_(y_i ) )/(∑1_j▒e^(s_j ) ))

where s_j is all incorrect class score and s_(y_i ) is correct class score.

The network training is a minimizing the loss function process. In other words, it is a process that optimizes weights and biases so that they lead to minimized loss.

- Backpropagation: computing gradients technique
- Gradient descent: repeatedly evaluating the gradient and then performing a parameter update.
- Mini-batch Gradient Descent
- Stochastic Gradient Descent update