Introduction

Parsing and display of math equations is included in this neural network guide. Parsing of math is enabled by remark-math and rehype-katex. KaTeX and its associated font are included in _document.js, allowing for the use of mathematical expressions anywhere in the documentation. ¹

Inline math symbols can be included by enclosing the term between the $ symbol.

Math code blocks are denoted by $$.

If you need to use the $ sign for non-math purposes, you can escape it (\$), or specify the HTML entity (&dollar;) ²

Inline or manually enumerated footnotes are also supported. Click on the links above to see them in action. To illustrate the use of math typesetting, we will discuss the architecture of neural networks and the key components of a neural network.

Neural Network Architecture

Neural networks are a cornerstone of artificial intelligence, simulating the interconnected neuron structure of the human brain to process data in complex ways.

At their core, neural networks consist of layers of interconnected nodes or 'neurons', each layer designed to perform specific transformations on the input data. These transformations are governed by a set of weights, adjusted during the network's training phase to minimise the difference between the predicted output and the actual target values.

Key Components of Neural Networks

Input Layer: The initial layer that receives the input data.
Hidden Layers: Intermediate layers where most of the computation is done, through a series of weighted connections.
Output Layer: The final layer that produces the output of the model.

The efficiency and accuracy of a neural network heavily depend on its architecture, the quality of the training data, and the algorithm used for adjusting the weights (often through a process known as backpropagation).

Consider a neural network with $L$ layers, each layer $l$ having $n^{[l]}$ neurons. The activations of layer $l$ are represented as $\mathbf{a}^{[l]}$ , which is an $n^{[l]} \times 1$ vector,

\mathbf{a}^{[l]} = \left[\begin{array}
  {c}
  a^{[l]}_1 \\
  . \\
  . \\
  . \\
  a^{[l]}_{n^{[l]}}
\end{array}\right]

\mathbf{a}^{[l]} = \left[\begin{array} {c} a^{[l]}_1 \\ . \\ . \\ . \\ a^{[l]}_{n^{[l]}} \end{array}\right]

The weight matrix for layer $l$ is denoted as $\mathbf{W}^{[l]}$ , a $n^{[l]} \times n^{[l-1]}$ matrix,

\mathbf{W}^{[l]} = \left[\begin{array}
  {ccccc}
  w^{[l]}_{11} & . & . & . & w^{[l]}_{1n^{[l-1]}} \\
  . & . & . & . & .  \\
  . & . & . & . & .  \\
  . & . & . & . & .  \\
  w^{[l]}_{n^{[l]}1} & . & . & . & w^{[l]}_{n^{[l]}n^{[l-1]}}
\end{array}\right]

\mathbf{W}^{[l]} = \left[\begin{array} {ccccc} w^{[l]}_{11} & . & . & . & w^{[l]}_{1n^{[l-1]}} \\ . & . & . & . & . \\ . & . & . & . & . \\ . & . & . & . & . \\ w^{[l]}_{n^{[l]}1} & . & . & . & w^{[l]}_{n^{[l]}n^{[l-1]}} \end{array}\right]

Bias for each layer is also a crucial component, represented as a vector $\mathbf{b}^{[l]}$ , a $n^{[l]} \times 1$ matrix.

Training Neural Networks

Forward Propagation:
Each layer computes its activation as follows: $a^{[l]}_i = g^{[l]}(\mathbf{W}^{[l]} a^{[l-1]} + b^{[l]})$

Assumptions:

Nonlinearity through activation function $g^{[l]}$ .
Appropriate initialisation of $\mathbf{W}^{[l]}$ and $\mathbf{b}^{[l]}$ .
Adequate network depth and width to capture the complexity of the function being modeled.

Objective:
Minimise the loss function:

J(\mathbf{W}, \mathbf{b}) = \frac{1}{m} \sum_{i=1}^{m}{\mathcal{L}(\hat{y}_i, y_i)}

Optimisation:
Gradient descent or variants (Adam, RMSprop) are typically used to update the parameters:

\begin{aligned}
  & \text{Repeat until convergence:} \\
  & \quad \mathbf{W}^{[l]} := \mathbf{W}^{[l]} - \alpha \frac{\partial J}{\partial \mathbf{W}^{[l]}} \\
  & \quad \mathbf{b}^{[l]} := \mathbf{b}^{[l]} - \alpha \frac{\partial J}{\partial \mathbf{b}^{[l]}}
\end{aligned}

\begin{aligned} & \text{Repeat until convergence:} \\ & \quad \mathbf{W}^{[l]} := \mathbf{W}^{[l]} - \alpha \frac{\partial J}{\partial \mathbf{W}^{[l]}} \\ & \quad \mathbf{b}^{[l]} := \mathbf{b}^{[l]} - \alpha \frac{\partial J}{\partial \mathbf{b}^{[l]}} \end{aligned}

For the full list of supported TeX functions, check out the KaTeX documentation ↩
$1 and $2. ↩

Introduction

Neural Network Architecture

Key Components of Neural Networks

Training Neural Networks

Footnotes