Feedforward Models
Standard feedforward models, also known as feedforward neural networks (FNNs) or multilayer perceptrons (MLPs), are one of the most basic types of neural network architectures. A feedforward neural network (FNN) is a type of artificial neural network where information flows in one direction—from the input layer, through hidden layers, to the output layer. There are no cycles or loops.
🧮 Formal Definition
Let the input be a vector $ \mathbf{x} \in \mathbb{R}^{d_0} $[1], and let the network have $ L $ layers (excluding the input). Each layer $ l \in {1, 2, \dots, L} $ is defined by:
- Weight matrix $ \mathbf{W}^{(l)} \in \mathbb{R}^{d_l \times d_{l-1}} $
- Bias vector $ \mathbf{b}^{(l)} \in \mathbb{R}^{d_l} $
- Activation function $ \sigma^{(l)} : \mathbb{R} \rightarrow \mathbb{R} $ (applied elementwise)
The output of layer $ l $, denoted as $ \mathbf{h}^{(l)} $, is recursively defined as:
$$ \mathbf{h}^{(0)} = \mathbf{x} $$
$$ \mathbf{h}^{(l)} = \sigma^{(l)}\left( \mathbf{W}^{(l)} \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)} \right), \quad \text{for } l = 1, \dots, L-1 $$
$$ \mathbf{y} = f\left( \mathbf{W}^{(L)} \mathbf{h}^{(L-1)} + \mathbf{b}^{(L)} \right) $$
📘 Notation Summary
- $ \mathbf{x} \in \mathbb{R}^{d_0} $: input vector
- $ \mathbf{W}^{(l)} \in \mathbb{R}^{d_l \times d_{l-1}} $: weight matrix of layer $ l $
- $ \mathbf{b}^{(l)} \in \mathbb{R}^{d_l} $: bias vector of layer $ l $
- $ \sigma^{(l)} $: activation function at layer $ l $ (e.g., ReLU, tanh)
- $ f $: final output activation (e.g., softmax for classification)
In the figure above, the notation used is slightly different from the formal definition to make it easier to understand. Below is a table to describe the relationship between the notations:
Mapping Between Notations
Concept | Formal Definition Notation | Image Notation | Description |
---|---|---|---|
Input vector | $ \mathbf{x} \in \mathbb{R}^{d_0} $ | $ x_1, x_2, \ldots, x_n $ | Vector of input features |
Hidden layer output | $ h^{(l)} \in \mathbb{R}^{d_l} $ | $ z_j, z_m $ | Neuron activations in layer $ l $ |
Weight matrix | $ W^{(l)} \in \mathbb{R}^{d_l \times d_{l-1}} $ | $ w_{ij}, w_{jm}, w_j $ | Matrix mapping inputs to layer |
Bias vector | $ b^{(l)} \in \mathbb{R}^{d_l} $ | $ b_j, b $ | Bias added to each neuron output |
Activation function | $ \sigma^{(l)}(\cdot) $ | $ \sigma(\cdot) $ | Nonlinearity (e.g., ReLU, tanh) |
Output layer | $ y = f(W^{(L)} h^{(L-1)} + b^{(L)}) $ | $ \hat{y} = \sigma(\sum w_j z_j + b) $ | Final output (possibly with different activation) |
🧠 Example: One Hidden Layer Network
For a single hidden layer FNN:
$$ \text{Output} = f\left( \mathbf{W}^{(2)} \cdot \sigma\left( \mathbf{W}^{(1)} \cdot \mathbf{x} + \mathbf{b}^{(1)} \right) + \mathbf{b}^{(2)} \right) $$
This matches the classic architecture used in MLPs (multilayer perceptrons).
🔹 Characteristics
Deterministic: Output is computed in a single forward pass. Static: No memory of previous inputs (unlike RNNs). Universal approximators: With enough neurons, they can approximate any continuous function.
🔹 Use Cases
Image classification (e.g., MNIST) Tabular data tasks Function approximation Simple regression and classification problems
🔹 Limitations
Not ideal for sequential or temporal data Can require many neurons/layers to capture complex patterns Do not model internal memory or feedback
🔹 Learning by Doing
- This is a nice credit card fraud detection tutorial to implement the feedforward model. Link
👨🎓 Notes
📘 What does $ \mathbf{x} \in \mathbb{R}^{d_0} $ mean?
The notation $ \mathbf{x} \in \mathbb{R}^{d_0} $ means:
"$ \mathbf{x} $ is a vector in $ \mathbb{R}^{d_0} $",
or more concretely:
"$ \mathbf{x} $ is a real-valued vector with $ d_0 $ components."
🔍 Explanation
- $ \mathbf{x} $ is the input vector to the neural network.
- $ \mathbb{R}^{d_0} $ denotes the $ d_0 $-dimensional real vector space — that is, the set of all vectors with $ d_0 $ real-number entries.
- For example: [2]
- If $ d_0 = 3 $, then $ \mathbf{x} \in \mathbb{R}^3 $ might look like:
$$ \mathbf{x} = [1.2,\ -0.5,\ 3.8]^\top $$
🧠 In the context of a feedforward neural network:
- $ \mathbf{x} $ could represent a feature vector or data sample.
- $ d_0 $ is the number of input features, e.g., pixels in an image, sensor values, or word embeddings.
For example, if you’re training a network on data with 100 features per input, you would write:
$$ \mathbf{x} \in \mathbb{R}^{100} $$
🔁 Why the Transpose $^\top$?
In the expression:
$$ \mathbf{x} = [1.2,\ -0.5,\ 3.8]^\top $$
the superscript $^\top$ denotes the transpose of a vector.
🧠 What Does That Mean?
-
Vectors can be written as row vectors:
$$ \mathbf{x}_{\text{row}} = [1.2,\ -0.5,\ 3.8] $$ -
Or as column vectors:
$$ \mathbf{x}_{\text{col}} = \left[ \begin{array}{c} 1.2 \\ -0.5 \\ 3.8 \end{array} \right] $$
The notation $[1.2,\ -0.5,\ 3.8]^\top$ means we’re transposing a row vector to become a column vector.
✅ Why Does This Matter?
In most neural network implementations (and linear algebra), vectors are treated as column vectors by default so they can be multiplied correctly with weight matrices.