Feedforward Models

Blog

Feedforward Models

Standard feedforward models, also known as feedforward neural networks (FNNs) or multilayer perceptrons (MLPs), are one of the most basic types of neural network architectures. A feedforward neural network (FNN) is a type of artificial neural network where information flows in one direction—from the input layer, through hidden layers, to the output layer. There are no cycles or loops.

🧮 Formal Definition

Let the input be a vector $ \mathbf{x} \in \mathbb{R}^{d_0} $^[1], and let the network have $ L $ layers (excluding the input). Each layer $ l \in {1, 2, \dots, L} $ is defined by:

Weight matrix $ \mathbf{W}^{(l)} \in \mathbb{R}^{d_l \times d_{l-1}} $
Bias vector $ \mathbf{b}^{(l)} \in \mathbb{R}^{d_l} $
Activation function $ \sigma^{(l)} : \mathbb{R} \rightarrow \mathbb{R} $ (applied elementwise)

The output of layer $ l $, denoted as $ \mathbf{h}^{(l)} $, is recursively defined as:

$$ \mathbf{h}^{(0)} = \mathbf{x} $$

$$ \mathbf{h}^{(l)} = \sigma^{(l)}\left( \mathbf{W}^{(l)} \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)} \right), \quad \text{for } l = 1, \dots, L-1 $$

$$ \mathbf{y} = f\left( \mathbf{W}^{(L)} \mathbf{h}^{(L-1)} + \mathbf{b}^{(L)} \right) $$

📘 Notation Summary

$ \mathbf{x} \in \mathbb{R}^{d_0} $: input vector
$ \mathbf{W}^{(l)} \in \mathbb{R}^{d_l \times d_{l-1}} $: weight matrix of layer $ l $
$ \mathbf{b}^{(l)} \in \mathbb{R}^{d_l} $: bias vector of layer $ l $
$ \sigma^{(l)} $: activation function at layer $ l $ (e.g., ReLU, tanh)
$ f $: final output activation (e.g., softmax for classification)

Feedforward Model

In the figure above, the notation used is slightly different from the formal definition to make it easier to understand. Below is a table to describe the relationship between the notations:

Mapping Between Notations

Concept	Formal Definition Notation	Image Notation	Description
Input vector	$ \mathbf{x} \in \mathbb{R}^{d_0} $	$ x_1, x_2, \ldots, x_n $	Vector of input features
Hidden layer output	$ h^{(l)} \in \mathbb{R}^{d_l} $	$ z_j, z_m $	Neuron activations in layer $ l $
Weight matrix	$ W^{(l)} \in \mathbb{R}^{d_l \times d_{l-1}} $	$ w_{ij}, w_{jm}, w_j $	Matrix mapping inputs to layer
Bias vector	$ b^{(l)} \in \mathbb{R}^{d_l} $	$ b_j, b $	Bias added to each neuron output
Activation function	$ \sigma^{(l)}(\cdot) $	$ \sigma(\cdot) $	Nonlinearity (e.g., ReLU, tanh)
Output layer	$ y = f(W^{(L)} h^{(L-1)} + b^{(L)}) $	$ \hat{y} = \sigma(\sum w_j z_j + b) $	Final output (possibly with different activation)

🧠 Example: One Hidden Layer Network

For a single hidden layer FNN:

$$ \text{Output} = f\left( \mathbf{W}^{(2)} \cdot \sigma\left( \mathbf{W}^{(1)} \cdot \mathbf{x} + \mathbf{b}^{(1)} \right) + \mathbf{b}^{(2)} \right) $$

This matches the classic architecture used in MLPs (multilayer perceptrons).

🔹 Characteristics

Deterministic: Output is computed in a single forward pass. Static: No memory of previous inputs (unlike RNNs). Universal approximators: With enough neurons, they can approximate any continuous function.

🔹 Use Cases

Image classification (e.g., MNIST) Tabular data tasks Function approximation Simple regression and classification problems

🔹 Limitations

Not ideal for sequential or temporal data Can require many neurons/layers to capture complex patterns Do not model internal memory or feedback

🔹 Learning by Doing

This is a nice credit card fraud detection tutorial to implement the feedforward model. Link

👨‍🎓 Notes

📘 What does $ \mathbf{x} \in \mathbb{R}^{d_0} $ mean?

The notation $ \mathbf{x} \in \mathbb{R}^{d_0} $ means:

"$ \mathbf{x} $ is a vector in $ \mathbb{R}^{d_0} $",
or more concretely:
"$ \mathbf{x} $ is a real-valued vector with $ d_0 $ components."

🔍 Explanation

$ \mathbf{x} $ is the input vector to the neural network.
$ \mathbb{R}^{d_0} $ denotes the $ d_0 $-dimensional real vector space — that is, the set of all vectors with $ d_0 $ real-number entries.
For example: ^[2]
- If $ d_0 = 3 $, then $ \mathbf{x} \in \mathbb{R}^3 $ might look like:

$$ \mathbf{x} = [1.2,\ -0.5,\ 3.8]^\top $$

🧠 In the context of a feedforward neural network:

$ \mathbf{x} $ could represent a feature vector or data sample.
$ d_0 $ is the number of input features, e.g., pixels in an image, sensor values, or word embeddings.

For example, if you’re training a network on data with 100 features per input, you would write:

$$ \mathbf{x} \in \mathbb{R}^{100} $$

🔁 Why the Transpose $^\top$?

In the expression:

$$ \mathbf{x} = [1.2,\ -0.5,\ 3.8]^\top $$

the superscript $^\top$ denotes the transpose of a vector.

🧠 What Does That Mean?

Vectors can be written as row vectors:
$$ \mathbf{x}_{\text{row}} = [1.2,\ -0.5,\ 3.8] $$
Or as column vectors:

$$ \mathbf{x}_{\text{col}} = \left[ \begin{array}{c} 1.2 \\ -0.5 \\ 3.8 \end{array} \right] $$

The notation $[1.2,\ -0.5,\ 3.8]^\top$ means we’re transposing a row vector to become a column vector.

✅ Why Does This Matter?

In most neural network implementations (and linear algebra), vectors are treated as column vectors by default so they can be multiplied correctly with weight matrices.

Blog