πŸš€ πŸš€ Launch Offer β€” Courses starting at β‚Ή1499 (Limited Time)
CortexCookie

Motivation: Why Neural Networks?

Artificial Neural Networks (ANNs) are inspired by the biological brain, which consists of billions of neurons connected through synapses. Each neuron receives signals, processes them, and passes the result forward.

Feature Extraction

Similarly, an artificial neural network is made up of artificial neurons that mimics a biological neuron or nerve cell:

  • Receive inputs
  • Perform weighted computations
  • Produce outputs

The core motivation behind neural networks is simple:

We want machines to learn complex patterns directly from data, without explicitly programming rules.

Artificial Neuron vs Biological Neuron

Feature Extraction

A biological neuron consists of:

  • Dendrites β†’ receive signals
  • Cell body (nucleus) β†’ processes signals
  • Axon β†’ transmits output via synapses

An artificial neuron mimics this behavior mathematically:

y^=Οƒ(βˆ‘iwixi+b) \hat{y} = \sigma\left(\sum_i w_i x_i + b\right)

Where:

  • xix_is are inputs
  • wiw_is are weights
  • bb is the bias
  • f=Οƒ(β‹…)f=\sigma(\cdot) is an activation function

This abstraction allows us to stack neurons into layers, forming a neural network.


Why Traditional Machine Learning Is Not Enough

Traditional machine learning models such as:

  • Linear Regression
  • Logistic Regression
  • Support Vector Machines
  • Decision Trees

work well for simple, structured problems. However, they struggle in real-world scenarios.

Limitations of Traditional ML

  • ❌ Poor performance on unstructured data
    • Images
    • Audio
    • Text
    • Video
  • ❌ Does not scale well with large and complex datasets
  • ❌ Fails when relationships are highly non-linear

As data complexity grows, traditional models start to fail.

This gap is exactly why neural networks are needed.


Why Do We Need Hidden Layers?

A network with only:

  • an input layer
  • an output layer

can only learn linear decision boundaries.

Hidden layers allow neural networks to:

  • Learn non-linear patterns
  • Build hierarchical feature representations
  • Solve complex real-world problems

Most real-world problems are impossible to solve using only input and output layers.


Linear Separability and Logic Gates

This image illustrates how the logical functions AND, OR, NAND, and NOR can be represented in a two-dimensional input space and separated using a single linear decision boundary. Each plot shows the four possible input combinations (A,B), colored by their output class, with the dashed green line indicating the classifier’s decision boundary. Since a straight line is sufficient to separate the classes in all four cases, these functions are linearly separable and can be modeled using a single perceptron without any hidden layers.

Feature Extraction

That means:

  • A single straight line can separate the output classes
  • A single neuron (perceptron) can model them

XOR: The Breaking Point

The XOR function outputs:

  • 1 when inputs are different
  • 0 when inputs are the same

XOR is not linearly separable.

Feature Extraction

Unlike AND or OR, the positive and negative classes in XOR lie diagonally opposite to each other, making it impossible to separate them using a single straight decision boundary (shown by the dashed green line). This visualization highlights why XOR is not linearly separable and why solving it is not possible with just one neuron model.

This single example proves that single-layer models are insufficient.

Modeling AND and OR with a Single Neuron

AND Function Model

This diagram illustrates how the AND logical function can be implemented using a single artificial neuron (perceptron). The neuron receives two binary inputs, x1x_1 and x2x_2, each scaled by their corresponding weights w1w_1 and w2w_2. These weighted inputs are summed together along with a bias term b1b_1, forming a linear combination w1x1+w2x2+b1w_1x_1+w_2x_2+b_1 ​ This value is then passed through an activation function 𝜎(β‹…)𝜎(β‹…), typically a sigmoid or step function, to produce the final predicted output y^\hat{y}. We apply sigmoid here because we want to convert the neuron’s raw weighted sum into a bounded, interpretable output between 0 and 1.

Feature Extraction

​By choosing appropriate values for the weights and bias, the neuron activates (outputs 1) only when both inputs are 1, and remains inactive (outputs 0) for all other input combinations. This demonstrates that the AND function is linearly separable and can be perfectly modeled using a single perceptron without any hidden layers.

Feature Extraction

Output behavior:

  • x1=1,x2=1β‡’y^=1x_1 = 1, x_2 = 1 \Rightarrow \hat{y} = 1
  • All other inputs β†’ y^=0\hat{y} = 0

OR Function Model

Similarly, ​by choosing appropriate values for the weights and bias, the neuron activates (outputs 1) when atleast one input is 1, and remains inactive (outputs 0) when both the inputs are 0.

Feature Extraction

Output behavior:

  • At least one input is 1 β†’ y^=1\hat{y} = 1

These examples show how:

  • Weights control feature importance
  • Bias shifts the decision boundary
  • Sigmoid approximates binary outputs

Why XOR Cannot Be Solved with a Single Neuron

For a single neuron:

y^=Οƒ(w1x1+w2x2+b)\hat{y} = \sigma(w_1 x_1 + w_2 x_2 + b)

No choice of (w1,w2,b)(w_1, w_2, b) can correctly model XOR.

Feature Extraction

This is a fundamental limitation of single-layer perceptrons.

Decomposing XOR into Simpler Functions

XOR can be rewritten as:

x1βŠ•x2=(x1∧¬x2)∨(Β¬x1∧x2)x_1 \oplus x_2 = (x_1 \land \neg x_2) \lor (\neg x_1 \land x_2)

Or equivalently:

x1βŠ•x2=(x1∨x2)∧¬(x1∧x2)x_1 \oplus x_2 = (x_1 \lor x_2) \land \neg(x_1 \land x_2) 0βŠ•0=(0∨0)∧¬(0∧0)=00 \oplus 0= (0 \lor 0 ) \land \lnot (0 \land 0 ) =0 0βŠ•1=(0∨1)∧¬(0∧1)=10 \oplus 1= (0 \lor 1) \land \lnot (0 \land 1) =1 1βŠ•0=(1∨0)∧¬(1∧0)=11 \oplus 0= (1 \lor 0 ) \land \lnot (1 \land 0 ) =1 1βŠ•1=(1∨1)∧¬(1∧1)=01 \oplus 1= (1 \lor 1 ) \land \lnot (1 \land 1 ) =0

This shows that XOR is:

  • A composition of simpler logical functions
  • Each of which is linearly separable

Solving XOR Using a Hidden Layer

To solve XOR, we add:

  • One hidden layer
  • Two hidden neurons
Feature Extraction

Hidden layer

h1=Οƒ(20x1+20x2βˆ’10)(BehavesΒ likeΒ OR)h_1 = \sigma(20x_1 + 20x_2 - 10) \quad \text{(Behaves like OR)} h2=Οƒ(20x1+20x2βˆ’30)(BehavesΒ likeΒ AND)h_2 = \sigma(20x_1 + 20x_2 - 30) \quad \text{(Behaves like AND)}

Output layer

y=Οƒ(20h1βˆ’20h2βˆ’10)(BehavesΒ likeΒ XOR)y = \sigma(20h_1 - 20h_2 - 10) \quad \text{(Behaves like XOR)}

The four diagrams below illustrate how different input combinations (x1,x2)(x_1, x_2) activate the hidden neurons and lead to the correct output, demonstrating why hidden layers are necessary to solve non-linearly separable problems like XOR.

image

This network:

  • Learns intermediate features
  • Combines them to produce XOR

Depth solves the problem.

From Logic Gates to Deep Learning

Modern neural networks extend this same idea. A neural network is a layered computational model composed of interconnected neurons that transform inputs into outputs through weighted connections and nonlinear activation functions.

It contains:

  • Input layer
  • Multiple hidden layers
  • Output layer
  • Weights wij(l)w_{ij}^{(l)}
  • Activations ai(l)a_i^{(l)}
Feature Extraction

Input Layer

The input layer receives the raw features:

x1,x2,…,xnx_1, x_2, \dots, x_n

These values are passed forward without modification and serve as the starting point of computation.

Hidden Layers

Each hidden layer consists of neurons that perform two operations:

  1. A linear transformation
zj(l)=βˆ‘iwij(l)ai(lβˆ’1)+bj(l)z_j^{(l)} = \sum_i w_{ij}^{(l)} a_i^{(l-1)} + b_j^{(l)}
  1. A nonlinear activation
aj(l)=Οƒ(zj(l))a_j^{(l)} = \sigma\left(z_j^{(l)}\right)

Here:

  • wij(l)w_{ij}^{(l)} is the weight connecting the ii-th neuron in layer (lβˆ’1)(l-1) to the jj-th neuron in layer ll
  • bj(l)b_j^{(l)} is the bias of the jj-th neuron in layer ll
  • ai(lβˆ’1)a_i^{(l-1)} is the activation from the previous layer

Hidden layers enable the network to learn hierarchical and non-linear representations of the input data.

Output Layer

The output layer applies the same computation to produce the final prediction:

y^=a(L)\hat{y} = a^{(L)}

Depending on the task, the activation function in this layer may vary (e.g., sigmoid for binary classification, softmax for multi-class classification).

Key Idea

By stacking multiple layers and learning the parameters w(l)w^{(l)} and b(l)b^{(l)}, neural networks can approximate complex functions and model intricate patterns in data that are difficult or impossible to capture with linear models.

Neural networks learn representations, not rules.

Why Depth Matters

Hidden layers allow neural networks to:

  • Learn intermediate concepts
  • Build reusable feature hierarchies
  • Approximate complex non-linear functions

Examples:

  • Vision: edges β†’ shapes β†’ objects
  • Language: characters β†’ words β†’ meaning
  • Audio: frequencies β†’ phonemes β†’ speech

Depth is not optional β€” it is the source of intelligence.

That was a free preview lesson.