**Przemek**,

Below I present what may be the worldâ€™s simplest, distilled-to-the-basics, explainable **neural network**. It features three characters: Amy, Bob and Clara. They are digital neurons, and each of them has a different opinion about what happened on the Titanic đźš˘.

## Titanic passengers

Weâ€™re going to make a neural network that predicts the survival chances of Titanic passengers.

To keep the example really simple, weâ€™re going to look at just two pieces of information about each passenger: their age and sex. Each will be a numeric value:

**age**: a number between 0.0 and 1.0:- 1.0 represents the oldest age found among the Titanic passengers. Everyoneâ€™s age has been proportionally scaled to fit the range from 0.0 to 1.0

**sex**: 0.0 for male and 1.0 for female. (Gender is not binary but the relevant data in the Titanic data set is.)

## The panel of experts

The neural network will have three neurons: **Amy**, **Bob** and **Clara**.

You can think of them as a panel of experts we invite to predict survival outcomes of each passenger. Weâ€™re asking each expert to assign a number close to 1.0 for passengers that are likely to survive, and 0.0 for passengers that are likely to perish.

**Amy** thinks that women are more likely to survive, regardless of their age. So she makes her predictions using the formula:

Amyâ€™s prediction: $(0 \times age) + (1 \times sex)$

This boils down to predicting 1.0 (survival) if the passenger is a woman, and 0.0 (demise) otherwise.

**Bob** is a hopeless optimist, he predicts that everyone would survive regardless of the data. Just like the experts we see in the media, our experts are not necessarily very good :).

Bobâ€™s prediction: $(0 \times age) + (0 \times sex) + 1$

**Clara** thinks that children and women are more likely to survive and she uses both pieces of data in her predictions, giving them equal weight:

Claraâ€™s prediction: $(-0.5 \times age) + (0.5 \times sex) + 0.5$

## Prediction by committee

For the first passenger, Amy thinks he will perish, while Bob predicts survival. Now weâ€™re in a classic real-life situation: we have multiple experts and they donâ€™t agree đź¤·. We can come up with an estimation of how much we trust each expert, and combine their opinions into a weighted average:

Combined prediction: $(1/3 \times amy) + (1/3 \times bob) + (1/3 \times clara)$

Values bigges than 0.5 indicate a prediction of survival, smaller than that indicate a prediction of demise. We note the resulting prediction in the â€śOutcomeâ€ť column:

## Neural network

What we just made is, in fact, a **neural network**. Amy, Bob and Clara are three digital neurons. They receive input data, apply a transformation on it and produce a value. At the end we combine their outputs into a single value and use it to make predictions:

Yes, this example is very simple. But now that we see how it works, we can better explain the more interesting aspects of neural networks.

## How to train a dragon

In our example, we completely made up the formulas that Amy, Bob and Clara use to make predictions, and also the final formula that combines their predictions.

Hereâ€™s the first fun fact about neural networks: **it actually works like this in real life** đź’«. Training the neural network starts with random values assigned to each value (like the numbers â€ś0â€ť and â€ś1â€ť in Amyâ€™s formula). Then, the network is â€śtrainedâ€ť on data to find better values.

What does it mean to â€śtrainâ€ť the neural network? Unlike the experts on the media, Amy, Bob and Clara are happy to change their opinions. To train the neural network, we calculate the outcome for data where we already know the right answer, and then tweak the parameters (ie. the formulas that Amy, Bob and Clara use) to better match the expected outcome.

When training goes well, the quality of the predictions improves with time:

I trained Amy, Bob and Clara on 700 passenger data in the Titanic data set. The neural network quickly learns to just predict â€śDoesnâ€™t surviveâ€ť for almost all passengers, which allows it to reach accuracy of about 60% (most passengers indeed didnâ€™t survive).

## Size of the neural network

How do you get from a tiny network like this to something that can power ChatGPT?

For one thing, youâ€™re gonna need a bigger network. The size of a neural network is measured in the number of parameters. Remember the formulas each neuron used to predict survival outcomes?

Claraâ€™s prediction: $(-0.5 \times age) + (0.5 \times sex) + 0.5$

Each of those numeric values (-0.5, 0.5 and 0.5) is a parameter. In total, our example network has 12 of them, 3 per each neuron and then 3 for the final output formula.

For comparison, a â€śsmallâ€ť 2023 state-of-the-art open source model by Mistal AI has 7 billion parameters (583 million times more đź’«).

## Architecture

The structure of the network matters too. The famous â€śtransformersâ€ť architecture that powers LLMs like the one behind ChatGPT looks like this when visualized :

The matrices represent data flowing through the artificial neurons.

## Conclusion

There’s quite a lot of complexity that makes neural networks work that we did not cover in this post. It includes techniques that makes it possible for the neural network to model the complexities of the problem weâ€™re solving (activation functions) and those helping us to find the right way to tweak the parameters in training (gradient descent).

But the main point of this post is this: **you donâ€™t need to understand all these concepts to get the basic mechanism of what neural networks are**. It is just Amy, Bob and Clara and us, fiddling with the parameters of their formulas to best predict Titanic survival outcomes đź’«.