Neural networks have a reputation of being hard to understand, train and deploy.
Some of this reputation is justified, but the fundamental concepts behind neural networks are not inherently complicated. And with the right high-level library, it’s feasible to prototype a working neural network without much training.
In this post we’re going to explain the basics of neural networks and build a full working example for the Titanic competition on Kaggle.
Minimal example
In the Titanic competition, we’re given a training file containing a description of some Titanic passengers along with their survival outcome (survived vs died). Based on this, we need to train a machine learning model that is later evaluated on another passenger file, where the survival outcomes are hidden (Kaggle knows them, but we don’t).
A tiny neural network for this problem could look like this:
Input
On the left we have two input nodes representing features of a Titanic passenger:
- age: this would typically be represented as a number between 0.0 and 1.0, with everyone’s age proportionally scaled to fit this range
- sex: 0.0 for male and 1.0 for female. (Gender is not binary but the relevant data in the Titanic data set is.)
Hidden layer
In the middle we have three “artificial neurons”. This part is sometimes called the “hidden” layer, because (as opposed to the input and output layers) we don’t get to see it when we’re using the neural network as a black box.
We can think of the neurons in this layer as experts who predict the survival outcomes based on the given inputs. Let’s say the experts are called Amy, Bob and Clara. Their example predictions could work as follows:
- Amy thinks that women are more likely to survive, regardless of their age. So she makes her predictions using the formula: $(0 \times age) + (1 \times sex)$ . This boils down to predicting 1.0 (survival) if the passenger is a woman, and 0.0 (demise) otherwise.
- Bob is a hopeless optimist, he predicts that everyone would survive regardless of the data. Just like the experts we see in the media, our experts are not necessarily very good :). Bob’s prediction: $(0 \times age) + (0 \times sex) + 1$
- Clara thinks that children and women are more likely to survive and she uses both pieces of data in her predictions, giving them equal weight: Clara’s prediction: $(-0.5 \times age) + (0.5 \times sex) + 0.5$
Output
At the end we want to have a single prediction for each passenger, so we add an output node. It could give an equal weight to each expert prediction, combining their individual forecasts as follows:
Combined prediction: $(1/3 \times amy) + (1/3 \times bob) + (1/3 \times clara)$
Values bigges than 0.5 indicate a prediction of survival, smaller than that indicate a prediction of demise. We note the resulting prediction in the “Outcome” column:
That’s it, we have a simplified (but not so much) neural network that gives us (some) predictions for Titanic passengers!
Parameters
The size of a neural network is measured in the number of parameters. Parameters are the numerical constants in the formulas we saw above: there are 3 per each neuron and 3 in the final output formula. So our network has 12 parameters in total. (For comparison, a “small” 2023 state-of-the-art open source model by Mistal AI has 7 billion parameters.)
You probably noticed that we completely made up the parameters used to make predictions. It actually works like this in real life 💫. Training a neural network starts with random values assigned to parameters. Then, the network is “trained” on data to find better values.
fastai
Next, we’re going to look at a more realistic neural network that can work in practice for the Titanic problem. We’re going to use the fastai library .
fastai is great for quick prototyping: it comes with common architectures and preprocessing methods already built-in. It will help us to quickly put together the right type of neural network without getting too bogged down in the details.
Behind the scenes, fastai uses the popular PyTorch library. So overall our techs stack will look like this:
- 🤖 fastai: high-level deep learning library providing ready-to-use neural network architectures
- 🔥 PyTorch low-level deep learning library. It provides an optimized framework for defining and running neural networks
- đź’» Hardware The actual CPU and GPU on the computer
Tabular learner
fastai comes with ready-to-go neural network architectures for common classes of problems. One of these classes is for making predictions based on tabular data, which is exactly what we need for the Titanic problem.
To use the Tabular learner from fastai, we need to feed it the passenger data. The framework can handle two types of features:
- continuous: where the value of the feature is a number on some numeric scale. For example: age, ticket price
- categorical: where the values represent some abstract categories that are not part of numeric scale. For example: port of embarkation, ticket class, sex
There’s no obvious way of handling opaque features such as ticket number, passenger name or cabin number, so we’re going to simply ignore it. Let’s see some code! The train.csv data file comes from Kaggle.
import pandas as pd
df = pd.read_csv('../input/titanic/train.csv')
# Drop the opaque features we're going to ignore
df = df.drop(['Name', 'Cabin', 'Ticket'], axis=1)
df.columns
This produces the list of the remaining features: ['PassengerId', 'Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
. Let’s configure the tabular learner indicating which are continuous, which are categorical, and which is the variable we learn how to predict.
from fastai.tabular.all import *
splits = RandomSplitter(valid_pct=0.2)(range_of(df))
to = TabularPandas(df, procs=[Categorify, FillMissing, Normalize],
cat_names = ['Pclass', 'Sex', 'Embarked'],
cont_names = ['Age', 'Fare', 'SibSp', 'Parch'],
y_names='Survived',
y_block=CategoryBlock,
splits=splits)
fastai takes care of preprocessing:
FillMissing
replaces missing data points with averages/most common values for each feature. This way we don’t have to discard an entire passenger if we’re missing an entry for one of their featuresNormalize
scales continuous variables, so that they fit the range of 0.0 to 1.0. This helps the neural network train better (bigger numbers tend to grow too much when they’re multiplied together).Categorify
handles categorical variables using embeddings, more on this below
Full neural network
We’re now ready for the main attraction: let’s see the neural network that fastai sets up to handle data in our data set. Indeed, the network architecture we see below is not a fixed template: parts of it are defined based on the shape of the data we configured above.
dls = to.dataloaders(bs=64)
learn = tabular_learner(dls, metrics=accuracy)
print(learn.model)
This prints out a detailed description of the underlying PyTorch neural network that fastai set up:
TabularModel(
(embeds): ModuleList(
(0): Embedding(4, 3)
(1): Embedding(3, 3)
(2): Embedding(4, 3)
(3): Embedding(3, 3)
)
(emb_drop): Dropout(p=0.0, inplace=False)
(bn_cont): BatchNorm1d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(layers): Sequential(
(0): LinBnDrop(
(0): Linear(in_features=16, out_features=200, bias=False)
(1): ReLU(inplace=True)
(2): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): LinBnDrop(
(0): Linear(in_features=200, out_features=100, bias=False)
(1): ReLU(inplace=True)
(2): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): LinBnDrop(
(0): Linear(in_features=100, out_features=2, bias=True)
)
)
)
This may look a bit intimidating. Let’s look at the main pieces:
The four “embedding” modules. These are needed for each “categorical” variable in our input: sex, point of embarkation, class, etc. Embeddings represent each category as an abstract vector of numbers, allowing the model to learn hidden relations between category elements. For example, if people who embarked in Southampton had similar survival outcomes to those who embarked in Cherbourg, but different than those in Queenstown, the model will be able to learn that.
Linear modules. Linear(in_features=16, out_features=200)
is the core of the network. Here the 16 input attributes of each passengers are connected to 200 artificial neurons: mathematical formulas that will try to learn relations between data points and their survival outcomes. We also have a second layer of these, connecting 200 neurons in layer 1 with 100 neurons in layer 2.
Output. At the end, the last linear layer connects the 100 neurons in layer 2 to just 2 output features, corresponding to two survival outcomes: survived or perished.
With the neural network in place, we need just one more line of code to train it on the training data:
learn.fit_one_cycle(20)
Results
When submitted on Kaggle, the fastai neural network solution reaches the accuracy of 78% out of the box.
This is much better than the baseline 62% for a solution that simply predicts that everyone dies and a bit better than 76% I got a few months back when experimenting with decision forests. Not bad!
Conclusion
Neural networks have a fancy name, but in fact it’s “just” multiplying numbers and adding them together. With a high-level framework like fastai, it’s quite easy to build and train a neural network for a specific problem.