In the summer of 2012 I was spending a lot of time looking at photos of dishwasher capsules. I was working at a cool Polish startup developing factory inspection systems (using cameras to find broken products on the assembly line).
I loved working in this field: we were developing specialized algorithms for edge detection and shape analysis, and it was a lot of fun to see them applied to real-life problems and deployed in factories.
Good old-fashioned AI
I couldn’t know it in 2012, but those were the last moments of the era of “good old-fashioned” Artificial Intelligence. We were writing specialized algorithms to teach the computer to perform steps that could conceivably be explained to a human.
Working in my little AI corner of shape-checking dishwasher capsules, I was using different techniques from my colleagues working on reading barcodes. At other companies and universities around the world, researchers were developing similarly specialized techniques for other tasks: natural language understanding, speech synthesis, weather forecasting, etc.
The “good old-fashioned” Artificial Intelligence was fragmented: advancements in one field were rarely benefiting the others.
That same year (2012), a paper came out that heralded the end of “good old-fashioned AI”. A team at University of Toronto demonstrated a successful application of neural networks that beat state-of-the-art conventional AI methods at classifying images in the ImageNet contest.
Neural networks are an example of what is called “statistical machine learning”. Rather than develop a step-by-step algorithm that teaches the computer how to perform a task (as I was doing with shape analysis of dishwasher capsules), we design a system that can learn the algorithm from example data.
This conceptually matches how we humans learn, especially when we’re children: a child sees a few examples of cats and dogs and pretty quickly starts telling them apart. It learns language from bits and pieces of speech it’s exposed to.
In 2012 the neural networks (nor statistical machine learning in general) were not new. What was new was their successful high-profile application.
It turns out that, in addition to careful design, neural networks benefit from large scale and computational power available to train them. The University of Toronto paper features a notable mention: To make training faster, we used (..) a very efficient GPU implementation of the convolution operation.
Graphical Processing Unit (GPU) is a type of computer processor that’s different from traditional “CPU” processors:
- CPUs (Central Processing Unit) can handle complex operations (often supporting hundreds of different instruction types), but with limited ability to parallelize the work
- GPUs (Graphical Processing Unit) perform simpler operations, but can execute more of them at the same time
GPUs were originally created to render computer graphics. The team at University of Toronto repurposed the GPU, so that instead of computing pixel colors, it trained a neural network. This implementation was highly efficient (in terms of speed and electricity use), allowing the team to train a much bigger model than their budget would otherwise allow with a CPU implementation.
The big convergence 🚀
Neural networks drove the big convergence of the Artificial Intelligence field. While in the past each field of AI (image classification, voice recognition, industrial quality inspection, etc.) had their own specialized tools, today we’re using neural networks in all of them.
This is important not just as a proof of versatility and viability of neural networks. It also explains the rapid acceleration of AI development in the last decade: for the first time the entire field shares similar methods and technological stack, meaning that breakthroughs in AI developed in the context of one application, now can benefit many other applications.
The headline-grabbing feats of ChatGPT and Midjourney are just the latest examples. Before that, we had Google Translate nearing human-quality translation, AlphaGo defeating Lee Sedol, self-driving cars, ever improving spam filters, etc. All of these achievements include neural networks trained on GPU-style processors.
It’s been 10 years since I left that cool Polish startup where I worked on image recognition. Checking their website today (the startup had been acquired and rebranded in the meantime 💫), I’m not surprised that the product now features capabilities based on neural networks. AI has converged.
After converging on:
- ✅ the method (neural networks), and
- ✅ the computing architecture (big matrices called tensors processed on GPU-style processors)
… the final step would be to converge on a single artificial intelligence model, capable of solving a variety of complex problems. (instead of training different models for different applications)
This is largely what the excitement about Large Language Models such as the ones employed in ChatGPT and Bard is about. The versatility of those tools is dazzling and gives us a glimpse of how a future general artificial intelligence could look like. More on this in a future post :).
- 📝 ImageNet Classification with Deep Convolutional Neural Networks 2012 paper that described a deep neural network winning the ImageNet contest
- 📝 Image Analysis Techniques for Industrial Inspection Systems – a book on industrial inspection I wrote in 2012