We invite ChatGPT for a “job interview”, to see if it could take over writing for this blog. Like the protagonist of the 2014 Sci-Fi movie 🎞️ Ex Machina , we set out to test the capabilities and the limits of an AI model in a friendly conversation. What can possibly go wrong? :)
The friendly robot responds eagerly:
We present prompts and responses in text form for readability. For illustration, this is how it looks like in the ChatGPT interface:
In the “interview”, we will ask ChatGPT to re-create three blog articles already posted on the site, to see if the AI model could have written them in the first place:
- 📝 The mythical man-month: takeaways from the great classic - a book summary
- 📝 Hash Code: 10 years of coding & pizza – history of a programming event
- 📝 Working from Freetown, Sierra Leone - trip report
Let’s start off easy! 📝 The mythical man-month is a frequently referenced, classic book on software engineering. ChatGPT clearly absorbed a lot of the material about the book, because the summary it produces is concise and accurate.
Verdict: pretty good!. The response is well-structured and covers the most important ideas of the book. The single-sentence summary of the “Second-system effect” is right on :).
On the other hand, the style is dry and some of the detail is lost. In particular, the point of the “Surgical team” is not to assemble a small team of experts, but to have most of the coding work done by one person (the “surgeon”), while the rest of the team is in a supporting role.
Hash Code competition history
Let’s try a more arcane topic. 📝 Hash Code: 10 years of coding & pizza is a detailed post on the history of Hash Code, a coding competition run between 2014–2022 that I co-created.
Verdict: doesn’t seem right. This is very convincing prose and uses appropriate vocabulary, but almost none of the information is correct. The competition indeed started in 2014, but it was initially a local event for 200 participants. We didn’t have an online round, and the event took place in Paris, not in Ireland.
Finally, let’s try a “trip report”-style article: 📝 Working from Freetown, Sierra Leone .
Verdict: OK, but…. I love the structure of the response and it sounds pretty convincing. The technical aspect of Internet speeds and reliability doesn’t match my experience (for me it was reliable and speeds were an order of magnitude better, and that on two different networks).
Where does the discrepancy come from? Sure enough, I only had one data sample: maybe ChatGPT absorbed more data points and presented a more representative summary? But I think it’s more likely the response is based on what people tend to write about remote work from various travel destinations in general and not representative of Freetown.
Who is testing whom?
3 articles in, we get a rough picture of what ChatGPT (version Mar23) can, and cannot do. Large language models combine the role of a language model with that of an information store. They are much better at the first part (language model) than at the latter: they’ll happily hallucinate convincing responses, regardless of whether the relevant correct information can be inferred from the model.
⚠️ In a good Sci-Fi movie, we’d expect some sort of a plot twist at the end. Here it is: setting out for these exercise, we thought it will demonstrate the capabilities and the limits of LLMs. But knowing how these models work (with their information based on what’s available on the Internet), the results tell us more about the originality of the blog posts, than about what ChatGPT can do.
When seen this way, here’s a summary of the experiments results:
- 📝 The mythical man-month: takeaways from the great classic - 👎 very unoriginal idea, the LLM can write an almost perfect post about it
- 📝 Hash Code: 10 years of coding & pizza – 👍 much better! The LLM cannot write about this so we may as well do it
- 📝 Working from Freetown, Sierra Leone - 👍 this works too! The LLM response was OK but not completely convincing, and hopefully the readers trust more a report from someone who actually tried it :)
My new aspiration for this blog is to write articles that couldn’t have been written by ChatGPT :).
Let’s let our friendly robot has the last word!