DeepMind today unveiled a new multi-modal AI system capable of performing more than 600 different tasks.
Dubbed Cat, it’s arguably the most impressive all-in-one machine learning kit the world has seen yet.
According to a DeepMind blog post:
The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.
And while it remains to be seen exactly how well it’ll do once researchers and users outside the DeepMind labs get their hands on it, Gato appears to be everything GPT-3 wishes it could be and more.
Here’s why that makes me sad: GPT-3 is a large-language model (LLM) produced by OpenAI, the world’s most well-funded artificial general intelligence (AGI) company.
Before we can compare GPT-3 and Gato however, we need to understand where both OpenAI and DeepMind are coming from as businesses.
OpenAI is Elon Musk’s brainchild, it has billions in support from Microsoft, and the US government could basically care less what it’s doing when it comes to regulation and oversight.
Keeping in mind that OpenAI’s sole purpose is to develop and control an AGI (that’s an AI capable of doing and learning anything a human could, given the same access), it’s a bit scary that all the company managed to produce is a really fancy LLM .
Don’t get me wrong, GPT-3 is impressive. In fact, it’s arguably just as impressive as DeepMind’s Cat, but that assessment requires some nuance.
OpenAI’s gone the LLM route on its path to AGI for a simple reason: nobody knows how to make AGI work.
Just like it took some time between the discovery of fire and the invention of the internal combustion engine, figuring out how to go from deep learning to AGI won’t happen overnight.
GPT-3 is an example of an AI that can at least do something that appears human: it generates text.
What DeepMind’s done with Gato is, well, pretty much the same thing. It’s taken something that works a lot like an LLM and turned it into an illusionist capable of more than 600 forms of prestidigitation.
As Mike Cook, of the Knives and Paintbrushes research collective, recently told TechCrunch’s Kyle Wiggers:
It sounds exciting that the AI is able to do all of these tasks that sound very different, because to us it sounds like writing text is very different to controlling a robot.
But in reality this isn’t all too different from GPT-3 understanding the difference between ordinary English text and Python code.
This isn’t to say this is easy, but to the outside observer this might sound like the AI can also make a cup of tea or easily learn another ten or fifty other tasks, and it can’t do that.
Basically, Gato and GPT-3 are both robust AI systems, but neither of them are capable of general intelligence.
Here’s my problem: Unless your gambling on AGI emerging as the result of some random act of luck – the movie Short Circuit comes to mind – it’s probably time for everyone to reassess their timelines on AGI.
I wouldn’t say “never,” because that’s one of science’s only cursed words. But, this does make it seem like AGI won’t be happening in our lifetimes.
DeepMind’s been working on AGI for over a decade, and OpenAI since 2015. And neither has been able to address the very first problem on the way to solving AGI: building an AI that can learn new things without training.
I believe Gato could be the world’s most advanced multi-modal AI system. But I also think DeepMind’s taken the same dead-end-for-AGI concept that OpenAI has and merely made it more marketable.
Final thoughts: What DeepMind’s done is remarkable and will probably pan out to make the company a lot of money.
If I’m the CEO of Alphabet (DeepMind’s parent company), I’m either spinning Gato out as a pure product, or I’m pushing DeepMind into more development than research.
Gato could have the potential to perform more lucratively on the consumer market than Alexa, Siri, or Google Assistant (with the right marketing and applicable use cases).
But, Gato and GPT-3 are no more viable entry-points for AGI than the above-mentioned virtual assistants.
Gato’s ability to perform multiple tasks is more like a video game console that can store 600 different games, than it’s like a game you can play 600 different ways. It’s not a general AI, it’s a bunch of pre-trained, narrow models bundled neatly.
That’s not a bad thing, if that’s what you’re looking for. But there’s simply nothing in Gato’s accompanying research paper to indicate this is even a glance in the right direction for AGI, much less a stepping stone.
At some point, the goodwill and capital that companies such as DeepMind and OpenAI have generated through their steely-eyed insistence that AGI was just around the corner will have to show even the tiniest of dividends.