Disclaimer: This is not a jab at the authors of these (for the most part) excellent papers; I understand the incentives for authors to make their papers reach wider audiences. This blog tries to raise awareness about the importance of the underlying hardware required to run modern LLMs.
I recently re-discovered a 3 year old diagram created by [1] that, not without sense of humour, depicts the overuse of the moniker "X is all you need" in research paper titles.
For those in the know, you know. But for those who don't, a little context. Ever since the publication of the seminal Google's paper on transformers Attention is all you need [3], we've been witnessed of a string of hundreds of papers that have borrowed the moniker "X is all you need" to achieve notoriety. I get it, even researchers and scientist have fallen prey of the clickbait trends.
Someone even made a program to daily pull Arxiv titles that contain such words [2]. I couldn't resist and modified it to generate a word cloud, so we know what is it that we really need as of 15th of October 2024:
Attention, data, learning, graph, transformer... that's a lot of things to need.
What is it that you really need?
Jokes aside, I often find it curious that none of this words contains the one thing that any LLM, any model architecture and any computer scientist for that matter really needs: hardware to run their code in. Though this point may seem pedantic, it is really key if one looks at the current trends in AI. Models getting bigger at a faster pace [6] than GPUs are improving their capacity [5]. Which pushes the requirements on hardware through the roof. We are now frequently seeing models in the hundreds of billions of parameters and training runs that require tens of thousands of GPUs [4].
This often overlook requirement is what separates those who can do AI from those who cannot. Thus, we conclude that hardware is, most definitely, all you need.
References
[1] https://github.com/vinayprabhu/X-is-all-you-need
[2] https://github.com/KentoNishi/awesome-all-you-need-papers
[3] https://arxiv.org/pdf/1706.03762
[5] https://epochai.org/blog/trends-in-machine-learning-hardware
[6] https://epochai.org/blog/training-compute-of-frontier-ai-models-grows-by-4-5x-per-year
Kalavai for accessible distributed computing
We believe distributed computing is not only the present of AI, but the future of computing in general. We want to pave the way to make sure everyone gets access to effective compute, the key resource for AI.
Try our open source, free platform now.