The architecture itself is 3 years old: https://arxiv.org/abs/1706.03762. It is not an exaggeration to say that GPT-3’s architecture can be described as “take that 2017 paper and make 3 numbers (width, # layers, # heads) much bigger”. The fact that there hasn’t been any improvement in architecture in 3 years is quite telling.
In the paper itself, the authors clearly say they’re quite near fundamental limits in being able to train an architecture like this. GPT-3 isn’t a starting point, it’s an end-point.
In 1991 a group of countercultural visionaries built an enormous replica of earth’s ecosystem called Biosphere 2. Their epic adventure is a cautionary tale but also a testament to the power of small groups reimagining the world.