The Monthly Blip: June 2023
SEMI-CONDUCTOR PICKS AND SHOVELS.
It’s of strategic importance for countries to have access to cutting-edge semiconductors. Semiconductors and the software that runs on them are the picks and shovels of the modern economy.
This technology is increasingly dominated by Asia, particularly Taiwan. It’s of strategic interest for the rest of the world to stop falling behind and build leading semiconductor companies.
Resource Allocation By Nations
Nations are incentivising semiconductor production with billions. Meanwhile, the market is valuing the technology leaders (Nvidia, Taiwan Semiconductor) in trillions.
It is very expensive to compete at this point of the technology curve. I question whether these government subsidies are achieving much.
To build a microchip company… you don’t.
For private enterprise, it’s largely seen as a losing battle to build a competing semiconductor company. It’s akin to taking on Google in search by taking the same indexing approach (as opposed to a radically new approach). The market does not reward those who copy the leader. At the very least, the reward is not big enough to justify the investment of billions required to catch up.
For entrepreneurs, there is the tempting question of whether to build a regional leader in semiconductors (or in large language models or in nuclear or in biotechnology or any other cutting edge field). However, it makes the business and the entrepreneur dependent on protection (protectionism) from their nation(s).
If you do want to start a semiconductor company, perhaps you don’t. You don’t start one in two ways:
You find a way to do computation that is not with silicon.
You realise that to build semiconductors, you first need to build the software for using the semiconductors. This is my takeaway from listening to a number of videos now with George Hotz of Tiny Corp.
In short, even if you had a competitive chip to Nvidia, nobody could use it without having software to run it. That software is not trivial. Therefore:
Step 1 is to write a software stack that is performant on Nvidia.
Step 2 is to build a better chip than Nvidia.
LARGE AND SMALL LANGUAGE MODELS and GUITARs
To tune a guitar you have six strings to get right. To tune OpenAi’s best language model you have to dial-in over 200 billion knobs.
That dialling-in is called “training”. During training, you might feed in one billion words to the model. So, there are hundreds of billions of billions (100,000,000,000,000,000,000s) of calculations required for training.
Once tuned, you can play the guitar. Once trained, you can get predictions from a language model, which is called inference. Inference takes lots of calculations (maybe think of it as 200,000,000,000 calculations per word produced by a 200 billion parameter model). That’s a lot of calculation but also a lot less than training.
Today, you can only train good models on huge computers. However, you can just about get answers (do inference) on an Apple Mac.
Here’s a rough breakdown:
The best model today is GPT4 (from OpenAI). It’s about 200+ billion parameters. You train it on hundreds of large computers. You run it on a few large computers.
Llama (from Meta) is not as good but still pretty good. Llama comes in a few sizes. You can get a 60 billion model that is pretty good. Then there are slimmed down ones of 7B and 13B parameters. Those ones you can run inference (i.e. get answers) on a Mac – but only using conversion software like that developed by ggml (by Gerganov). The 7B and 13B models though make a lot of mistakes, and they are slow to generate answers on a Mac (although it’s getting better).
So one vision – and I think it will come true – is that we run language models on our laptops (not the training, but the getting answers). This isn’t great right now, but should be good enough soon. The benefit is that data doesn’t have to leave your device. This is the ggml (gerganov) vision => simplify some of the big models to run on a Mac.
You can take an open source model like Llama (well, commercial limits aside) and run it on an Amazon or Microsoft server. Then, at least your data is all on a server you control(ish).
Another approach is to buy your own server – like what Tiny Corp are offering – a 100 GB of RAM $15,000 server you can plug into a wall outlet. You could run Llama with that and not be restricted to the smaller models. Most likely, companies use cloud servers instead.
The very best models (like openAi) are closed – i.e. they don’t tell you the tuning of the guitar strings. You can get OpenAi or Microsoft to run you a dedicated server, but they need enough control to make sure you don’t steal the tunings (called weights for the language model).
Zeros Room for improvement
Language model parameters are handled by computer software in grids (matrices). These matrices, after tuning, are mostly zeros. It’s a bit like a guitar and you’re only using a few of the strings. Why? Well, that’s what the tuning process gives. It means there are better ways to build these models – much better and more efficient ways – but we haven’t found them yet.
Building projects using language models with the help of language models is recursive fun. I have built four and learned a lot but wonder where an advantage is to be maintained:
Big tech companies have huge customer bases they can easily advertise new products to. And, developing the customer facing layer on top of language models is not that hard.
Probably startups will have to find very focused niches, say like patent search. But, even there, there is the challenge that the more general the language model, the better it is at specific tasks. In other words, the best model is not achieved solely by specialising, but rather by being general (training on code, plus books, plus patents). So, specialising (unless you already have existing customers in a space) is maybe not enough.
Here go the projects:
Summarise-Me.com – You can already ask chatGPT to summarise, but the limit there is that you can only input about 3000 words in to the free version. Summarise-Me uses a model by Anthropic and can take in up to 60,000 words.
Patent-Me.com – this recursively searches for patents, analyses the patents and generates a report. It’s hard to use a chat model recursively because it runs into errors too much. So, if you want to do something recursive, the scope has to be very narrow and you have to tweak the steps carefully to keep the model on track – which is a good application for patents.
Research Buddy (an early demo is available at Research.Trelis.com) – Combines some of the above features, but is focused on allowing chats to be saved, deleted and reanalysed. Also includes Bing search. The main differences with chatGPT are that 1) you do everything by typing in (e.g. ‘delete this file’, ’give me the highlights of the file I uploaded about potatoes’), and 2) you can have long chats – up to 12,000 words – compared with about 3000 if using the standard chatGPT interface.
That’s it for June, cheerio, Ronan