GPT-4 Launch Details
Yesterday OpenAI released the GPT-4 Technical Report and gave a live demo to developers showing the new model in action. Here is a short summary of some of the main launch details.
Multimodal
This model now accepts both text and image inputs. It then outputs text only.
We are seeing that new LLMs are moving in this direction. I was expecting that GPT-4 would also be capable of both image and text output but it seems we will need to wait a bit longer for that.
The developer live demo showed some cool use cases for this. For example taking a photo of a hand drawn UI wireframe and then getting GPT-4 to code a simple application as output.
Longer Context
A big improvement to earlier models is that GPT-4 can now handle up to 25,000 word input in the same context.
This makes it much more capable when it comes to summarising large amounts of data / documents and allowing the user to query information on that data.
Again in the live demo we saw a new use case — the user inputted a lengthy technical documentation webpage to GPT-4 and then asked it to help resolve a bug in the user’s code using the documentation provided.
Human-level performance in some standard tests
OpenAI evaluated the new model on a number of standardised tests such as the BAR exam etc.
Impressively GPT-4 scored in the top 10% for the bar exam, whereas ChatGPT scored in the bottom 10%. Other tests show similar improvements over ChatGPT.
Interestingly GPT-4 with image input gave even better results in some tests.
Architecture & Training Data
In a departure from other AI papers (where it has been normal to describe the architecture of the models in detail), OpenAI chose not to give any details this time due to the “competitive landscape” and safety concerns.
The only information is that GPT-4 is a “Transformer-style model” pre-trained on publicly available data and also licensed data from 3rd parties. It is also fine tuned after training using Reinforcement Learning from Human Feedback (RLHF) in the same way ChatGPT was aligned.
Unfortunately I assume this lack of architectural details will become the new normal going forward as AI competition continues to heat up.
Limitations
OpenAI states that GPT-4 is much better at returning factual responses and blocking undesirable content than ChatGPT — making it more aligned.
It still ‘hallucinates’ at times though. Could the better accuracy actually be more dangerous? — the model is still not 100% factual and a concern is that it could be trusted more by its users since it will make less frequent errors.
GPT-4 and Bing
On another note — as rumoured, Microsoft have confirmed on their blog that Bing has been running on an early preview version of GPT-4 so it could be that you have already interacted with GPT-4 over the last few weeks.