How will AI transform the way we interact with software, the web and each other?
New Generative AI interfaces will replace many of the software applications and websites that we currently use. Could they also replace some of the interactions we have with other humans?
Large Language Models (LLMs) such as GPT, have already started to transform the way we interact with software applications, websites and online services. This transformation is happening now and over the next few months we will see software updates to many applications introducing new user experiences based upon conversational text or voice interfaces.
This is just the first stage of how Generative AI and Chatbots will change the way we interact. As the technology is adopted by the Operating Systems we use (such as Android and iOS), there will be a second shift where AI assistants will interact with software applications, websites and services on our behalf. This will be done via the APIs that many of these services already make available to developers.
This could take a few years to play out, but as these models are further improved and the cost to interact with them becomes more feasible, we will see millions of instances of powerful Generative AIs being instantiated and deployed.
At this point the transformation will still be in the early stages. The next stages of the transformation might change the way we interact with each other, with coworkers and specialists such as doctors and lawyers.
The three stages of AI interfaces:
Interacting with your apps using natural language conversations
AI Assistants that interact with apps and the web on your behalf
Interacting with human-level AI assistants & chatbots
Stage 1 | Happening Now
Interacting with your apps using natural language conversations
The impressive natural language capabilities of Generative AI powered by the latest Large Language Models (LLMs) has ushered in a new powerful way for us to interact with software and the web. In many use cases we will see graphical user interfaces (GUIs) replaced or improved by voice and text conversations. In some use cases we will also see co-pilots boosting user productivity.
AI Conversational interfaces for current websites and apps
With the hype that ChatGPT has created, we can assume that most software companies are currently building out chat or voice interfaces for their products and websites. The first iterations of these new interfaces are being released now. Over the next year we will see many of the software, websites and apps we use introduce a new way of interacting with them.
Most will use text based chat, while many will use advanced voice conversations. As the response speed and performance of today’s LLMs improves this interaction will feel more natural.
Most developers will leverage and build upon closed model APIs (such as OpenAI’s GPT) to offer these interfaces to their users. Some larger software developers will train their own models or fine-tune smaller open source models.
If a graphical user interface (GUI) is not necessary for the user to interact with your software, then it will be replaced by natural language prompts and responses.
For these new interfaces to materialise there are challenges which need to be overcome:
Inference and API costs for these LLM models are expensive and not feasible for most software applications. Rapid advancements in open source models will reduce these costs but inference costs will remain prohibitive for many use cases.
The models cannot be hosted locally on devices or be bundled within a software application due to size and compute requirements, therefore an internet connection is necessary. New techniques of reaching similar performance using smaller open source models may offer a solution soon.
Privacy concerns will limit the use in certain applications, especially when using a closed source third party model.
Accuracy and hallucination issues will make many software developers cautious and hesitant to release these updates. The key here will be to ground the AI models properly to the application’s / website’s data to reduce these risks.
AI co-pilots for boosting productivity within apps
Conversational chatbots and voice assistants will not only be introduced as a way for users to instruct and interact with their software applications and websites. We are also seeing productive and proactive assistants (or co-pilots) being introduced into software which will also do some of the user’s work.
While basic conversational interfaces (that are designed to replace a GUI) will simply allow users to control their software using natural language - assistants & co-pilots will boost user productivity. They will proactively control parts of the software autonomously without requiring user instruction and will collect and process data for the user. More importantly they will also generate content and build reports automatically for the user.
Examples include AI co-pilots that:
Generate or auto-complete source code for a developer.
Manipulate or query databases and spreadsheets for an accountant.
Draft contracts or summarise documents for a lawyer.
Word processors that will prepare first drafts for their users and then even generate clip art and stock images.
These types of AI assistants are more complex to build and design - therefore we will only see them introduced by the larger software developers at first. Over time we will see new frameworks (eg. LangChain) and platforms being introduced to facilitate the development of these AI assistants and this will allow these features to appear in more applications and websites.
The same challenges of conversational interfaces apply here too. Model costs and hallucinations will slow down development in this area, but techniques to address these issues will become more sophisticated.
Stage 2 | Probable | In 1 year
AI Assistants that interact with apps and the web on your behalf
The second stage in this transformation, would be more advanced AIs which would interact with multiple apps, services and websites on your behalf. At this stage we are no longer discussing innovative new interfaces and co-pilots being developed within apps. We are looking at AI assistants with a scope that spans multiple data sources, sites and applications.
Personal AI assistant
During this stage we might see the voice and chat assistants built in to the operating systems we use become much more capable. These advanced AI assistants will become the new interface between users and third party applications, services and websites.
For many use cases, we will no longer install apps or visit a website. Instead we will chat with our device’s built in assistant (via voice or text) and explain what we require. Perhaps we are looking for a recipe, planning a trip or purchasing a product - the assistant will do what is necessary in the background to collect the information we are looking for and then even go ahead with an action if we ask it to (eg. Book a flight via an API).
The LLM AIs powering our device assistant will be able to search the web and interact with third party APIs. They will also be able write source code and transform data to allow them to chain results and responses from one API to another. App builders and website designers will design their APIs to ensure that they are discoverable by these AI models and that they contain all the meta data required for the model to learn how to interact with it. APIs already adhere to web standards such as JSON, meaning that it would be easy to use in-context learning to teach an LLM how to interact with them.
These device assistants will be personalised to a certain extent by having access to some of your device’s data. They might also have access to limited context about you, such as your location and calendar. Eventually they could gain access to your device wallet for payments and your password manager to login to APIs.
Once again a number of challenges will mean that this type of assistant will be limited at first. Privacy concerns, inference costs and limitations to running a full model locally on a user’s device will be the main blockers.
The major transformation triggered by these early personal assistants will be that they will become the intermediaries between us and the apps that we use.
Local and personalised Corporate AI models
Today companies are using third party Large Language Models and building their applications and systems on top of them. This limits the use cases due to high costs, privacy concerns and limited personalisation.
The next stage could see companies adopting smaller open source models and hosting them on their own servers or cloud infrastructure.
These models can then be given access to all of the company’s APIs, documents, emails and databases. They can be fine-tuned using this data and allowing the models to be trained on up to date and relevant data. Using grounding and in-context learning the model can have access to the latest company events and communications as they occur.
The result will be that these models will augment existing employees. Each employee will have access to a corporate assistant chatbot which will assist them with their work and make them more productive. Certain roles may also be fully replaced over time.
Larger companies will probably have multiple models deployed and available to their employees. Each model will be fine-tuned or trained for particular capabilities. For example a source code generative model to augment developers / a customer service model to augment support agents and handle tier-1 support. Some will also offer customer facing AI chatbots.
Stage 3 | Speculative | In 10 years +
Interacting with Human-Level AI Assistants & Chatbots
Here we imagine the final part of the transformation where human-level AI performance creates scenarios where we would interact with AI instead of humans. This would take two forms; The first would be communication with AI Specialist Chatbots instead of human specialists. The second would be a personalised AI assistant trained on our data to act like us and capable of joining meetings and replying to messages on our behalf.
Important Note: AI researchers are divided on whether we can achieve human-level general AI or not. Some researchers believe it could arrive within years, other think it will take decades. It is also unknown whether we can achieve this just by scaling the current transformer based large language models architectures, or whether new breakthrough techniques in Machine Learning & Deep Learning are required first.
Proprietary Specialist Chatbots
Will voice and text conversational agents replace specialists such as doctors / lawyers / educators?
It is possible that technology companies that have access to certain types of specialised data (eg. Health records / legal contracts) would be able to build advanced AI models that are able to offer services that we rely on human specialists for today.
A company could release a Doctor AI service that would allow users to have conversations with the AI and upload blood tests, photos, prescriptions etc. The AI service would then guide the user to provide all relevant data and would then make a diagnosis, order tests or write a prescription.
In reality even if the Doctor AI was capable of doing this, I imagine that in the background a team of human doctors would need to review the outputs of the model and approve them before they are sent to the user. The human doctor might request an in-person appointment with the patient to carry out tests and follow up on the AI’s diagnosis. Perhaps the AI will be blocked from actually outputting a prescription or diagnosis before it is approved by a human doctor.
This would mean that users can have cheap access to AI doctors and are able to hold lengthy conversations to understand their health better. Time is not an issue so if a patient needs to spend hours discussing their symptoms, lifestyle and eating habits with an AI doctor then they can do so. Perhaps our wearables and devices could also be allowed to upload health data to these AI doctors proactively. This will deliver huge benefits to users, especially the ones without access to good personalised health advice.
Human doctors can then efficiently review reports generated by the AI, without needing to spend time collecting information from their patients. At scale they can then approve / modify AI ‘decisions’ for hundreds of patients during each shift.
The example above can be applied to many specialists. AI gives legal advice to a large number of users cheaply and efficiently and then escalates to a human lawyer when necessary. The same approach could work for educators, software developers, accountants, engineers etc.
It is unlikely that these specialists will be replaced by AI. Instead we might see Specialist AI Chatbots make this type of specialist advice and interaction available to many more people at scale. There will then be AI augmented human specialists and in-person appointments available at key points during the interaction.
Personalised human-level AI Assistant
What would a human-level personal AI assistant be capable of?
In a few years time, we might each have an advanced personalised LLM based assistant running locally on our devices. This assistant will now have full access to your data from your devices. It will know your realtime health data from our wearables (eg. Smart watch or ear pods) and have access to your location, messages, calendars, documents and social media profiles.
The data that it has access to would automatically be used to fine-tune and further train the model to your requirements and context. It could learn how to talk and sound like you and it will be aware of all your communications and objectives.
It will be able to take voice calls or join video meetings when you are not available and summarise them for you. It can also respond to emails and messages, notifying you if there is something important for you to action or know about.
This type of personal assistant will also be proactive. Informing you about health data that needs attention, alerting you to issues with your payments or accounts, and recommending new products and articles for you. It might even be able to edit / generate / personalise video content, music and podcasts based upon your requirements, mood and personality.
Imagine telling your assistant that you want to watch a documentary about a particular topic and you only have 15 minutes to watch it. The assistant might curate parts of other documentaries from streaming platforms and social media covering the topic and then stitch them together into the format you requested.
You might have a report or book you need to read, but instead you can just ask your assistant for a summary with a focus on a certain topic, or you can even have a conversation with your assistant where you would ask questions about the content.
If such assistants materialise our interaction with applications, the web and services will be exclusively through our assistant (perhaps apart from VR, games, streaming etc). The benefits are clear but they would also introduce a large number of concerns:
Would it be evident when a user is communicating with a human or their AI assistant?
If the AI model is owned and controlled by the operating system developer - would information and services curated for the user be influenced by the developer?
Should the AI assistant disclose information about the user if they are unwell, or doing something dangerous or illegal?
How would we ensure that the AI assistant is aligned with its user and working in their best interests?
What would happen if only some of us have access to an advanced AI assistant?
Conclusion
Generative AI has already started to transform the way that we interact with software and each other.
As the AI technology powering these models continues to advance we will interact more with AI assistants and less with traditional applications and websites. We will start off by using natural language as a user interface, and then transition to instructing powerful AI assistants to discover and use applications for us.
If these AI models become good enough, they might also replace some of the interactions we have with each other. It might become normal to communicate with AI doctors, AI teachers and AI lawyers whether we like it or not.
With advanced personal AI assistants we might also get to a point where we won’t know if we are talking to a human or their AI assistant whenever we join a remote meeting, reply to an email or answer the phone.
We are now in the first stage of the transformation. The second stage already seems probable based upon the AI models available today. Only time will tell whether neural network breakthroughs and scaling can continue at the current rate and deliver human-level AI performance. If it does, then the speculative stage 3 might arrive sooner than we thought possible. Whether this transformation would be beneficial to humanity is less clear.