a brown incident in san francisco In November, Sam Altman, chief executive of artificial intelligence company OpenAI, was asked what surprises the field would bring in 2024.
Online chatbots like OpenAI’s ChatGPT “will take a leap that no one expected,” Mr. Altman quickly responded.
Sitting next to him, James Manica, a Google executive, nodded and said, “Plus one more.”
The AI industry this year is set to be defined by one main characteristic: remarkably rapid improvements in technology with advances built upon each other, allowing AI to generate new types of media, mimic human reasoning in new ways, and in the physical world. To enable entry. Through a new breed of robots.
In the coming months, AI-powered image generators like DALL-E and MidJourney will instantly deliver video as well as still images. And they will gradually merge with chatbots like ChatGPT.
This means that chatbots will go far beyond digital text by handling photos, videos, diagrams, charts, and other media. They will display behavior that looks more like human reasoning to tackle increasingly complex tasks in areas such as mathematics and science. As technology is changing to robots, it will also help solve problems beyond the digital world.
Many of these developments are already beginning to emerge in top research laboratories and technology products. But in 2024, the power of these products will increase significantly and will be used by far more people.
“The rapid progress of AI will continue,” said David Luan, chief executive of Adept, an AI start-up. “it is inevitable.”
OpenAI, Google and other tech companies are advancing AI far more quickly than other technologies because of the way the underlying systems are built.
Most software apps are created by engineers, one line of computer code at a time, which is typically a slow and difficult process. Companies are improving AI more rapidly because the technology relies on neural networks, mathematical systems that can learn skills by analyzing digital data. By pinpointing patterns in data such as Wikipedia articles, books, and digital text extracted from the Internet, a neural network can learn to generate text on its own.
This year, tech companies plan to provide AI systems with more data – including images, sounds and more text – than people can wrap their heads around. As these systems learn the relationships between these different types of data, they will learn to solve increasingly complex problems, preparing them for life in the physical world.
(The New York Times sued OpenAI and Microsoft last month for copyright infringement of news content related to AI systems.)
There is no sense that AI will be able to match the human brain any time soon. While AI companies and entrepreneurs aim to create what they call “artificial general intelligence” – a machine that can do anything the human brain can do – it remains a difficult task. Despite all its rapid gains, AI remains in its early stages.
Here’s a guide to how AI is set to make a difference this year, starting with near-term progress that will lead to further advancements in its capabilities.
Until now, AI-powered applications mostly generated text and static images in response to prompts. DALL-E, for example, can create photorealistic images within seconds of requests like “a rhinoceros diving off the Golden Gate Bridge.”
But this year, companies like OpenAI, Google, Meta and New York-based Runway are likely to deploy image generators that also allow people to create videos. These companies have already created prototypes of tools that can instantly create videos from short text prompts.
Tech companies are likely to turn the powers of image and video generators into chatbots, making chatbots more powerful.
Chatbots and image generators, which were originally developed as separate tools, are slowly merging. When OpenAI introduced a new version of ChatGPT last year, the chatbot could generate images as well as text.
AI companies are building “multimodal” systems, meaning the AI can handle multiple types of media. These systems learn skills by analyzing photos, text and diagrams, potentially other types of media, including charts, sound and video, so that they can then generate their own text, images and sound.
Not only this. Because systems are also learning the relationships between different types of media, they will be able to understand one type of media and respond with another type of media. In other words, someone can feed an image into the chatbot and it will respond with text.
“The technology will become smarter and more useful,” said Ahmed Al-Dahle, who leads the Generative AI group at Meta. “It’ll do more.”
Multimodal chatbots will make mistakes, just as text-only chatbots make mistakes. Tech companies are working to reduce errors as they strive to create chatbots that can reason like humans.
When Mr. Altman talks about leaps forward in AI, he’s referring to chatbots that are better at “reasoning” so they can perform more complex tasks, such as solving complex math problems and making detailed decisions. Preparing computer programs.
The aim is to build systems that can carefully and logically solve a problem through a series of distinct steps, each step building on the next. This is how humans reason, at least in some cases.
Leading scientists disagree over whether chatbots can actually reason this way. Some argue that these systems appear logical only because they replicate behavior observed in Internet data. But OpenAI and others are building systems that can more reliably answer complex questions involving subjects like mathematics, computer programming, physics and other sciences.
“As systems become more reliable, they will become more popular,” said Nick Frost, a former Google researcher who helped lead the AI start-up Fog.
If chatbots get better at reasoning, they could turn into “AI agents.”
As companies teach AI systems how to tackle complex problems one step at a time, they can also improve chatbots’ ability to use software apps and websites on your behalf.
Researchers are essentially turning chatbots into a new kind of autonomous systems called AI agents. This means that chatbots can use software apps, websites and other online tools, including spreadsheets, online calendars and travel sites. People can then upload tedious office work to chatbots. But these agents can also take away jobs completely.
Chatbots already work as agents in small ways. They can schedule meetings, edit files, analyze data, and create bar charts. But these tools don’t always work as well as they need to. Agents break down completely when applied to more complex tasks.
This year, AI companies are set to unveil agents that are more reliable. “You should be able to delegate any tedious, everyday computer task to an agent,” Mr. Luan said.
This might include keeping track of expenses in an app like QuickBooks or logging vacation days in an app like Workday. In the long run, it will expand beyond software and Internet services to the world of robotics.
In the past, robots were programmed to perform the same tasks repeatedly, such as picking up boxes that were always the same size and shape. But using the same kind of technology that underpins chatbots, researchers are giving robots the power to handle more complex tasks — including ones they’ve never seen before.
Just as chatbots can learn to predict the next word in a sentence by analyzing large amounts of digital text, a robot can learn to predict what happens in the physical world by analyzing countless videos of objects prodding, lifting, and moving. What will happen.
“These technologies can absorb tremendous amounts of data. And as they absorb the data, they can learn how the world works, how physics works, how you interact with objects,” said Peter Chen, a former OpenAI researcher who founded a robotics start-up. Let’s run up the covariance, said.
This year, AI will supercharge robots that work behind the scenes, like mechanical arms that fold shirts in a laundromat or sort piles of goods inside a warehouse. Tech giants like Elon Musk are also working to move ahead Humanoid robots in people’s homes,