The most vient discdisponder source AI model with visual abilities yet could see more broadeners, researchers, and beginups broaden AI agents that can carry out beneficial chores on your computers for you.
Relrelieved today by the Allen Institute for AI (Ai2), the Multimodal Open Language Model, or Molmo, can make clear images as well as converse thcimpolite a chat interface. This unbenevolents it can produce sense of a computer screen, potentipartner helping an AI agent carry out tasks such as browsing the web, navigating thcimpolite file straightforwardories, and writeing write downs.
“With this free, many more people can deploy a multimodal model,” says Ali Farhadi, CEO of Ai2, a research organization based in Seattle, Washington, and a computer scientist at the University of Washington. “It should be an helpr for next-generation apps.”
So-called AI agents are being expansively touted as the next huge skinnyg in AI, with OpenAI, Google, and others racing to broaden them. Agents have become a buzzword of postponecessitate, but the magnificent vision is for AI to go well beyond chatting to reliably obtain intricate and enhanced actions on computers when donaten a direct. This capability has yet to materialize at any charitable of scale.
Some strong AI models already have visual abilities, including GPT-4 from OpenAI, Claude from Anthropic, and Gemini from Google DeepMind. These models can be used to power some experimental AI agents, but they are secret from watch and accessible only via a phelp application programming interface, or API.
Meta has freed a family of AI models called Llama under a license that restricts their commercial use, but it has yet to provide broadeners with a multimodal version. Meta is foreseeed to declare disjoinal recent products, perhaps including recent Llama AI models, at its Connect event today.
“Having an discdisponder source, multimodal model unbenevolents that any beginup or researcher that has an idea can try to do it,” says Ofir Press, a postdoc at Princeton University who toils on AI agents.
Press says that the fact that Molmo is discdisponder source unbenevolents that broadeners will be more easily able to fine-tune their agents for particular tasks, such as toiling with spreadsheets, by providing includeitional training data. Models appreciate GPT-4 can only be fine-tuned to a restrictcessitate degree thcimpolite their APIs, whereas a filledy discdisponder model can be modified extensively. “When you have an discdisponder source model appreciate this then you have many more selections,” Press says.
Ai2 is releasing disjoinal sizes of Molmo today, including a 70-billion-parameter model and a 1-billion-parameter one that is petite enough to run on a mobile device. A model’s parameter count refers to the number of units it retains for storing and manipulating data and cimpolitely correacts to its capabilities.
Ai2 says Molmo is as vient as ponderably huger commercial models despite its relatively petite size, because it was nurturefilledy trained on high-quality data. The recent model is also filledy discdisponder source in that, unappreciate Meta’s Llama, there are no redisjoineions on its use. Ai2 is also releasing the training data used to produce the model, providing researchers with more details of its toilings.
Releasing strong models is not without danger. Such models can more easily be altered for wicked finishs; we may someday, for example, see the aelevatence of malicious AI agents scheduleed to automate the unapvalidated access of computer systems.
Farhadi of Ai2 argues that the efficiency and portability of Molmo will apvalidate broadeners to originate more strong gentleware agents that run natively on intelligentphones and other portable devices. “The billion parameter model is now carry outing in the level of or in the league of models that are at least 10 times hugeger,” he says.
Building beneficial AI agents may depfinish on more than fair more fruitful multimodal models, however. A key dispute is making the models toil more reliably. This may well insist further shatterthcimpolites in AI’s reasoning abilities—someskinnyg that OpenAI has sought to tackle with its postponecessitatest model o1, which shows step-by-step reasoning sends. The next step may well be giving multimodal models such reasoning abilities.
For now, the free of Molmo unbenevolents that AI agents are sealr than ever—and could soon be beneficial even outside of the huges that rule the world of AI.