Google’s Project Genie Is Not for You

Google’s Project Genie Is Not for You

Google’s Project Genie Is Not for You

Google has a whole new world for people to play in—but only for a minute. This week, the company released Project Genie, which the company calls its “general-purpose world model” that is capable of generating interactive environments. First unveiled to a small group of invite-only testers back in August of last year, the model, known as Genie 3, is now rolling out to Google AI Ultra subscribers in the US, so you can get your hands on it for the low, low price of $250 per month.

The fact that Google is showing off a world model is interesting on its own. Unlike a large language model (LLM), the underlying technology that powers most consumer-facing AI tools including Google’s own Gemini, which use the vast amount of training data they are given to predict the most likely next part of a sequence, world models are trained on the dynamics of the real world, including physics and spatial properties, to create a simulation of how physical environments operate.

World models are the approach to AI favored by Yann LeCun, the former chief scientist of Meta AI. LeCun believes (probably correctly) that LLMs will never be able to achieve artificial general intelligence, the point at which AI is able to match or exceed human capabilities across all domains. Instead, he believes world models can chart a path to that end goal, and he’s recently joined a startup that is going all in on that bet. It’s an oversimplification, but the idea is essentially that LLMs can only recognize patterns, whereas world models would allow AI to run tons of simulations to understand how the world works and extrapolate new conclusions.

Google playing in this world certainly provides some legitimacy to the idea that world models offer something that LLMs can’t, and there is no denying that the preview videos that have come out of the Project Genie’s early days are quite visually impressive, albeit short. Google is capping users at generating 60 seconds worth of their world, which the company also says “might not look completely true-to-life or always adhere closely to prompts or images, or real-world physics,”—which is to say, it might not work. Outputs are currently 720p videos rendered at 24 frames per second, per Ars Technica, and users have complained at times that it’s quite laggy in practice.

That’s fine for something in beta, though it does speak to the limitations of the company’s model, suggesting the world might be smaller than you’d imagine. While users have been hyping up the feature as if it’s about to put video game developers out of business, it’s probably worth pumping the brakes on that concern for the time being.

Google’s Genie 3 also takes a different approach to world models than what LeCun has imagined. The model, available through Project Genie, essentially creates a continuous video-based world. Users can navigate that like a video game, but in theory, AI agents could also endlessly run through those worlds to understand how things work. LeCun’s idea when he was at Meta was to create Joint Embedding Predictive Architecture (JEPA), which embeds a model of the outside world in an AI agent.

Also Read  AI Will Replace Recruiters and Assistants in Six Months, Says CEO Behind ChatGPT Rival

But again, the fact that Google is pushing a world model says something. Yes, the company is going to run into all of the same issues that have come from the release of other image and video generation models like OpenAI’s Sora 2, which was used to commit all sorts of likely copyright infringement. Early Project Genie outputs are reliably replicating Nintendo worlds, for instance, and that’s probably going to cause some issues. But it also suggests that even the biggest players in this AI space recognize that LLMs may eventually hit a wall.

That said, there’s a reason Google has put a hard cap on Project Genie for the time being. If you think it costs a lot to train and operate a text-based model, just imagine what creating a fully generated simulation of the world requires. It needs tons of high-dimensional data to understand everything from how a world looks to how physics works, and requires lots of processing power to run. That’s why, for now, the worlds might look vast but are being kept quite small in practice.

Also Read  'It's Not Going to Slow Down': The Tech Stock Everyone Is Watching This Week





Source link

Back To Top