June 18, 2024 General World Models

Deeper Learning

Hey there! We’ve got some new video and audio models to discuss this week. But before we get into it, heads up that we’ve got another AI demo night coming up on Monday, June 24 in San Francisco. So if you’re an AI maker, make sure to check it out.

Now, let’s go deeper.

ONE BIG READ

Runway’s newest video model

Image: Runway

Runway, the company behind one of the first commercial text-to-video models, just released its latest model, Gen-3 Alpha.

Gen-3 Alpha offers a major improvement in speed and fidelity over Gen-2. According to Runway’s cofounder, the new model excels at human characters and their actions, gestures, and emotions. The downsides? It struggles with interactions and is limited to 5- and 10-second clips which take 45 seconds and 90 seconds to generate, respectively.

The launch comes a week after the launch of another text-to-video model, Dream Machine from Luma AI, an a16z and Nvidia-backed startup. Dream Machine can generate up to 120 frames of video in around 120 seconds, outperforming OpenAI’s Sora, which produces up to a minute of video but takes 10 minutes to an hour — depending on who you ask. In our own testing, we noticed Dream Machine can get a little funky when it comes to movement. Take this UFO-inspired video, for example. The child’s movements have a certain stiffness to them.

General World Models

Back in December, Runway “introduced” the idea of General World Models (the original concept dates back to 2018), which is a buzzword for AI that understands its environment and can anticipate or teach itself new things based on that understanding (so not just AI that mathematically predicts the next word in a sentence).

Runway says the release of Gen-3 Alpha is a step towards building General World Models, which is ultimately a step towards AGI.“To build general world models, there are several open research challenges that we’re working on,” the team explains on its website. “For one, those models will need to generate consistent maps of the environment, and the ability to navigate and interact in those environments.”

In other words, building good video models offers a potential path to building AI that successfully simulates the physical world. Think about it — a great video model would be able to simulate a wide range of real-life situations and interactions, and have an understanding of things like physics and motion. These are the things we need to get to AGI.

OpenAI looks at Sora the same way: “Our largest model, Sora, is capable of generating a minute of high-fidelity video,” explains a blog on the topic, “Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.”

This is all still research, of course. World Models have only been narrowly applied to things like video games or autonomous vehicles so far. I just personally like knowing that all of this work on video models extends a lot further than a 5-second clip.

PRODUCT HIGHLIGHT

Remember more information with this AI tool

The internet is a double-edged sword. You have instant access to a limitless amount of information, but there’s so much of it that you’ll likely forget what you consumed within a week. That’s the exact problem Paul Richards, co-founder of Recall, was dealing with.

Active Recall is an AI-powered web-based tool that aims to help you remember more of what you consume by utilizing categorization, contextual recollection, and even some gamification.

When you stumble upon something interesting, click the Recall icon in your browser bar. From there, Recall will get to work summarizing the piece based on key points. Once done, it will add it to your knowledge base and categorize it based on what it mentions. If it’s an article about GPT-5 rumors, it will likely mention OpenAI, Sam Altman, AI, and large language models. This is to make it easier for users to resurface content. Rather than relying on remembering the title or publication, you can just remember what it was about.

It gets really interesting as you’re actively consuming content. Say you’re reading an article about Elon Musk. You might have a piece about colonizing Mars saved (one of Musk’s and Space X’s main goals). Recall will remember this and resurface it to you in the right context.

Alongside that, Recall also gamifies information retention with things like AI-generated information quizzes, which ideally should help you retain more knowledge about what you’ve read.

FROM PRODUCT HUNT DEV

How well can LLMs find syntactic bugs in large Python codebases? The folks over at Hamming.ai built a new benchmark to find out. Their results showed that models react differently to the placement of the bug within the source code. The GPT-4-series was the least sensitive. Check out more of their results on the Product Hunt blog.

About the team: Sumanyu Sharma is the Co-Founder & CEO of Hamming, a platform for building AI products with a focus on reliability and trustworthy AI. Before that, he was the Head of Data at Citizen and helped lead an AI-powered sales program at Tesla to hundreds of millions in revenue. Hokyung (Andy) Lee is an AI Researcher and student at the University of Waterloo.

Product Hunt DevA weekly digest of developer tools and interesting stories from the world of engineering

MORE TOOLS

For productivity

  • Chatty uses an open-source ChatGPT-like interface to run open-source models locally in the browser using WebGPU.

  • Omi consolidates your contracts for collaboration in one place.

  • Summit is an AI-powered tool that helps you organize and track your goals, stay accountable to those goals, and be supported 24/7.

  • Inbox Zero uses AI to show only what is important in your inbox and block out irrelevant and spam emails.

  • Choosy Cat chooses the best answer from ChatGPT, Gemini, and Claude.

For creating

  • MARS5 TTS from CAMB.AI is an open-source, text-to-speech model for extremely tough prosodic scenarios (i.e. it can replicate speech rhythm and intonation patterns).

  • TwoShot lets you create and remix music with voice, descriptions, or humming.

  • Magic Publish researches YouTube for optimized titles, tags, and descriptions.

  • Icons8 made an AI image generator for generating multiple images in the same style, trained on proprietary data, not by scraping the internet.

For fun

For developers

Thanks for going deeper with us!

Here via forward? Subscribe here.

Have feedback?

Did you enjoy today's newsletter?

Login or Subscribe to participate in polls.

Join the conversation

or to participate.