Highlights:
- In a research paper that was published earlier this year, Runway went into depth about the technology that powers the model.
- Using a training dataset of 240 million images and 6.4 million video clips, the firm created its model.
Gen-2, an artificial intelligence model that can create brief video clips based on text cues, was unveiled by startup Runway AI Inc.
Runway, located in New York, creates AI models that make it easier for creative professionals to edit images and videos. The company collaborated on the development of the well-known Stable Diffusion generative AI model last year. It received a USD 50 million Series C financing round in December, valued at USD 500 million, according to reports.
The startup’s new AI model to generate videos, Gen-2, is an enhanced variant of the Gen-1 neural network, which made its debut in February. According to the startup, Gen-2 can produce clips with greater accuracy than its predecessor. Additionally, the model gives users more customization choices.
Runway’s original Gen-1 neural network accepts an existing video and a text prompt describing the necessary edits as input. One user might send Gen-1 a video of a yellow vehicle and the instruction to “paint the car red,” for instance. The appropriate changes will then be made automatically by the model.
Gen-1 can also alter a video by molding it to the appearance of a reference picture that the user provides. Runway’s latest model, Gen-2, introduces another method for producing clips. Users can easily make videos by entering a text prompt instead of a source video or reference image.
In a research paper that was published earlier this year, Runway went into depth about the technology that powers the model. The business claims that its algorithm creates videos using an AI technique called diffusion.
Researchers add Gaussian noise, a type of error, to a file using the diffusion technique. The original file is then restored after training a neural network to eliminate the Gaussian noise. The neural network learns how to analyze the input data it gets and transform it into a new file that adheres to the user’s specifications by repeatedly performing this process.
Using a training dataset of 240 million images and 6.4 million video clips, the firm created its model. After that, it conducted several user studies to assess Gen-2’s performance, and it found that Gen-2 considerably outperformed the two most sophisticated AI models in the same category.
Runway is not the only business creating AI algorithms that can produce videos. Researchers from Meta Platforms Inc. described a comparable clip generation model dubbed Make-A-Video last year. It can produce clips based on text cues just like Gen-2.