Highlights:
- SA2, as it’s called, is a “segmentation model”—a specialized type of computer vision model that analyzes an image and describes its contents.
- The main difference between SA1 and SA2 is that SA2 extends its capabilities to videos, not just images, marking a major advancement in the field of computer vision.
Meta Platforms Inc.’s artificial intelligence research team has introduced a follow-up to last summer’s popular Segment Anything machine learning model.
Recently, Mark Zuckerberg, Chief Executive of Meta, announced Segment Anything 2 during a comprehensive fireside chat with Nvidia Corp. CEO Jensen Huang at the SIGGRAPH 2024 event. This new version significantly improves upon the original model, which was designed to identify specific objects and elements within an image, extending that capability to videos as well.
SA2, as it’s called, is a “segmentation model,” a specialized type of computer vision model that can analyze an image and describe its contents. For instance, it can identify a dog partially hidden by a tree or a bucket collecting rainwater from a leaky roof.
The main distinction between SA1 and SA2 is that SA2 can be used for both videos and images, representing a major progress in computer vision technology.
Zuckerberg mentioned that scientists frequently utilize these types of models to research subjects such as coral reefs and natural habitats. He said, “But being able to do this in video and have it be zero shot and tell it what you want, it’s pretty cool.”
Zuckerberg highlighted that SA2’s ability to perform this task for videos showcases the advancements in the AI industry, especially in processing power. He noted that just a year ago, applying image segmentation to video would have been impossible.
The SA2 model is open-source and available for download on GitHub, with a free demo accessible here.
Zuckerberg stated that the model was trained on an extensive amount of data, with the company releasing an annotated database of about 50,000 videos specifically created for SA2’s training. Additionally, the model was trained on a second database containing over 100,000 videos, though this one is not being made public. While Zuckerberg did not provide a reason, it is reasonable to assume that these videos are likely user-generated content from Facebook and Instagram.
During the chat, Zuckerberg acknowledged to Huang that while most of the company’s AI research is open-source, they still maintain commercial interests.
He said, “We’re not doing this because we’re altruistic people, even though I think that this is going to be helpful for the ecosystem — we’re doing it because we think that this is going to make the thing that we’re building the best.”
Digital Twins for Influencers
In the discussion, Zuckerberg also shared his vision of a future where Facebook and Instagram could create AI replicas of social media influencers and content creators, functioning as “an agent or assistant that their community can interact with.”
He explained that some creators don’t have enough time to engage with their followers as much as they would like. By using a digital twin, influencers could interact directly with their followers through messaging, he noted.
Zuckerberg said that instead of interacting with their followers directly, “the next best thing is to enable people to build digital agents trained on material that represents them in the way they want.”
Meta’s goal is to gather all a user’s content to quickly establish a business agent that can “interact with customers, handle sales, and provide customer support,” he added.