Stability AI introduced its latest innovation, the Stable Video 3D (SV3D) model, on Monday, marking a significant milestone in the realm of artificial intelligence (AI) technology. Unlike its counterparts like OpenAI’s Sora, Runway AI, and Pika 1.0, SV3D stands out by generating videos without relying on text inputs. Instead, its primary function is to transform 2D images into immersive orbital 3D models. The company has made this groundbreaking AI model available for both commercial and non-commercial use.
The official announcement from Stability AI, shared on X (formerly known as Twitter), heralded the release of Stable Video 3D, a generative model built upon Stable Video Diffusion. This cutting-edge model represents a significant leap forward in 3D technology, boasting vastly improved quality and multi-view capabilities. Notably, this announcement follows closely on the heels of Stability AI’s unveiling of Stable Diffusion 3, aimed at enhancing performance in multi-subject prompts.
The Stable Video 3D AI model comes in two distinct variants: SV3D_u and SV3D_p. The former excels at generating orbital videos based on single image inputs, albeit without camera conditioning. While it adeptly converts 2D images into 3D renders, it lacks camera movement. On the other hand, SV3D_p, the more advanced variant, offers greater versatility by accommodating both single images and orbital views. This enables it to produce fully rendered 3D videos along predefined camera paths.
According to Stability AI, the SV3D model addresses the inconsistency issues prevalent in older-generation models like Stable Zero123. Leveraging Neural Radiance Fields (NeRF) and mesh representations, SV3D significantly enhances the quality and consistency of rendered videos. Furthermore, to mitigate the challenge of baked-in lighting, Stable Video 3D incorporates a disentangled illumination model, which undergoes joint optimization alongside 3D shape and texture, as outlined in a comprehensive blog post by Stability AI.