Why I Choose the Problem of 4D Scene Reconstruction from RGB Videos ?

I envision a future where people can watch the 2030 World Cup live as holograms through augmented reality, transforming their living rooms into the stadium. Imagine seeing Mbappe playing right on the floor of your home, seamlessly blending the excitement of live sports with the immersive experience of AR.

In tackling any significant problem, I believe in setting objective and measurable goals. Inspired by the principles in The One Thing by Gary W. Keller and Jay Papasan, I use time-blocking to target, track, and amend the path to my goals. This disciplined approach allows me to systematically work towards achieving my ambitions.

Building a WorldModel

By World Model, I mean a model that can predict future frames from past frames and decompose the actions by itself.

I aim to create an AI model that learns from the temporal inputs of the world, capable of solving complex vision tasks. This AI model needs to be expansive, self-decomposing actions, and most importantly, able to predict future frames from past frames. It must learn the underlying reality of the world, grounded in its temporal sequence. This approach, though technically demanding, is essential for developing a robust AI that understands and interacts with the world as we do.

Broad problems, by their nature, lack clear deadlines and measurable progress indicators. They are open-domain research challenges. To make meaningful progress, I focus on more defined sub-problems, so the reconstruction of 4D scenes from RGB videos for enabling hologram experience of World Cup 2030.

The Importance of Reconstruction in 4D Scene Modeling

Reconstructing 4D scenes from sparse RGB video inputs involves complex processes of completion and reconstruction. This model should not only address sparse view 4D reconstruction but also learn the fundamental realities of the world, enabling a deeper understanding and representation of reality.

Long-Term Vision: AI Grounded in Reality

The development of a world model extends beyond 4D reconstruction. It lays the foundation for creating AI systems grounded in reality, capable of generating outputs based on the true nature of the world. My life’s goal is to work on AI models that can teach any concept to any person through any modality, making education and knowledge dissemination universally accessible and effective.

Impact and Reward

Contributing to a technological breakthrough that transforms how millions of people capture, distribute, and experience reality is a profoundly motivating long-term reward. The potential impact of my research on everyday lives drives my commitment to this problem.