Department Talks

Generative Rendering and Beyond

Talk

02 May 2024 • 17:00—18:00

Shengqu Cai

Hybrid

Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models (SORA). Despite great promise, video diffusion models are difficult to control, hindering users from applying their own creativity rather than amplifying it. In this talk, we present a novel approach called Generative Rendering that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models. Our approach takes an animated, low-fidelity rendered mesh as input and injects the ground truth correspondence information obtained from the dynamic mesh into various stages of a pre-trained text-to-image generation model to output high-quality and temporally consistent frames. Going beyond, we will discuss the various challenges and goals towards achieving controllability in video diffusion models, and conclude with a preview of our ongoing consensus video generation efforts.

Organizers: Shrisha Bharadwaj Michael Black

Archived Talks

Modeling and Reconstructing Garments with Sewing Patterns

Talk

04 April 2024 • 14:00—15:00

Maria Korosteleva

N3.022

The problems of creating new garments (modeling) or reproducing the existing ones (reconstruction) appear in various fields: from fashion production to digital human modeling for the metaverse. The talk introduces approaches to a novel garment creation paradigm: programming-based parametric sewing pattern construction and its application to generating rich synthetic datasets of garments with sewing patterns. We will then discuss how the availability of ground truth sewing patterns allows posing the learning-based garment reconstruction problem as a sewing pattern recovery. Such reformulation enables obtaining high-quality 3D garment models from sparse point clouds with effective design generalization while simultaneously providing designer-friendly garment representation for further use in traditional garment processing pipelines.

Organizers: Yao Feng Michael Black

Geometric Regularizations for 3D Shape Generation

Talk

13 March 2024 • 15:00—16:00

Qixing Huang

N3.022

Generative models, which map a latent parameter space to instances in an ambient space, enjoy various applications in 3D Vision and related domains. A standard scheme of these models is probabilistic, which aligns the induced ambient distribution of a generative model from a prior distribution of the latent space with the empirical ambient distribution of training instances. While this paradigm has proven to be quite successful on images, its current applications in 3D generation encounter fundamental challenges in the limited training data and generalization behavior. The key difference between image generation and shape generation is that 3D shapes possess various priors in geometry, topology, and physical properties. Existing probabilistic 3D generative approaches do not preserve these desired properties, resulting in synthesized shapes with various types of distortions. In this talk, I will discuss recent work that seeks to establish a novel geometric framework for learning shape generators. The key idea is to model various geometric, physical, and topological priors of 3D shapes as suitable regularization losses by developing computational tools in differential geometry and computational topology. We will discuss the applications in deformable shape generation, latent space design, joint shape matching, and 3D man-made shape generation.

Organizers: Yuliang Xiu

Mining Visual Knowledge from Large Pre-trained Models

Talk

18 January 2024 • 15:00—16:00

Luming Tang

N3.022

Computer vision made huge progress in the past decade with the dominant supervised learning paradigm, that is training large-scale neural networks on each task with ever larger datasets. However, in many cases, scalable data or annotation collection is intractable. In contrast, humans can easily adapt to new vision tasks with very little data or labels. In order to bridge this gap, we found that there actually exists rich visual knowledge in large pre-trained models, i.e., models trained on scalable internet images with either self-supervised or generative objectives. And we proposed different techniques to extract these implicit knowledge and use them to accomplish specific downstream tasks where data is constrained including recognition, dense prediction and generation. Specifically, I’ll mainly present the following three works. Firstly, I will introduce an efficient and effective way to adapt pre-trained vision transformers to a variety of low-shot downstream tasks, while tuning only less than 1 percent of the model parameters. Secondly, I will show that accurate visual correspondences emerge from a strong generative model (i.e., diffusion models) without any supervision. Following that, I will demonstrate that an adapted diffusion model is able to complete a photo with true scene contents using only a few casual captured reference images.

Organizers: Yuliang Xiu Yandong Wen

RAVEN: Rethinking Adversarial Video generation with Efficient tri-plane Networks

Talk

30 November 2023 • 10:00 am—11:00 am

Partha Ghosh

N3.022 Aquarium and Zoom

We present a novel unconditional video generative model designed to address long-term spatial and temporal dependencies. To capture these dependencies, our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks developed for three-dimensional object representation and employs a singular latent code to model an entire video sequence. Individual video frames are then synthesized from an intermediate tri-plane representation, which itself is derived from the primary latent code. This novel strategy reduces computational complexity by a factor of 2 as measured in FLOPs. Consequently, our approach facilitates the efficient and temporally coherent generation of videos. Moreover, our joint frame modeling approach, in contrast to autoregressive methods, mitigates the generation of visual artifacts. We further enhance the model's capabilities by integrating an optical flow-based module within our Generative Adversarial Network (GAN) based generator architecture, thereby compensating for the constraints imposed by a smaller generator size. As a result, our model is capable of synthesizing high-fidelity video clips at a resolution of 256×256 pixels, with durations extending to more than 5 seconds at a frame rate of 30 fps. The efficacy and versatility of our approach are empirically validated through qualitative and quantitative assessments across three different datasets comprising both synthetic and real video clips.

Organizers: Yandong Wen

Orthogonal Butterfly: Parameter-Efficient Orthogonal Adaptation of Foundation Models via Butterfly Factorization

Talk

19 October 2023 • 10:00 am—11:00 am

Weiyang Liu

N3.022 Aquarium and Zoom

Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in computer vision and natural language. The results validate the effectiveness of BOFT as a generic finetuning method.

Organizers: Yandong Wen

Ghost on the Shell: An Expressive Representation of General 3D Shapes

Talk

12 October 2023 • 10:00 am—11:00 am

Zhen Liu

Hybrid

The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they enable 1) fast physics-based rendering with realistic material and lighting, 2) physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D shape, however, has critiqued meshes as being topologically inflexible. To capture a wide range of object shapes, any 3D representation must be able to model solid, watertight, shapes as well as thin, open, surfaces. Recent work has focused on the former, and methods for reconstructing open surfaces do not support fast reconstruction with material and lighting or unconditional generative modelling. Inspired by the observation that open surfaces can be seen as islands floating on watertight surfaces, we parametrize open surfaces by defining a manifold signed distance field on watertight templates. With this parametrization, we further develop a grid-based and differentiable representation that parametrizes both watertight and non-watertight meshes of arbitrary topology. Our new representation, called Ghost-on-the-Shell (G-Shell), enables two important applications: differentiable rasterization-based reconstruction from multiview images and generative modelling of non-watertight meshes. We empirically demonstrate that G-SHELL achieves state-of-the-art performance on non-watertight mesh reconstruction and generation tasks, while also performing effectively for watertight meshes.

Organizers: Yandong Wen

Scaling up 3D content generation via 3D grounding for representation, data and algorithm

Talk

07 September 2023 • 10:00—11:00

Jun Gao

Virtual (Zoom)

Creating 3D virtual worlds will require generating diverse and high-quality 3D content that mimics the intricacies of the real 3D world. While machine learning has achieved significant success in image and video generation, its application in 3D content generation encounters fundamental challenges in the scarcity of 3D training data and increased complexities inherent in three dimensions. We approach the problem of 3D content generation by revisiting the 3D grounding for the representation, data and algorithms. First, we introduce a differentiable 3D representation that bridges neural fields with meshes via differentiable isosurfacing. This enables us not only to generate 3D meshes with varying topologies but also to regularize neural fields through the mesh. Second, we exploit 2D data prior to facilitating text-to-3D generation with a coarse-to-fine generation recipe. Specifically, we bring our differentiable isosurfacing to extract 3D meshes and differentiably render high-resolution images, which enables the generation of high-frequency details in geometry and textures from the text. Lastly, we develop a 3D generative algorithm that can generate high-quality meshes with textures by enforcing a 3D bottleneck in the generation process while supervising 2D images through differentiable rendering.

Organizers: Yao Feng

SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage

Talk

24 August 2023 • 10:00—11:00

Yifan Wang

Virtual (Zoom)

A light stage acquires the shape and material properties of a face in high detail using a series of images captured under synchronized cameras and lights. This captured information can be used to synthesize novel images of the subject under arbitrary lighting conditions or from arbitrary viewpoints. This process enables a number of visual effects, such as creating digital replicas of actors that can be used in movies or high-quality postproduction relighting. In many cases, however, it is often infeasible to get access to a light stage for capturing a particular subject, because light stages are not easy to find: they are expensive and require significant technical expertise (often teams of people) to build and operate. In this talk, we will delve into a lightweight alternative to a light stage that captures comparable data using only a smartphone camera and the sun, which we dub SunStage. Our method only requires the user to capture a selfie video outdoors, rotating in place, and uses the varying angles between the sun and the face as guidance in joint reconstruction of facial geometry, reflectance, camera pose, and lighting parameters. Despite the in-the-wild un-calibrated setting, SunStage is able to reconstruct detailed facial appearance and geometry, enabling compelling effects such as relighting, novel view synthesis, and reflectance editing.

Organizers: Yao Feng

Face Exploration - Capture all Degrees of Freedom of the Face

Talk

17 August 2023 • 10:00 am—11:00 am

Claudia Gallatz

N3.022 Aquarium and Zoom

A high quality data capture is decisive for your scientific work. As a member of the data team, it is a core task of my daily routine to ensure good quality standards in this field. My talk will enlighten the background of this work, starting from scanner set-up and the corresponding data outcome with focus on the Face Scanner. A work, each scientist can profit from for his personal projects. I will take the occasion to present our most recent face capture study named FACE EXPLORATION, of which Timo Bolkart is the leading scientist. A selection of representative sequences including facial movements and expressions will be demonstrated along with general informations on protocol and participating subjects. Further, I would like to point out some parallels between actors and computer scientist by doing an approach to the topic of facial expression – based on my experience as an actress prior to my work at PS.

Organizers: Yandong Wen

Full-body avatars from single images and textual guidance

Talk

13 July 2023 • 10:00—11:00

Yangyi Huang

The reconstruction of full body appearance of clothed humans from single-view RGB images is a crucial yet challenging task, primarily due to depth ambiguities and the absence of observations from unseen regions. While existing methods have shown impressive results, they still suffer from limitations such as over-smooth surfaces and blurry textures, particularly lacking details at the backside of the avatar. In this talk, I will delve into how we have addressed these limitations by leveraging text guidance and pretrained text-image models, introducing two novel methods. Firstly, I will present ELICIT, a data-efficient approach that utilizes a SMPL-based human body prior and CLIP-based semantic prior to create an animatable human nerf from a single image. This method tackles the challenges of creating detailed back-side appearance by a CLIP embedding loss. Secondly, I will introduce TeCH, our latest project for reconstructing high-fidelity 3D clothed humans with consistent texture maps and detailed geometry. This approach employs a hybrid mesh representation and pretrained 2D text-to-image diffusion models to achieve remarkable results. Through these advancements, we aim to push the boundaries of creating digital human, bridging the gap between single-image inputs and the creation of fully textured and realistic 3D avatars.

Organizers: Hongwei Yi

← Previous 1 2 3 4 5 6 7 8 9 … 26 27 Next →

Department Talks

Generative Rendering and Beyond

Modeling and Reconstructing Garments with Sewing Patterns

Geometric Regularizations for 3D Shape Generation

Mining Visual Knowledge from Large Pre-trained Models

RAVEN: Rethinking Adversarial Video generation with Efficient tri-plane Networks

Orthogonal Butterfly: Parameter-Efficient Orthogonal Adaptation of Foundation Models via Butterfly Factorization

Ghost on the Shell: An Expressive Representation of General 3D Shapes

Scaling up 3D content generation via 3D grounding for representation, data and algorithm

SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage

Face Exploration - Capture all Degrees of Freedom of the Face

Full-body avatars from single images and textual guidance

Latest News

Links

Contact Us