Hello, I'm Sifei Liu

I currently hold the position of a staff-level Senior Research Scientist at NVIDIA, where I am part of the LPR team led by Jan Kautz. My work primarily revolves around the development of generalizable visual representation learning for images, videos, and 3D content. Prior to this, I pursued my Ph.D. at the VLLAB, under the guidance of Ming-Hsuan Yang.

Over the years, I’ve been fortunate to receive several prestigious awards and recognitions. In 2013, I was honored with the Baidu Graduate Fellowship. This was followed by the NVIDIA Pioneering Research Award in 2017, and the Rising Star EECS accolade in 2019. Additionally, I was nominated for the VentureBeat Women in AI Award in 2020.


News

  • May 2023: We’ve released the code for Affordance Diffusion.
  • April 2023: We will host the Learning with Noisy and Unlabeled Data for Large Models beyond Categorization Tutorial at ICCV 2023. See you in Paris.
  • Mar 2023: Four papers accepted at CVPR on representation learning for (i) Affordance generation, (ii) Open-vocabulary discriminative network (Highlight top 2.5% paper), (iii) zero-shot pose transfer, and (iv) self-supervised 3D reconstruction.

Recent Research

Full publications can be found at Google Scholar and CV

TUVF: Learning Generalizable Texture UV Radiance Fields

TUVF: Learning Generalizable Texture UV Radiance Fields

arXiv 2023

The paper introduces TUVF, a method for learning generalizable texture UV radiance fields.

Affordance diffusion: Synthesizing hand-object interactions

Affordance diffusion: Synthesizing hand-object interactions

CVPR 2023

The paper proposes a method for interaction synthesis that addresses issues using diffusion models. They build upon the classic idea of disentangling where to interact (layout) from how to interact (content).

Open-vocabulary panoptic segmentation with text-to-image diffusion models

Open-vocabulary panoptic segmentation with text-to-image diffusion models

CVPR 2023

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation.

CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

CVPR 2022

This work introduces Coordinate GAN (CoordGAN), a structure-texture disentangled GAN that learns a dense correspondence map for each generated image.

GroupViT: Semantic Segmentation Emerges from Text Supervision

GroupViT: Semantic Segmentation Emerges from Text Supervision

CVPR, 2022

This paper proposes a hierarchical Grouping Vision Transformer (GroupViT), which learns to group image regions into progressively larger arbitrary-shaped segments.

Autoregressive 3D Shape Generation via Canonical Mapping

Autoregressive 3D Shape Generation via Canonical Mapping

ECCV 2022

The paper demonstrates a solution for 3D point cloud generation using transformers. The key idea is to decompose a point cloud into a sequence of semantically meaningful shape compositions, which are further encoded by an autoregressive model for point cloud generation.

Learning Continuous Image Representation with Local Implicit Image Function

Learning Continuous Image Representation with Local Implicit Image Function

CVPR, 2021

The paper presents a method for learning continuous image representation with local implicit image function.

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

ICCV, 2021

This paper presents a video autoencoder for learning disentangled representations of 3D structure and camera pose from videos in a self-supervised manner.

Learning 3D Dense Correspondence via Canonical Point Autoencoder

Learning 3D Dense Correspondence via Canonical Point Autoencoder

NeurIPS 2021, virtual

The paper presents a method for learning 3D dense correspondence using a canonical point autoencoder.

Joint-task self-supervised learning for temporal correspondence

Joint-task self-supervised learning for temporal correspondence

NeurIPS 2019

This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner.