University of California, Merced
Ph. D in EECS, advised by Ming-Hsuan Yang
Fall 2012 - Fall 2027
University of Science and Technology of China
M.S. in Electronic Engineering and Information Sciences, advised by Stan Z. Li.
Fall 2008 - Spring 2011


Sr. Research Scientist
Nov 2017 - Current
Santa Clara, CA
Research Intern with NVIDIA Research.
March 2017 - Aug 2017
Santa Clara, CA
Chinese University of HongKong
Visiting Schoolar at the MMLAB.
Jul 2016 - Dec 2016
Baidu Inc.
Applied Scientist Intern on IDL. Worked on face parsing and beutification Apps.
May 2013 - Jan 2016
Beijing, China

Workshop and tutorial organization

New Frontiers for Learning with Limited Labels or Data
ECCV 2020. (Co-orgainizor and Speaker)
Learning Representations via Graph-structured Networks Tutorial
CVPR 2019 and 2020. (Co-orgainizor and Speaker)

(Co)-Mentees at NVIDIA Research

Recent Publications

TUVF: Learning Generalizable Texture UV Radiance Fields
A. Cheng, X. Li, S. Liu, X. Wang
The paper introduces TUVF, a method for learning generalizable texture UV radiance fields.
arXiv, 2023
Affordance diffusion: Synthesizing hand-object interactions The paper proposes a method for interaction synthesis that addresses issues using diffusion models. They build upon the classic idea of disentangling where to interact (layout) from how to interact (content).
CVPR, 2023
Open-vocabulary panoptic segmentation with text-to-image diffusion models We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation.
CVPR, 2023
CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs This work introduces Coordinate GAN (CoordGAN), a structure-texture disentangled GAN that learns a dense correspondence map for each generated image.
CVPR, 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision This paper proposes a hierarchical Grouping Vision Transformer (GroupViT), which learns to group image regions into progressively larger arbitrary-shaped segments.
CVPR, 2022
Autoregressive 3D Shape Generation via Canonical Mapping The paper demonstrates a solution for 3D point cloud generation using transformers. The key idea is to decompose a point cloud into a sequence of semantically meaningful shape compositions, which are further encoded by an autoregressive model for point cloud generation.
ECCV, 2022
Learning Continuous Image Representation with Local Implicit Image Function
Y. Chen, S. Liu, X. Wang
The paper presents a method for learning continuous image representation with local implicit image function.
CVPR, 2021
Video Autoencoder: self-supervised disentanglement of static 3D structure and motion
Z. Lai, S. Liu, A. Efros, X. Wang
This paper presents a video autoencoder for learning disentangled representations of 3D structure and camera pose from videos in a self-supervised manner.
ICCV, 2021
Learning 3D Dense Correspondence via Canonical Point Autoencoder The paper presents a method for learning 3D dense correspondence using a canonical point autoencoder.
NeurIPS, 2021
Joint-task self-supervised learning for temporal correspondence This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner.
NeurIPS, 2019