Research
I'm boradly interested in Computer Vision, Multimodal learning and Robotics. Currently, I'm mainly
working on video understanding, with a focus on leveraging foundation models (LLMs, VLMs, etc.) to
solve multiple video understanding tasks. I'm also interested in offline decision making, especially
learning from videos. I believe the commonsense knowledge encoded in foundation models would help
solve robotic tasks faster and more robustly.
|
|
Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning
Zilai Zeng, Ce Zhang, Shijie Wang, Chen Sun
NeurIPS, 2023
We investigate if sequence modeling has the capability to condense trajectories into useful representations that can contribute to policy learning. GCPC achieves competitive performance on AntMaze, FrankaKitchen and Locomotion.
|
|
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao*, Ce Zhang*, Shijie Wang, Changcheng Fu, Nakul Agarwal, Kwonjoon Lee, Chen Sun
arXiv, 2023
We use discretized action labels to represent videos, then feed the representations to LLMs for long-term action anticipation. Results on Ego4D, EK-55 and Gaze show that this simple approach is suprisingly effective.
|
|
Object-centric Video Representation for Long-term Action Anticipation
Changcheng Fu*, Ce Zhang*, Shijie Wang, Nakul Agarwal, Kwonjoon Lee, Chiho Choi, Chen Sun
In submission
|
This webpage is adapted from Jon Barron's page.
|
|