VLMNM Workshop @ ICRA 2024

Final Schedule

Time (JST)

Event

Description

Time (PDT)
(May 16)

8:30 - 8:50

Coffee and Pasteries
Poster presenters set up posters

16:30 - 16:50

                8:50 - 9:00
                
                  Introduction
                
                16:50 - 17:00

9:10 - 9:35

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks [Recording]

Prof. Subbarao Kambhampati | Arizona State University

17:10 - 17:35

9:35 - 10:00

LLM-based Task and Motion Planning for Robots [Recording]

Prof. Chuchu Fan | Massachusetts Institute of Technology

17:35 - 18:00

10:00 - 10:20

On the Challenges and Opportunities of Policy Learning for Mobile Manipulation [Recording]

Prof. Jeannette Bohg | Stanford University

18:00 - 18:20

                10:25 - 10:45
                
                  Coffee Break and Poster Session
                
                  20 Mins
                
                18:25 - 18:45

11:00 - 11:20

LLM-State: Adaptive State Representation for Long-Horizon Task Planning in the Open World [Recording]

Prof. David Hsu | National University of Singapore

19:00 - 19:20

11:20 - 11:40

LLMs for System 1 Generalization [Recording]

Prof. Yuke Zhu | University of Texas at Austin

19:20 - 19:40

11:40 - 12:00

Panel: Bridging the Gap between Research & Industry
Moderator: Naoki Wake, Microsoft Research

Chris Paxton
| Hello Robot

Takafumi Watanabe
| Preferred Robotics Inc.

Dr. Mohit Shridhar
| Dyson Robot Learning Lab

Prof. Lerrel Pinto
| NYU Courant

19:40 - 20:00

12:00 - 12:20

Demo: a Chat with Kachaka, a Home Robot

Takafumi Watanabe, Kenichi Hidai
| Preferred Robotics Inc.

20:00 - 20:20

12:40 - 13:30

Lunch Break

50 min

20:40 - 21:30

13:30 - 13:50

Language as Bridge for Sim2Real [Recording]

Prof. Roberto Martín-Martín | University of Texas at Austin

21:30 - 21:50

13:50 - 14:10

Foundation Models of and for Navigation [Recording]

Dhruv Shah | University of California, Berkeley

21:50 - 22:10

14:15 - 14:35

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [Recording]

Dr. Ruohan Zhang | Stanford University

22:15 - 22:35

14:35 - 15:05

Spotlight Talks (six)

RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation
Ruihai Wu | Peking University
MOSAIC: A Modular System for Assistive and Interactive Cooking
Kushal Kedia | Cornell Univeristy
CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models
Haoxu Huang | Shanghai Qizhi Institute
Deploying and Evaluating LLMs to Program Service Mobile Robots
Zichao Hu | UT Austin
Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation<
Daniel Honerkamp | University of Freiburg
Language Models as Zero-Shot Trajectory Generators
Norman Di Palo | Imperial College London

[Recording]

22:35 - 23:05

                15:10 - 16:00
                
                  Coffee Break and a Longer Poster Session
                
                  1 Hour
                
                23:10 - 24:00

16:00 - 16:40

Debate: Is Large Foundation Models the most important research topic in the next 5 years? And various other questions.
Moderator: Nur Muhammad Mahi Shafiullah, New York University

Prof. Roberto Martín-Martín
| UT Austin

Dhruv Shah
| Berkeley

Ted Xiao
| Google DeepMind

Lin Shao
| NUS

GPT-4-o ^*
| Open AI

^*The organizers may or may not be serious about this special guest.

00:00 - 00:40

                16:40 - 16:55
                
                  Moderated Open Discussion: 
 What’s Down the Horizon? / The 1 Billion Dollar Proposal
                
                  All in-person attendees
                
                00:40 - 00:55

16:55 - 17:00

Best Paper Awards Ceremony and Closing Remarks

00:55 - 01:00

↑ Time (JST)

↑ Event

↑ Time (PDT)
(1 Day Earlier)

Call for Papers

We invite submissions including but not limited to the following topics:

Applications:

Integration of VLM/LLMs for manipulation and navigation
VLM/LLMs for perception/scene understanding/state estimation
VLM/LLMs for control/skill learning/motion generation
VLM/LLMs for decision-making/reasoning/planning
VLM/LLMs as world models
VLMs/LLMs for multimodal task specifications
VLMs/LLMs for human-robot/robot-robot interactions
VLMs/LLMs for scene and task generation

New Capabilities:

Open-vocabulary perception/navigation/manipulation
Commonsense reasoning with VLM/LLMs
Generalization to unseen object categories, environments, and tasks
Bootstrapping learning from scarce data
Natural language interaction with everyday users

Datasets/Benchmarks:

Internet-scale data for training robotics foundation models
Mobile manipulation benchmarks for VLM/LLM-based systems

Limitations:

Failure modes of VLM/LLMs
Robustness of VLM/LLMs
Certifiabilities of VLM/LLMs

Submissions should have up to 4 pages of technical content, with no limit on references/appendices. Submissions are suggested to follow the ICRA double-column format with the template available here. We encourage authors to upload videos, code, or data as supplementary material (due on the same day as the paper). Following the main conference, our workshop will use a single-blind review process. We welcome both unpublished, original contributions and recently published relevant works. Accepted papers will be presented as posters or orals and made public via the workshop’s OpenReview page with the authors’ consent. We strongly encourage at least one of the authors to present on-site during the workshop. Our workshop will feature a Best Paper Award.
Important Dates:

Submission portal opens: ~~January 29, 2024~~
Paper submission deadline: ~~March 11, Monday, 2024 (AoE)~~
Notification of acceptance: ~~March 29, 2024 (Results viewable on OpenReview)~~ April 1, 2024 (Announcing Spotlights)
Camera-ready deadline: ~~April 26, 2024~~
Workshop @ ICRA 2024: May 17, 2024

Chris Paxton FAIR, Meta	Fei Xia Google Deepmind	Karmesh Yadav Georgia Tech	Nur Muhammad Mahi Shafiullah New York University
Naoki Wake Microsoft Research	Weiyu Liu Stanford University	Yujin Tang Sakana AI	Zhutian Yang MIT, NVIDIA Research

Vision-Language Models for Navigation and Manipulation (VLMNM)

Full-day hybrid workshop at ICRA 2024, Room 315, Yokohama (Japan)

Friday, May 17, 9 am - 5 pm (JST)

Recordings of invited talks can be found in our YouTube Channel

Final Schedule

Location

FAQ

Call for Papers

Organizers

Contact

Access Map