Vision-Language Models for Navigation and Manipulation (VLMNM)


Full-day hybrid workshop at ICRA 2024

Friday, May 17, 2024, Japan Standard Time (JST), Yokohama (Japan)


Room number will be announced 1-2 weeks before the conference starts.



Introduction


With the rising capabilities of LLMs and VLMs, the past two years have seen a surge in research work using VLMs for navigation and manipulation. Fusing the capabilities of visual interpretation with natural language processing, these models are poised to redefine how robotic systems interact with both their environment and human counterparts. The relevance of this topic cannot be overstated; as the frontier of human-robot interaction expands, so does the necessity for robots to comprehend and operate within complex environments using naturalistic instructions. Our workshop will not only reflect the state-of-the-art advancements in this domain, by featuring a diverse set of speakers, from senior academics to researchers in early careers, from industry researchers to companies producing mobile manipulation platforms, from researchers who are enthusiastic about using VLMs for robotics to those who have reservations about it. We aim for this event to be a catalyst for originality and diversity at ICRA 2024. We believe that, amidst a sea of workshops, ours will provide unique perspectives that will push the boundaries of what's achievable in robot navigation and manipulation.

In this workshop, we plan to discuss: All accepted workshop papers: OpenReview. Please bring a poster. We will have double-sided A0 portrait poster boards.

FAQ


Are you going to record the talks and post them later on YouTube?
We’re going to post the talks of speakers who permit us to on YouTube. But we will NOT post the recordings of the panel discussion, the debate, or the open discussion at the end.
Can I present remotely if my paper is accepted as a poster or a spotlight talk?
We will play a pre-recording of your spotlight talk and we will strongly encourage you to find friends to help present the poster in person.

Final Schedule


Time (JST) Event Description Time (PDT)
(1 Day Earlier)
8:30 - 8:50 Coffee and Pasteries
Poster presenters set up posters
15:30 - 15:50
8:50 - 9:00 Introduction by the Organizing Committee 15:50 - 16:00
9:00 - 9:20 LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

Prof. Subbarao Kambhampati | Arizona State University
16:00 - 16:20
9:20 - 9:40 LLM-based Task and Motion Planning for Robots

Prof. Chuchu Fan | Massachusetts Institute of Technology
16:20 - 16:40
9:40 - 10:00 BEHAVIOR Benchmark

Dr. Ruohan Zhang | Ruohan Zhang
16:40 - 17:00
10:00 - 10:30 Coffee Break and Poster Session 30 Mins 17:00 - 17:30
10:30 - 10:50 LLM-State: Adaptive State Representation for Long-Horizon Task Planning in the Open World

Prof. David Hsu | National University of Singapore
17:30 - 17:50
10:50 - 11:10 LLM for Task Generation and Feedback

Prof. Yuke Zhu | University of Texas at Austin
17:50 - 18:10
11:10 - 11:30 Real robot demo

Takafumi Watanabe | Preferred Networks Robotics
18:10 - 18:30
11:30 - 12:00 Panel: Bridging the Gap between Research & Industry
Moderator: Naoki Wake, Microsoft Research

Chris Paxton
| Hello Robot

Takafumi Watanabe
| Preferred Networks Robotics

Dr. Mohit Shridhar
| Dyson Robot Learning Lab
18:30 - 19:00
12:00 - 13:30 Lunch Break 1.5 Hours 19:00 - 20:30
13:30 - 13:50 Language as Bridge for Sim2Real

Prof. Roberto Martín-Martín | University of Texas at Austin
20:30 - 20:50
13:50 - 14:10 Foundation Models of and for Navigation

Dhruv Shah | University of California, Berkeley
20:50 - 21:10
14:10 - 14:30 Mobile Manipulation, Multi-Agent Coordination, Long Horizon Tasks

Prof. Jeannette Bohg | Stanford University
21:10 - 21:30
14:30 - 15:00 Spotlight Talks (six)
  • RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation
  • MOSAIC: A Modular System for Assistive and Interactive Cooking
  • CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models
  • Deploying and Evaluating LLMs to Program Service Mobile Robots
  • Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation
  • Language Models as Zero-Shot Trajectory Generators
21:30 - 22:00
15:00 - 16:00 Coffee Break and a Longer Poster Session 60 Mins 22:00 - 23:00
16:00 - 16:40 Debate: various topics
Panelists: To be announced
Moderator: Nur Muhammad Mahi Shafiullah, New York University
23:00 - 23:40
16:40 - 16:55 Moderated Open Discussion: What’s Down the Horizon? / The 1 Billion Dollar Proposal All in-person attendees 23:40 - 23:55
16:55 - 17:00 Best Paper Awards Ceremony and Closing Remarks by the Organizing Committee 23:55 - 00:00
↑ Time (JST) ↑ Event ↑ Description ↑ Time (PDT)
(1 Day Earlier)

Call for Papers


We invite submissions including but not limited to the following topics: Submissions should have up to 4 pages of technical content, with no limit on references/appendices. Submissions are suggested to follow the ICRA double-column format with the template available here. We encourage authors to upload videos, code, or data as supplementary material (due on the same day as the paper). Following the main conference, our workshop will use a single-blind review process. We welcome both unpublished, original contributions and recently published relevant works. Accepted papers will be presented as posters or orals and made public via the workshop’s OpenReview page with the authors’ consent. We strongly encourage at least one of the authors to present on-site during the workshop. Our workshop will feature a Best Paper Award.
Important Dates:

Organizers




Chris Paxton
FAIR, Meta


Fei Xia
Google Deepmind


Karmesh Yadav
Georgia Tech


Nur Muhammad Mahi Shafiullah
New York University


Naoki Wake
Microsoft Research


Weiyu Liu
Stanford University


Yujin Tang
Sakana AI


Zhutian Yang
MIT, NVIDIA Research

Contact


For further information or questions, please contact vlm-navigation-manipulation-workshop [AT] googlegroups [DOT] com

Access Map