MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Speaker

Linxi “Jim” Fan is a research scientist at NVIDIA AI. His mission is to build embodied general intelligence. To tackle this grand challenge, his research efforts span foundation models, policy learning, robotics, multimodal learning, and large-scale systems. His latest work “MineDojo” won the Outstanding Paper Award at NeurIPS 2022. He obtained his Ph.D. degree in Computer Science from Stanford University, advised by Prof. Fei-Fei Li. Previously, Jim did research internships at OpenAI, Google AI, and MILA-Quebec AI Institute. He graduated summa cum laude with a Bachelor’s degree in Computer Science from Columbia University. Jim was the Valedictorian of Class 2016 and a recipient of the Illig Medal at Columbia.

Homepage: https://jimfan.me

Abstract

Autonomous agents have made great strides in specialist domains like Atari games and Go. However, they typically learn tabula rasa in isolated environments with limited objectives, thus failing to generalize across a wide spectrum of tasks and capabilities. Inspired by how humans continually learn and adapt in the open world, we advocate a trinity of ingredients for building generalist agents: 1) an environment that supports an infinite variety of tasks and goals, 2) a large-scale database of multimodal knowledge, and 3) a flexible and scalable agent architecture. We introduce MineDojo, a new framework built on the popular Minecraft game that features a simulation suite with 1000s of diverse open-ended tasks and an internet-scale knowledge base with YouTube videos, Wiki pages, and Reddit posts. Using MineDojo’s data, we propose a novel agent capable of solving Minecraft tasks specified in free-form language without any manually designed reward. We look forward to seeing how MineDojo empowers the community to make progress on the grand challenge of open-ended agent learning.

Video