One doc tagged with "multimodal"

Vision Language Models for Robotics

Vision Language Models (VLAs) combine visual and linguistic understanding to enable robots to understand and execute natural language instructions.