Robotics AI Will Be Built on Specialized Data, Programs Building That Data Now Have a Real Advantage

Investment in embodied AI and robotics surged to over $1 billion annually by 2024, with specialized training data becoming the primary bottleneck, as foundation models struggle to adapt from simulated or internet-based datasets to real-world physical tasks. Experts warn that millions of hours of annotated, multi-sensor datasets—captured under deployment conditions like variable lighting or material deformation—are needed to bridge this gap, and synthetic data alone cannot fully replicate real-world variability.
Investment in general-purpose robotics and embodied AI grew fivefold between 2022 and 2024, exceeding $1 billion annually, according to McKinsey’s report on embodied AI. While hardware limitations like sensor costs and computational power once hindered progress, the new barrier is training data—robots require billions of physical interaction examples to function reliably in real-world environments like warehouses or hospitals, unlike language or vision models that rely on internet-scale datasets. The lack of standardized, large-scale robotics datasets is a critical challenge, as embodied AI cannot scrape data like other AI fields. Steve Nemzer, Senior Director of AI Research at TELUS Digital, notes that only a fraction of the required data exists today, and millions of hours of annotated, multi-sensor datasets—captured from a robot’s perspective—are needed. These datasets must include synchronized inputs like cameras, LiDAR, radar, touch, and audio to handle tasks such as manipulating sheet metal or navigating occluded spaces. Researchers from 21 institutions recently pooled data across 22 robot platforms, covering 527 skills and 160,266 tasks, to develop the RT-X model. While cross-platform training proved effective, the dataset still represents a small fraction of what production deployments demand. Nemzer emphasizes that synthetic data can supplement gaps but cannot replace real-world data, which is essential for teaching robots to handle sensor artifacts or adversarial conditions. The ‘specialized’ data in robotics differs from other AI fields by requiring egocentric, multi-sensor inputs—including force and torque feedback—for precise tasks like plugging cables or peeling labels. Unlike web-based datasets, robotics data must account for real-world variables such as lighting changes, partial occlusion, and unpredictable material interactions. McKinsey’s report highlights that foundation models trained on internet data fail when applied to physical tasks, underscoring the need for deployment-specific datasets to ensure reliability.
This content was automatically generated and/or translated by AI. It may contain inaccuracies. Please refer to the original sources for verification.