AI dhe Robotik do të ndërtohen në bazë të të dhënave specializuara, ndërsa programet që ndërtojnë këto të dhëna kanë një avantazh të vërtetë

Investment in general-purpose robotics and embodied AI grew fivefold between 2022 and 2024, exceeding $1 billion annually, according to McKinsey’s report on embodied AI. While hardware limitations like sensor costs and computational power once hindered progress, the new barrier is training data—robots require billions of physical interaction examples to function reliably in real-world environments like warehouses or hospitals, unlike language or vision models that rely on internet-scale datasets. The lack of standardized, large-scale robotics datasets is a critical challenge, as embodied AI cannot scrape data like other AI fields. Steve Nemzer, Senior Director of AI Research at TELUS Digital, notes that only a fraction of the required data exists today, and millions of hours of annotated, multi-sensor datasets—captured from a robot’s perspective—are needed. These datasets must include synchronized inputs like cameras, LiDAR, radar, touch, and audio to handle tasks such as manipulating sheet metal or navigating occluded spaces. Researchers from 21 institutions recently pooled data across 22 robot platforms, covering 527 skills and 160,266 tasks, to develop the RT-X model. While cross-platform training proved effective, the dataset still represents a small fraction of what production deployments demand. Nemzer emphasizes that synthetic data can supplement gaps but cannot replace real-world data, which is essential for teaching robots to handle sensor artifacts or adversarial conditions. The ‘specialized’ data in robotics differs from other AI fields by requiring egocentric, multi-sensor inputs—including force and torque feedback—for precise tasks like plugging cables or peeling labels. Unlike web-based datasets, robotics data must account for real-world variables such as lighting changes, partial occlusion, and unpredictable material interactions. McKinsey’s report highlights that foundation models trained on internet data fail when applied to physical tasks, underscoring the need for deployment-specific datasets to ensure reliability.

Robotics AI Will Be Built on Specialized Data, Programs Building That Data Now Have a Real Advantage

Comments (0)