Exploration Without Maps
Simulation and Real-World Results
Methodology for Training and Deployment
The DRL model was trained in simulation in a constrained racetrack at physical limits and transferred zero-shot to the real-world for out-of-distribution generalization to new racetrack layouts, exploration in unstructured terrain and dynamic obstacle avoidance.
Operation of Autonomous Mobile Robots (AMRs) of all forms that include wheeled ground vehicles, quadrupeds and humanoids in dynamic real-world environments without prior maps that may be GPS denied such as caves and lave tubes on Mars, unknown buildings, contested military regions and areas effected by natural disasters such as fire or earthquake is an unsolved problem, that has potential to transform the economy, and vastly improve humanity’s capabilities with improvements to agriculture, manufacturing, disaster response, military and extraterrestrial planetary exploration.
Conventional AMR automation approaches are modularized into perception, motion planning and control which is computationally inefficient, and requires explicit feature extraction and engineering, that inhibits generalization, and deployment at scale. Learning representations for end-to-end AMR navigation has a number of benefits that include computation efficiency and generalization across tasks, however it is challenging due to the large training sample requirement, difficulty in converging to useful policies and accurately bridging the simulation to reality gap to leverage accelerated training in simulation [1].
Towards realizing the benefits of end-to-end Artificial Intelligence (AI) models for AMR navigation, a novel Deep Reinforcement Learning (DRL) method was developed [2], trained in simulation in a constrained racetrack environment at physical limits, for cognitive end-to-end navigation in new, unknown environments without GPS exclusively using onboard sensors transferred zero-shot with no additional training in the real-world. Simulation [3] and hardware [4] platforms developed in-house using open-source tools were used for training and evaluation.
Training was performed in a handcrafted multidirectional racetrack with varying cornering radii, using an Intel Core i9 13900KF CPU and NVIDIA GeForce RTX 4090 GPU for 20,000,000 steps that corresponded to 15,747 training episodes, accelerated at 30 times real-time speed, in 48 wall-clock hours. The representation learned in a compact parameter space with 2 fully connected layers with 64 nodes each demonstrated emergent behavior for out-of-distribution generalization to navigation in new environments without maps that include unstructured forest terrain, and dynamic obstacle avoidance. The learned policy outperforms conventional navigation algorithms while consuming a fraction of the computation resources, enabling execution on a range of AMR forms with varying embedded computer payloads.
References
[1] S. Sivashangaran and A. Eskandarian, “Deep Reinforcement Learning for Autonomous Ground Vehicle Exploration Without A-Priori Maps," Advances in Artificial Intelligence and Machine Learning, 3 (2), 1198-1219, Jun. 2023. (Link) (Preprint)
[2] S. Sivashangaran, A. Khairnar and A. Eskandarian, “Exploration Without Maps via Zero-Shot Out-of-Distribution Deep Reinforcement Learning,” arXiv preprint arXiv:2402.05066, Feb. 2024. (Link)
[3] S. Sivashangaran, A. Khairnar and A. Eskandarian, “AutoVRL: A High Fidelity Autonomous Ground Vehicle Simulator for Sim-to-Real Deep Reinforcement Learning," IFAC-PapersOnLine, vol. 56, no. 3, pp. 475-480, Dec. 2023. (Link) (Preprint)
[4] S. Sivashangaran and A. Eskandarian, “XTENTH-CAR: A Proportionally Scaled Experimental Vehicle Platform for Connected Autonomy and All-Terrain Research," Proceedings of the ASME 2023 International Mechanical Engineering Congress and Exposition. Volume 6: Dynamics, Vibration, and Control. New Orleans, LA, USA, Oct. 29–Nov. 2, 2023. V006T07A068. American Society of Mechanical Engineers. (Link) (Preprint)