|dc.description.abstract||In recent years, advances in autonomous robotics have begun to transform how we work and live. Unmanned Aerial Vehicles (UAV) and Unmanned Ground Vehicles are helping us to deliver goods, conduct surveys of construction sites, and perform search and rescue alongside first responders. However, designing robots with this level of autonomy is often challenging due to the complexity of the real-world environment. Multi-sensory perception is a critical component to address this challenge and develop robust autonomous robotic systems. By combining multiple inputs from sensors, the system can eliminate a single point of failure from sensor degradation and generate new insights to make better decisions integrating information from dierent sensor modalities. Recent breakthroughs in Machine Learning, especially the Deep Neural Network(DNN) based deep learning perception pipelines have been proven effective in a number of robot perception tasks. However, the significant computation cost for Deep Neural Networks is prohibiting their deployment on a robot system with limited power budget and real-time performance requirement. It is important to bridge this gap by optimization to deploy state-of-the-art machine learning models to a real-world robot systems. This work investigates the viability to develop robust multi-sensory robot perception systems enhanced by machine learning models in three different chapters. First, I explore the effectiveness of DNN perception pipelines in object detection and semantic segmentation tasks, then experiment on various model optimization techniques to enhance the efficiency of these perception models, achieving real-time performance on robot system with a limited power budget. Then I elucidate the design and implementation of a thermal sensing robot system that performs sensor fusion of a thermal camera and an RGB-Depth Camera to automatically track occupants in a building, measuring their forehead temperature, providing fine-grain information for better decision making in intelligent Air Conditioning (AC) system. Finally, I explore camera pose estimation using rectangular to spherical image matching, enabling a robot to quickly grasp a scene with spherical camera, and allow other robots to localize themselves within the scene by matching rectangular sensor images to the spherical image.