This intermediate course equips computer vision specialists and roboticists to design, implement, and deploy full-stack perception systems for autonomous robots and vehicles. You will master the pipeline from image formation and camera modeling to real-time inference, multi-sensor fusion, and robust decision-making under uncertainty. Blending classical vision with modern deep learning, the course emphasizes the practical constraints of robotics: latency, reliability, safety, and deployment on edge hardware.
Designed for perception engineers, roboticists building autonomous systems, autonomous vehicle developers, ML engineers specializing in robotic vision, and systems architects, the curriculum assumes strong Python skills and working knowledge of neural networks. Using OpenCV and PyTorch as core tools—along with TensorRT, TensorFlow Lite, ONNX Runtime, and NVIDIA Jetson for deployment—you will build detection, segmentation, tracking, pose estimation, and 3D perception components that integrate seamlessly with robotic control loops.
Across eight sections, you will: (1) ground your understanding in camera geometry, color spaces, convolutional filtering, and feature detection; (2) implement object detection pipelines spanning classical methods to YOLO, Faster R-CNN, and SSD MobileNet; (3) engineer semantic, instance, and panoptic segmentation systems with encoder–decoder architectures and real-time constraints; (4) estimate human and object pose, including 6DoF localization, bundle adjustment, and uncertainty quantification; (5) build 3D perception via stereo, epipolar geometry, monocular depth, LiDAR, and camera–LiDAR fusion; (6) design multi-object tracking with Kalman filters, optical flow, SORT/DeepSORT, and trajectory prediction; (7) optimize models for embedded processors through quantization, pruning, distillation, NAS, and runtime selection; and (8) integrate perception with control for visual servoing, manipulation, navigation, SLAM, AV perception stacks, HRI, failure recovery, and end-to-end testing.
Hands-on labs (suggested) prioritize reproducible, real-time solutions: profiling latency, measuring mAP and IoU for detection/segmentation, benchmarking tracking identity switches, and validating pose accuracy with calibration targets and synthetic data. You will leverage uncertainty-aware outputs and fail-safe triggers to meet safety requirements, and you will practice domain adaptation techniques for challenging conditions such as motion blur, low light, occlusion, and sensor dropouts.
By the end of the course, you will be able to design robust perception stacks that power autonomous navigation, object manipulation, and human–robot interaction. You will understand when to favor classical methods over deep learning (and vice versa), how to fuse complementary sensors, how to tune models for edge deployment, and how to validate performance with principled metrics, datasets, and test protocols. Graduates will be ready to ship production-grade perception for robots and autonomous systems operating in the real world.
