Monocular Visual Odometry — Leonard Wagner

cat README.md

# Monocular Visual Odometry A from-scratch **monocular visual odometry (VO)** pipeline that recovers a camera's trajectory from a single image sequence — no stereo, no depth sensor. Built for the *Vision Algorithms for Mobile Robotics* course at UZH Zürich and evaluated on **KITTI, Malaga, Parking, and a custom handheld recording**. ## Why Monocular VO is hard: with one camera you get bearings but no absolute scale, and every estimate drifts. The goal was a single pipeline robust enough to span wildly different motion regimes — slow handheld curves, high-speed highway driving, low-texture parking lots — through a continuous landmark-management strategy and per-dataset parameter tuning, rather than one brittle setting. ## How - **Initialization** — Shi-Tomasi corners tracked across two bootstrap frames, Essential Matrix via RANSAC, pose recovery (R, t), and linear triangulation of the first 3D landmarks - **Feature tracking** — pyramidal Lucas-Kanade (KLT) optical flow with **forward-backward consistency** checks and **Median Absolute Deviation (MAD)** filtering to reject dynamic objects and bad tracks - **Pose estimation** — **two-stage PnP-RANSAC**: a strict pass, a loose recovery pass for motion-blurred frames, and a constant-velocity fallback, plus "survival logic" that keeps all low-reprojection-error landmarks (not just RANSAC inliers) to stop the map from flickering - **Landmark management** — a candidate pool continuously triangulated and promoted on baseline / bearing-angle / depth criteria, with a "panic mode" that aggressively re-detects features under starvation - **Calibration** — checkerboard intrinsics extracted from a calibration video to support a self-recorded smartphone dataset (4K/60fps → downsampled) - **Visualization** — a live dashboard: tracked features overlaid on the frame, landmark-count history, full XY trajectory, and a top-down local map ## Results The pipeline produces consistent local trajectories across all four datasets and recovers quickly from challenging frames (motion blur on KITTI, lens flare on Malaga) thanks to continuous triangulation. As expected for monocular VO, scale drift remains — most visibly along Z on the Parking sequence — pointing toward loop closure and IMU/GPS scale recovery as next steps. ## Stack - **Geometry & vision** — OpenCV (KLT, Essential Matrix, solvePnPRansac, triangulation), NumPy - **Datasets** — KITTI, Malaga, Parking, and a custom handheld sequence, each with automatic dataset-specific parameter tuning - **Tooling** — uv / conda environments, Matplotlib visualization, Typst report

ls -l

tech Python, OpenCV, NumPy, Matplotlib, Typst, uv