cat README.md
# Monocular Visual Odometry
A from-scratch **monocular visual odometry (VO)** pipeline that recovers a
camera's trajectory from a single image sequence — no stereo, no depth sensor.
Built for the *Vision Algorithms for Mobile Robotics* course at UZH Zürich and
evaluated on **KITTI, Malaga, Parking, and a custom handheld recording**.
## Why
Monocular VO is hard: with one camera you get bearings but no absolute scale,
and every estimate drifts. The goal was a single pipeline robust enough to span
wildly different motion regimes — slow handheld curves, high-speed highway
driving, low-texture parking lots — through a continuous landmark-management
strategy and per-dataset parameter tuning, rather than one brittle setting.
## How
- **Initialization** — Shi-Tomasi corners tracked across two bootstrap frames,
Essential Matrix via RANSAC, pose recovery (R, t), and linear triangulation
of the first 3D landmarks
- **Feature tracking** — pyramidal Lucas-Kanade (KLT) optical flow with
**forward-backward consistency** checks and **Median Absolute Deviation (MAD)**
filtering to reject dynamic objects and bad tracks
- **Pose estimation** — **two-stage PnP-RANSAC**: a strict pass, a loose
recovery pass for motion-blurred frames, and a constant-velocity fallback,
plus "survival logic" that keeps all low-reprojection-error landmarks (not just
RANSAC inliers) to stop the map from flickering
- **Landmark management** — a candidate pool continuously triangulated and
promoted on baseline / bearing-angle / depth criteria, with a "panic mode"
that aggressively re-detects features under starvation
- **Calibration** — checkerboard intrinsics extracted from a calibration video
to support a self-recorded smartphone dataset (4K/60fps → downsampled)
- **Visualization** — a live dashboard: tracked features overlaid on the frame,
landmark-count history, full XY trajectory, and a top-down local map
## Results
The pipeline produces consistent local trajectories across all four datasets and
recovers quickly from challenging frames (motion blur on KITTI, lens flare on
Malaga) thanks to continuous triangulation. As expected for monocular VO, scale
drift remains — most visibly along Z on the Parking sequence — pointing toward
loop closure and IMU/GPS scale recovery as next steps.
## Stack
- **Geometry & vision** — OpenCV (KLT, Essential Matrix, solvePnPRansac,
triangulation), NumPy
- **Datasets** — KITTI, Malaga, Parking, and a custom handheld sequence, each
with automatic dataset-specific parameter tuning
- **Tooling** — uv / conda environments, Matplotlib visualization, Typst report
ls -l
Python, OpenCV, NumPy, Matplotlib, Typst, uv