Marker Based Visual Localization

Goal of the Project

The goal of this project is to estimate the 2D pose (x, y, yaw) of a mobile robot using AprilTag-based visual localization.

The robot must:

Detect AprilTags using its onboard camera.
Compute the relative pose between the camera and each detected tag using PnP.
Transform this pose into the global reference frame using the known tag positions.
Filter unstable visual measurements caused by noise.
Maintain a continuous pose estimate using odometry when no tags are visible.

The result is a localization system capable of correcting odometry drift using vision while remaining stable during temporary visual loss.

Project Overview

The localization implemented in this project follows a vision-first approach with odometry fallback.

Whenever a tag is detected, the robot estimates its pose using a full geometric transformation chain. If visual data is not available, the robot propagates the last valid visual pose using odometry increments.

The main challenges addressed are:

Correct handling of coordinate frames.
Robust pose estimation using PnP.
Noise filtering to avoid large localization jumps.

AprilTag Detection

AprilTags are markers that encode a unique ID and allow accurate pose estimation from a single camera.

Each control cycle:

The robot captures an image from the camera.
The image is converted to grayscale.
The AprilTag detector extracts tag IDs and the pixel coordinates of the four tag corners.

The detected tag borders are drawn on the image and displayed in the WebGUI to verify detection quality and orientation.

Camera Model and PnP Pose Estimation

To compute the relative pose between the camera and a detected tag, I use Perspective-n-Point (PnP).

The tag is modeled as a planar object with known dimensions (0.24 × 0.24 m). Its four corner points are defined in the tag reference frame (Z = 0). These 3D points are matched with their corresponding 2D image projections.

Using the camera matrix and assuming zero distortion, OpenCV’s "solvePnp" function computes:

A rotation vector rvec.
A translation vector tvec.

These describe the pose of the tag with respect to the camera frame.

The SOLVEPNP_IPPE_SQUARE method is used because it is optimized for square planar markers and provides stable results for AprilTags.

Homogeneous Transformations

The rotation vector returned by PnP is converted into a rotation matrix using Rodrigues formula. The rotation and translation are then combined into a homogeneous transformation matrix.

From PnP, the transformation obtained is:

Camera -> Tag

To compute the robot pose, this transformation is inverted to obtain:

Tag -> Camera

Homogeneous matrices are used throughout the system to allow chaining of rotations and translations in a consistent way.

Coordinate Frame Alignment and Fixed Rotations

The coordinate frame returned by OpenCV does not match the robot’s coordinate frame.

Specifically:

OpenCV uses a camera frame where Z points forward, X to the right, and Y downward.
The robot frame assumes X forward, Y left, and Z upward.

To align these frames correctly, two fixed rotations are applied:

A rotation around the X axis to correct the vertical axis orientation.

A rotation around the Z axis to correct the horizontal axis alignment.

These rotations are applied in a specific order using matrix multiplication, ensuring the camera pose is properly expressed in the robot reference frame.

Without this alignment, the estimated yaw and position would be inconsistent with the robot’s real motion.

Transformation to World Coordinates

Each AprilTag has a known pose in the world frame, loaded from a YAML configuration file.

For a detected tag:

A world-to-tag transformation is built using the tag’s global position and yaw.
This transformation is multiplied with the tag-to-camera transformation to obtain:

World -> Camera

From this final matrix:

The robot position is extracted from the translation component.
The robot yaw is computed from the forward axis of the rotation matrix.

This yields a complete (x, y, yaw) estimate in world coordinates.

Visual Pose Selection and Noise Filtering

When multiple tags are visible, the detection with the smallest camera distance is selected, as it generally provides better accuracy.

However, visual pose estimation is inherently noisy, especially at larger distances. To address this, I implemented a jump rejection mechanism:

A maximum allowed position change is defined.
If the new visual pose differs too much from the previous one and the tag is far away, the measurement is discarded.

This prevents sudden unrealistic jumps while still allowing the system to correct accumulated drift when the robot legitimately moves.

Odometry-Based Pose Propagation

When no valid visual measurements are available, the robot continues estimating its pose using odometry.

The approach is incremental:

The odometry pose at the last visual update is stored.
Current odometry is compared against it to compute relative motion.
This relative motion is applied to the last visual pose.

This ensures smooth pose continuity and allows the robot to recover seamlessly once a tag becomes visible again.

Robot Motion and Safety

To test localization under motion, the robot moves forward at a constant speed.

A simple laser-based safety check is implemented:

If an obstacle is detected in front of the robot, forward motion stops.
The robot rotates until the path is clear.

This allows continuous localization testing while avoiding collisions.

Problems and Solutions

Unstable visual estimates at long distances

Far tags produced noisy pose estimates.

Solution: introduce distance-based filtering and reject large pose jumps.

Pose discontinuities when switching from odometry to vision

Sudden jumps occurred when visual data became available again.

Solution: compare new visual estimates with the last valid pose and discard inconsistent measurements.

ROBOTICA DE SERVICIOS

domingo, 28 de diciembre de 2025

Marker Visual Loc