ECCV 2022 Tutorial on "Localization and Mapping for Augmented Reality"

October 24th 2022, 14:00-18:00

ECCV 2022, Tel Aviv, Israel, room D&S

This tutorial covers the task of large-scale localization and mapping for Augmented Reality (AR). Placing virtual content in the physical 3D world, persisting it over time, and sharing it with other users are typical scenarios for AR. In order to reliably overlay virtual content in the real world with pixel-level precision, these scenarios require AR devices to accurately determine their pose (3D position and orientation), at any point in time. While visual localization and mapping is one of the most studied problems in computer vision, its use for AR entails specific challenges and opportunities: devices capture temporal streams from multiple sensors besides cameras, they exhibit specific motion patterns, and they provide data crowdsourced from multiple users and device types, which can be mined for building large-scale maps.

As the academic community has been mainly driven by benchmarks that are disconnected from the specifics of AR, we are introducing the large-scale LaMAR dataset captured using hand-held and head-mounted AR devices in diverse and changing environments with various sensor modalities and accurate ground truth poses. The dataset covers several locations of interest to AR applications such as university buildings, office spaces, and a historical city center. This tutorial will be a first hands-on experience with the new dataset, serving as the basis for a future benchmark providing novel insights for research and tracking the progress of localization and mapping for AR.

In contrast to existing benchmarks which mostly provide single image / rig localization queries, we focus on the specificities of AR devices. We consider the use of radios (WiFi / bluetooth signals) to improve image retrieval, as well as sequence localization which exploits the temporal aspect of sensor streams to estimate a more accurate pose. This corresponds to the natural scenario of a user launching an AR application streaming sensorial data and trying to localize with respect to a previously built map. Finally, we propose a novel evaluation metric for localization that better correlates with the quality of a user’s experience in the context of AR.

To obtain the ground truth, we developed a pipeline for accurate automatic registration of AR trajectories against large 3D laser scans. The proposed pipeline does not require any manual labeling or setup of custom infrastructure (e.g., fiducial markers). The system robustly handles crowd-sourced data from heterogeneous devices captured over longer periods of time and can be easily extended to support future devices.

Schedule & material

  1. Introduction and motivation [slides]
    1. Introduction to visual localization and mapping
    2. Augmented Reality systems: constraints and opportunities
    3. Benchmarking, existing datasets, and their limitations
  2. The LaMAR dataset [slides]
    1. Overview: scale, modalities, comparison to previous datasets, creation process
    2. Ground truth generation
    3. Use cases: localization & mapping and beyond

Coffee break [15min]

  1. Benchmarking localization and mapping [slides]
    1. Approaches to multimodal localization: multi-camera, radios, sequences
    2. Metrics and evaluation tracks
    3. Analysis of the results, limitations, open problems
  2. Practical guide and conclusion [slides]
    1. How to run your own approach: code walk-through with the SDK
    2. Details on the data release: data format, modalities, etc.
    3. Open source tools
    4. Conclusion: lessons learned


Paul-Edouard Sarlin

Johannes L. Schönberger

Ondrej Miksik

ETH Zurich

Mihai Dusmanu

Viktor Larsson

Marc Pollefeys

ETH Zurich
Lund University
Microsoft & ETH Zurich