Publications | STARS Lab

The majority of our publications are available from open repositories, primarily arXiv. We also maintain a blog/site with additional details about some of our publications – see https://papers.starslab.ca/ or follow the ‘site’ links below. If you would like to request an electronic copy of a journal article or conference paper, please contact us.

2025

Watch Your Steps: Local Image and Scene Editing by Text Instructions

A. Mirzaei, T. Aumentado-Armstrong, M. A. Brubaker, J. Kelly, A. Levinshtein, K. G. Derpanis, and I. Gilitschenski

in Computer Vision — ECCV 2024 , A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, and Gül. Varol, Eds., Cham, Switzerland: Springer Nature Switzerland, 2025, vol. 15096, pp. 111-129.

DOI | Bibtex | Abstract

@incollection{2024_Mirzaei_Watch,
  abstract = {The success of denoising diffusion models in generating and editing images has sparked interest in using diffusion models for editing 3D scenes represented via neural radiance fields (NeRFs). However, current 3D editing methods lack a way to both pinpoint the edit location and limit changes to the desired volumetric region. Consequently, these methods often over-edit, altering irrelevant parts of the scene. We introduce a new task, 3D edit localization, to automatically identify the relevant region for an editing task and restrict the edit accordingly. To achieve this goal, we initially tackle 2D edit localization, and then lift it to multiple views to address the 3D localization challenge. For 2D localization, we leverage InstructPix2Pix (IP2P) and identify the discrepancy between IP2P predictions with and without the instruction. We refer to this discrepancy as the relevance map. The relevance map conveys the importance of changing each pixel to achieve an edit, and guides downstream modifications, ensuring that pixels irrelevant to the edit remain unchanged. With the relevance maps of multiview posed images, we can define the relevance field, defining the 3D region within which modifications should be made. This enables us to improve the quality of text-guided 3D NeRF scene editing, by performing iterative updates on the training views, guided by renders from the relevance field. Our method achieves state-of-the-art performance on both NeRF and image editing tasks. We will make the code available.},
  address = {Cham, Switzerland},
  author = {Ashkan Mirzaei and Tristan Aumentado-Armstrong and Marcus A. Brubaker and Jonathan Kelly and Alex Levinshtein and Konstantinos G. Derpanis and Igor Gilitschenski},
  booktitle = {Computer Vision -- ECCV 2024},
  doi = {10.1007/978-3-031-72920-1_7},
  editor = {Ale{\v{s}} Leonardis and Elisa Ricci and Stefan Roth and Olga Russakovsky and Torsten Sattler and G\"{u}l Varol},
  isbn = {978-3-031-72920-1},
  pages = {111--129},
  publisher = {Springer Nature Switzerland},
  series = {Lecture Notes in Computer Science},
  title = {Watch Your Steps: Local Image and Scene Editing by Text Instructions},
  volume = {15096},
  year = {2025}
}

The success of denoising diffusion models in generating and editing images has sparked interest in using diffusion models for editing 3D scenes represented via neural radiance fields (NeRFs). However, current 3D editing methods lack a way to both pinpoint the edit location and limit changes to the desired volumetric region. Consequently, these methods often over-edit, altering irrelevant parts of the scene. We introduce a new task, 3D edit localization, to automatically identify the relevant region for an editing task and restrict the edit accordingly. To achieve this goal, we initially tackle 2D edit localization, and then lift it to multiple views to address the 3D localization challenge. For 2D localization, we leverage InstructPix2Pix (IP2P) and identify the discrepancy between IP2P predictions with and without the instruction. We refer to this discrepancy as the relevance map. The relevance map conveys the importance of changing each pixel to achieve an edit, and guides downstream modifications, ensuring that pixels irrelevant to the edit remain unchanged. With the relevance maps of multiview posed images, we can define the relevance field, defining the 3D region within which modifications should be made. This enables us to improve the quality of text-guided 3D NeRF scene editing, by performing iterative updates on the training views, guided by renders from the relevance field. Our method achieves state-of-the-art performance on both NeRF and image editing tasks. We will make the code available.

Addressing Distribution Shift in Robotic Imitation Learning

T. Ablett

PhD Thesis , University of Toronto, Toronto, Ontario, Canada, 2025.

Bibtex | Abstract | PDF

@phdthesis{2025_Ablett_Addressing,
  abstract = {The widespread use of robotic systems remains out of reach for a variety of reasons. Even assuming perfect sensing and modelling, determining the optimal actions to perform to complete tasks is challenging, particularly as task complexity rises. An appealing alternative to explicit modelling is imitation learning (IL), where control policies are learned directly from expert data containing raw observations and actions. Unfortunately, IL is subject to distribution shift: when a robot encounters data not adequately covered by its training dataset, it can fail, sometimes catastrophically. This thesis presents approaches to resolve distribution shift in robotic IL. The first two methods presented directly modify the training dataset as it is collected: one allows better coverage of the full distribution encountered in a mobile manipulation setting, and the next modifies individual trajectories to address an inevitable shift resulting from the use of kinesthetic teaching, a common approach to generating
demonstrations. Our third approach directly detects distribution shift and allows the expert to intervene and provide corrective data in response. Finally, we investigate inverse reinforcement learning (IRL), which ostensibly resolves distribution shift, but in practice can suffer from deceptive learned rewards causing locally maximal but globally poor behaviour. In our fourth approach, we learn policies from demonstrations of auxiliary tasks, in addition to the main task, to break out of these local maxima in IRL. Our fifth approach extends this exploration improvement to the more difficult case of merely having examples of completed tasks and auxiliary tasks, improving performance while simultaneously significantly reducing the collection burden on the expert. These approaches are demonstrated on both simulated and real robotic manipulators, and we provide open source code to reproduce and build upon our results.},
  address = {Toronto, Ontario, Canada},
  author = {Trevor Ablett},
  institution = {University of Toronto},
  month = {June},
  school = {University of Toronto},
  title = {Addressing Distribution Shift in Robotic Imitation Learning},
  year = {2025}
}

The widespread use of robotic systems remains out of reach for a variety of reasons. Even assuming perfect sensing and modelling, determining the optimal actions to perform to complete tasks is challenging, particularly as task complexity rises. An appealing alternative to explicit modelling is imitation learning (IL), where control policies are learned directly from expert data containing raw observations and actions. Unfortunately, IL is subject to distribution shift: when a robot encounters data not adequately covered by its training dataset, it can fail, sometimes catastrophically. This thesis presents approaches to resolve distribution shift in robotic IL. The first two methods presented directly modify the training dataset as it is collected: one allows better coverage of the full distribution encountered in a mobile manipulation setting, and the next modifies individual trajectories to address an inevitable shift resulting from the use of kinesthetic teaching, a common approach to generating demonstrations. Our third approach directly detects distribution shift and allows the expert to intervene and provide corrective data in response. Finally, we investigate inverse reinforcement learning (IRL), which ostensibly resolves distribution shift, but in practice can suffer from deceptive learned rewards causing locally maximal but globally poor behaviour. In our fourth approach, we learn policies from demonstrations of auxiliary tasks, in addition to the main task, to break out of these local maxima in IRL. Our fifth approach extends this exploration improvement to the more difficult case of merely having examples of completed tasks and auxiliary tasks, improving performance while simultaneously significantly reducing the collection burden on the expert. These approaches are demonstrated on both simulated and real robotic manipulators, and we provide open source code to reproduce and build upon our results.

Efficient Imitation Without Demonstrations via Value-Penalized Auxiliary Control from Examples

T. Ablett, B. Chan, H. Wang, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Atlanta, Georgia, USA, May 19–23, 2025.

Bibtex | Abstract | arXiv | Site

@inproceedings{2025_Ablett_Efficient,
  abstract = {Learning from examples of success is an appealing approach to reinforcement learning but it presents a challenging exploration problem, especially for complex or long-horizon tasks. This work introduces value-penalized auxiliary control from examples (VPACE), an algorithm that significantly improves exploration in example-based control by adding examples of simple auxiliary tasks. For instance, a manipulation task may have auxiliary examples of an object being reached for, grasped, or lifted. We show that the naive application of scheduled auxiliary control to example-based learning can lead to value overestimation and poor performance. We resolve the problem with an above-success-level value penalty. Across both simulated and real robotic environments, we show that our approach substantially improves learning efficiency for challenging tasks, while maintaining bounded value estimates. We compare with existing approaches to example-based learning, inverse reinforcement learning, and an exploration bonus. Preliminary results also suggest that VPACE may learn more efficiently than the more common approaches of using full trajectories or true sparse rewards.},
  address = {Atlanta, Georgia, USA},
  author = {Trevor Ablett and Bryan Chan and Haoran Wang and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA)}},
  date = {2025-05-19/2025-05-23},
  month = {May 19--23},
  site = {https://papers.starslab.ca/vpace/},
  title = {Efficient Imitation Without Demonstrations via Value-Penalized Auxiliary Control from Examples},
  url = {https://arxiv.org/abs/2407.03311},
  year = {2025}
}

Learning from examples of success is an appealing approach to reinforcement learning but it presents a challenging exploration problem, especially for complex or long-horizon tasks. This work introduces value-penalized auxiliary control from examples (VPACE), an algorithm that significantly improves exploration in example-based control by adding examples of simple auxiliary tasks. For instance, a manipulation task may have auxiliary examples of an object being reached for, grasped, or lifted. We show that the naive application of scheduled auxiliary control to example-based learning can lead to value overestimation and poor performance. We resolve the problem with an above-success-level value penalty. Across both simulated and real robotic environments, we show that our approach substantially improves learning efficiency for challenging tasks, while maintaining bounded value estimates. We compare with existing approaches to example-based learning, inverse reinforcement learning, and an exploration bonus. Preliminary results also suggest that VPACE may learn more efficiently than the more common approaches of using full trajectories or true sparse rewards.

Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor

T. Ablett, O. Limoyo, A. Sigal, A. Jilani, J. Kelly, K. Siddiqi, F. Hogan, and G. Dudek

IEEE Transactions on Robotics, vol. 41, pp. 946-959, 2025.

@article{2025_Ablett_Multimodal,
  abstract = {Contact-rich tasks continue to present many challenges for robotic manipulation. In this work, we leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact-rich tasks that involve relative motion (e.g., slipping and sliding) between the end-effector and the manipulated object. We introduce two algorithmic contributions, tactile force matching and learned mode switching, as complimentary methods for improving IL. Tactile force matching enhances kinesthetic teaching by reading approximate forces during the demonstration and generating an adapted robot trajectory that recreates the recorded forces. Learned mode switching uses IL to couple visual and tactile sensor modes with the learned motion policy, simplifying the transition from reaching to contacting. We perform robotic manipulation experiments on four door-opening tasks with a variety of observation and algorithm configurations to study the utility of multimodal visuotactile sensing and our proposed improvements. Our results show that the inclusion of force matching raises average policy success rates by 62.5\%, visuotactile mode switching by 30.3\%, and visuotactile data as a policy input by 42.5\%, emphasizing the value of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to enable accurate task feedback.},
  author = {Trevor Ablett and Oliver Limoyo and Adam Sigal and Affan Jilani and Jonathan Kelly and Kaleem Siddiqi and Francois Hogan and Gregory Dudek},
  doi = {10.1109/TRO.2024.3521864},
  journal = {{IEEE} Transactions on Robotics},
  pages = {946--959},
  site = {https://papers.starslab.ca/sts-il/},
  title = {Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor},
  url = {https://arxiv.org/abs/2311.01248},
  video1 = {https://www.youtube.com/watch?v=1BgS78-5_vA&ab_channel=UTIASSTARSLaboratory},
  volume = {41},
  year = {2025}
}

Contact-rich tasks continue to present many challenges for robotic manipulation. In this work, we leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact-rich tasks that involve relative motion (e.g., slipping and sliding) between the end-effector and the manipulated object. We introduce two algorithmic contributions, tactile force matching and learned mode switching, as complimentary methods for improving IL. Tactile force matching enhances kinesthetic teaching by reading approximate forces during the demonstration and generating an adapted robot trajectory that recreates the recorded forces. Learned mode switching uses IL to couple visual and tactile sensor modes with the learned motion policy, simplifying the transition from reaching to contacting. We perform robotic manipulation experiments on four door-opening tasks with a variety of observation and algorithm configurations to study the utility of multimodal visuotactile sensing and our proposed improvements. Our results show that the inclusion of force matching raises average policy success rates by 62.5\%, visuotactile mode switching by 30.3\%, and visuotactile data as a policy input by 42.5\%, emphasizing the value of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to enable accurate task feedback.

Structured Pneumatic Fingerpads for Actively Tunable Grip Friction

K. Allison, J. Kelly, and B. Hatton

Proceedings of the IEEE International Conference on Soft Robotics (RoboSoft), Lausanne, Switzerland, Apr. 23–26, 2025.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2025_Allison_Structured,
  abstract = {Grip surfaces with tunable friction can actively modify contact conditions, enabling transitions between higher- and lower-friction states for grasp adjustment. Friction can be increased to grip securely and then decreased to gently release (e.g., for handovers) or manipulate in-hand. Recent friction-tuning surface designs using soft pneumatic chambers show good control over grip friction; however, most require complex fabrication processes and/or custom gripper hardware. We present a practical structured fingerpad design for friction tuning that uses less than $1 USD of materials, takes only seconds to repair, and is easily adapted to existing grippers. Our design uses surface morphology changes to tune friction. The fingerpad is actuated by pressurizing its internal chambers, thereby deflecting its flexible grip surface out from or into these chambers. We characterize the friction-tuning capabilities of our design by measuring the shear force required to pull an object from a gripper equipped with two independently actuated fingerpads. Our results show that varying actuation pressure and timing changes the magnitude of friction forces on a gripped object by up to a factor of 2.8. We demonstrate additional features including macro-scale interlocking behaviour and pressure-based object detection.},
  address = {Lausanne, Switzerland},
  author = {Katherine Allison and Jonathan Kelly and Benjamin Hatton},
  booktitle = {Proceedings of the {IEEE} International Conference on Soft Robotics {(RoboSoft)}},
  date = {2025-04-23/2025-04-26},
  doi = {10.1109/RoboSoft63089.2025.11020883},
  month = {Apr. 23--26},
  title = {Structured Pneumatic Fingerpads for Actively Tunable Grip Friction},
  url = {https://arxiv.org/abs/2502.00926},
  year = {2025}
}

Grip surfaces with tunable friction can actively modify contact conditions, enabling transitions between higher- and lower-friction states for grasp adjustment. Friction can be increased to grip securely and then decreased to gently release (e.g., for handovers) or manipulate in-hand. Recent friction-tuning surface designs using soft pneumatic chambers show good control over grip friction; however, most require complex fabrication processes and/or custom gripper hardware. We present a practical structured fingerpad design for friction tuning that uses less than $1 USD of materials, takes only seconds to repair, and is easily adapted to existing grippers. Our design uses surface morphology changes to tune friction. The fingerpad is actuated by pressurizing its internal chambers, thereby deflecting its flexible grip surface out from or into these chambers. We characterize the friction-tuning capabilities of our design by measuring the shear force required to pull an object from a gripper equipped with two independently actuated fingerpads. Our results show that varying actuation pressure and timing changes the magnitude of friction forces on a gripped object by up to a factor of 2.8. We demonstrate additional features including macro-scale interlocking behaviour and pressure-based object detection.

Automated Planning Domain Inference for Task and Motion Planning

J. Huang, A. Tao, R. Marco, M. Bogdanovic, J. Kelly, and F. Shkurti

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Atlanta, Georgia, USA, May 19–23, 2025.

Bibtex | Abstract

@inproceedings{2025_Huang_Automatic,
  abstract = {Task and motion planning (TAMP) frameworks address long and complex planning problems by integrating high-level task planners with low-level motion planners. However, existing TAMP methods rely heavily on the manual design of planning domains that specify the preconditions and postconditions of all high-level actions. This paper proposes a method to automate planning domain inference from a handful of test-time trajectory demonstrations, reducing the reliance on human design. Our approach incorporates a deep learning-based estimator that predicts the appropriate components of a domain for a new task and a search algorithm that refines this prediction, reducing the size and ensuring the utility of the inferred domain. Our method is able to generate new domains from minimal demonstrations at test time, enabling robots to handle complex tasks more efficiently. We demonstrate that our approach outperforms behavior cloning baselines, which directly imitate planner behavior, in terms of planning performance and generalization across a variety of tasks. Additionally, our method reduces computational costs and data amount requirements at test time for inferring new planning domains.},
  address = {Atlanta, Georgia, USA},
  author = {Jinbang Huang and Allen Tao and Rozilyn Marco and Miroslav Bogdanovic and Jonathan Kelly and Florian Shkurti},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA})},
  date = {2025-05-19/2025-05-23},
  month = {May 19--23},
  title = {Automated Planning Domain Inference for Task and Motion Planning},
  year = {2025}
}

Task and motion planning (TAMP) frameworks address long and complex planning problems by integrating high-level task planners with low-level motion planners. However, existing TAMP methods rely heavily on the manual design of planning domains that specify the preconditions and postconditions of all high-level actions. This paper proposes a method to automate planning domain inference from a handful of test-time trajectory demonstrations, reducing the reliance on human design. Our approach incorporates a deep learning-based estimator that predicts the appropriate components of a domain for a new task and a search algorithm that refines this prediction, reducing the size and ensuring the utility of the inferred domain. Our method is able to generate new domains from minimal demonstrations at test time, enabling robots to handle complex tasks more efficiently. We demonstrate that our approach outperforms behavior cloning baselines, which directly imitate planner behavior, in terms of planning performance and generalization across a variety of tasks. Additionally, our method reduces computational costs and data amount requirements at test time for inferring new planning domains.

Risk-Averse Traversal of Graphs with Stochastic and Correlated Edge Costs for Safe Global Planetary Mobility

O. Lamarre and J. Kelly

Autonomous Robots, 2025.

Bibtex | Abstract | arXiv

@article{2025_Lamarre_Risk-Averse,
  abstract = {In robotic planetary surface exploration, strategic mobility planning is an important task that involves finding candidate long-distance routes on orbital maps and identifying segments with uncertain traversability. Then, expert human operators establish safe, adaptive traverse plans based on the actual navigation difficulties encountered in these uncertain areas. In this paper, we formalize this challenge as a new, risk-averse variant of the Canadian Traveller Problem (CTP) tailored to global planetary mobility. The objective is to find a traverse policy minimizing a conditional value-at-risk (CVaR) criterion, which is a risk measure with an intuitive interpretation. We propose a novel search algorithm that finds exact CVaR-optimal policies. Our approach leverages well-established optimal AND-OR search techniques intended for (risk-agnostic) expectation minimization and extends these methods to the risk-averse domain. We validate our approach through simulated long-distance planetary surface traverses; we employ real orbital maps of the Martian surface to construct problem instances and use terrain maps to express traversal probabilities in uncertain regions. Our results illustrate different adaptive decision-making schemes depending on the level of risk aversion. Additionally, our problem setup allows accounting for traversability correlations between similar areas of the environment. In such a case, we empirically demonstrate how information-seeking detours can mitigate risk.},
  author = {Olivier Lamarre and Jonathan Kelly},
  journal = {Autonomous Robots},
  note = {Submitted},
  title = {Risk-Averse Traversal of Graphs with Stochastic and Correlated Edge Costs for Safe Global Planetary Mobility},
  url = {https://arxiv.org/abs/2505.13674},
  year = {2025}
}

In robotic planetary surface exploration, strategic mobility planning is an important task that involves finding candidate long-distance routes on orbital maps and identifying segments with uncertain traversability. Then, expert human operators establish safe, adaptive traverse plans based on the actual navigation difficulties encountered in these uncertain areas. In this paper, we formalize this challenge as a new, risk-averse variant of the Canadian Traveller Problem (CTP) tailored to global planetary mobility. The objective is to find a traverse policy minimizing a conditional value-at-risk (CVaR) criterion, which is a risk measure with an intuitive interpretation. We propose a novel search algorithm that finds exact CVaR-optimal policies. Our approach leverages well-established optimal AND-OR search techniques intended for (risk-agnostic) expectation minimization and extends these methods to the risk-averse domain. We validate our approach through simulated long-distance planetary surface traverses; we employ real orbital maps of the Martian surface to construct problem instances and use terrain maps to express traversal probabilities in uncertain regions. Our results illustrate different adaptive decision-making schemes depending on the level of risk aversion. Additionally, our problem setup allows accounting for traversability correlations between similar areas of the environment. In such a case, we empirically demonstrate how information-seeking detours can mitigate risk.

Submitted

Generative Graphical Inverse Kinematics

O. Limoyo, F. Mari’c, M. Giamou, P. Alexson, I. Petrovi’c, and J. Kelly

IEEE Transactions on Robotics, vol. 41, pp. 1002-1018, 2025.

DOI | Bibtex | Abstract | arXiv

@article{2025_Limoyo_Generative,
  abstract = {Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for many robot manipulators. Existing numerical solvers are broadly applicable but typically only produce a single solution and rely on local search techniques to minimize nonconvex objective functions. Recent learning-based approaches that approximate the entire feasible set of solutions have shown promise in generating multiple fast and accurate IK results in parallel. However, existing learning-based techniques have a significant drawback: each robot of interest requires a specialized model that must be trained from scratch. To address this key shortcoming, we propose a novel distance-geometric robot representation coupled with a graph structure that allows us to leverage the generalizability of graph neural networks (GNNs). Our approach, which we call GGIK (for generative graphical inverse kinematics), is the first learned IK solver that is able to efficiently yield a large number of diverse solutions in parallel while also displaying the ability to generalize---a single learned model can be used to produce IK solutions for a variety of different robots. When compared to several other learned IK methods, GGIK provides more accurate solutions with the same amount of training data. GGIK can also generalize reasonably well to robot manipulators unseen during training. Additionally, GGIK is able to learn a constrained distribution that encodes joint limits and scales well with the number of robot joints and sampled solutions. Finally, GGIK can be used to complement local IK solvers by providing a reliable initialization for the local optimization process.},
  author = {Oliver Limoyo and Filip Mari\'{c} and Matthew Giamou and Petra Alexson and Ivan Petrovi\'{c} and Jonathan Kelly},
  doi = {10.1109/TRO.2024.3521862},
  journal = {{IEEE} Transactions on Robotics},
  pages = {1002--1018},
  title = {Generative Graphical Inverse Kinematics},
  url = {http://arxiv.org/abs/2209.08812},
  volume = {41},
  year = {2025}
}

Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for many robot manipulators. Existing numerical solvers are broadly applicable but typically only produce a single solution and rely on local search techniques to minimize nonconvex objective functions. Recent learning-based approaches that approximate the entire feasible set of solutions have shown promise in generating multiple fast and accurate IK results in parallel. However, existing learning-based techniques have a significant drawback: each robot of interest requires a specialized model that must be trained from scratch. To address this key shortcoming, we propose a novel distance-geometric robot representation coupled with a graph structure that allows us to leverage the generalizability of graph neural networks (GNNs). Our approach, which we call GGIK (for generative graphical inverse kinematics), is the first learned IK solver that is able to efficiently yield a large number of diverse solutions in parallel while also displaying the ability to generalize---a single learned model can be used to produce IK solutions for a variety of different robots. When compared to several other learned IK methods, GGIK provides more accurate solutions with the same amount of training data. GGIK can also generalize reasonably well to robot manipulators unseen during training. Additionally, GGIK is able to learn a constrained distribution that encodes joint limits and scales well with the number of robot joints and sampled solutions. Finally, GGIK can be used to complement local IK solvers by providing a reliable initialization for the local optimization process.

Stable Object Placement Planning from Contact Point Robustness

P. Nadeau and J. Kelly

IEEE Transactions on Robotics, vol. 41, pp. 3669-3683, 2025.

DOI | Bibtex | Abstract | arXiv

@article{2025_Nadeau_Stable,
  abstract = {We introduce a planner designed to guide robot manipulators in stably placing objects within complex scenes. Our proposed method reverses the traditional approach to object placement: our planner selects contact points first and then determines a placement pose that solicits the selected points. This is instead of sampling poses, identifying contact points, and evaluating pose quality. Our algorithm facilitates stability-aware object placement planning, imposing no restrictions on object shape, convexity, or mass density homogeneity, while avoiding combinatorial computational complexity. Our proposed stability heuristic enables our planner to find a solution about 20 times faster when compared to the same algorithm not making use of the heuristic and eight times faster than a state-of-the-art method that uses the traditional sample-and-evaluate approach. The proposed planner is also more successful in finding stable placements than the five other benchmarked algorithms. Derived from first principles and validated in ten real robot experiments, our approach provides a general and scalable solution to the problem of rigid object placement planning.},
  author = {Philippe Nadeau and Jonathan Kelly},
  doi = {10.1109/TRO.2025.3577049},
  journal = {{IEEE} Transactions on Robotics},
  pages = {3669--3683},
  title = {Stable Object Placement Planning from Contact Point Robustness},
  url = {https://arxiv.org/abs/2410.12483},
  volume = {41},
  year = {2025}
}

We introduce a planner designed to guide robot manipulators in stably placing objects within complex scenes. Our proposed method reverses the traditional approach to object placement: our planner selects contact points first and then determines a placement pose that solicits the selected points. This is instead of sampling poses, identifying contact points, and evaluating pose quality. Our algorithm facilitates stability-aware object placement planning, imposing no restrictions on object shape, convexity, or mass density homogeneity, while avoiding combinatorial computational complexity. Our proposed stability heuristic enables our planner to find a solution about 20 times faster when compared to the same algorithm not making use of the heuristic and eight times faster than a state-of-the-art method that uses the traditional sample-and-evaluate approach. The proposed planner is also more successful in finding stable placements than the five other benchmarked algorithms. Derived from first principles and validated in ten real robot experiments, our approach provides a general and scalable solution to the problem of rigid object placement planning.

FaVoR: Features via Voxel Rendering for Camera Relocalization

V. Polizzi, M. Cannici, D. Scaramuzza, and J. Kelly

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, Arizona, USA, Feb. 28–Mar. 4, 2025.

@inproceedings{2025_Polizzi_FaVoR,
  abstract = {Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerous applications. However, feature-based methods often struggle with significant viewpoint and appearance changes, leading to matching failures and inaccurate
pose estimates. To overcome these limitations, we propose a novel approach that leverages a globally sparse but locally dense 3D representation of 2D features. By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking. Given an initial pose estimate, we first synthesize descriptors from the voxels using volumetric rendering and then perform feature matching to estimate the camera pose. This method enables the generation of descriptors for unseen views, enhancing robustness to viewpoint changes. We evaluate our method on the 7-Scenes and Cambridge Landmarks datasets. Our results show that our approach significantly outperforms existing state-of-the-art feature representation techniques in indoor environments, achieving up to a 39\% improvement in median translation error. Additionally, our approach yields comparable results to other methods for outdoor scenes but with lower computational and memory footprints.},
  address = {Tucson, Arizona, USA},
  author = {Vincenzo Polizzi and Marco Cannici and Davide Scaramuzza and Jonathan Kelly},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision {(WACV)}},
  date = {2025-02-28/2025-03-04},
  doi = {10.1109/WACV61041.2025.00015},
  month = {Feb. 28--Mar. 4},
  site = {https://papers.starslab.ca/favor/},
  title = {{FaVoR}: Features via Voxel Rendering for Camera Relocalization},
  url = {https://arxiv.org/abs/2409.07571},
  video1 = {https://www.youtube.com/watch?v=rgV8TZzX7qc&ab_channel=UTIASSTARSLaboratory},
  year = {2025}
}

Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerous applications. However, feature-based methods often struggle with significant viewpoint and appearance changes, leading to matching failures and inaccurate pose estimates. To overcome these limitations, we propose a novel approach that leverages a globally sparse but locally dense 3D representation of 2D features. By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking. Given an initial pose estimate, we first synthesize descriptors from the voxels using volumetric rendering and then perform feature matching to estimate the camera pose. This method enables the generation of descriptors for unseen views, enhancing robustness to viewpoint changes. We evaluate our method on the 7-Scenes and Cambridge Landmarks datasets. Our results show that our approach significantly outperforms existing state-of-the-art feature representation techniques in indoor environments, achieving up to a 39\% improvement in median translation error. Additionally, our approach yields comparable results to other methods for outdoor scenes but with lower computational and memory footprints.

Learning Cross-Spectral Point Features for Visual Navigation with Task-Oriented Training

M. Thomas

Master Thesis , University of Toronto, Toronto, Ontario, Canada, 2025.

Bibtex | Abstract

@mastersthesis{2025_Thomas_Learning_A,
  abstract = {Unmanned aerial vehicles (UAVs) enable operations in remote and hazardous environments, yet the visible-spectrum, camera-based navigation systems relied upon by UAVs struggle in low-visibility conditions. Thermal cameras, which capture long-wave infrared radiation, are able to function effectively in darkness and smoke, where visible-light cameras fail. This thesis explores learned cross-spectral (thermal-visible) point features as a means to integrate thermal imagery into established camera-based navigation systems. While existing approaches focus training on image regions with limited appearance variation across spectra, we take a task-oriented approach to supervision. Our feature network feeds into a differentiable pipeline that performs homography estimation between thermal and visible-spectrum images. We compare different task-based loss formulations using the matching and registration outputs of this pipeline. Our selected feature model, trained with a matching-based loss, outperforms baseline methods on cross-spectral image matching and registration, both with a specialized homography estimation pipeline
and its classical counterpart.},
  address = {Toronto, Ontario, Canada},
  annote = {Unmanned aerial vehicles (UAVs) enable operations in remote and hazardous environments, yet the visible-spectrum, camera-based navigation systems relied upon by UAVs struggle in low-visibility conditions. Thermal cameras, which capture long-wave infrared radiation, are able to function effectively in darkness and smoke, where visible-light cameras fail. This thesis explores learned cross-spectral (thermal-visible) point features as a means to integrate thermal imagery into established camera-based navigation systems. While existing approaches focus training on image regions with limited appearance variation across spectra, we take a task-oriented approach to supervision. Our feature network feeds into a differentiable pipeline that performs homography estimation between thermal and visible-spectrum images. We compare different task-based loss formulations using the matching and registration outputs of this pipeline. Our selected feature model, trained with a matching-based loss, outperforms baseline methods on cross-spectral image matching and registration, both with a specialized homography estimation pipeline and its classical counterpart.},
  author = {Mia Thomas},
  month = {April},
  school = {University of Toronto},
  title = {Learning Cross-Spectral Point Features for Visual Navigation with Task-Oriented Training},
  year = {2025}
}

Unmanned aerial vehicles (UAVs) enable operations in remote and hazardous environments, yet the visible-spectrum, camera-based navigation systems relied upon by UAVs struggle in low-visibility conditions. Thermal cameras, which capture long-wave infrared radiation, are able to function effectively in darkness and smoke, where visible-light cameras fail. This thesis explores learned cross-spectral (thermal-visible) point features as a means to integrate thermal imagery into established camera-based navigation systems. While existing approaches focus training on image regions with limited appearance variation across spectra, we take a task-oriented approach to supervision. Our feature network feeds into a differentiable pipeline that performs homography estimation between thermal and visible-spectrum images. We compare different task-based loss formulations using the matching and registration outputs of this pipeline. Our selected feature model, trained with a matching-based loss, outperforms baseline methods on cross-spectral image matching and registration, both with a specialized homography estimation pipeline and its classical counterpart.

Learning Cross-Spectral Point Features with Task-Oriented Training

M. Thomas, T. Ablett, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) Thermal Infrared in Robotics (TIRO) Workshop, Atlanta, Georgia, USA, May 19, 2025.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2025_Thomas_Learning_B,
  abstract = {Unmanned aerial vehicles (UAVs) enable operations in remote and hazardous environments, yet the visible-spectrum, camera-based navigation systems often relied upon by UAVs struggle in low-visibility conditions. Thermal cameras, which capture long-wave infrared radiation, are able to function effectively in darkness and smoke, where visible-light cameras fail. This work explores learned cross-spectral (thermal-visible) point features as a means to integrate thermal imagery into established camera-based navigation systems. Existing methods typically train a feature network's detection and description outputs directly, which often focuses training on image regions where thermal and visible-spectrum images exhibit similar appearance. Aiming to more fully utilize the available data, we propose a method to train the feature network on the tasks of matching and registration. We run our feature network on thermal-visible image pairs, then feed the network response into a differentiable registration pipeline. Losses are applied to the matching and registration estimates of this pipeline. Our selected model, trained on the task of matching, achieves a registration error (corner error) below 10 pixels for more than 75\% of estimates on the MultiPoint dataset. We further demonstrate that our model can also be used with a classical pipeline for matching and registration.},
  address = {Atlanta, Georgia, USA},
  author = {Mia Thomas and Trevor Ablett and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA)} Thermal Infrared in Robotics (TIRO) Workshop},
  date = {2025-05-19},
  doi = {10.48550/arXiv.2505.12593},
  month = {May 19},
  title = {Learning Cross-Spectral Point Features with Task-Oriented Training},
  url = {https://arxiv.org/abs/2505.12593},
  year = {2025}
}

Unmanned aerial vehicles (UAVs) enable operations in remote and hazardous environments, yet the visible-spectrum, camera-based navigation systems often relied upon by UAVs struggle in low-visibility conditions. Thermal cameras, which capture long-wave infrared radiation, are able to function effectively in darkness and smoke, where visible-light cameras fail. This work explores learned cross-spectral (thermal-visible) point features as a means to integrate thermal imagery into established camera-based navigation systems. Existing methods typically train a feature network's detection and description outputs directly, which often focuses training on image regions where thermal and visible-spectrum images exhibit similar appearance. Aiming to more fully utilize the available data, we propose a method to train the feature network on the tasks of matching and registration. We run our feature network on thermal-visible image pairs, then feed the network response into a differentiable registration pipeline. Losses are applied to the matching and registration estimates of this pipeline. Our selected model, trained on the task of matching, achieves a registration error (corner error) below 10 pixels for more than 75\% of estimates on the MultiPoint dataset. We further demonstrate that our model can also be used with a classical pipeline for matching and registration.

A Certifably Correct Algorithm for Generalized Robot-World and Hand-Eye Calibration

E. Wise, P. Kaveti, Q. Chen, W. Wang, H. Singh, J. Kelly, D. M. Rosen, and M. Giamou

The International Journal of Robotics Research, 2025.

Bibtex | Abstract | arXiv

@article{2025_Wise_Certifably,
  abstract = {Automatic extrinsic sensor calibration is a fundamental problem for multi-sensor platforms. Reliable and general-purpose solutions should be computationally efficient, require few assumptions about the structure of the sensing environment, and demand little effort from human operators. Since the engineering effort required to obtain accurate calibration parameters increases with the number of sensors deployed, robotics researchers have pursued methods requiring few assumptions about the sensing environment and minimal effort from human operators. In this work, we introduce a fast and certifiably globally optimal algorithm for solving a generalized formulation of the robot-world and hand-eye calibration (RWHEC) problem. The formulation of RWHEC presented is ``generalized'' in that it supports the simultaneous estimation of multiple sensor and target poses, and permits the use of monocular cameras that, alone, are unable to measure the scale of their environments. In addition to demonstrating our method's superior performance over existing solutions, we derive novel identifiability criteria and establish a priori guarantees of global optimality for problem instances with bounded measurement errors. We also introduce a complementary Lie-algebraic local solver for RWHEC and compare its performance with our global method and prior art. Finally, we provide a free and open-source implementation of our algorithms and experiments.},
  author = {Emmett Wise and Pushyami Kaveti and Qilong Chen and Wenhao Wang and Hanumant Singh and Jonathan Kelly and David M. Rosen and Matthew Giamou},
  journal = {The International Journal of Robotics Research},
  note = {Submitted},
  title = {A Certifably Correct Algorithm for Generalized Robot-World and Hand-Eye Calibration},
  url = {https://arxiv.org/abs/2507.23045},
  year = {2025}
}

Automatic extrinsic sensor calibration is a fundamental problem for multi-sensor platforms. Reliable and general-purpose solutions should be computationally efficient, require few assumptions about the structure of the sensing environment, and demand little effort from human operators. Since the engineering effort required to obtain accurate calibration parameters increases with the number of sensors deployed, robotics researchers have pursued methods requiring few assumptions about the sensing environment and minimal effort from human operators. In this work, we introduce a fast and certifiably globally optimal algorithm for solving a generalized formulation of the robot-world and hand-eye calibration (RWHEC) problem. The formulation of RWHEC presented is ``generalized'' in that it supports the simultaneous estimation of multiple sensor and target poses, and permits the use of monocular cameras that, alone, are unable to measure the scale of their environments. In addition to demonstrating our method's superior performance over existing solutions, we derive novel identifiability criteria and establish a priori guarantees of global optimality for problem instances with bounded measurement errors. We also introduce a complementary Lie-algebraic local solver for RWHEC and compare its performance with our global method and prior art. Finally, we provide a free and open-source implementation of our algorithms and experiments.

Submitted

Streamlining Calibration: Eliminating Infrastructure and Certifying Optimality

E. Wise

PhD Thesis , University of Toronto, Toronto, Ontario, Canada, 2025.

Bibtex | Abstract | PDF

@phdthesis{2025_Wise_Streamlining,
  abstract = {Autonomous systems often fuse data from multiple sensors and sensing modalities to improve robustness when operating in adverse conditions. Correct sensor fusion requires knowledge of the transformations between the sensor reference frames as well as the temporal offsets between the sensor measurement times. Due to modifications and wear-and-tear, estimates of these parameters may become inaccurate over time, and end-users of autonomous systems in turn require processes to estimate these parameters. The process of estimating the spatial transformation is known as extrinsic calibration, while the process of estimating the temporal offset is known as temporal calibration. Jointly estimating both sets of parameters is known as spatiotemporal calibration. Many state-of-the-art calibration methods require specialized targets and rough initial guesses for the parameters of interest. These requirements limit potential sensor configurations and calibration venues (e.g., the environment must contain a target and all sensors must view that singular target). In this thesis, we seek to lift these restrictions and streamline the calibration process for end-users. Initially, we explore a method to eliminate specialized targets in spatiotemporal calibration where one sensor is a radar. We then develop a targetless extrinsic calibration method for pairs of radars. Additionally, we apply recent results from convex optimization to two classic calibration problems, the hand-eye calibration problem and the hand-eye-robot-world calibration problem, yielding solutions for the optimal calibration parameters (for a dataset) without prior knowledge. Through simulation studies and real-world experiments, we demonstrate that our methods achieve estimation accuracy similar to or better than other calibration methods that require initialization or specialized targets.},
  address = {Toronto, Ontario, Canada},
  author = {Emmett Wise},
  institution = {University of Toronto},
  month = {January},
  school = {University of Toronto},
  title = {Streamlining Calibration: Eliminating Infrastructure and Certifying Optimality},
  year = {2025}
}

Autonomous systems often fuse data from multiple sensors and sensing modalities to improve robustness when operating in adverse conditions. Correct sensor fusion requires knowledge of the transformations between the sensor reference frames as well as the temporal offsets between the sensor measurement times. Due to modifications and wear-and-tear, estimates of these parameters may become inaccurate over time, and end-users of autonomous systems in turn require processes to estimate these parameters. The process of estimating the spatial transformation is known as extrinsic calibration, while the process of estimating the temporal offset is known as temporal calibration. Jointly estimating both sets of parameters is known as spatiotemporal calibration. Many state-of-the-art calibration methods require specialized targets and rough initial guesses for the parameters of interest. These requirements limit potential sensor configurations and calibration venues (e.g., the environment must contain a target and all sensors must view that singular target). In this thesis, we seek to lift these restrictions and streamline the calibration process for end-users. Initially, we explore a method to eliminate specialized targets in spatiotemporal calibration where one sensor is a radar. We then develop a targetless extrinsic calibration method for pairs of radars. Additionally, we apply recent results from convex optimization to two classic calibration problems, the hand-eye calibration problem and the hand-eye-robot-world calibration problem, yielding solutions for the optimal calibration parameters (for a dataset) without prior knowledge. Through simulation studies and real-world experiments, we demonstrate that our methods achieve estimation accuracy similar to or better than other calibration methods that require initialization or specialized targets.

2024

Fast Reinforcement Learning without Rewards or Demonstrations via Auxiliary Task Examples

T. Ablett, B. Chan, J. H. Wang, and J. Kelly

Proceedings of the Conference on Robot Learning (CoRL) Workshop on Mastering Robot Manipulation in a World of Abundant Data, Munich, Germany, Nov. 9, 2024.

Bibtex | Abstract | arXiv

@inproceedings{2024_Ablett_Fast,
  abstract = {An alternative to using hand-crafted rewards and full-trajectory demonstrations in reinforcement learning is the use of examples of completed tasks, but such an approach can be extremely sample inefficient. We introduce value-penalized auxiliary control from examples (VPACE), an algorithm that significantly improves exploration in example-based control by adding examples of simple auxiliary tasks and an above-success-level value penalty. Across both simulated and real robotic environments, we show that our approach substantially improves learning efficiency for challenging tasks, while maintaining bounded value estimates. Preliminary results also suggest that VPACE may learn more efficiently than the more common approaches of using full trajectories or true sparse rewards. Project site: https://papers.starslab.ca/vpace/.},
  address = {Munich, Germany},
  author = {Trevor Ablett and Bryan Chan and Jayce Haoran Wang and Jonathan Kelly},
  booktitle = {Proceedings of the Conference on Robot Learning {(CoRL)} Workshop on Mastering Robot Manipulation in a World of Abundant Data},
  date = {2024-11-09},
  month = {Nov. 9},
  title = {Fast Reinforcement Learning without Rewards or Demonstrations via Auxiliary Task Examples},
  url = {https://openreview.net/forum?id=5gRB5Z4QIg},
  year = {2024}
}

An alternative to using hand-crafted rewards and full-trajectory demonstrations in reinforcement learning is the use of examples of completed tasks, but such an approach can be extremely sample inefficient. We introduce value-penalized auxiliary control from examples (VPACE), an algorithm that significantly improves exploration in example-based control by adding examples of simple auxiliary tasks and an above-success-level value penalty. Across both simulated and real robotic environments, we show that our approach substantially improves learning efficiency for challenging tasks, while maintaining bounded value estimates. Preliminary results also suggest that VPACE may learn more efficiently than the more common approaches of using full trajectories or true sparse rewards. Project site: https://papers.starslab.ca/vpace/.

Fingerpads with Adaptive Surface Properties for Robotic Gripping

K. Allison

Master Thesis , University of Toronto, Toronto, Ontario, Canada, 2024.

Bibtex | Abstract | PDF

@mastersthesis{2024_Allison_Fingerpads,
  abstract = {Recent advances in materials science have created surfaces with controllable adhesion, friction, and compliance. These ``adaptive surfaces'' enable a versatile control paradigm in robotic gripping, where control over contact properties improves performance across widely varying target surfaces. Using actively adaptive grip surfaces, friction can be increased to grip securely and then decreased to gently release or manipulate in-hand. Existing adaptive surface designs show good friction control; however, most require specialized fabrication processes and are impractical for gripping. This thesis presents a practical and inexpensive multi-level adaptive fingerpad design that is easily adapted to existing grippers. Actuating the fingerpad deflects its contact surface in structured active regions, tuning friction by combining microscale effects from microtopography with mesoscale effects from surface morphology changes and macroscale effects from grip articulation. The fingerpad's friction-tuning capabilities are characterized through shear force testing and additional features are demonstrated including macroscale interlocking behaviour and pressure-based object detection.},
  address = {Toronto, Ontario, Canada},
  author = {Katherine Allison},
  month = {December},
  school = {University of Toronto},
  title = {Fingerpads with Adaptive Surface Properties for Robotic Gripping},
  year = {2024}
}

Recent advances in materials science have created surfaces with controllable adhesion, friction, and compliance. These ``adaptive surfaces'' enable a versatile control paradigm in robotic gripping, where control over contact properties improves performance across widely varying target surfaces. Using actively adaptive grip surfaces, friction can be increased to grip securely and then decreased to gently release or manipulate in-hand. Existing adaptive surface designs show good friction control; however, most require specialized fabrication processes and are impractical for gripping. This thesis presents a practical and inexpensive multi-level adaptive fingerpad design that is easily adapted to existing grippers. Actuating the fingerpad deflects its contact surface in structured active regions, tuning friction by combining microscale effects from microtopography with mesoscale effects from surface morphology changes and macroscale effects from grip articulation. The fingerpad's friction-tuning capabilities are characterized through shear force testing and additional features are demonstrated including macroscale interlocking behaviour and pressure-based object detection.

Automated Planning Domain Inference for Robot Task and Motion Planning

J. Huang

Master Thesis , University of Toronto, Toronto, Ontario, Canada, 2024.

Bibtex | Abstract | PDF

@mastersthesis{2024_Huang_Automated,
  abstract = {Robots excel at completing short-term tasks within structured environments but struggle with longer-horizon tasks in dynamic, unstructured settings, due to the limitations of current motion planning algorithms. Task and motion planning (TAMP) frameworks address this problem by integrating high-level task planning with low-level motion planning. However, existing TAMP methods rely heavily on the manual design of planning domains that specify the preconditions and effects of all high-level actions. This the- sis proposes a novel method to automate planning domain inference, reducing the reliance on human design. Our approach incorporates a deep learning-based estimator that predicts the appropriate domain for a new task, and a search algorithm that refines this prediction. Our method is able to generate new domains from minimal demonstrations at test time, enabling robots to handle complex tasks more efficiently. We demonstrate that our approach achieves superior performance and generalization on a variety of tasks compared to behavior cloning baselines.},
  address = {Toronto, Ontario, Canada},
  author = {Jinbang Huang},
  month = {September},
  school = {University of Toronto},
  title = {Automated Planning Domain Inference for Robot Task and Motion Planning},
  year = {2024}
}

Robots excel at completing short-term tasks within structured environments but struggle with longer-horizon tasks in dynamic, unstructured settings, due to the limitations of current motion planning algorithms. Task and motion planning (TAMP) frameworks address this problem by integrating high-level task planning with low-level motion planning. However, existing TAMP methods rely heavily on the manual design of planning domains that specify the preconditions and effects of all high-level actions. This the- sis proposes a novel method to automate planning domain inference, reducing the reliance on human design. Our approach incorporates a deep learning-based estimator that predicts the appropriate domain for a new task, and a search algorithm that refines this prediction. Our method is able to generate new domains from minimal demonstrations at test time, enabling robots to handle complex tasks more efficiently. We demonstrate that our approach achieves superior performance and generalization on a variety of tasks compared to behavior cloning baselines.

Making Space for Time: The Special Galilean Group and Its Application to Some Robotics Problems

J. Kelly

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Workshop From Geometry to General Autonomy of Robotic Systems, Abu Dhabi, United Arab Emirates, Oct. 15, 2024.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2024_Kelly_Making,
  abstract = {The special Galilean group, usually denoted SGal(3), is a 10-dimensional Lie group whose important subgroups include the special orthogonal group, the special Euclidean group, and the group of extended poses. We briefly describe SGal(3) and its Lie algebra and show how the group structure supports a unified representation of uncertainty in space and time. Our aim is to highlight the potential usefulness of this group for several robotics problems.},
  address = {Abu Dhabi, United Arab Emirates},
  author = {Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS)} Workshop From Geometry to General Autonomy of Robotic Systems},
  date = {2024-10-15},
  doi = {10.48550/arXiv.2409.14276},
  month = {Oct. 15},
  title = {Making Space for Time: The Special Galilean Group and Its Application to Some Robotics Problems},
  url = {http://arxiv.org/abs/2409.14276},
  year = {2024}
}

The special Galilean group, usually denoted SGal(3), is a 10-dimensional Lie group whose important subgroups include the special orthogonal group, the special Euclidean group, and the group of extended poses. We briefly describe SGal(3) and its Lie algebra and show how the group structure supports a unified representation of uncertainty in space and time. Our aim is to highlight the potential usefulness of this group for several robotics problems.

The Importance of Adaptive Decision-Making for Autonomous Long-Range Planetary Surface Mobility

O. Lamarre and J. Kelly

Proceedings of the International Symposium on Artificial Intelligence, Robotics and Automation in Space (i-SAIRAS), Brisbane, Queensland, Australia, Nov. 19–21, 2024.

DOI | Bibtex | Abstract | PDF | arXiv

@inproceedings{2024_Lamarre_Importance,
  abstract = {Long-distance driving is an important component of planetary surface exploration. Unforeseen events often require human operators to adjust mobility plans, but this approach does not scale and will be insufficient for future missions. Interest in self-reliant rovers is increasing, however the research community has not yet given significant attention to autonomous, adaptive decision-making. In this paper, we look back at specific planetary mobility operations where human-guided adaptive planning played an important role in mission safety and productivity. Inspired by the abilities of human experts, we identify shortcomings of existing autonomous mobility algorithms for robots operating in off-road environments like planetary surfaces. We advocate for adaptive decision-making capabilities such as unassisted learning from past experiences and more reliance on stochastic world models. The aim of this work is to highlight promising research avenues to enhance ground planning tools and, ultimately, long-range autonomy algorithms on board planetary rovers.},
  address = {Brisbane, Queensland, Australia},
  author = {Olivier Lamarre and Jonathan Kelly},
  booktitle = {Proceedings of the International Symposium on Artificial Intelligence, Robotics and Automation in Space {(i-SAIRAS)}},
  date = {2024-11-19/2024-11-21},
  doi = {10.48550/arXiv.2409.19455},
  month = {Nov. 19--21},
  title = {The Importance of Adaptive Decision-Making for Autonomous Long-Range Planetary Surface Mobility},
  url = {https://arxiv.org/abs/2409.19455},
  year = {2024}
}

Long-distance driving is an important component of planetary surface exploration. Unforeseen events often require human operators to adjust mobility plans, but this approach does not scale and will be insufficient for future missions. Interest in self-reliant rovers is increasing, however the research community has not yet given significant attention to autonomous, adaptive decision-making. In this paper, we look back at specific planetary mobility operations where human-guided adaptive planning played an important role in mission safety and productivity. Inspired by the abilities of human experts, we identify shortcomings of existing autonomous mobility algorithms for robots operating in off-road environments like planetary surfaces. We advocate for adaptive decision-making capabilities such as unassisted learning from past experiences and more reliance on stochastic world models. The aim of this work is to highlight promising research avenues to enhance ground planning tools and, ultimately, long-range autonomy algorithms on board planetary rovers.

Safe Mission-Level Path Planning for Exploration of Lunar Shadowed Regions by a Solar-Powered Rover

O. Lamarre, S. Malhotra, and J. Kelly

Proceedings of the IEEE Aerospace Conference, Big Sky, Montana, USA, Mar. 2–9, 2024, pp. 1-14.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2024_Lamarre_Safe,
  abstract = {Exploration of the lunar south pole with a solar- powered rover is challenging due to the highly dynamic solar illumination conditions and the presence of permanently shadowed regions (PSRs). In turn, careful planning in space and time is essential. Mission-level path planning is a global, spatiotemporal paradigm that addresses this challenge, taking into account rover resources and mission requirements. However, existing approaches do not proactively account for random disturbances, such as recurring faults, that may temporarily delay rover traverse progress. In this paper, we formulate a chance-constrained mission-level planning problem for the exploration of PSRs by a solar-powered rover affected by random faults. The objective is to find a policy that visits as many waypoints of scientific interest as possible while respecting an upper bound on the probability of mission failure.

Our approach assumes that faults occur randomly, but at a known, constant average rate. Each fault is resolved within a fixed time, simulating the recovery period of an autonomous system or the time required for a team of human operators to intervene. Unlike solutions based upon dynamic programming alone, our method breaks the chance-constrained optimization problem into smaller offline and online subtasks to make the problem computationally tractable. Specifically, our solution combines existing mission-level path planning techniques with a stochastic reachability analysis component. We find mission plans that remain within reach of safety throughout large state spaces. To empirically validate our algorithm, we simulate mission scenarios using orbital terrain and illumination maps of Cabeus Crater. Results from simulations of multi-day, long-range drives in the LCROSS impact region are also presented.},
  address = {Big Sky, Montana, USA},
  author = {Olivier Lamarre and Shantanu Malhotra and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} Aerospace Conference},
  date = {2024-03-02/2024-03-09},
  doi = {10.1109/AERO58975.2024.10521136},
  month = {Mar. 2--9},
  pages = {1--14},
  title = {Safe Mission-Level Path Planning for Exploration of Lunar Shadowed Regions by a Solar-Powered Rover},
  url = {https://arxiv.org/abs/2401.08558},
  year = {2024}
}

Exploration of the lunar south pole with a solar- powered rover is challenging due to the highly dynamic solar illumination conditions and the presence of permanently shadowed regions (PSRs). In turn, careful planning in space and time is essential. Mission-level path planning is a global, spatiotemporal paradigm that addresses this challenge, taking into account rover resources and mission requirements. However, existing approaches do not proactively account for random disturbances, such as recurring faults, that may temporarily delay rover traverse progress. In this paper, we formulate a chance-constrained mission-level planning problem for the exploration of PSRs by a solar-powered rover affected by random faults. The objective is to find a policy that visits as many waypoints of scientific interest as possible while respecting an upper bound on the probability of mission failure. Our approach assumes that faults occur randomly, but at a known, constant average rate. Each fault is resolved within a fixed time, simulating the recovery period of an autonomous system or the time required for a team of human operators to intervene. Unlike solutions based upon dynamic programming alone, our method breaks the chance-constrained optimization problem into smaller offline and online subtasks to make the problem computationally tractable. Specifically, our solution combines existing mission-level path planning techniques with a stochastic reachability analysis component. We find mission plans that remain within reach of safety throughout large state spaces. To empirically validate our algorithm, we simulate mission scenarios using orbital terrain and illumination maps of Cabeus Crater. Results from simulations of multi-day, long-range drives in the LCROSS impact region are also presented.

Automated Visual Anomaly Detection for Proximity Operations in the Space Domain

S. Leveugle

Master Thesis , University of Toronto, Toronto, Ontario, Canada, 2024.

Bibtex | Abstract | PDF

@mastersthesis{2024_Leveugle_Automated,
  abstract = {The demand for autonomy on the Canadarm3 presents several new challenges, including the need for the arm to use its inspection cameras to perform autonomous anomaly detection, that is, to identify hazards within its operating environment. In this thesis, we introduce the ALLO dataset, a novel resource for developing and testing anomaly detection algorithms in the space domain. The ALLO dataset is used to evaluate the performance of state-of-the-art anomaly detection algorithms, demonstrating how current methods struggle to generalize to the complex lighting and scenery of space. We then present MRAD, a novel, shallow anomaly detection algorithm designed specifically for space applications. By leveraging the known pose of the Canadarm3 inspection camera, MRAD reformulates the anomaly detection problem and outperforms existing methods. Given the low tolerance for risk in space operations, this research provides essential tools and a potential solution for visual anomaly detection in lunar orbit.},
  address = {Toronto, Ontario, Canada},
  author = {Selina Leveugle},
  month = {September},
  school = {University of Toronto},
  title = {Automated Visual Anomaly Detection for Proximity Operations in the Space Domain},
  year = {2024}
}

The demand for autonomy on the Canadarm3 presents several new challenges, including the need for the arm to use its inspection cameras to perform autonomous anomaly detection, that is, to identify hazards within its operating environment. In this thesis, we introduce the ALLO dataset, a novel resource for developing and testing anomaly detection algorithms in the space domain. The ALLO dataset is used to evaluate the performance of state-of-the-art anomaly detection algorithms, demonstrating how current methods struggle to generalize to the complex lighting and scenery of space. We then present MRAD, a novel, shallow anomaly detection algorithm designed specifically for space applications. By leveraging the known pose of the Canadarm3 inspection camera, MRAD reformulates the anomaly detection problem and outperforms existing methods. Given the low tolerance for risk in space operations, this research provides essential tools and a potential solution for visual anomaly detection in lunar orbit.

Generative and Self-Supervised Learning for Robotics Problems

O. Limoyo

PhD Thesis , University of Toronto, Toronto, Ontario, Canada, 2024.

Bibtex | Abstract | PDF

@phdthesis{2024_Limoyo_Generative,
  abstract = {There is a tremendous amount of information available in the form of unlabelled data from robotic systems. Without proper models and algorithms, tapping into this trove of data is not possible. In this dissertation, we apply generative and self-supervised learning as a framework to solve robotics-specific problems. Traditionally, supervised learning relies on human-annotated labels as signal for the training process. Generative, self-supervised learning is a promising approach that works with unlabelled data directly. A model that is trained to generate a large amount of unlabelled but structured data with a significantly smaller number of parameters learns to acquire an efficient internal representation of the data itself. We present models and algorithms that leverage generative self-supervision in multiple robotic contexts. First, we solve the classical robotic problems of system identification and inverse kinematics by formulating them as generative modelling problems. Second, we leverage self-supervision in combination with grasp planning, impedance control, and tactile sensing to automatically generate robotic manipulation placement demonstration data in contact-constrained environments. Finally, we directly integrate generative models of vision and language, trained solely via self-supervision, with the perspective-n-point algorithm to produce a language-guided robotic photography framework. We demonstrate our approaches on simulated and real-world datasets as well as various real-world robotic manipulation tasks.},
  address = {Toronto, Ontario, Canada},
  author = {Oliver Limoyo},
  institution = {University of Toronto},
  month = {November},
  school = {University of Toronto},
  title = {Generative and Self-Supervised Learning for Robotics Problems},
  year = {2024}
}

There is a tremendous amount of information available in the form of unlabelled data from robotic systems. Without proper models and algorithms, tapping into this trove of data is not possible. In this dissertation, we apply generative and self-supervised learning as a framework to solve robotics-specific problems. Traditionally, supervised learning relies on human-annotated labels as signal for the training process. Generative, self-supervised learning is a promising approach that works with unlabelled data directly. A model that is trained to generate a large amount of unlabelled but structured data with a significantly smaller number of parameters learns to acquire an efficient internal representation of the data itself. We present models and algorithms that leverage generative self-supervision in multiple robotic contexts. First, we solve the classical robotic problems of system identification and inverse kinematics by formulating them as generative modelling problems. Second, we leverage self-supervision in combination with grasp planning, impedance control, and tactile sensing to automatically generate robotic manipulation placement demonstration data in contact-constrained environments. Finally, we directly integrate generative models of vision and language, trained solely via self-supervision, with the perspective-n-point algorithm to produce a language-guided robotic photography framework. We demonstrate our approaches on simulated and real-world datasets as well as various real-world robotic manipulation tasks.

PhotoBot: Reference-Guided Interactive Photography via Natural Language

O. Limoyo, J. Li, D. Rivkin, J. Kelly, and G. Dudek

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, Oct. 14–18, 2024.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2024_Limoyo_PhotoBot,
  abstract = {We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize the reference images via textual descriptions and then use a large language model (LLM) to retrieve relevant reference images based on a user's language query through text-based reasoning. To correspond the reference image and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across marked appearance variations. Using these features, we compute suggested pose adjustments for an RGB-D camera by solving a perspective-n-point (PnP) problem. We demonstrate our approach using a manipulator equipped with a wrist camera. Our user studies show that photos taken by PhotoBot are often more aesthetically pleasing than those taken by users themselves, as measured by human feedback. We also show that PhotoBot can generalize to other reference sources such as paintings.},
  address = {Abu Dhabi, United Arab Emirates},
  author = {Oliver Limoyo and Jimmy Li and Dmitriy Rivkin and Jonathan Kelly and Gregory Dudek},
  booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS)}},
  date = {2024-10-14/2024-10-18},
  doi = {10.1109/IROS58592.2024.10801790},
  month = {Oct. 14--18},
  title = {PhotoBot: Reference-Guided Interactive Photography via Natural Language},
  url = {https://arxiv.org/abs/2401.11061},
  year = {2024}
}

We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize the reference images via textual descriptions and then use a large language model (LLM) to retrieve relevant reference images based on a user's language query through text-based reasoning. To correspond the reference image and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across marked appearance variations. Using these features, we compute suggested pose adjustments for an RGB-D camera by solving a perspective-n-point (PnP) problem. We demonstrate our approach using a manipulator equipped with a wrist camera. Our user studies show that photos taken by PhotoBot are often more aesthetically pleasing than those taken by users themselves, as measured by human feedback. We also show that PhotoBot can generalize to other reference sources such as paintings.

Reference-Guided Robotic Photography Through Natural Language Interaction

O. Limoyo, J. Li, D. Rivkhin, J. Kelly, and G. Dudek

Proceedings of the ACM/IEEE International Conference on Human Robot Interaction (HRI) Workshop on Human-Large Language Model Interaction: The Dawn of a New Era or the End of it All?, Boulder, Colorado, USA, Mar. 11, 2024.

Bibtex | Abstract | PDF

@inproceedings{2024_Limoyo_Reference-Guided,
  abstract = {We introduce PhotoBot, a framework for automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via a reference picture that is retrieved from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize reference pictures via textual descriptions and use a large language model (LLM) to retrieve relevant reference pictures based on a user's language query through text-based reasoning. To correspond the reference picture and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across widely varied images. Using these features, we compute pose adjustments for an RGB-D camera by solving a Perspective-n-Point (PnP) problem, enabling the fully automatic capture of well-composed and compelling photographs.},
  address = {Boulder, Colorado, USA},
  author = {Oliver Limoyo and Jimmy Li and Dmitriy Rivkhin and Jonathan Kelly and Gregory Dudek},
  booktitle = {Proceedings of the {ACM/IEEE} International Conference on Human Robot Interaction {(HRI)} Workshop on Human-Large Language Model Interaction: The Dawn of a New Era or the End of it All?},
  date = {2024-03-11},
  month = {Mar. 11},
  title = {Reference-Guided Robotic Photography Through Natural Language Interaction},
  year = {2024}
}

We introduce PhotoBot, a framework for automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via a reference picture that is retrieved from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize reference pictures via textual descriptions and use a large language model (LLM) to retrieve relevant reference pictures based on a user's language query through text-based reasoning. To correspond the reference picture and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across widely varied images. Using these features, we compute pose adjustments for an RGB-D camera by solving a Perspective-n-Point (PnP) problem, enabling the fully automatic capture of well-composed and compelling photographs.

Working Backwards: Learning to Place by Picking

O. Limoyo, A. Konar, T. Ablett, J. Kelly, F. R. Hogan, and G. Dudek

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, Oct. 14–18, 2024.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2024_Limoyo_Working,
  abstract = {We present placing via picking (PvP), a method to autonomously collect real-world demonstrations for a family of placing tasks in which objects must be manipulated to specific contact-constrained locations. With PvP, we approach the collection of robotic object placement demonstrations by reversing the grasping process and exploiting the inherent symmetry of the pick and place problems. Specifically, we obtain placing demonstrations from a set of grasp sequences of objects initially located at their target placement locations. Our system can collect hundreds of demonstrations in contact- constrained environments without human intervention by combining two modules: tactile regrasping and compliant control for grasps. We train a policy directly from visual observations through behavioral cloning, using the autonomously-collected demonstrations. By doing so, the policy can generalize to object placement scenarios outside of the training environment without privileged information (e.g., placing a plate picked up from a table). We validate our approach in home robotic scenarios that include dishwasher loading and table setting. Our approach yields robotic placing policies that outperform policies trained with kinesthetic teaching, both in terms of performance and data efficiency, while requiring no human supervision.},
  address = {Abu Dhabi, United Arab Emirates},
  author = {Oliver Limoyo and Abhisek Konar and Trevor Ablett and Jonathan Kelly and Francois R. Hogan and Gregory Dudek},
  booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS)}},
  date = {2024-10-14/2024-10-18},
  doi = {10.1109/IROS58592.2024.10802173},
  month = {Oct. 14--18},
  title = {Working Backwards: Learning to Place by Picking},
  url = {https://arxiv.org/abs/2312.02352},
  year = {2024}
}

We present placing via picking (PvP), a method to autonomously collect real-world demonstrations for a family of placing tasks in which objects must be manipulated to specific contact-constrained locations. With PvP, we approach the collection of robotic object placement demonstrations by reversing the grasping process and exploiting the inherent symmetry of the pick and place problems. Specifically, we obtain placing demonstrations from a set of grasp sequences of objects initially located at their target placement locations. Our system can collect hundreds of demonstrations in contact- constrained environments without human intervention by combining two modules: tactile regrasping and compliant control for grasps. We train a policy directly from visual observations through behavioral cloning, using the autonomously-collected demonstrations. By doing so, the policy can generalize to object placement scenarios outside of the training environment without privileged information (e.g., placing a plate picked up from a table). We validate our approach in home robotic scenarios that include dishwasher loading and table setting. Our approach yields robotic placing policies that outperform policies trained with kinesthetic teaching, both in terms of performance and data efficiency, while requiring no human supervision.

2023

Learning from Guided Play: Improving Exploration for Adversarial Imitation Learning with Simple Auxiliary Tasks

T. Ablett, B. Chan, and J. Kelly

IEEE Robotics and Automation Letters, vol. 8, iss. 3, pp. 1263-1270, 2023.

@article{2023_Ablett_Learning,
  abstract = {Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learning that reduces the distribution shift suffered by the latter. However, AIL requires effective exploration during an online reinforcement learning phase. In this work, we show that the standard, naive approach to exploration can manifest as a suboptimal local maximum if a policy learned with AIL sufficiently matches the expert distribution without fully learning the desired task. This can be particularly catastrophic for manipulation tasks, where the difference between an expert and a non-expert state-action pair is often subtle. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of multiple exploratory, auxiliary tasks in addition to a main task. The addition of these auxiliary tasks forces the agent to explore states and actions that standard AIL may learn to ignore. Additionally, this particular formulation allows for the reusability of expert data between main tasks. Our experimental results in a challenging multitask robotic manipulation domain indicate that LfGP significantly outperforms both AIL and behaviour cloning, while also being more expert sample efficient than these baselines. To explain this performance gap, we provide further analysis of a toy problem that highlights the coupling between a local maximum and poor exploration, and also visualize the differences between the learned models from AIL and LfGP.},
  author = {Trevor Ablett and Bryan Chan and Jonathan Kelly},
  code = {https://github.com/utiasSTARS/lfgp},
  doi = {10.1109/LRA.2023.3236882},
  journal = {{IEEE} Robotics and Automation Letters},
  month = {March},
  number = {3},
  pages = {1263--1270},
  site = {https://papers.starslab.ca/lfgp/},
  title = {Learning from Guided Play: Improving Exploration for Adversarial Imitation Learning with Simple Auxiliary Tasks},
  url = {https://arxiv.org/abs/2301.00051},
  volume = {8},
  year = {2023}
}

Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learning that reduces the distribution shift suffered by the latter. However, AIL requires effective exploration during an online reinforcement learning phase. In this work, we show that the standard, naive approach to exploration can manifest as a suboptimal local maximum if a policy learned with AIL sufficiently matches the expert distribution without fully learning the desired task. This can be particularly catastrophic for manipulation tasks, where the difference between an expert and a non-expert state-action pair is often subtle. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of multiple exploratory, auxiliary tasks in addition to a main task. The addition of these auxiliary tasks forces the agent to explore states and actions that standard AIL may learn to ignore. Additionally, this particular formulation allows for the reusability of expert data between main tasks. Our experimental results in a challenging multitask robotic manipulation domain indicate that LfGP significantly outperforms both AIL and behaviour cloning, while also being more expert sample efficient than these baselines. To explain this performance gap, we provide further analysis of a toy problem that highlights the coupling between a local maximum and poor exploration, and also visualize the differences between the learned models from AIL and LfGP.

Extrinsic Calibration of 2D Millimetre-Wavelength Radar Pairs Using Ego-Velocity Estimates

Q. Cheng, E. Wise, and J. Kelly

Proceedings of the IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), Seattle, Washington, USA, Jun. 27–Jul. 1, 2023, pp. 559-565.

DOI | Bibtex | Abstract | arXiv | Video

@inproceedings{2023_Cheng_Extrinsic,
  abstract = {Correct radar data fusion depends on knowledge of the spatial transform between sensor pairs. Current methods for determining this transform operate by aligning identifiable features in different radar scans, or by relying on measurements from another, more accurate sensor. Feature-based alignment requires the sensors to have overlapping fields of view or necessitates the construction of an environment map. Several existing techniques require bespoke retroreflective radar targets. These requirements limit both where and how calibration can be performed. In this paper, we take a different approach: instead of attempting to track targets or features, we rely on ego-velocity estimates from each radar to perform calibration. Our method enables calibration of a subset of the transform parameters, including the yaw and the axis of translation between the radar pair, without the need for a shared field of view or for specialized targets. In general, the yaw and the axis of translation are the most important parameters for data fusion, the most likely to vary over time, and the most difficult to calibrate manually. We formulate calibration as a batch optimization problem, show that the radar-radar system is identifiable, and specify the platform excitation requirements. Through simulation studies and real-world experiments, we establish that our method is more reliable and accurate than state-of-the-art methods. Finally, we demonstrate that the full rigid body transform can be recovered if relatively coarse information about the platform rotation rate is available.},
  address = {Seattle, Washington, USA},
  author = {Qilong Cheng and Emmett Wise and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE/ASME} International Conference on Advanced Intelligent Mechatronics {(AIM)}},
  date = {2023-06-27/2023-07-01},
  doi = {10.1109/AIM46323.2023.10196187},
  month = {Jun. 27--Jul. 1},
  pages = {559--565},
  title = {Extrinsic Calibration of {2D} Millimetre-Wavelength Radar Pairs Using Ego-Velocity Estimates},
  url = {https://arxiv.org/abs/2302.00660},
  video1 = {https://www.youtube.com/watch?v=hfXdG6j3sjE},
  year = {2023}
}

Correct radar data fusion depends on knowledge of the spatial transform between sensor pairs. Current methods for determining this transform operate by aligning identifiable features in different radar scans, or by relying on measurements from another, more accurate sensor. Feature-based alignment requires the sensors to have overlapping fields of view or necessitates the construction of an environment map. Several existing techniques require bespoke retroreflective radar targets. These requirements limit both where and how calibration can be performed. In this paper, we take a different approach: instead of attempting to track targets or features, we rely on ego-velocity estimates from each radar to perform calibration. Our method enables calibration of a subset of the transform parameters, including the yaw and the axis of translation between the radar pair, without the need for a shared field of view or for specialized targets. In general, the yaw and the axis of translation are the most important parameters for data fusion, the most likely to vary over time, and the most difficult to calibrate manually. We formulate calibration as a batch optimization problem, show that the radar-radar system is identifiable, and specify the platform excitation requirements. Through simulation studies and real-world experiments, we establish that our method is more reliable and accurate than state-of-the-art methods. Finally, we demonstrate that the full rigid body transform can be recovered if relatively coarse information about the platform rotation rate is available.

Contrastive Learning for Self-Supervised Pre-Training of Point Cloud Segmentation Networks With Image Data

A. Janda, B. Wagstaff, E. G. Ng, and J. Kelly

Proceedings of the Conference on Robots and Vision (CRV), Montréal, Québec, Canada, Jun. 6–8, 2023, pp. 145-152.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2023_Janda_Contrastive,
  abstract = {Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is particularly important for semantic segmentation tasks involving 3D datasets, which are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on unlabelled data is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point clouds exclusively. While useful, this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene and can be applied to cases where localization information is unavailable. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.},
  address = {Montr\'{e}al, Qu\'{e}bec, Canada},
  author = {Andrej Janda and Brandon Wagstaff and Edwin G. Ng and Jonathan Kelly},
  booktitle = {Proceedings of the Conference on Robots and Vision {(CRV)}},
  date = {2023-06-06/2023-06-08},
  doi = {10.1109/CRV60082.2023.00026},
  month = {Jun. 6--8},
  pages = {145--152},
  title = {Contrastive Learning for Self-Supervised Pre-Training of Point Cloud Segmentation Networks With Image Data},
  url = {https://arxiv.org/abs/2301.07283},
  year = {2023}
}

Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is particularly important for semantic segmentation tasks involving 3D datasets, which are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on unlabelled data is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point clouds exclusively. While useful, this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene and can be applied to cases where localization information is unavailable. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.

Living in a Material World: Learning Material Properties from Full-Waveform Flash Lidar Data for Semantic Segmentation

A. Janda, P. Merriaux, P. Olivier, and J. Kelly

Proceedings of the Conference on Robots and Vision (CRV), Montréal, Québec, Canada, Jun. 6–8, 2023, pp. 202-207.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2023_Janda_Living,
  abstract = {Advances in lidar technology have made the collection of 3D point clouds fast and easy. While most lidar sensors return per-point intensity (or reflectance) values along with range measurements, Dash lidar sensors are able to provide information about the shape of the return pulse. The shape of the return waveform is affected by many factors, including the distance that the light pulse travels and the angle of incidence with a surface. Importantly, the shape of the return waveform also depends on the material properties of the reflecting surface. In this paper, we investigate whether the material type or class can be determined from the full-waveform response. First, as a proof of concept, we demonstrate that the extra information about material class, if known accurately, can improve performance on scene understanding tasks such as semantic segmentation. Next, we learn two different full- waveform material classifiers: a random forest classifier and a temporal convolutional neural network (TCN) classifier. We And that, in some cases, material types can be distinguished, and that the TCN generally performs better across a wider range of materials. However, factors such as angle of incidence, material colour, and material similarity may hinder overall performance.},
  address = {Montr\'{e}al, Qu\'{e}bec, Canada},
  author = {Andrej Janda and Pierre Merriaux and Pierre Olivier and Jonathan Kelly},
  booktitle = {Proceedings of the Conference on Robots and Vision {(CRV)}},
  date = {2023-06-06/2023-06-08},
  doi = {10.1109/CRV60082.2023.00033},
  month = {Jun. 6--8},
  pages = {202--207},
  title = {Living in a Material World: Learning Material Properties from Full-Waveform Flash Lidar Data for Semantic Segmentation},
  url = {https://arxiv.org/abs/2305.04334},
  year = {2023}
}

Advances in lidar technology have made the collection of 3D point clouds fast and easy. While most lidar sensors return per-point intensity (or reflectance) values along with range measurements, Dash lidar sensors are able to provide information about the shape of the return pulse. The shape of the return waveform is affected by many factors, including the distance that the light pulse travels and the angle of incidence with a surface. Importantly, the shape of the return waveform also depends on the material properties of the reflecting surface. In this paper, we investigate whether the material type or class can be determined from the full-waveform response. First, as a proof of concept, we demonstrate that the extra information about material class, if known accurately, can improve performance on scene understanding tasks such as semantic segmentation. Next, we learn two different full- waveform material classifiers: a random forest classifier and a temporal convolutional neural network (TCN) classifier. We And that, in some cases, material types can be distinguished, and that the TCN generally performs better across a wider range of materials. However, factors such as angle of incidence, material colour, and material similarity may hinder overall performance.

All About the Galilean Group SGal(3)

J. Kelly

Toronto, Ontario, Canada, Tech. Rep. STARS-2023-001, Nov. 26, 2023.

DOI | Bibtex | Abstract | arXiv

@techreport{2023_Kelly_Galilean,
  abstract = {We consider the Galilean group of transformations that preserve spatial distances and absolute time intervals between events in spacetime. The special Galilean group SGal(3) is a 10-dimensional Lie group; we examine the structure of the group and its Lie algebra and discuss the representation of uncertainty on the group manifold. Along the way, we mention several other groups, including the special orthogonal group, the special Euclidean group, and the group of extended poses, all of which are proper subgroups of the Galilean group. We describe the role of time in Galilean relativity and touch on the relationship between temporal and spatial uncertainty.},
  address = {Toronto, Ontario, Canada},
  author = {Jonathan Kelly},
  date = {2023-11-26},
  doi = {10.48550/arXiv.2312.07555},
  institution = {University of Toronto},
  month = {Nov. 26},
  number = {STARS-2023-001},
  title = {All About the Galilean Group SGal(3)},
  url = {https://arxiv.org/abs/2312.07555},
  year = {2023}
}

We consider the Galilean group of transformations that preserve spatial distances and absolute time intervals between events in spacetime. The special Galilean group SGal(3) is a 10-dimensional Lie group; we examine the structure of the group and its Lie algebra and discuss the representation of uncertainty on the group manifold. Along the way, we mention several other groups, including the special orthogonal group, the special Euclidean group, and the group of extended poses, all of which are proper subgroups of the Galilean group. We describe the role of time in Galilean relativity and touch on the relationship between temporal and spatial uncertainty.

Recovery Policies for Safe Exploration of Lunar Permanently Shadowed Regions by a Solar-Powered Rover

O. Lamarre, S. Malhotra, and J. Kelly

Acta Astronautica, vol. 213, pp. 706-724, 2023.

DOI | Bibtex | Abstract | arXiv | Site

@article{2023_Lamarre_Recovery,
  abstract = {The success of a multi-kilometre drive by a solar-powered rover at the lunar south pole depends upon careful planning in space and time due to highly dynamic solar illumination conditions. An additional challenge is that the rover may be subject to random faults that can temporarily delay long-range traverses. The majority of existing global spatiotemporal planners assume a deterministic rover-environment model and do not account for random faults. In this paper, we consider a random fault profile with a known, average spatial fault rate. We introduce a methodology to compute recovery policies that maximize the probability of survival of a solar-powered rover from different start states. A recovery policy defines a set of recourse actions to reach a safe location with sufficient battery energy remaining, given the local solar illumination conditions. We solve a stochastic reach-avoid problem using dynamic programming to find an optimal recovery policy. Our focus, in part, is on the implications of state space discretization, which is required in practical implementations. We propose a modified dynamic programming algorithm that conservatively accounts for approximation errors. To demonstrate the benefits of our approach, we compare against existing methods in scenarios where a solar-powered rover seeks to safely exit from permanently shadowed regions in the Cabeus area at the lunar south pole. We also highlight the relevance of our methodology for mission formulation and trade safety analysis by comparing different rover mobility models in simulated recovery drives from the LCROSS impact region.},
  author = {Olivier Lamarre and Shantanu Malhotra and Jonathan Kelly},
  doi = {10.1016/j.actaastro.2023.09.028},
  journal = {Acta Astronautica},
  month = {December},
  pages = {706--724},
  site = {https://papers.starslab.ca/recovery-policies-psr-exploration/},
  title = {Recovery Policies for Safe Exploration of Lunar Permanently Shadowed Regions by a Solar-Powered Rover},
  url = {https://arxiv.org/abs/2307.16786},
  volume = {213},
  year = {2023}
}

The success of a multi-kilometre drive by a solar-powered rover at the lunar south pole depends upon careful planning in space and time due to highly dynamic solar illumination conditions. An additional challenge is that the rover may be subject to random faults that can temporarily delay long-range traverses. The majority of existing global spatiotemporal planners assume a deterministic rover-environment model and do not account for random faults. In this paper, we consider a random fault profile with a known, average spatial fault rate. We introduce a methodology to compute recovery policies that maximize the probability of survival of a solar-powered rover from different start states. A recovery policy defines a set of recourse actions to reach a safe location with sufficient battery energy remaining, given the local solar illumination conditions. We solve a stochastic reach-avoid problem using dynamic programming to find an optimal recovery policy. Our focus, in part, is on the implications of state space discretization, which is required in practical implementations. We propose a modified dynamic programming algorithm that conservatively accounts for approximation errors. To demonstrate the benefits of our approach, we compare against existing methods in scenarios where a solar-powered rover seeks to safely exit from permanently shadowed regions in the Cabeus area at the lunar south pole. We also highlight the relevance of our methodology for mission formulation and trade safety analysis by comparing different rover mobility models in simulated recovery drives from the LCROSS impact region.

Euclidean Equivariant Models for Generative Graphical Inverse Kinematics

O. Limoyo, F. Mari’c, M. Giamou, P. Alexson, I. Petrovi’c, and J. Kelly

Proceedings of the Robotics: Science and Systems (RSS) Workshop on Symmetries in Robot Learning, Daegu, Republic of Korea, Jul. 10, 2023.

Bibtex | Abstract | arXiv

@inproceedings{2023_Limoyo_Equivariant,
  abstract = {Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for robotic manipulation. Existing numerical solvers typically produce a single solution only and rely on local search techniques to minimize a highly nonconvex objective function. Recently, learning-based approaches that approximate the entire feasible set of solutions have shown promise as a means to generate multiple fast and accurate IK results in parallel. However, existing learning-based techniques have a significant drawback: each robot of interest requires a specialized model that must be trained from scratch. To address this shortcoming, we investigate a novel distance- geometric robot representation coupled with a graph structure that allows us to leverage the flexibility of graph neural networks (GNNs). We use this approach to train a generative graphical inverse kinematics solver (GGIK) that is able to produce a large number of diverse solutions in parallel while also generalizing well - a single learned model can be used to produce IK solutions for a variety of different robots. The graphical formulation elegantly exposes the symmetry and Euclidean equivariance of the IK problem that stems from the spatial nature of robot manipulators. We exploit this symmetry by encoding it into the architecture of our learned model, yielding a flexible solver that is able to produce sets of IK solutions for multiple robots.},
  address = {Daegu, Republic of Korea},
  author = {Oliver Limoyo and Filip Mari\'{c} and Matthew Giamou and Petra Alexson and Ivan Petrovi\'{c} and Jonathan Kelly},
  booktitle = {Proceedings of the Robotics: Science and Systems {(RSS)} Workshop on Symmetries in Robot Learning},
  date = {2023-07-10},
  month = {Jul. 10},
  title = {Euclidean Equivariant Models for Generative Graphical Inverse Kinematics},
  url = {https://arxiv.org/abs/2307.01902},
  year = {2023}
}

Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for robotic manipulation. Existing numerical solvers typically produce a single solution only and rely on local search techniques to minimize a highly nonconvex objective function. Recently, learning-based approaches that approximate the entire feasible set of solutions have shown promise as a means to generate multiple fast and accurate IK results in parallel. However, existing learning-based techniques have a significant drawback: each robot of interest requires a specialized model that must be trained from scratch. To address this shortcoming, we investigate a novel distance- geometric robot representation coupled with a graph structure that allows us to leverage the flexibility of graph neural networks (GNNs). We use this approach to train a generative graphical inverse kinematics solver (GGIK) that is able to produce a large number of diverse solutions in parallel while also generalizing well - a single learned model can be used to produce IK solutions for a variety of different robots. The graphical formulation elegantly exposes the symmetry and Euclidean equivariance of the IK problem that stems from the spatial nature of robot manipulators. We exploit this symmetry by encoding it into the architecture of our learned model, yielding a flexible solver that is able to produce sets of IK solutions for multiple robots.

Learning Sequential Latent Variable Models from Multimodal Time Series Data

O. Limoyo, T. Ablett, and J. Kelly

Intelligent Autonomous Systems 17, Cham, , 2023, pp. 511-528.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2023_Limoyo_Learning,
  abstract = {Sequential modelling of high-dimensional data is an important problem that appears in many domains including model-based reinforcement learning and dynamics identification for control. Latent variable models applied to sequential data (i.e., latent dynamics models) have been shown to be a particularly effective probabilistic approach to solve this problem, especially when dealing with images. However, in many application areas (e.g., robotics), information from multiple sensing modalities is available---existing latent dynamics methods have not yet been extended to effectively make use of such multimodal sequential data. Multimodal sensor streams can be correlated in a useful manner and often contain complementary information across modalities. In this work, we present a self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data and the respective dynamics. Using synthetic and real-world datasets from a multimodal robotic planar pushing task, we demonstrate that our approach leads to significant improvements in prediction and representation quality.Furthermore, we compare to the common learning baseline of concatenating each modality in the latent space and show that our principled probabilistic formulation performs better. Finally, despite being fully self-supervised, we demonstrate that our method is nearly as effective as an existing supervised approach that relies on ground truth labels.},
  address = {Cham},
  author = {Oliver Limoyo and Trevor Ablett and Jonathan Kelly},
  booktitle = {Intelligent Autonomous Systems 17},
  doi = {10.1007/978-3-031-22216-0_35},
  editor = {Ivan Petrovi\'{c} and Emanuele Menegatti and Ivan Markovi\'{c}},
  isbn = {978-3-031-22216-0},
  note = {Best Paper Finalist},
  pages = {511--528},
  publisher = {Springer Nature Switzerland},
  series = {Lecture Notes in Networks and Systems},
  title = {Learning Sequential Latent Variable Models from Multimodal Time Series Data},
  url = {https://arxiv.org/abs/2204.10419},
  volume = {577},
  year = {2023}
}

Sequential modelling of high-dimensional data is an important problem that appears in many domains including model-based reinforcement learning and dynamics identification for control. Latent variable models applied to sequential data (i.e., latent dynamics models) have been shown to be a particularly effective probabilistic approach to solve this problem, especially when dealing with images. However, in many application areas (e.g., robotics), information from multiple sensing modalities is available---existing latent dynamics methods have not yet been extended to effectively make use of such multimodal sequential data. Multimodal sensor streams can be correlated in a useful manner and often contain complementary information across modalities. In this work, we present a self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data and the respective dynamics. Using synthetic and real-world datasets from a multimodal robotic planar pushing task, we demonstrate that our approach leads to significant improvements in prediction and representation quality.Furthermore, we compare to the common learning baseline of concatenating each modality in the latent space and show that our principled probabilistic formulation performs better. Finally, despite being fully self-supervised, we demonstrate that our method is nearly as effective as an existing supervised approach that relies on ground truth labels.

Best Paper Finalist

A Geometric Approach for Generating Feasible Configurations of Robotic Manipulators

F. Mari’c

PhD Thesis , University of Toronto, Toronto, Ontario, Canada, 2023.

Bibtex | Abstract | PDF

@phdthesis{2023_Maric_Geometric,
  abstract = {Most robotic manipulators, and especially those designed with autonomous operation in mind, consist of a series of joints that rotate about a single axis, also known as revolute joints. These mechanisms give robotic manipulators the degrees of freedom and versatility similar to that of the human arm, which they are designed to outperform. However, this results in a geometry of motion or kinematics that makes all aspects of robotic manipulation challenging from a computational perspective. A major part of this challenge lies in the fact that computing joint configurations adhering to a specific set of constraints (i.e., gripper pose) is a non-trivial problem. The procedure of finding feasible joint configurations and the mathematical problem associated with it are known as inverse kinematics --- a core part of motion planning, trajectory optimization, calibration and other important challenges in successfully performing robotic manipulation. In recent years, the overall decrease of computation time required to perceive and process environmental and proprioceptive information has helped realize the potential of robotic manipulation in dynamic environments. Concurrently, a new standard in manipulator design has emerged, where additional degrees of freedom are added in order to increase their overall dexterity and capacity for motion. These two developments have vastly increased the requirements for inverse kinematics algorithms, which are now expected to deal with infinite solution spaces and difficult, nonlinear constraints. On the other hand, the addition of degrees of freedom in recent robot designs has enabled algorithms to search for locally optimal configurations with respect to some performance criteria in an infinitely large solution space. This property has motivated approaches that leverage non-Euclidean geometries to re- place conventional constraints and optimization criteria, thereby overcoming computational bottlenecks and common failure modes. The contributions presented in this thesis propose three such approaches, that aim to develop new ways of looking at the problems associated with inverse kinematics through the use of geometric representations that are not widely utilized in robotic manipulation.},
  address = {Toronto, Ontario, Canada},
  author = {Filip Mari\'{c}},
  institution = {University of Toronto},
  month = {January},
  school = {University of Toronto},
  title = {A Geometric Approach for Generating Feasible Configurations of Robotic Manipulators},
  year = {2023}
}

Most robotic manipulators, and especially those designed with autonomous operation in mind, consist of a series of joints that rotate about a single axis, also known as revolute joints. These mechanisms give robotic manipulators the degrees of freedom and versatility similar to that of the human arm, which they are designed to outperform. However, this results in a geometry of motion or kinematics that makes all aspects of robotic manipulation challenging from a computational perspective. A major part of this challenge lies in the fact that computing joint configurations adhering to a specific set of constraints (i.e., gripper pose) is a non-trivial problem. The procedure of finding feasible joint configurations and the mathematical problem associated with it are known as inverse kinematics --- a core part of motion planning, trajectory optimization, calibration and other important challenges in successfully performing robotic manipulation. In recent years, the overall decrease of computation time required to perceive and process environmental and proprioceptive information has helped realize the potential of robotic manipulation in dynamic environments. Concurrently, a new standard in manipulator design has emerged, where additional degrees of freedom are added in order to increase their overall dexterity and capacity for motion. These two developments have vastly increased the requirements for inverse kinematics algorithms, which are now expected to deal with infinite solution spaces and difficult, nonlinear constraints. On the other hand, the addition of degrees of freedom in recent robot designs has enabled algorithms to search for locally optimal configurations with respect to some performance criteria in an infinitely large solution space. This property has motivated approaches that leverage non-Euclidean geometries to re- place conventional constraints and optimization criteria, thereby overcoming computational bottlenecks and common failure modes. The contributions presented in this thesis propose three such approaches, that aim to develop new ways of looking at the problems associated with inverse kinematics through the use of geometric representations that are not widely utilized in robotic manipulation.

Reference-guided Controllable Inpainting of Neural Radiance Fields

A. Mirzaei, T. Aumentado-Armstrong, M. A. Brubaker, J. Kelly, A. Levinshtein, K. G. Derpanis, and I. Gilitschenski

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, Oct. 2–6, 2023, pp. 17769-17779.

DOI | Bibtex | Abstract | arXiv | Site

@inproceedings{2023_Mirzaei_Reference-guided,
  abstract = {The popularity of Neural Radiance Fields (NeRFs) for view synthesis has led to a desire for NeRF editing tools. Here, we focus on inpainting regions in a view-consistent and controllable manner. In addition to the typical NeRF inputs and masks delineating the unwanted region in each view, we require only a single inpainted view of the scene, i.e., a reference view. We use monocular depth estimators to back-project the inpainted view to the correct 3D positions. Then, via a novel rendering technique, a bilateral solver can construct view-dependent effects in non-reference views, making the inpainted region appear consistent from any view. For non-reference disoccluded regions, which cannot be supervised by the single reference view, we devise a method based on image inpainters to guide both the geometry and appearance. Our approach shows superior performance to NeRF inpainting baselines, with the additional advantage that a user can control the generated scene via a single inpainted image.},
  address = {Paris, France},
  author = {Ashkan Mirzaei and Tristan Aumentado-Armstrong and Marcus A. Brubaker and Jonathan Kelly and Alex Levinshtein and Konstantinos G. Derpanis and Igor Gilitschenski},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision {(ICCV)}},
  date = {2023-10-02/2023-10-06},
  doi = {10.1109/ICCV51070.2023.01633},
  month = {Oct. 2--6},
  pages = {17769--17779},
  site = {https://ashmrz.github.io/reference-guided-3d/},
  title = {Reference-guided Controllable Inpainting of Neural Radiance Fields},
  url = {https://arxiv.org/abs/2304.09677},
  year = {2023}
}

The popularity of Neural Radiance Fields (NeRFs) for view synthesis has led to a desire for NeRF editing tools. Here, we focus on inpainting regions in a view-consistent and controllable manner. In addition to the typical NeRF inputs and masks delineating the unwanted region in each view, we require only a single inpainted view of the scene, i.e., a reference view. We use monocular depth estimators to back-project the inpainted view to the correct 3D positions. Then, via a novel rendering technique, a bilateral solver can construct view-dependent effects in non-reference views, making the inpainted region appear consistent from any view. For non-reference disoccluded regions, which cannot be supervised by the single reference view, we devise a method based on image inpainters to guide both the geometry and appearance. Our approach shows superior performance to NeRF inpainting baselines, with the additional advantage that a user can control the generated scene via a single inpainted image.

SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields

A. Mirzaei, T. Aumentado-Armstrong, K. G. Derpanis, J. Kelly, M. A. Brubaker, I. Gilitschenski, and A. Levinshtein

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, British Columbia, Canada, Jun. 18–22, 2023, pp. 20669-20679.

DOI | Bibtex | Abstract | arXiv | Site

@inproceedings{2023_Mirzaei_SPIn-NeRF,
  abstract = {Neural Radiance Fields (NeRFs) have emerged as a popular approach for novel view synthesis. While NeRFs are quickly being adapted for a wider set of applications, intuitively editing NeRF scenes is still an open challenge. One important editing task is the removal of unwanted objects from a 3D scene, such that the replaced region is visually plausible and consistent with its context. We refer to this task as 3D inpainting. In 3D, solutions must be both consistent across multiple views and geometrically valid. In this paper, we propose a novel 3D inpainting method that addresses these challenges. Given a small set of posed images and sparse annotations in a single input image, our framework first rapidly obtains a 3D segmentation mask for a target object. Using the mask, a perceptual optimization-based approach is then introduced that leverages learned 2D image inpainters, distilling their information into 3D space, while ensuring view consistency. We also address the lack of a diverse benchmark for evaluating 3D scene inpainting methods by introducing a dataset comprised of challenging real-world scenes. In particular, our dataset contains views of the same scene with and without a target object, enabling more principled benchmarking of the 3D inpainting task. We first demonstrate the superiority of our approach on multiview segmentation, comparing to NeRF-based methods and 2D segmentation approaches. We then evaluate on the task of 3D inpainting, establishing state-of-the-art performance against other NeRF manipulation algorithms, as well as a strong 2D image inpainter baseline.},
  address = {Vancouver, British Columbia, Canada},
  author = {Ashkan Mirzaei and Tristan Aumentado-Armstrong and Konstantinos G. Derpanis and Jonathan Kelly and Marcus A. Brubaker and Igor Gilitschenski and Alex Levinshtein},
  booktitle = {Proceedings of the {IEEE/CVF} Conference on Computer Vision and Pattern Recognition {(CVPR)}},
  date = {2023-06-18/2023-06-22},
  doi = {10.1109/CVPR52729.2023.01980},
  month = {Jun. 18--22},
  pages = {20669--20679},
  site = {https://spinnerf3d.github.io/},
  title = {{SPIn-NeRF}: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields},
  url = {https://arxiv.org/abs/2211.12254},
  year = {2023}
}

Neural Radiance Fields (NeRFs) have emerged as a popular approach for novel view synthesis. While NeRFs are quickly being adapted for a wider set of applications, intuitively editing NeRF scenes is still an open challenge. One important editing task is the removal of unwanted objects from a 3D scene, such that the replaced region is visually plausible and consistent with its context. We refer to this task as 3D inpainting. In 3D, solutions must be both consistent across multiple views and geometrically valid. In this paper, we propose a novel 3D inpainting method that addresses these challenges. Given a small set of posed images and sparse annotations in a single input image, our framework first rapidly obtains a 3D segmentation mask for a target object. Using the mask, a perceptual optimization-based approach is then introduced that leverages learned 2D image inpainters, distilling their information into 3D space, while ensuring view consistency. We also address the lack of a diverse benchmark for evaluating 3D scene inpainting methods by introducing a dataset comprised of challenging real-world scenes. In particular, our dataset contains views of the same scene with and without a target object, enabling more principled benchmarking of the 3D inpainting task. We first demonstrate the superiority of our approach on multiview segmentation, comparing to NeRF-based methods and 2D segmentation approaches. We then evaluate on the task of 3D inpainting, establishing state-of-the-art performance against other NeRF manipulation algorithms, as well as a strong 2D image inpainter baseline.

The Sum of Its Parts: Visual Part Segmentation for Inertial Parameter Identification of Manipulated Objects

P. Nadeau, M. Giamou, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), London, United Kingdom, May 29–Jun. 2, 2023, pp. 3779-3785.

@inproceedings{2023_Nadeau_Sum,
  abstract = {To operate safely and efficiently alongside human workers, collaborative robots (cobots) require the ability to quickly understand the dynamics of manipulated objects. How- ever, traditional methods for estimating the inertial parameters of novel objects rely on motions that are necessarily fast and unsafe (to achieve a sufficient signal-to-noise ratio). In this work, we follow an alternative approach: by combining visual and force-torque measurements, we develop an inertial parameter identification algorithm that requires slow or ``stop- and-go'' motions only, and hence is ideally tailored for use around humans. Our technique, called Homogeneous Part Segmentation (HPS), leverages the observation that man-made objects are typically composed of distinct, homogeneous parts. We combine a surface-based point clustering method with a volumetric shape segmentation algorithm to quickly produce a part-level segmentation of a manipulated object; the segmented representation is then used by HPS to accurately estimate the object's inertial parameters. To benchmark our algorithm, we create and utilize a novel dataset consisting of realistic meshes, segmented point clouds, and inertial parameters for 20 common workshop tools. Finally, we demonstrate the real- world performance and accuracy of HPS by performing an intricate `hammer balancing act' autonomously and online with a low-cost collaborative robotic arm. Our code and dataset are open source and freely available.},
  address = {London, United Kingdom},
  author = {Philippe Nadeau and Matthew Giamou and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA)}},
  code = {https://github.com/utiasSTARS/inertial-identification-with-part-segmentation},
  date = {2023-05-29/2023-06-02},
  doi = {10.1109/ICRA48891.2023.10160394},
  month = {May 29--Jun. 2},
  pages = {3779--3785},
  site = {https://papers.starslab.ca/part-segmentation-for-inertial-identification/},
  title = {The Sum of Its Parts: Visual Part Segmentation for Inertial Parameter Identification of Manipulated Objects},
  url = {https://arxiv.org/abs/2302.06685},
  year = {2023}
}

To operate safely and efficiently alongside human workers, collaborative robots (cobots) require the ability to quickly understand the dynamics of manipulated objects. How- ever, traditional methods for estimating the inertial parameters of novel objects rely on motions that are necessarily fast and unsafe (to achieve a sufficient signal-to-noise ratio). In this work, we follow an alternative approach: by combining visual and force-torque measurements, we develop an inertial parameter identification algorithm that requires slow or ``stop- and-go'' motions only, and hence is ideally tailored for use around humans. Our technique, called Homogeneous Part Segmentation (HPS), leverages the observation that man-made objects are typically composed of distinct, homogeneous parts. We combine a surface-based point clustering method with a volumetric shape segmentation algorithm to quickly produce a part-level segmentation of a manipulated object; the segmented representation is then used by HPS to accurately estimate the object's inertial parameters. To benchmark our algorithm, we create and utilize a novel dataset consisting of realistic meshes, segmented point clouds, and inertial parameters for 20 common workshop tools. Finally, we demonstrate the real- world performance and accuracy of HPS by performing an intricate `hammer balancing act' autonomously and online with a low-cost collaborative robotic arm. Our code and dataset are open source and freely available.

Cross-Modal Data Fusion for 3D Instance Segmentation of Indoor Scenes

E. G. Ng

Master Thesis , University of Toronto, Toronto, Ontario, Canada, 2023.

Bibtex | Abstract | PDF

@mastersthesis{2023_Ng_Cross-Modal,
  abstract = {Tremendous strides have been made in image-based scene understanding over the past decade, thanks to larger datasets and enhanced model capacity. However, 3D-based understanding still struggles, in part because 3D data are so costly to annotate. Top 3D instance segmentation models, trained exclusively on 3D data, outperform models using both 2D and 3D data, suggesting untapped potential in merging 2D data to enrich 3D pipelines. Interestingly, while instance segmentation has not yet benefited from 2D-3D data fusion, the sequential fusion of outputs from 2D and 3D models that are trained separately does improve object detection. This thesis applies sequential fusion to instance segmentation and investigates what and where to fuse. We demonstrate that current 2D models do not perform well enough compared to 3D models to enhance instance segmentation results, but that future, higher-performing 2D models should show performance gains using the sequential fusion method.},
  address = {Toronto, Ontario, Canada},
  author = {Edwin G. Ng},
  month = {September},
  school = {University of Toronto},
  title = {Cross-Modal Data Fusion for 3D Instance Segmentation of Indoor Scenes},
  year = {2023}
}

Tremendous strides have been made in image-based scene understanding over the past decade, thanks to larger datasets and enhanced model capacity. However, 3D-based understanding still struggles, in part because 3D data are so costly to annotate. Top 3D instance segmentation models, trained exclusively on 3D data, outperform models using both 2D and 3D data, suggesting untapped potential in merging 2D data to enrich 3D pipelines. Interestingly, while instance segmentation has not yet benefited from 2D-3D data fusion, the sequential fusion of outputs from 2D and 3D models that are trained separately does improve object detection. This thesis applies sequential fusion to instance segmentation and investigates what and where to fuse. We demonstrate that current 2D models do not perform well enough compared to 3D models to enhance instance segmentation results, but that future, higher-performing 2D models should show performance gains using the sequential fusion method.

Spatiotemporal Calibration of 3D Millimetre-Wavelength Radar-Camera Pairs

E. Wise, Q. Cheng, and J. Kelly

IEEE Transactions on Robotics, vol. 39, iss. 6, pp. 4552-4566, 2023.

DOI | Bibtex | Abstract | arXiv

@article{2023_Wise_Spatiotemporal,
  abstract = {Autonomous vehicles (AVs) fuse data from multiple sensors and sensing modalities to impart a measure of robustness when operating in adverse conditions. Radars and cameras are popular choices for use in sensor fusion; although radar measurements are sparse in comparison to camera images, radar scans penetrate fog, rain, and snow. However, accurate sensor fusion depends upon knowledge of the spatial transform between the sensors and any temporal misalignment that exists in their measurement times. During the life cycle of an AV, these calibration parameters may change, so the ability to perform in-situ spatiotemporal calibration is essential to ensure reliable long-term operation. State-of-the-art 3D radar-camera spatiotemporal calibration algorithms require bespoke calibration targets that are not readily available in the field. In this paper, we describe an algorithm for targetless spatiotemporal calibration that does not require specialized infrastructure. Our approach leverages the ability of the radar unit to measure its own ego-velocity relative to a fixed, external reference frame. We analyze the identifiability of the spatiotemporal calibration problem and determine the motions necessary for calibration. Through a series of simulation studies, we characterize the sensitivity of our algorithm to measurement noise. Finally, we demonstrate accurate calibration for three real-world systems, including a handheld sensor rig and a vehicle-mounted sensor array. Our results show that we are able to match the performance of an existing, target-based method, while calibrating in arbitrary, infrastructure-free environments.},
  author = {Emmett Wise and Qilong Cheng and Jonathan Kelly},
  doi = {10.1109/TRO.2023.3311680},
  journal = {{IEEE} Transactions on Robotics},
  month = {December},
  number = {6},
  pages = {4552--4566},
  title = {Spatiotemporal Calibration of {3D} Millimetre-Wavelength Radar-Camera Pairs},
  url = {https://arxiv.org/abs/2211.01871},
  volume = {39},
  year = {2023}
}

Autonomous vehicles (AVs) fuse data from multiple sensors and sensing modalities to impart a measure of robustness when operating in adverse conditions. Radars and cameras are popular choices for use in sensor fusion; although radar measurements are sparse in comparison to camera images, radar scans penetrate fog, rain, and snow. However, accurate sensor fusion depends upon knowledge of the spatial transform between the sensors and any temporal misalignment that exists in their measurement times. During the life cycle of an AV, these calibration parameters may change, so the ability to perform in-situ spatiotemporal calibration is essential to ensure reliable long-term operation. State-of-the-art 3D radar-camera spatiotemporal calibration algorithms require bespoke calibration targets that are not readily available in the field. In this paper, we describe an algorithm for targetless spatiotemporal calibration that does not require specialized infrastructure. Our approach leverages the ability of the radar unit to measure its own ego-velocity relative to a fixed, external reference frame. We analyze the identifiability of the spatiotemporal calibration problem and determine the motions necessary for calibration. Through a series of simulation studies, we characterize the sensitivity of our algorithm to measurement noise. Finally, we demonstrate accurate calibration for three real-world systems, including a handheld sensor rig and a vehicle-mounted sensor array. Our results show that we are able to match the performance of an existing, target-based method, while calibrating in arbitrary, infrastructure-free environments.

aUToLights: A Robust Multi-Camera Traffic Light Detection and Tracking System

S. Wu, N. Amenta, J. Zhou, S. Papais, and J. Kelly

Proceedings of the Conference on Robots and Vision (CRV), Montréal, Québec, Canada, Jun. 6–8, 2023, pp. 89-96.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2023_Wu_aUToLights,
  abstract = {Following four successful years in the SAE AutoDrive Challenge Series I, the University of Toronto is participating in the Series II competition to develop a Level 4 autonomous passenger vehicle capable of handling various urban driving scenarios by 2025. Accurate detection of traffic lights and correct identification of their states is essential for safe autonomous operation in cities. Herein, we describe our recently-redesigned traffic light perception system for autonomous vehicles like the University of Toronto's self- driving car, Artemis. Similar to most traffic light perception systems, we rely primarily on camera-based object detectors. We deploy the YOLOv5 detector for bounding box regression and traffic light classification across multiple cameras and fuse the observations. To improve robustness, we incorporate priors from high-definition (HD) semantic maps and perform state filtering using hidden Markov models (HMMs). We demonstrate a multi-camera, real time-capable traffic light perception pipeline that handles complex situations including multiple visible intersections, traffic light variations, temporary occlusion, and flashing light states. To validate our system, we collected and annotated a varied dataset incorporating flashing states and a range of occlusion types. Our results show superior performance in challenging real-world scenarios compared to single-frame, single-camera object detection.},
  address = {Montr\'{e}al, Qu\'{e}bec, Canada},
  author = {Sean Wu and Nicole Amenta and Jiachen Zhou and Sandro Papais and Jonathan Kelly},
  booktitle = {Proceedings of the Conference on Robots and Vision {(CRV)}},
  date = {2023-06-06/2023-06-08},
  doi = {10.1109/CRV60082.2023.00019},
  month = {Jun. 6--8},
  pages = {89--96},
  title = {{aUToLights}: A Robust Multi-Camera Traffic Light Detection and Tracking System},
  url = {http://arxiv.org/abs/2305.08673},
  year = {2023}
}

Following four successful years in the SAE AutoDrive Challenge Series I, the University of Toronto is participating in the Series II competition to develop a Level 4 autonomous passenger vehicle capable of handling various urban driving scenarios by 2025. Accurate detection of traffic lights and correct identification of their states is essential for safe autonomous operation in cities. Herein, we describe our recently-redesigned traffic light perception system for autonomous vehicles like the University of Toronto's self- driving car, Artemis. Similar to most traffic light perception systems, we rely primarily on camera-based object detectors. We deploy the YOLOv5 detector for bounding box regression and traffic light classification across multiple cameras and fuse the observations. To improve robustness, we incorporate priors from high-definition (HD) semantic maps and perform state filtering using hidden Markov models (HMMs). We demonstrate a multi-camera, real time-capable traffic light perception pipeline that handles complex situations including multiple visible intersections, traffic light variations, temporary occlusion, and flashing light states. To validate our system, we collected and annotated a varied dataset incorporating flashing states and a range of occlusion types. Our results show superior performance in challenging real-world scenarios compared to single-frame, single-camera object detection.

CIDGIKc: Distance-Geometric Inverse Kinematics for Continuum Robots

H. J. Zhang, M. Giamou, F. Mari’c, J. Kelly, and J. Burgner-Kahrs

IEEE Robotics and Automation Letters, vol. 8, iss. 11, pp. 7679-7686, 2023.

DOI | Bibtex | Abstract | arXiv

@article{2023_Zhang_CIDGIKc,
  abstract = {The small size, high dexterity, and intrinsic compliance of continuum robots (CRs) make them well suited for constrained environments. Solving the inverse kinematics (IK), that is finding robot joint configurations that satisfy desired position or pose queries, is a fundamental challenge in motion planning, control, and calibration for any robot structure. For CRs, the need to avoid obstacles in tightly confined workspaces greatly complicates the search for feasible IK solutions. Without an accurate initialization or multiple re-starts, existing algorithms often fail to find a solution. We present CIDGIKc (Convex Iteration for Distance-Geometric Inverse Kinematics for Continuum Robots), an algorithm that solves these nonconvex feasibility problems with a sequence of semidefinite programs whose objectives are designed to encourage low-rank minimizers. CIDGIKc is enabled by a novel distance-geometric parameterization of constant curvature segment geometry for CRs with extensible segments. The resulting IK formulation involves only quadratic expressions and can efficiently incorporate a large number of collision avoidance constraints. Our experimental results demonstrate >98\% solve success rates within complex, highly cluttered environments which existing algorithms cannot account for.},
  author = {Hanna Jiamei Zhang and Matthew Giamou and Filip Mari\'{c} and Jonathan Kelly and Jessica Burgner-Kahrs},
  doi = {10.1109/LRA.2023.3322078},
  journal = {{IEEE} Robotics and Automation Letters},
  month = {November},
  number = {11},
  pages = {7679--7686},
  title = {CIDGIKc: Distance-Geometric Inverse Kinematics for Continuum Robots},
  url = {https://arxiv.org/abs/2306.13617},
  volume = {8},
  year = {2023}
}

The small size, high dexterity, and intrinsic compliance of continuum robots (CRs) make them well suited for constrained environments. Solving the inverse kinematics (IK), that is finding robot joint configurations that satisfy desired position or pose queries, is a fundamental challenge in motion planning, control, and calibration for any robot structure. For CRs, the need to avoid obstacles in tightly confined workspaces greatly complicates the search for feasible IK solutions. Without an accurate initialization or multiple re-starts, existing algorithms often fail to find a solution. We present CIDGIKc (Convex Iteration for Distance-Geometric Inverse Kinematics for Continuum Robots), an algorithm that solves these nonconvex feasibility problems with a sequence of semidefinite programs whose objectives are designed to encourage low-rank minimizers. CIDGIKc is enabled by a novel distance-geometric parameterization of constant curvature segment geometry for CRs with extensible segments. The resulting IK formulation involves only quadratic expressions and can efficiently incorporate a large number of collision avoidance constraints. Our experimental results demonstrate >98\% solve success rates within complex, highly cluttered environments which existing algorithms cannot account for.

2022

Convex Iteration for Distance-Geometric Inverse Kinematics

M. Giamou, F. Mari’c, D. M. Rosen, V. Peretroukhin, N. Roy, I. Petrovi’c, and J. Kelly

IEEE Robotics and Automation Letters, vol. 7, iss. 2, pp. 1952-1959, 2022.

@article{2022_Giamou_Convex,
  abstract = {Inverse kinematics (IK) is the problem of finding robot joint configurations that satisfy constraints on the position or pose of one or more end-effectors. For robots with redundant degrees of freedom, there is often an infinite, nonconvex set of solutions. The IK problem is further complicated when collision avoidance constraints are imposed by obstacles in the workspace. In general, closed-form expressions yielding feasible configurations do not exist, motivating the use of numerical solution methods. However, these approaches rely on local optimization of nonconvex problems, often requiring an accurate initialization or numerous re-initializations to converge to a valid solution. In this work, we first formulate inverse kinematics with complex workspace constraints as a convex feasibility problem whose low-rank feasible points provide exact IK solutions. We then present CIDGIK (Convex Iteration for Distance-Geometric Inverse Kinematics), an algorithm that solves this feasibility problem with a sequence of semidefinite programs whose objectives are designed to encourage low-rank minimizers. Our problem formulation elegantly unifies the configuration space and workspace constraints of a robot: intrinsic robot geometry and obstacle avoidance are both expressed as simple linear matrix equations and inequalities. Our experimental results for a variety of popular manipulator models demonstrate faster and more accurate convergence than a conventional nonlinear optimization-based approach, especially in environments with many obstacles.},
  author = {Matthew Giamou and Filip Mari\'{c} and David M. Rosen and Valentin Peretroukhin and Nicholas Roy and Ivan Petrovi\'{c} and Jonathan Kelly},
  code = {https://github.com/utiasSTARS/graphIK},
  doi = {10.1109/LRA.2022.3141763},
  journal = {{IEEE} Robotics and Automation Letters},
  month = {April},
  number = {2},
  pages = {1952--1959},
  title = {Convex Iteration for Distance-Geometric Inverse Kinematics},
  url = {https://arxiv.org/abs/2109.03374},
  video1 = {https://www.youtube.com/watch?v=kja7H8zQ0H8},
  volume = {7},
  year = {2022}
}

Inverse kinematics (IK) is the problem of finding robot joint configurations that satisfy constraints on the position or pose of one or more end-effectors. For robots with redundant degrees of freedom, there is often an infinite, nonconvex set of solutions. The IK problem is further complicated when collision avoidance constraints are imposed by obstacles in the workspace. In general, closed-form expressions yielding feasible configurations do not exist, motivating the use of numerical solution methods. However, these approaches rely on local optimization of nonconvex problems, often requiring an accurate initialization or numerous re-initializations to converge to a valid solution. In this work, we first formulate inverse kinematics with complex workspace constraints as a convex feasibility problem whose low-rank feasible points provide exact IK solutions. We then present CIDGIK (Convex Iteration for Distance-Geometric Inverse Kinematics), an algorithm that solves this feasibility problem with a sequence of semidefinite programs whose objectives are designed to encourage low-rank minimizers. Our problem formulation elegantly unifies the configuration space and workspace constraints of a robot: intrinsic robot geometry and obstacle avoidance are both expressed as simple linear matrix equations and inequalities. Our experimental results for a variety of popular manipulator models demonstrate faster and more accurate convergence than a conventional nonlinear optimization-based approach, especially in environments with many obstacles.

Semidefinite Relaxations for Geometric Problems in Robotics

M. Giamou

PhD Thesis , University of Toronto, Toronto, Ontario, Canada, 2022.

Bibtex | Abstract | PDF

@phdthesis{2022_Giamou_Semidefinite,
  abstract = {Mobile robots perceive and move through the three-dimensional space of the approximately Euclidean world we share with them. In order to safely and accurately accomplish their goals, they must be able to reason about the nonlinear and nonconvex geometry of the manifold of rotations. Without this ability, tracking the poses of objects from noisy measurements and avoiding obstacles in their environment becomes impossible. Traditional approaches use local information and structure to estimate and optimize rotations of interest, making them susceptible to suboptimal performance. In this dissertation, we apply recent advancements in global convex optimization to two fundamental geometric problems in robotics: extrinsic sensor calibration and inverse kinematics in cluttered workspaces. We begin with a summary and extension of the semidefinite relaxation machinery that we apply to both problems. This machinery is used to develop fast and accurate extrinsic calibration algorithms with novel performance guarantees provided by certificates of global optimality. We proceed to develop a novel perspective of inverse kinematics inspired by noisy state estimation problems, leading to fast and accurate algorithms appropriate for a variety of challenging scenarios. We provide free and open source implementations of our algorithms, and demonstrate their superiority over conventional approaches on a variety of simulated and real-world data.},
  address = {Toronto, Ontario, Canada},
  author = {Matthew Giamou},
  institution = {University of Toronto},
  month = {December},
  school = {University of Toronto},
  title = {Semidefinite Relaxations for Geometric Problems in Robotics},
  year = {2022}
}

Mobile robots perceive and move through the three-dimensional space of the approximately Euclidean world we share with them. In order to safely and accurately accomplish their goals, they must be able to reason about the nonlinear and nonconvex geometry of the manifold of rotations. Without this ability, tracking the poses of objects from noisy measurements and avoiding obstacles in their environment becomes impossible. Traditional approaches use local information and structure to estimate and optimize rotations of interest, making them susceptible to suboptimal performance. In this dissertation, we apply recent advancements in global convex optimization to two fundamental geometric problems in robotics: extrinsic sensor calibration and inverse kinematics in cluttered workspaces. We begin with a summary and extension of the semidefinite relaxation machinery that we apply to both problems. This machinery is used to develop fast and accurate extrinsic calibration algorithms with novel performance guarantees provided by certificates of global optimality. We proceed to develop a novel perspective of inverse kinematics inspired by noisy state estimation problems, leading to fast and accurate algorithms appropriate for a variety of challenging scenarios. We provide free and open source implementations of our algorithms, and demonstrate their superiority over conventional approaches on a variety of simulated and real-world data.

A Study of Observability-Aware Trajectory Optimization

C. Grebe

Master Thesis , University of Toronto, Toronto, Ontario, Canada, 2022.

Bibtex | Abstract

@mastersthesis{2022_Grebe_Study,
  abstract = {Ideally, robots should move in ways that maximize knowledge gained about the state of both their internal system and external operating environment. Recently, observability- based metrics have been proposed to find trajectories that enable rapid and accurate estimation. A system is observable, roughly, if relevant states and parameters can be recovered from measurements over finite time. Degree of observability has been applied as a metric to optimize motion to produce more observable trajectories that yield better estimation accuracy. The viability of methods for observability-aware trajectory optimization are not yet well understood in the literature. In this thesis, we compare two state-of-the-art methods for trajectory optimization and seek to add important theoretical clarifications and valuable discussion about their effectiveness. For evaluation, we examine the representative task of sensor-to-sensor extrinsic self-calibration using a realistic physics simulator. We also study the sensitivity of these algorithms to changes in information content of exteroceptive sensor measurements. },
  address = {Toronto, Ontario, Canada},
  author = {Christopher Grebe},
  month = {January},
  school = {University of Toronto},
  title = {A Study of Observability-Aware Trajectory Optimization},
  year = {2022}
}

Ideally, robots should move in ways that maximize knowledge gained about the state of both their internal system and external operating environment. Recently, observability- based metrics have been proposed to find trajectories that enable rapid and accurate estimation. A system is observable, roughly, if relevant states and parameters can be recovered from measurements over finite time. Degree of observability has been applied as a metric to optimize motion to produce more observable trajectories that yield better estimation accuracy. The viability of methods for observability-aware trajectory optimization are not yet well understood in the literature. In this thesis, we compare two state-of-the-art methods for trajectory optimization and seek to add important theoretical clarifications and valuable discussion about their effectiveness. For evaluation, we examine the representative task of sensor-to-sensor extrinsic self-calibration using a realistic physics simulator. We also study the sensitivity of these algorithms to changes in information content of exteroceptive sensor measurements.

Learning to Detect Slip with Barometric Tactile Sensors and a Temporal Convolutional Neural Network

A. Grover, P. Nadeau, C. Grebe, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, Pennsylvania, USA, May 23–27, 2022, pp. 570-576.

DOI | Bibtex | Abstract | arXiv | Video

@inproceedings{2022_Grover_Learning,
  abstract = {The ability to perceive object slip via tactile feedback enables humans to accomplish complex manipulation tasks including maintaining a stable grasp. Despite the utility of tactile information for many applications, tactile sensors have yet to be widely deployed in industrial robotics settings; part of the challenge lies in identifying slip and other events from the tactile data stream. In this paper, we present a learning-based method to detect slip using barometric tactile sensors. These sensors have many desirable properties including high durability and reliability, and are built from inexpensive, off-the-shelf components. We train a temporal convolution neural network to detect slip, achieving high detection accuracies while displaying robustness to the speed and direction of the slip motion. Further, we test our detector on two manipulation tasks involving a variety of common objects and demonstrate successful generalization to real-world scenarios not seen during training. We argue that barometric tactile sensing technology, combined with data-driven learning, is suitable for many manipulation tasks such as slip compensation.},
  address = {Philadelphia, Pennsylvania, USA},
  author = {Abhinav Grover and Philippe Nadeau and Christopher Grebe and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA)}},
  date = {2022-05-23/2022-05-27},
  doi = {10.1109/ICRA46639.2022.9811592},
  month = {May 23--27},
  pages = {570--576},
  title = {Learning to Detect Slip with Barometric Tactile Sensors and a Temporal Convolutional Neural Network},
  url = {https://arxiv.org/abs/2202.09549},
  video1 = {https://www.youtube.com/watch?v=N9MWBpkIJPM},
  year = {2022}
}

The ability to perceive object slip via tactile feedback enables humans to accomplish complex manipulation tasks including maintaining a stable grasp. Despite the utility of tactile information for many applications, tactile sensors have yet to be widely deployed in industrial robotics settings; part of the challenge lies in identifying slip and other events from the tactile data stream. In this paper, we present a learning-based method to detect slip using barometric tactile sensors. These sensors have many desirable properties including high durability and reliability, and are built from inexpensive, off-the-shelf components. We train a temporal convolution neural network to detect slip, achieving high detection accuracies while displaying robustness to the speed and direction of the slip motion. Further, we test our detector on two manipulation tasks involving a variety of common objects and demonstrate successful generalization to real-world scenarios not seen during training. We argue that barometric tactile sensing technology, combined with data-driven learning, is suitable for many manipulation tasks such as slip compensation.

A Contrastive Learning Framework for Self-Supervised Pre-Training of 3D Point Cloud Networks with Visual Data

A. Janda

Master Thesis , University of Toronto, Toronto, Ontario, Canada, 2022.

Bibtex | Abstract | PDF

@mastersthesis{2022_Janda_Contrastive,
  abstract = {Reducing the quantity of annotations required during supervised training is vital when labels are scarce and costly. This reduction is especially important for segmentation tasks involving 3D datasets, which are often significantly smaller, and more challenging to annotate, than their image-based counterparts. Self-supervised pre-training on large unlabeled datasets is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point cloud data exclusively; this approach often requires two or more registered views. In this thesis, we combine image and point modalities, by first learning self-supervised image features and then using these features to train a 3D model. By incorporating visual data, which is often included in many 3D datasets, our pre-training method requires a single scan of a scene only. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.},
  address = {Toronto, Ontario, Canada},
  author = {Andrej Janda},
  month = {September},
  school = {University of Toronto},
  title = {A Contrastive Learning Framework for Self-Supervised Pre-Training of 3D Point Cloud Networks with Visual Data},
  year = {2022}
}

Reducing the quantity of annotations required during supervised training is vital when labels are scarce and costly. This reduction is especially important for segmentation tasks involving 3D datasets, which are often significantly smaller, and more challenging to annotate, than their image-based counterparts. Self-supervised pre-training on large unlabeled datasets is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point cloud data exclusively; this approach often requires two or more registered views. In this thesis, we combine image and point modalities, by first learning self-supervised image features and then using these features to train a 3D model. By incorporating visual data, which is often included in many 3D datasets, our pre-training method requires a single scan of a scene only. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.

Self-Supervised Pre-Training of 3D Point Cloud Networks with Image Data

A. Janda, B. Wagstaff, E. G. Ng, and J. Kelly

Proceedings of the Conference on Robot Learning (CoRL) Workshop on Pre-Training Robot Learning, Auckland, New Zealand, Dec. 15, 2022.

Bibtex | Abstract | arXiv

@inproceedings{2022_Janda_Self-Supervised,
  abstract = {Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is especially important for semantic segmentation tasks involving 3D datasets that are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on large unlabelled datasets is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point cloud data exclusively; this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities, by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.},
  address = {Auckland, New Zealand},
  author = {Andrej Janda and Brandon Wagstaff and Edwin G. Ng and Jonathan Kelly},
  booktitle = {Proceedings of the Conference on Robot Learning {(CoRL)} Workshop on Pre-Training Robot Learning},
  date = {2022-12-15},
  month = {Dec. 15},
  note = {Dyson Best Paper Award},
  title = {Self-Supervised Pre-Training of 3D Point Cloud Networks with Image Data},
  url = {https://arxiv.org/abs/2211.11801},
  year = {2022}
}

Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is especially important for semantic segmentation tasks involving 3D datasets that are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on large unlabelled datasets is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point cloud data exclusively; this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities, by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.

Dyson Best Paper Award

Capsule robot pose and mechanism state detection in ultrasound using attention-based hierarchical deep learning

X. Liu, D. Esser, B. Wagstaff, A. Zavodni, N. Matsuura, J. Kelly, and E. Diller

Scientific Reports, vol. 12, iss. 1, p. 21130, 2022.

DOI | Bibtex | Abstract

@article{2022_Liu_Capsule,
  abstract = {Ingestible robotic capsules with locomotion capabilities and on-board sampling mechanism have great potential for non-invasive diagnostic and interventional use in the gastrointestinal tract. Real-time tracking of capsule location and operational state is necessary for clinical application, yet remains a significant challenge. To this end, we propose an approach that can simultaneously determine the mechanism state and in-plane 2D pose of millimeter capsule robots in an anatomically representative environment using ultrasound imaging. Our work proposes an attention-based hierarchical deep learning approach and adapts the success of transfer learning towards solving the multi-task tracking problem with limited dataset. To train the neural networks, we generate a representative dataset of a robotic capsule within ex-vivo porcine stomachs. Experimental results show that the accuracy of capsule state classification is 97\%, and the mean estimation errors for orientation and centroid position are 2.0 degrees and 0.24 mm (1.7\% of the capsule's body length) on the hold-out test set. Accurate detection of the capsule while manipulated by an external magnet in a porcine stomach and colon is also demonstrated. The results suggest our proposed method has the potential for advancing the wireless capsule-based technologies by providing accurate detection of capsule robots in clinical scenarios.},
  author = {Xiaoyun Liu and Daniel Esser and Brandon Wagstaff and Anna Zavodni and Naomi Matsuura and Jonathan Kelly and Eric Diller},
  doi = {10.1038/s41598-022-25572-w},
  journal = {Scientific Reports},
  number = {1},
  pages = {21130},
  title = {Capsule robot pose and mechanism state detection in ultrasound using attention-based hierarchical deep learning},
  volume = {12},
  year = {2022}
}

Ingestible robotic capsules with locomotion capabilities and on-board sampling mechanism have great potential for non-invasive diagnostic and interventional use in the gastrointestinal tract. Real-time tracking of capsule location and operational state is necessary for clinical application, yet remains a significant challenge. To this end, we propose an approach that can simultaneously determine the mechanism state and in-plane 2D pose of millimeter capsule robots in an anatomically representative environment using ultrasound imaging. Our work proposes an attention-based hierarchical deep learning approach and adapts the success of transfer learning towards solving the multi-task tracking problem with limited dataset. To train the neural networks, we generate a representative dataset of a robotic capsule within ex-vivo porcine stomachs. Experimental results show that the accuracy of capsule state classification is 97\%, and the mean estimation errors for orientation and centroid position are 2.0 degrees and 0.24 mm (1.7\% of the capsule's body length) on the hold-out test set. Accurate detection of the capsule while manipulated by an external magnet in a porcine stomach and colon is also demonstrated. The results suggest our proposed method has the potential for advancing the wireless capsule-based technologies by providing accurate detection of capsule robots in clinical scenarios.

Riemannian Optimization for Distance-Geometric Inverse Kinematics

F. Mari’c, M. Giamou, A. W. Hall, S. Khoubyarian, I. Petrovi’c, and J. Kelly

IEEE Transactions on Robotics, vol. 38, iss. 3, pp. 1703-1722, 2022.

@article{2022_Maric_Riemannian,
  abstract = {Solving the inverse kinematics problem is a fundamental challenge in motion planning, control, and calibration for articulated robots. Kinematic models for these robots are typically parametrized by joint angles, generating a complicated mapping between the robot configuration and the end-effector pose. Alternatively, the kinematic model and task constraints can be represented using invariant distances between points attached to the robot. In this paper, we formalize the equivalence of distance-based inverse kinematics and the distance geometry problem for a large class of articulated robots and task constraints. Unlike previous approaches, we use the connection between distance geometry and low-rank matrix completion to find inverse kinematics solutions by completing a partial Euclidean distance matrix through local optimization. Furthermore, we parametrize the space of Euclidean distance matrices with the Riemannian manifold of fixed-rank Gram matrices, allowing us to leverage a variety of mature Riemannian optimization methods. Finally, we show that bound smoothing can be used to generate informed initializations without significant computational overhead, improving convergence. We demonstrate that our inverse kinematics solver achieves higher success rates than traditional techniques, and substantially outperforms them on problems that involve many workspace constraints.},
  author = {Filip Mari\'{c} and Matthew Giamou and Adam W. Hall and Soroush Khoubyarian and Ivan Petrovi\'{c} and Jonathan Kelly},
  code = {https://github.com/utiasSTARS/graphIK},
  doi = {10.1109/TRO.2021.3123841},
  journal = {{IEEE} Transactions on Robotics},
  month = {June},
  number = {3},
  pages = {1703--1722},
  title = {Riemannian Optimization for Distance-Geometric Inverse Kinematics},
  url = {https://arxiv.org/abs/2108.13720},
  video1 = {https://www.youtube.com/watch?v=mQ01Nqa7yHI},
  volume = {38},
  year = {2022}
}

Solving the inverse kinematics problem is a fundamental challenge in motion planning, control, and calibration for articulated robots. Kinematic models for these robots are typically parametrized by joint angles, generating a complicated mapping between the robot configuration and the end-effector pose. Alternatively, the kinematic model and task constraints can be represented using invariant distances between points attached to the robot. In this paper, we formalize the equivalence of distance-based inverse kinematics and the distance geometry problem for a large class of articulated robots and task constraints. Unlike previous approaches, we use the connection between distance geometry and low-rank matrix completion to find inverse kinematics solutions by completing a partial Euclidean distance matrix through local optimization. Furthermore, we parametrize the space of Euclidean distance matrices with the Riemannian manifold of fixed-rank Gram matrices, allowing us to leverage a variety of mature Riemannian optimization methods. Finally, we show that bound smoothing can be used to generate informed initializations without significant computational overhead, improving convergence. We demonstrate that our inverse kinematics solver achieves higher success rates than traditional techniques, and substantially outperforms them on problems that involve many workspace constraints.

LaTeRF: Label and Text Driven Object Radiance Fields

A. Mirzaei, Y. Kant, J. Kelly, and I. Gilitschenski

in Computer Vision — ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III , S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds., Cham: Springer Nature Switzerland, 2022, vol. 13663, pp. 20-36.

DOI | Bibtex | Abstract | arXiv

@incollection{2022_Mirzaei_LaTeRF,
  abstract = {Obtaining 3D object representations is important for creating photo-realistic simulations and for collecting AR and VR assets. Neural fields have shown their effectiveness in learning a continuous volumetric representation of a scene from 2D images, but acquiring object representations from these models with weak supervision remains an open challenge. In this paper we introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene, known camera poses, a natural language description of the object, and a set of point-labels of object and non-object points in the input images. To faithfully extract the object from the scene, LaTeRF extends the NeRF formulation with an additional `objectness' probability at each 3D point. Additionally, we leverage the rich latent space of a pre-trained CLIP model combined with our differentiable object renderer, to inpaint the occluded parts of the object. We demonstrate high-fidelity object extraction on both synthetic and real-world datasets and justify our design choices through an extensive ablation study.},
  address = {Cham},
  author = {Ashkan Mirzaei and Yash Kant and Jonathan Kelly and Igor Gilitschenski},
  booktitle = {Computer Vision -- {ECCV 2022}: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part III},
  doi = {10.1007/978-3-031-20062-5_2},
  editor = {Shai Avidan and Gabriel Brostow and Moustapha Ciss\'{e} and Giovanni Maria Farinella and Tal Hassner},
  isbn = {978-3-031-20062-5},
  pages = {20--36},
  publisher = {Springer Nature Switzerland},
  series = {Lecture Notes in Computer Science},
  title = {{LaTeRF}: Label and Text Driven Object Radiance Fields},
  url = {https://arxiv.org/abs/2207.01583},
  volume = {13663},
  year = {2022}
}

Obtaining 3D object representations is important for creating photo-realistic simulations and for collecting AR and VR assets. Neural fields have shown their effectiveness in learning a continuous volumetric representation of a scene from 2D images, but acquiring object representations from these models with weak supervision remains an open challenge. In this paper we introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene, known camera poses, a natural language description of the object, and a set of point-labels of object and non-object points in the input images. To faithfully extract the object from the scene, LaTeRF extends the NeRF formulation with an additional `objectness' probability at each 3D point. Additionally, we leverage the rich latent space of a pre-trained CLIP model combined with our differentiable object renderer, to inpaint the occluded parts of the object. We demonstrate high-fidelity object extraction on both synthetic and real-world datasets and justify our design choices through an extensive ablation study.

Fast Object Inertial Parameter Identification for Collaborative Robots

P. Nadeau, M. Giamou, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, Pennsylvania, USA, May 23–27, 2022, pp. 3560-3566.

DOI | Bibtex | Abstract | arXiv | Video

@inproceedings{2022_Nadeau_Fast,
  abstract = {Collaborative robots (cobots) are machines designed to work safely alongside people in human-centric environments. Providing cobots with the ability to quickly infer the inertial parameters of manipulated objects will improve their flexibility and enable greater usage in manufacturing and other areas. To ensure safety, cobots are subject to kinematic limits that result in low signal-to-noise ratios (SNR) for velocity, acceleration, and force-torque data. This renders existing inertial parameter identification algorithms prohibitively slow and inaccurate. Motivated by the desire for faster model acquisition, we investigate the use of an approximation of rigid body dynamics to improve the SNR. Additionally, we introduce a mass discretization method that can make use of shape information to quickly identify plausible inertial parameters for a manipulated object. We present extensive simulation studies and real-world experiments demonstrating that our approach complements existing inertial parameter identification methods by specifically targeting the typical cobot operating regime.},
  address = {Philadelphia, Pennsylvania, USA},
  author = {Philippe Nadeau and Matthew Giamou and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA)}},
  date = {2022-05-23/2022-05-27},
  doi = {10.1109/ICRA46639.2022.9916213},
  month = {May 23--27},
  pages = {3560--3566},
  title = {Fast Object Inertial Parameter Identification for Collaborative Robots},
  url = {https://arxiv.org/abs/2203.00830},
  video1 = {https://www.youtube.com/watch?v=wB7FrtRpLNI},
  year = {2022}
}

Collaborative robots (cobots) are machines designed to work safely alongside people in human-centric environments. Providing cobots with the ability to quickly infer the inertial parameters of manipulated objects will improve their flexibility and enable greater usage in manufacturing and other areas. To ensure safety, cobots are subject to kinematic limits that result in low signal-to-noise ratios (SNR) for velocity, acceleration, and force-torque data. This renders existing inertial parameter identification algorithms prohibitively slow and inaccurate. Motivated by the desire for faster model acquisition, we investigate the use of an approximation of rigid body dynamics to improve the SNR. Additionally, we introduce a mass discretization method that can make use of shape information to quickly identify plausible inertial parameters for a manipulated object. We present extensive simulation studies and real-world experiments demonstrating that our approach complements existing inertial parameter identification methods by specifically targeting the typical cobot operating regime.

On the Coupling of Depth and Egomotion Networks for Self-Supervised Structure from Motion

B. Wagstaff, V. Peretroukhin, and J. Kelly

IEEE Robotics and Automation Letters, vol. 7, iss. 3, pp. 6766-6773, 2022.

@article{2022_Wagstaff_Coupling,
  abstract = {Structure from motion (SfM) has recently been formulated as a self-supervised learning problem, where neural network models of depth and egomotion are learned jointly through view synthesis. Herein, we address the open problem of how to best couple, or link, the depth and egomotion network components, so that information such as a common scale factor can be shared between the networks. Towards this end, we introduce several notions of coupling, categorize existing approaches, and present a novel tightly-coupled approach that leverages the interdependence of depth and egomotion at training time and at test time. Our approach uses iterative view synthesis to recursively update the egomotion network input, permitting contextual information to be passed between the components. We demonstrate through substantial experiments that our approach promotes consistency between the depth and egomotion predictions at test time, improves generalization, and leads to state-of-the-art accuracy on indoor and outdoor depth and egomotion evaluation benchmarks.},
  author = {Brandon Wagstaff and Valentin Peretroukhin and Jonathan Kelly},
  code = {https://github.com/utiasSTARS/tightly-coupled-SfM},
  doi = {10.1109/LRA.2022.3176087},
  journal = {{IEEE} Robotics and Automation Letters},
  month = {July},
  number = {3},
  pages = {6766--6773},
  title = {On the Coupling of Depth and Egomotion Networks for Self-Supervised Structure from Motion},
  url = {https://arxiv.org/abs/2106.04007},
  video1 = {https://www.youtube.com/watch?v=6QEDCooyUjE},
  volume = {7},
  year = {2022}
}

Structure from motion (SfM) has recently been formulated as a self-supervised learning problem, where neural network models of depth and egomotion are learned jointly through view synthesis. Herein, we address the open problem of how to best couple, or link, the depth and egomotion network components, so that information such as a common scale factor can be shared between the networks. Towards this end, we introduce several notions of coupling, categorize existing approaches, and present a novel tightly-coupled approach that leverages the interdependence of depth and egomotion at training time and at test time. Our approach uses iterative view synthesis to recursively update the egomotion network input, permitting contextual information to be passed between the components. We demonstrate through substantial experiments that our approach promotes consistency between the depth and egomotion predictions at test time, improves generalization, and leads to state-of-the-art accuracy on indoor and outdoor depth and egomotion evaluation benchmarks.

Data-Driven Models for Robust Egomotion Estimation

B. Wagstaff

PhD Thesis , University of Toronto, Toronto, Ontario, Canada, 2022.

Bibtex | Abstract | PDF

@phdthesis{2022_Wagstaff_Data-Driven,
  abstract = {In many modern autonomy applications, robots are required to operate safely and reliably within complex environments, alongside other dynamic agents such as humans. To meet these requirements, localization algorithms for robots and humans must be developed that can maintain accurate pose estimates, despite being subjected to a range of adverse operating conditions. Further, the development of self-localization algorithms that enable mobile agents to maintain an estimate of their own pose is particularly important for improved autonomy. At the heart of self-localization is egomotion estimation, which is the process of determining the motion of a mobile agent over time using a stream of body-mounted sensor measurements. Body-mounted sensors such as cameras and inertial measurement units are self-contained, lightweight, and inexpensive, making them ideal candidates for self-localization. Traditional approaches to egomotion estimation are based on handcrafted models that achieve a high degree of accuracy while operating under a range of nominal conditions, but are prone to failure when the assumptions no longer hold. In this dissertation, we investigate how data-driven, or learned, models can be leveraged within the egomotion estimation pipeline to improve upon existing classical approaches. In particular, we develop a number of hybrid and end-to-end systems for inertial and visual egomotion estimation. The hybrid systems replace brittle components of classical egomotion estimators with data-driven models, while the end-to-end systems solely use neural networks that are trained to directly map from sensor data to egomotion predictions. We employ these data-driven systems for self-localization in pedestrian navigation, urban driving, and unmanned aerial vehicle applications. In these domains, we benchmark our systems on several real-world datasets, including a pedestrian navigation dataset that we collected at the University of Toronto. Our experiments demonstrate that, in challenging environments where classical estimation frameworks fail, data-driven systems are viable candidates for maintaining self-localization accuracy.},
  address = {Toronto, Ontario, Canada},
  author = {Brandon Wagstaff},
  institution = {University of Toronto},
  month = {November},
  school = {University of Toronto},
  title = {Data-Driven Models for Robust Egomotion Estimation},
  year = {2022}
}

In many modern autonomy applications, robots are required to operate safely and reliably within complex environments, alongside other dynamic agents such as humans. To meet these requirements, localization algorithms for robots and humans must be developed that can maintain accurate pose estimates, despite being subjected to a range of adverse operating conditions. Further, the development of self-localization algorithms that enable mobile agents to maintain an estimate of their own pose is particularly important for improved autonomy. At the heart of self-localization is egomotion estimation, which is the process of determining the motion of a mobile agent over time using a stream of body-mounted sensor measurements. Body-mounted sensors such as cameras and inertial measurement units are self-contained, lightweight, and inexpensive, making them ideal candidates for self-localization. Traditional approaches to egomotion estimation are based on handcrafted models that achieve a high degree of accuracy while operating under a range of nominal conditions, but are prone to failure when the assumptions no longer hold. In this dissertation, we investigate how data-driven, or learned, models can be leveraged within the egomotion estimation pipeline to improve upon existing classical approaches. In particular, we develop a number of hybrid and end-to-end systems for inertial and visual egomotion estimation. The hybrid systems replace brittle components of classical egomotion estimators with data-driven models, while the end-to-end systems solely use neural networks that are trained to directly map from sensor data to egomotion predictions. We employ these data-driven systems for self-localization in pedestrian navigation, urban driving, and unmanned aerial vehicle applications. In these domains, we benchmark our systems on several real-world datasets, including a pedestrian navigation dataset that we collected at the University of Toronto. Our experiments demonstrate that, in challenging environments where classical estimation frameworks fail, data-driven systems are viable candidates for maintaining self-localization accuracy.

A Self-Supervised, Differentiable Kalman Filter for Uncertainty-Aware Visual-Inertial Odometry

B. Wagstaff, E. Wise, and J. Kelly

Proceedings of the IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), Sapporo, Japan, Jul. 11–15, 2022, pp. 1388-1395.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2022_Wagstaff_Self-Supervised,
  abstract = {Visual-inertial odometry (VIO) systems traditionally rely on filtering or optimization-based techniques for egomotion estimation. While these methods are accurate under nominal conditions, they are prone to failure during severe illumination changes, rapid camera motions, or on low-texture image sequences. Learning-based systems have the potential to out- perform classical implementations in challenging environments, but, currently, do not perform as well as classical methods in nominal settings. Herein, we introduce a framework for training a hybrid VIO system that leverages the advantages of learning and standard filtering-based state estimation. Our approach is built upon a differentiable Kalman filter, with an IMU-driven process model and a robust, neural network-derived relative pose measurement model. The use of the Kalman filter framework enables the principled treatment of uncertainty at training time and at test time. We show that our self-supervised loss formulation outperforms a similar, supervised method, while also enabling online retraining. We evaluate our system on a visually degraded version of the EuRoC dataset and find that our estimator operates without a significant reduction in accuracy in cases where classical estimators consistently diverge. Finally, by properly utilizing the metric information contained in the IMU measurements, our system is able to recover metric scene scale, while other self-supervised monocular VIO approaches cannot.},
  address = {Sapporo, Japan},
  author = {Brandon Wagstaff and Emmett Wise and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE/ASME} International Conference on Advanced Intelligent Mechatronics {(AIM)}},
  date = {2022-07-11/2022-07-15},
  doi = {10.1109/AIM52237.2022.9863270},
  month = {Jul. 11--15},
  pages = {1388--1395},
  title = {A Self-Supervised, Differentiable {Kalman} Filter for Uncertainty-Aware Visual-Inertial Odometry},
  url = {https://arxiv.org/abs/2203.07207},
  year = {2022}
}

Visual-inertial odometry (VIO) systems traditionally rely on filtering or optimization-based techniques for egomotion estimation. While these methods are accurate under nominal conditions, they are prone to failure during severe illumination changes, rapid camera motions, or on low-texture image sequences. Learning-based systems have the potential to out- perform classical implementations in challenging environments, but, currently, do not perform as well as classical methods in nominal settings. Herein, we introduce a framework for training a hybrid VIO system that leverages the advantages of learning and standard filtering-based state estimation. Our approach is built upon a differentiable Kalman filter, with an IMU-driven process model and a robust, neural network-derived relative pose measurement model. The use of the Kalman filter framework enables the principled treatment of uncertainty at training time and at test time. We show that our self-supervised loss formulation outperforms a similar, supervised method, while also enabling online retraining. We evaluate our system on a visually degraded version of the EuRoC dataset and find that our estimator operates without a significant reduction in accuracy in cases where classical estimators consistently diverge. Finally, by properly utilizing the metric information contained in the IMU measurements, our system is able to recover metric scene scale, while other self-supervised monocular VIO approaches cannot.

2021

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

T. Ablett, B. Chan, and J. Kelly

Proceedings of the Neural Information Processing Systems (NeurIPS) Deep Reinforcement Learning Workshop, Dec. 13, 2021.

@inproceedings{2021_Ablett_Learning,
  abstract = {Effective exploration continues to be a significant challenge that prevents the deployment of reinforcement learning for many physical systems. This is particularly true for systems with continuous and high-dimensional state and action spaces, such as robotic manipulators. The challenge is accentuated in the sparse rewards setting, where the low-level state information required for the design of dense rewards is unavailable. Adversarial imitation learning (AIL) can partially overcome this barrier by leveraging expert-generated demonstrations of optimal behaviour and providing, essentially, a replacement for dense reward information. Unfortunately, the availability of expert demonstrations does not necessarily improve an agent's capability to explore effectively and, as we empirically show, can lead to inefficient or stagnated learning. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks. Subsequently, a hierarchical model is used to learn each task reward and policy through a modified AIL procedure, in which exploration of all tasks is enforced via a scheduler composing different tasks together. This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible. Our experimental results in a challenging multitask robotic manipulation domain indicate that our method compares favourably to supervised imitation learning and to a state-of-the-art AIL method. Code is available at https://github.com/utiasSTARS/lfgp.},
  author = {Trevor Ablett and Bryan Chan and Jonathan Kelly},
  booktitle = {Proceedings of the Neural Information Processing Systems {(NeurIPS)} Deep Reinforcement Learning Workshop},
  code = {https://github.com/utiasSTARS/lfgp},
  date = {2021-12-13},
  month = {Dec. 13},
  site = {https://papers.starslab.ca/lfgp/},
  title = {Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning},
  url = {https://arxiv.org/abs/2112.08932},
  video1 = {https://slideslive.com/38971121/learning-from-guided-play-a-scheduled-hierarchical-approach-for-improving-exploration-in-adversarial-imitation-learning},
  year = {2021}
}

Effective exploration continues to be a significant challenge that prevents the deployment of reinforcement learning for many physical systems. This is particularly true for systems with continuous and high-dimensional state and action spaces, such as robotic manipulators. The challenge is accentuated in the sparse rewards setting, where the low-level state information required for the design of dense rewards is unavailable. Adversarial imitation learning (AIL) can partially overcome this barrier by leveraging expert-generated demonstrations of optimal behaviour and providing, essentially, a replacement for dense reward information. Unfortunately, the availability of expert demonstrations does not necessarily improve an agent's capability to explore effectively and, as we empirically show, can lead to inefficient or stagnated learning. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks. Subsequently, a hierarchical model is used to learn each task reward and policy through a modified AIL procedure, in which exploration of all tasks is enforced via a scheduler composing different tasks together. This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible. Our experimental results in a challenging multitask robotic manipulation domain indicate that our method compares favourably to supervised imitation learning and to a state-of-the-art AIL method. Code is available at https://github.com/utiasSTARS/lfgp.

Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations

T. Ablett, Y. Zhai, and J. Kelly

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, Sep. 27–Oct. 1, 2021, pp. 7843-7850.

@inproceedings{2021_Ablett_Seeing,
  abstract = {Learned visuomotor policies have shown considerable success as an alternative to traditional, hand-crafted frameworks for robotic manipulation. Surprisingly, an extension of these methods to the multiview domain is relatively unexplored. A successful multiview policy could be deployed on a mobile manipulation platform, allowing the robot to complete a task regardless of its view of the scene. In this work, we demonstrate that a multiview policy can be found through imitation learning by collecting data from a variety of viewpoints. We illustrate the general applicability of the method by learning to complete several challenging multi-stage and contact-rich tasks, from numerous viewpoints, both in a simulated environment and on a real mobile manipulation platform. Furthermore, we analyze our policies to determine the benefits of learning from multiview data compared to learning with data collected from a fixed perspective. We show that learning from multiview data results in little, if any, penalty to performance for a fixed-view task compared to learning with an equivalent amount of fixed-view data. Finally, we examine the visual features learned by the multiview and fixed-view policies. Our results indicate that multiview policies implicitly learn to identify spatially correlated features.},
  address = {Prague, Czech Republic},
  author = {Trevor Ablett and Yifan Zhai and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS)}},
  code = {https://github.com/utiasSTARS/multiview-manipulation},
  date = {2021-09-27/2021-10-01},
  doi = {10.1109/IROS51168.2021.9636440},
  month = {Sep. 27--Oct. 1},
  pages = {7843--7850},
  site = {https://papers.starslab.ca/multiview-manipulation/},
  title = {Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations},
  url = {http://arxiv.org/abs/2104.13907},
  video1 = {https://www.youtube.com/watch?v=oh0JMeyoswg},
  year = {2021}
}

Learned visuomotor policies have shown considerable success as an alternative to traditional, hand-crafted frameworks for robotic manipulation. Surprisingly, an extension of these methods to the multiview domain is relatively unexplored. A successful multiview policy could be deployed on a mobile manipulation platform, allowing the robot to complete a task regardless of its view of the scene. In this work, we demonstrate that a multiview policy can be found through imitation learning by collecting data from a variety of viewpoints. We illustrate the general applicability of the method by learning to complete several challenging multi-stage and contact-rich tasks, from numerous viewpoints, both in a simulated environment and on a real mobile manipulation platform. Furthermore, we analyze our policies to determine the benefits of learning from multiview data compared to learning with data collected from a fixed perspective. We show that learning from multiview data results in little, if any, penalty to performance for a fixed-view task compared to learning with an equivalent amount of fixed-view data. Finally, we examine the visual features learned by the multiview and fixed-view policies. Our results indicate that multiview policies implicitly learn to identify spatially correlated features.

Observability-Aware Trajectory Optimization: Theory, Viability, and State of the Art

C. Grebe, E. Wise, and J. Kelly

Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Karlsruhe, Germany, Sep. 23–25, 2021.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2021_Grebe_Observability-Aware,
  abstract = {Ideally, robots should move in ways that maximize the knowledge gained about the state of both their internal system and the external operating environment. Trajectory design is a challenging problem that has been investigated from a variety of perspectives, ranging from information-theoretic analyses to leaning-based approaches. Recently, observability-based metrics have been proposed to find trajectories that enable rapid and accurate state and parameter estimation. The viability and efficacy of these methods is not yet well understood in the literature. In this paper, we compare two state-of-the-art methods for observability-aware trajectory optimization and seek to add important theoretical clarifications and valuable discussion about their overall effectiveness. For evaluation, we examine the representative task of sensor-to-sensor extrinsic self-calibration using a realistic physics simulator. We also study the sensitivity of these algorithms to changes in the information content of the exteroceptive sensor measurements.},
  address = {Karlsruhe, Germany},
  author = {Christopher Grebe and Emmett Wise and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Multisensor Fusion and Integration for Intelligent Systems {(MFI)}},
  date = {2021-09-23/2021-09-25},
  doi = {10.1109/MFI52462.2021.9591177},
  month = {Sep. 23--25},
  title = {Observability-Aware Trajectory Optimization: Theory, Viability, and State of the Art},
  url = {https://arxiv.org/abs/2109.09007},
  year = {2021}
}

Ideally, robots should move in ways that maximize the knowledge gained about the state of both their internal system and the external operating environment. Trajectory design is a challenging problem that has been investigated from a variety of perspectives, ranging from information-theoretic analyses to leaning-based approaches. Recently, observability-based metrics have been proposed to find trajectories that enable rapid and accurate state and parameter estimation. The viability and efficacy of these methods is not yet well understood in the literature. In this paper, we compare two state-of-the-art methods for observability-aware trajectory optimization and seek to add important theoretical clarifications and valuable discussion about their overall effectiveness. For evaluation, we examine the representative task of sensor-to-sensor extrinsic self-calibration using a realistic physics simulator. We also study the sensitivity of these algorithms to changes in the information content of the exteroceptive sensor measurements.

Learning to Detect Slip Using Barometric Tactile Sensors

A. Grover

Master Thesis , University of Toronto, Toronto, Ontario, Canada, 2021.

Bibtex | Abstract | PDF

@mastersthesis{2021_Grover_Learning,
  abstract = {The ability to perceive object slip through tactile feedback allows humans to accomplish complex manipulation tasks. For robots, however, detecting key events such as slip from tactile information is a challenge. This work explores a learning- based method to detect slip using barometric tactile sensors that have many desirable properties; they are durable, highly reliable, and built from inexpensive components. We collect a novel dataset specifically targeted for robustness, and train a TCN to detect slip. The trained detector achieves an accuracy of greater than 91\% on test data while displaying robustness to the speed and direction of the slip motion. When tested on two robot manipulation tasks involving a variety of common objects, our detector demonstrates generalization to previously unseen objects. This is the first time that barometric tactile-sensing technology, combined with data-driven learning, has been used for a manipulation task like slip detection.},
  address = {Toronto, Ontario, Canada},
  author = {Abhinav Grover},
  month = {September},
  school = {University of Toronto},
  title = {Learning to Detect Slip Using Barometric Tactile Sensors},
  year = {2021}
}

The ability to perceive object slip through tactile feedback allows humans to accomplish complex manipulation tasks. For robots, however, detecting key events such as slip from tactile information is a challenge. This work explores a learning- based method to detect slip using barometric tactile sensors that have many desirable properties; they are durable, highly reliable, and built from inexpensive components. We collect a novel dataset specifically targeted for robustness, and train a TCN to detect slip. The trained detector achieves an accuracy of greater than 91\% on test data while displaying robustness to the speed and direction of the slip motion. When tested on two robot manipulation tasks involving a variety of common objects, our detector demonstrates generalization to previously unseen objects. This is the first time that barometric tactile-sensing technology, combined with data-driven learning, has been used for a manipulation task like slip detection.

Under Pressure: Learning to Detect Slip with Barometric Tactile Sensors

A. Grover, C. Grebe, P. Nadeau, and J. Kelly

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) RoboTac Workshop: New Advances in Tactile Sensation, Interactive Perception, Control, and Learning, Prague, Czech Republic, Sep. 27, 2021.

Bibtex | Abstract | arXiv

@inproceedings{2021_Grover_Under,
  abstract = {Despite the utility of tactile information, tactile sensors have yet to be widely deployed in industrial robotics settings. Part of the challenge lies in identifying slip and other key events from the tactile data stream. In this paper, we present a learning-based method to detect slip using barometric tactile sensors. Although these sensors have a low resolution, they have many other desirable properties including high reliability and durability, a very slim profile, and a low cost. We are able to achieve slip detection accuracies of greater than 91\% while being robust to the speed and direction of the slip motion. Further, we test our detector on two robot manipulation tasks involving common household objects and demonstrate successful generalization to real-world scenarios not seen during training. We show that barometric tactile sensing technology, combined with data-driven learning, is potentially suitable for complex manipulation tasks such as slip compensation.},
  address = {Prague, Czech Republic},
  author = {Abhinav Grover and Christopher Grebe and Philippe Nadeau and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS)} RoboTac Workshop: New Advances in Tactile Sensation, Interactive Perception, Control, and Learning},
  date = {2021-09-27},
  month = {Sep. 27},
  title = {Under Pressure: Learning to Detect Slip with Barometric Tactile Sensors},
  url = {https://arxiv.org/abs/2103.13460},
  year = {2021}
}

Despite the utility of tactile information, tactile sensors have yet to be widely deployed in industrial robotics settings. Part of the challenge lies in identifying slip and other key events from the tactile data stream. In this paper, we present a learning-based method to detect slip using barometric tactile sensors. Although these sensors have a low resolution, they have many other desirable properties including high reliability and durability, a very slim profile, and a low cost. We are able to achieve slip detection accuracies of greater than 91\% while being robust to the speed and direction of the slip motion. Further, we test our detector on two robot manipulation tasks involving common household objects and demonstrate successful generalization to real-world scenarios not seen during training. We show that barometric tactile sensing technology, combined with data-driven learning, is potentially suitable for complex manipulation tasks such as slip compensation.

A Question of Time: Revisiting the Use of Recursive Filtering for Temporal Calibration of Multisensor Systems

J. Kelly, C. Grebe, and M. Giamou

Proceedings of the IEEE International Conference on Multisensor Fusion and Integration (MFI), Karlsruhe, Germany, Sep. 23–25, 2021.

DOI | Bibtex | Abstract | arXiv | Video

@inproceedings{2021_Kelly_Question,
  abstract = {We examine the problem of time delay estimation, or temporal calibration, in the context of multisensor data fusion. Differences in processing intervals and other factors typically lead to a relative delay between measurement updates from disparate sensors. Correct (optimal) data fusion demands that the relative delay must either be known in advance or identified online. There have been several recent proposals in the literature to determine the delay using recursive, causal filters such as the extended Kalman filter (EKF). We carefully review this formulation and show that there are fundamental issues with the structure of the EKF (and related algorithms) when the delay is included in the filter state vector as a parameter to be estimated. These structural issues, in turn, leave recursive filters prone to bias and inconsistency. Our theoretical analysis is supported by simulation studies that demonstrate the implications in terms of filter performance; although tuning of the filter noise variances may reduce the chance of inconsistency or divergence, the underlying structural concerns remain. We offer brief suggestions for ways to maintain the computational efficiency of recursive filtering for temporal calibration while avoiding the drawbacks of the standard filtering algorithms.},
  address = {Karlsruhe, Germany},
  author = {Jonathan Kelly and Christopher Grebe and Matthew Giamou},
  booktitle = {Proceedings of the {IEEE} International Conference on Multisensor Fusion and Integration {(MFI)}},
  date = {2021-09-23/2021-09-25},
  doi = {10.1109/MFI52462.2021.9591176},
  month = {Sep. 23--25},
  note = {Best Paper Award 1st Runner Up},
  title = {A Question of Time: Revisiting the Use of Recursive Filtering for Temporal Calibration of Multisensor Systems},
  url = {https://arxiv.org/abs/2106.00391},
  video1 = {https://www.youtube.com/watch?v=_GJFUA-hMJ0},
  year = {2021}
}

We examine the problem of time delay estimation, or temporal calibration, in the context of multisensor data fusion. Differences in processing intervals and other factors typically lead to a relative delay between measurement updates from disparate sensors. Correct (optimal) data fusion demands that the relative delay must either be known in advance or identified online. There have been several recent proposals in the literature to determine the delay using recursive, causal filters such as the extended Kalman filter (EKF). We carefully review this formulation and show that there are fundamental issues with the structure of the EKF (and related algorithms) when the delay is included in the filter state vector as a parameter to be estimated. These structural issues, in turn, leave recursive filters prone to bias and inconsistency. Our theoretical analysis is supported by simulation studies that demonstrate the implications in terms of filter performance; although tuning of the filter noise variances may reduce the chance of inconsistency or divergence, the underlying structural concerns remain. We offer brief suggestions for ways to maintain the computational efficiency of recursive filtering for temporal calibration while avoiding the drawbacks of the standard filtering algorithms.

Best Paper Award 1st Runner Up

A Riemannian metric for geometry-aware singularity avoidance by articulated robots

F. Mari’c, L. Petrovi’c, M. Guberina, J. Kelly, and I. Petrovi’c

Robotics and Autonomous Systems, vol. 145, p. 103865, 2021.

DOI | Bibtex | Abstract | arXiv

@article{2021_Maric_Riemannian,
  abstract = {Articulated robots such as manipulators increasingly must operate in uncertain and dynamic environments where interaction (with human coworkers, for example) is necessary. In these situations, the capacity to quickly adapt to unexpected changes in operational space constraints is essential. At certain points in a manipulator's configuration space, termed singularities, the robot loses one or more degrees of freedom (DoF) and is unable to move in specific operational space directions. The inability to move in arbitrary directions in operational space compromises adaptivity and, potentially, safety. We introduce a geometry-aware singularity index, defined using a Riemannian metric on the manifold of symmetric positive definite matrices, to provide a measure of proximity to singular configurations. We demonstrate that our index avoids some of the failure modes and difficulties inherent to other common indices. Further, we show that our index can be differentiated easily, making it compatible with local optimization approaches used for operational space control. Our experimental results establish that, for reaching and path following tasks, optimization based on our index outperforms a common manipulability maximization technique and ensures singularity-robust motions.},
  author = {Filip Mari\'{c} and Luka Petrovi\'{c} and Marko Guberina and Jonathan Kelly and Ivan Petrovi\'{c}},
  doi = {10.1016/j.robot.2021.103865},
  journal = {Robotics and Autonomous Systems},
  month = {November},
  pages = {103865},
  title = {A Riemannian metric for geometry-aware singularity avoidance by articulated robots},
  url = {https://arxiv.org/abs/2103.05362},
  volume = {145},
  year = {2021}
}

Articulated robots such as manipulators increasingly must operate in uncertain and dynamic environments where interaction (with human coworkers, for example) is necessary. In these situations, the capacity to quickly adapt to unexpected changes in operational space constraints is essential. At certain points in a manipulator's configuration space, termed singularities, the robot loses one or more degrees of freedom (DoF) and is unable to move in specific operational space directions. The inability to move in arbitrary directions in operational space compromises adaptivity and, potentially, safety. We introduce a geometry-aware singularity index, defined using a Riemannian metric on the manifold of symmetric positive definite matrices, to provide a measure of proximity to singular configurations. We demonstrate that our index avoids some of the failure modes and difficulties inherent to other common indices. Further, we show that our index can be differentiated easily, making it compatible with local optimization approaches used for operational space control. Our experimental results establish that, for reaching and path following tasks, optimization based on our index outperforms a common manipulability maximization technique and ensures singularity-robust motions.

Trajectory Optimization with Geometry-Aware Singularity Avoidance for Robot Motion Planning

L. Petrovi’c, F. Mari’c, I. Markovi’c, J. Kelly, and I. Petrovi’c

Proceedings of the 21st International Conference on Control, Automation and Systems (ICCAS), Jeju Island, Republic of Korea, Oct. 12–15, 2021.

DOI | Bibtex | Abstract

@inproceedings{2021_Petrovic_Trajectory,
  abstract = {Oneoftheprincipalchallengesinrobotarmmotionplanningistoensurerobot'sagilityincaseofencountering unforeseeable changes during task execution. It is thus crucial to preserve the ability to move in every direction in task space, which is achieved by avoiding singularities, i.e. states of configuration space where a degree of freedom is lost. To aid in singularity avoidance, existing methods mostly rely on manipulability or dexterity indices to provide a measure of proximity to singular configurations. Recently, a novel geometry-aware singularity index was proposed that circumvents some of the failure modes inherent to manipulability and dexterity. In this paper, we propose a cost function based on this index and integrate it within a stochastic trajectory optimization framework for efficient motion planning with singularity avoidance. We compare the proposed method with existing singularity-aware motion planning techniques, demonstrating improvement in common indices such as manipulability and dexterity and showcasing the ability of the proposed method to handle collision avoidance while retaining agility of the robot arm.},
  address = {Jeju Island, Republic of Korea},
  author = {Luka Petrovi\'{c} and Filip Mari\'{c} and Ivan Markovi\'{c} and Jonathan Kelly and Ivan Petrovi\'{c}},
  booktitle = {Proceedings of the 21st International Conference on Control, Automation and Systems {(ICCAS)}},
  date = {2021-10-12/2021-10-15},
  doi = {10.23919/ICCAS52745.2021.9650039},
  month = {Oct. 12--15},
  title = {Trajectory Optimization with Geometry-Aware Singularity Avoidance for Robot Motion Planning},
  year = {2021}
}

Oneoftheprincipalchallengesinrobotarmmotionplanningistoensurerobot'sagilityincaseofencountering unforeseeable changes during task execution. It is thus crucial to preserve the ability to move in every direction in task space, which is achieved by avoiding singularities, i.e. states of configuration space where a degree of freedom is lost. To aid in singularity avoidance, existing methods mostly rely on manipulability or dexterity indices to provide a measure of proximity to singular configurations. Recently, a novel geometry-aware singularity index was proposed that circumvents some of the failure modes inherent to manipulability and dexterity. In this paper, we propose a cost function based on this index and integrate it within a stochastic trajectory optimization framework for efficient motion planning with singularity avoidance. We compare the proposed method with existing singularity-aware motion planning techniques, demonstrating improvement in common indices such as manipulability and dexterity and showcasing the ability of the proposed method to handle collision avoidance while retaining agility of the robot arm.

Learned Camera Gain and Exposure Control for Improved Visual Feature Detection and Matching

J. Tomasi, B. Wagstaff, S. Waslander, and J. Kelly

IEEE Robotics and Automation Letters, vol. 6, iss. 2, pp. 2028-2035, 2021.

DOI | Bibtex | Abstract | arXiv

@article{2021_Tomasi_Learned,
  abstract = {Successful visual navigation depends upon capturing images that contain sufficient useful information. In this letter, we explore a data-driven approach to account for environmental lighting changes, improving the quality of images for use in visual odometry (VO) or visual simultaneous localization and mapping (SLAM). We train a deep convolutional neural network model to predictively adjust camera gain and exposure time parameters such that consecutive images contain a maximal number of matchable features. The training process is fully self-supervised: our training signal is derived from an underlying VO or SLAM pipeline and, as a result, the model is optimized to perform well with that specific pipeline. We demonstrate through extensive real-world experiments that our network can anticipate and compensate for dramatic lighting changes (e.g., transitions into and out of road tunnels), maintaining a substantially higher number of inlier feature matches than competing camera parameter control algorithms.},
  author = {Justin Tomasi and Brandon Wagstaff and Steven Waslander and Jonathan Kelly},
  doi = {10.1109/LRA.2021.3058909},
  journal = {{IEEE} Robotics and Automation Letters},
  month = {April},
  number = {2},
  pages = {2028--2035},
  title = {Learned Camera Gain and Exposure Control for Improved Visual Feature Detection and Matching},
  url = {https://arxiv.org/abs/2102.04341},
  volume = {6},
  year = {2021}
}

Successful visual navigation depends upon capturing images that contain sufficient useful information. In this letter, we explore a data-driven approach to account for environmental lighting changes, improving the quality of images for use in visual odometry (VO) or visual simultaneous localization and mapping (SLAM). We train a deep convolutional neural network model to predictively adjust camera gain and exposure time parameters such that consecutive images contain a maximal number of matchable features. The training process is fully self-supervised: our training signal is derived from an underlying VO or SLAM pipeline and, as a result, the model is optimized to perform well with that specific pipeline. We demonstrate through extensive real-world experiments that our network can anticipate and compensate for dramatic lighting changes (e.g., transitions into and out of road tunnels), maintaining a substantially higher number of inlier feature matches than competing camera parameter control algorithms.

Self-Supervised Scale Recovery for Monocular Depth and Egomotion Estimation

B. Wagstaff and J. Kelly

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, Sep. 27–Oct. 1, 2021.

@inproceedings{2021_Wagstaff_Self-Supervised,
  abstract = {The self-supervised loss formulation for jointly training depth and egomotion neural networks with monocular images is well studied and has demonstrated state-of-the-art accuracy. One of the main limitations of this approach, however, is that the depth and egomotion estimates are only determined up to an unknown scale. In this paper, we present a novel scale recovery loss that enforces consistency between a known camera height and the estimated camera height, generating metric (scaled) depth and egomotion predictions. We show that our proposed method is competitive with other scale recovery techniques that require more information. Further, we demonstrate that our method facilitates network retraining within new environments, whereas other scale-resolving approaches are incapable of doing so. Notably, our egomotion network is able to produce more accurate estimates than a similar method which recovers scale at test time only.},
  address = {Prague, Czech Republic},
  author = {Brandon Wagstaff and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS)}},
  code = {https://github.com/utiasSTARS/learned_scale_recovery},
  date = {2021-09-27/2021-10-01},
  doi = {10.1109/IROS51168.2021.9635938},
  month = {Sep. 27--Oct. 1},
  title = {Self-Supervised Scale Recovery for Monocular Depth and Egomotion Estimation},
  url = {https://arxiv.org/abs/2009.03787},
  video1 = {https://www.youtube.com/watch?v=5Z16Wl4FOwE},
  year = {2021}
}

The self-supervised loss formulation for jointly training depth and egomotion neural networks with monocular images is well studied and has demonstrated state-of-the-art accuracy. One of the main limitations of this approach, however, is that the depth and egomotion estimates are only determined up to an unknown scale. In this paper, we present a novel scale recovery loss that enforces consistency between a known camera height and the estimated camera height, generating metric (scaled) depth and egomotion predictions. We show that our proposed method is competitive with other scale recovery techniques that require more information. Further, we demonstrate that our method facilitates network retraining within new environments, whereas other scale-resolving approaches are incapable of doing so. Notably, our egomotion network is able to produce more accurate estimates than a similar method which recovers scale at test time only.

A Continuous-Time Approach for 3D Radar-to-Camera Extrinsic Calibration

E. Wise, J. Pervsi’c, C. Grebe, I. Petrovi’c, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, May 30–Jun. 5, 2021, pp. 13164-13170.

DOI | Bibtex | Abstract | arXiv | Video

@inproceedings{2021_Wise_Continuous-Time,
  abstract = {Reliable operation in inclement weather is essential to the deployment of safe autonomous vehicles (AVs). Robustness and reliability can be achieved by fusing data from the standard AV sensor suite (i.e., lidars, cameras) with weather robust sensors, such as millimetre-wavelength radar. Critically, accurate sensor data fusion requires knowledge of the rigid-body transform between sensor pairs, which can be determined through the process of extrinsic calibration. A number of extrinsic calibration algorithms have been designed for 2D (planar) radar sensors---however, recently-developed, low-cost 3D millimetre-wavelength radars are set to displace their 2D counterparts in many applications. In this paper, we present a continuous-time 3D radar-to-camera extrinsic calibration algorithm that utilizes radar velocity measurements and, unlike the majority of existing techniques, does not require specialized radar retroreflectors to be present in the environment. We derive the observability properties of our formulation and demonstrate the efficacy of our algorithm through synthetic and real-world experiments.},
  address = {Xi'an, China},
  author = {Emmett Wise and Juraj Per\v{s}i\'{c} and Christopher Grebe and Ivan Petrovi\'{c} and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA)}},
  date = {2021-05-30/2021-06-05},
  doi = {10.1109/ICRA48506.2021.9561938},
  month = {May 30--Jun. 5},
  pages = {13164--13170},
  title = {A Continuous-Time Approach for 3D Radar-to-Camera Extrinsic Calibration},
  url = {https://arxiv.org/abs/2103.07505},
  video1 = {https://www.youtube.com/watch?v=cmUJE9yZOcc},
  year = {2021}
}

Reliable operation in inclement weather is essential to the deployment of safe autonomous vehicles (AVs). Robustness and reliability can be achieved by fusing data from the standard AV sensor suite (i.e., lidars, cameras) with weather robust sensors, such as millimetre-wavelength radar. Critically, accurate sensor data fusion requires knowledge of the rigid-body transform between sensor pairs, which can be determined through the process of extrinsic calibration. A number of extrinsic calibration algorithms have been designed for 2D (planar) radar sensors---however, recently-developed, low-cost 3D millimetre-wavelength radars are set to displace their 2D counterparts in many applications. In this paper, we present a continuous-time 3D radar-to-camera extrinsic calibration algorithm that utilizes radar velocity measurements and, unlike the majority of existing techniques, does not require specialized radar retroreflectors to be present in the environment. We derive the observability properties of our formulation and demonstrate the efficacy of our algorithm through synthetic and real-world experiments.

2020

Fighting Failures with FIRE: Failure Identification to Reduce Expert Burden in Intervention-Based Learning

T. Ablett, F. Mari’c, and J. Kelly

Toronto, Ontario, Canada, Tech. Rep. STARS-2020-001, Aug. 10, 2020.

Bibtex | Abstract | arXiv

@techreport{2020_Ablett_Fighting,
  abstract = {Supervised imitation learning, also known as behavioral cloning, suffers from distribution drift leading to failures during policy execution. One approach to mitigate this issue is to allow an expert to correct the agent's actions during task execution, based on the expert's determination that the agent has reached a `point of no return.' The agent's policy is then retrained using this new corrective data. This approach alone can enable high-performance agents to be learned, but at a substantial cost: the expert must vigilantly observe execution until the policy reaches a specified level of success, and even at that point, there is no guarantee that the policy will always succeed. To address these limitations, we present FIRE (Failure Identification to Reduce Expert burden), a system that can predict when a running policy will fail, halt its execution, and request a correction from the expert. Unlike existing approaches that learn only from expert data, our approach learns from both expert and non-expert data, akin to adversarial learning. We demonstrate experimentally for a series of challenging manipulation tasks that our method is able to recognize state-action pairs that lead to failures. This permits seamless integration into an intervention-based learning system, where we show an order-of-magnitude gain in sample efficiency compared with a state-of-the-art inverse reinforcement learning method and dramatically improved performance over an equivalent amount of data learned with behavioral cloning.},
  address = {Toronto, Ontario, Canada},
  author = {Trevor Ablett and Filip Mari\'{c} and Jonathan Kelly},
  date = {2020-08-10},
  institution = {University of Toronto},
  month = {Aug. 10},
  number = {STARS-2020-001},
  title = {Fighting Failures with {FIRE}: Failure Identification to Reduce Expert Burden in Intervention-Based Learning},
  url = {https://arxiv.org/abs/2007.00245},
  year = {2020}
}

Supervised imitation learning, also known as behavioral cloning, suffers from distribution drift leading to failures during policy execution. One approach to mitigate this issue is to allow an expert to correct the agent's actions during task execution, based on the expert's determination that the agent has reached a `point of no return.' The agent's policy is then retrained using this new corrective data. This approach alone can enable high-performance agents to be learned, but at a substantial cost: the expert must vigilantly observe execution until the policy reaches a specified level of success, and even at that point, there is no guarantee that the policy will always succeed. To address these limitations, we present FIRE (Failure Identification to Reduce Expert burden), a system that can predict when a running policy will fail, halt its execution, and request a correction from the expert. Unlike existing approaches that learn only from expert data, our approach learns from both expert and non-expert data, akin to adversarial learning. We demonstrate experimentally for a series of challenging manipulation tasks that our method is able to recognize state-action pairs that lead to failures. This permits seamless integration into an intervention-based learning system, where we show an order-of-magnitude gain in sample efficiency compared with a state-of-the-art inverse reinforcement learning method and dramatically improved performance over an equivalent amount of data learned with behavioral cloning.

T. D. Barfoot, J. Burgner-Kahrs, E. Diller, A. Garg, A. Goldenberg, J. Kelly, X. Liu, H. E. Naguib, G. Nejat, A. P. Schoellig, F. Shkurti, H. Siegel, Y. Sun, and S. L. Waslander, Making Sense of the Robotized Pandemic Response: A Comparison of Global and Canadian Robot Deployments and Success Factors, 2020.

Bibtex | Abstract | arXiv

@unpublished{2020_Barfoot_Making,
  abstract = {From disinfection and remote triage, to logistics and delivery, countries around the world are making use of robots to address the unique challenges presented by the COVID-19 pandemic. Robots are being used to manage the pandemic in Canada too, but relative to other regions, we have been more cautious in our adoption -- this despite the important role that robots of Canadian origin are now playing on the global stage. This white paper discusses why this is the case, and argues that strategic investment and support for the Canadian robotics industry are urgently needed to bring the benefits of robotics home, where we have more control in shaping the future of this game-changing technology. Such investments will not only serve to support Canada's current pandemic response and post pandemic recovery, but will also prepare this country to weather future crises. Without such support, Canada risks falling behind other developed nations that are investing heavily in hardware automation at this time.},
  author = {Timothy D. Barfoot and Jessica Burgner-Kahrs and Eric Diller and Animesh Garg and Andrew Goldenberg and Jonathan Kelly and Xinyu Liu and Hani E. Naguib and Goldie Nejat and Angela P. Schoellig and Florian Shkurti and Hallie Siegel and Yu Sun and Steven L. Waslander},
  month = {September},
  note = {University of Toronto Robotics Institute},
  title = {Making Sense of the Robotized Pandemic Response: A Comparison of Global and Canadian Robot Deployments and Success Factors},
  url = {https://arxiv.org/abs/2009.08577},
  year = {2020}
}

From disinfection and remote triage, to logistics and delivery, countries around the world are making use of robots to address the unique challenges presented by the COVID-19 pandemic. Robots are being used to manage the pandemic in Canada too, but relative to other regions, we have been more cautious in our adoption -- this despite the important role that robots of Canadian origin are now playing on the global stage. This white paper discusses why this is the case, and argues that strategic investment and support for the Canadian robotics industry are urgently needed to bring the benefits of robotics home, where we have more control in shaping the future of this game-changing technology. Such investments will not only serve to support Canada's current pandemic response and post pandemic recovery, but will also prepare this country to weather future crises. Without such support, Canada risks falling behind other developed nations that are investing heavily in hardware automation at this time.

University of Toronto Robotics Institute

Learning Matchable Image Transformations for Long-term Visual Localization

L. Clement, M. Gridseth, J. Tomasi, and J. Kelly

IEEE Robotics and Automation Letters, vol. 5, iss. 2, pp. 1492-1499, 2020.

@article{2020_Clement_Learning_A,
  abstract = {Long-term metric self-localization is an essential capability of autonomous mobile robots, but remains challenging for vision-based systems due to appearance changes caused by lighting, weather, or seasonal variations. While experience-based mapping has proven to be an effective technique for bridging the `appearance gap,' the number of experiences required for reliable metric localization over days or months can be very large, and methods for reducing the necessary number of experiences are needed for this approach to scale. Taking inspiration from color constancy theory, we learn a nonlinear RGB-to-grayscale mapping that explicitly maximizes the number of inlier feature matches for images captured under different lighting and weather conditions, and use it as a pre-processing step in a conventional single-experience localization pipeline to improve its robustness to appearance change. We train this mapping by approximating the target non-differentiable localization pipeline with a deep neural network, and find that incorporating a learned low-dimensional context feature can further improve cross-appearance feature matching. Using synthetic and real-world datasets, we demonstrate substantial improvements in localization performance across day-night cycles, enabling continuous metric localization over a 30-hour period using a single mapping experience, and allowing experience-based localization to scale to long deployments with dramatically reduced data requirements.},
  author = {Lee Clement and Mona Gridseth and Justin Tomasi and Jonathan Kelly},
  code = {https://github.com/utiasSTARS/matchable-image-transforms},
  doi = {10.1109/LRA.2020.2967659},
  journal = {{IEEE} Robotics and Automation Letters},
  month = {April},
  number = {2},
  pages = {1492--1499},
  title = {Learning Matchable Image Transformations for Long-term Visual Localization},
  url = {https://arxiv.org/abs/1904.01080},
  video1 = {https://www.youtube.com/watch?v=WrxaSpHKxE8},
  volume = {5},
  year = {2020}
}

Long-term metric self-localization is an essential capability of autonomous mobile robots, but remains challenging for vision-based systems due to appearance changes caused by lighting, weather, or seasonal variations. While experience-based mapping has proven to be an effective technique for bridging the `appearance gap,' the number of experiences required for reliable metric localization over days or months can be very large, and methods for reducing the necessary number of experiences are needed for this approach to scale. Taking inspiration from color constancy theory, we learn a nonlinear RGB-to-grayscale mapping that explicitly maximizes the number of inlier feature matches for images captured under different lighting and weather conditions, and use it as a pre-processing step in a conventional single-experience localization pipeline to improve its robustness to appearance change. We train this mapping by approximating the target non-differentiable localization pipeline with a deep neural network, and find that incorporating a learned low-dimensional context feature can further improve cross-appearance feature matching. Using synthetic and real-world datasets, we demonstrate substantial improvements in localization performance across day-night cycles, enabling continuous metric localization over a 30-hour period using a single mapping experience, and allowing experience-based localization to scale to long deployments with dramatically reduced data requirements.

On Learning Models of Appearance for Robust Long-term Visual Navigation

L. E. Clement

PhD Thesis , University of Toronto, Toronto, Ontario, Canada, 2020.

Bibtex | Abstract | PDF

@phdthesis{2020_Clement_Learning_B,
  abstract = {Simultaneous localization and mapping (SLAM) is a class of techniques that allow robots to navigate unknown environments using onboard sensors. With inexpensive commercial cameras as the primary sensor, visual SLAM has become an important and widely used approach to enabling mobile robot autonomy. However, traditional visual SLAM algorithms make use of only a fraction of the information available from conventional cameras: in addition to the basic geometric cues typically used in visual SLAM, colour images encode a wealth of information about the camera, environmental illumination, surface materials, vehicle motion, and other factors influencing the image formation process. Moreover, visual localization performance degrades quickly in long-term deployments due to environmental appearance changes caused by lighting, weather, or seasonal effects. This is especially problematic when continuous metric localization is required to drive vision-in-the-loop systems such as autonomous route following. This thesis explores several novel approaches to exploiting additional information from vision sensors in order to improve the accuracy and reliability of metric visual SLAM algorithms in short- and long-term deployments. First, we develop a technique for reducing drift error in visual odometry (VO) by estimating the position of a known light source such as the sun using indirect illumination cues available from existing image streams. We build and evaluate hand-engineered and learned models for single-image sun detection and achieve significant reductions in drift error over 30 km of driving in urban and planetary analogue environments. Second, we explore deep image-to-image translation as a means of improving metric visual localization under time-varying illumination. Using images captured under different illumination conditions in a common environment, we demonstrate that localization accuracy and reliability can be substantially improved by learning a many-to-one mapping to a user-selected canonical appearance condition. Finally, we develop a self-supervised method for learning a canonical appearance optimized for high-quality localization. By defining a differentiable surrogate loss function related to the performance of a non-differentiable localization pipeline, we train an optimal RGB-to-grayscale mapping for a given environment, sensor, and pipeline. Using synthetic and real-world long-term vision datasets, we demonstrate significant improvements in localization performance compared to standard grayscale images, enabling continuous metric localization over day-night cycles using a single mapping experience.},
  address = {Toronto, Ontario, Canada},
  author = {Lee Eric Clement},
  institution = {University of Toronto},
  month = {January},
  school = {University of Toronto},
  title = {On Learning Models of Appearance for Robust Long-term Visual Navigation},
  year = {2020}
}

Simultaneous localization and mapping (SLAM) is a class of techniques that allow robots to navigate unknown environments using onboard sensors. With inexpensive commercial cameras as the primary sensor, visual SLAM has become an important and widely used approach to enabling mobile robot autonomy. However, traditional visual SLAM algorithms make use of only a fraction of the information available from conventional cameras: in addition to the basic geometric cues typically used in visual SLAM, colour images encode a wealth of information about the camera, environmental illumination, surface materials, vehicle motion, and other factors influencing the image formation process. Moreover, visual localization performance degrades quickly in long-term deployments due to environmental appearance changes caused by lighting, weather, or seasonal effects. This is especially problematic when continuous metric localization is required to drive vision-in-the-loop systems such as autonomous route following. This thesis explores several novel approaches to exploiting additional information from vision sensors in order to improve the accuracy and reliability of metric visual SLAM algorithms in short- and long-term deployments. First, we develop a technique for reducing drift error in visual odometry (VO) by estimating the position of a known light source such as the sun using indirect illumination cues available from existing image streams. We build and evaluate hand-engineered and learned models for single-image sun detection and achieve significant reductions in drift error over 30 km of driving in urban and planetary analogue environments. Second, we explore deep image-to-image translation as a means of improving metric visual localization under time-varying illumination. Using images captured under different illumination conditions in a common environment, we demonstrate that localization accuracy and reliability can be substantially improved by learning a many-to-one mapping to a user-selected canonical appearance condition. Finally, we develop a self-supervised method for learning a canonical appearance optimized for high-quality localization. By defining a differentiable surrogate loss function related to the performance of a non-differentiable localization pipeline, we train an optimal RGB-to-grayscale mapping for a given environment, sensor, and pipeline. Using synthetic and real-world long-term vision datasets, we demonstrate significant improvements in localization performance compared to standard grayscale images, enabling continuous metric localization over day-night cycles using a single mapping experience.

The Canadian Planetary Emulation Terrain Energy-Aware Rover Navigation Dataset

O. Lamarre, O. Limoyo, F. Mari’c, and J. Kelly

The International Journal of Robotics Research, vol. 39, iss. 6, pp. 641-650, 2020.

DOI | Bibtex | Abstract | Code

@article{2020_Lamarre_Canadian,
  abstract = {Future exploratory missions to the Moon and to Mars will involve solar-powered rovers; careful vehicle energy management is critical to the success of such missions. This article describes a unique dataset gathered by a small, four-wheeled rover at a planetary analog test facility in Canada. The rover was equipped with a suite of sensors designed to enable the study of energy-aware navigation and path planning algorithms. The sensors included a colour omnidirectional stereo camera, a monocular camera, an inertial measurement unit, a pyranometer, drive power consumption monitors, wheel encoders, and a GPS receiver. In total, the rover drove more than 1.2 km over varied terrain at the analog test site. All data is presented in human-readable text files and as standard-format images; additional Robot Operating System (ROS) parsing tools and several georeferenced aerial maps of the test environment are also included. A series of potential research use cases is described.},
  author = {Olivier Lamarre and Oliver Limoyo and Filip Mari\'{c} and Jonathan Kelly},
  code = {https://github.com/utiasSTARS/enav-planetary-dataset},
  doi = {10.1177/0278364920908922},
  journal = {The International Journal of Robotics Research},
  month = {May},
  number = {6},
  pages = {641--650},
  title = {The Canadian Planetary Emulation Terrain Energy-Aware Rover Navigation Dataset},
  volume = {39},
  year = {2020}
}

Future exploratory missions to the Moon and to Mars will involve solar-powered rovers; careful vehicle energy management is critical to the success of such missions. This article describes a unique dataset gathered by a small, four-wheeled rover at a planetary analog test facility in Canada. The rover was equipped with a suite of sensors designed to enable the study of energy-aware navigation and path planning algorithms. The sensors included a colour omnidirectional stereo camera, a monocular camera, an inertial measurement unit, a pyranometer, drive power consumption monitors, wheel encoders, and a GPS receiver. In total, the rover drove more than 1.2 km over varied terrain at the analog test site. All data is presented in human-readable text files and as standard-format images; additional Robot Operating System (ROS) parsing tools and several georeferenced aerial maps of the test environment are also included. A series of potential research use cases is described.

Impact of Traversability Uncertainty on Global Navigation Planning in Planetary Environments

O. Lamarre, A. B. Asghar, and J. Kelly

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Workshop on Planetary Exploration Robots, Las Vegas, Nevada, USA, Oct. 29, 2020.

DOI | Bibtex

@inproceedings{2020_Lamarre_Impact,
  address = {Las Vegas, Nevada, USA},
  author = {Olivier Lamarre and Ahmad Bilal Asghar and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS)} Workshop on Planetary Exploration Robots},
  date = {2020-10-29},
  doi = {10.3929/ethz-b-000450119},
  month = {Oct. 29},
  note = {Moog Workshop Poster Competition First Prize},
  title = {Impact of Traversability Uncertainty on Global Navigation Planning in Planetary Environments},
  year = {2020}
}

Moog Workshop Poster Competition First Prize

Heteroscedastic Uncertainty for Robust Generative Latent Dynamics

O. Limoyo, B. Chan, F. Mari’c, B. Wagstaff, R. Mahmood, and J. Kelly

IEEE Robotics and Automation Letters, vol. 5, iss. 4, pp. 6654-6661, 2020.

DOI | Bibtex | Abstract | arXiv | Video

@article{2020_Limoyo_Heteroscedastic,
  abstract = {Learning or identifying dynamics from a sequence of high-dimensional observations is a difficult challenge in many domains, including reinforcement learning and control. The problem has recently been studied from a generative perspective through latent dynamics: high-dimensional observations are embedded into a lower-dimensional space in which the dynamics can be learned. Despite some successes, latent dynamics models have not yet been applied to real-world robotic systems where learned representations must be robust to a variety of perceptual confounds and noise sources not seen during training. In this paper, we present a method to jointly learn a latent state representation and the associated dynamics that is amenable for long-term planning and closed-loop control under perceptually difficult conditions. As our main contribution, we describe how our representation is able to capture a notion of heteroscedastic or input-specific uncertainty at test time by detecting novel or out-of-distribution (OOD) inputs. We present results from prediction and control experiments on two image-based tasks: a simulated pendulum balancing task and a real-world robotic manipulator reaching task. We demonstrate that our model produces significantly more accurate predictions and exhibits improved control performance, compared to a model that assumes homoscedastic uncertainty only, in the presence of varying degrees of input degradation.},
  author = {Oliver Limoyo and Bryan Chan and Filip Mari\'{c} and Brandon Wagstaff and Rupam Mahmood and Jonathan Kelly},
  doi = {10.1109/LRA.2020.3015449},
  journal = {{IEEE} Robotics and Automation Letters},
  month = {October},
  number = {4},
  pages = {6654--6661},
  title = {Heteroscedastic Uncertainty for Robust Generative Latent Dynamics},
  url = {https://arxiv.org/abs/2008.08157},
  video1 = {https://www.youtube.com/watch?v=tPLUqhobVzw},
  volume = {5},
  year = {2020}
}

Learning or identifying dynamics from a sequence of high-dimensional observations is a difficult challenge in many domains, including reinforcement learning and control. The problem has recently been studied from a generative perspective through latent dynamics: high-dimensional observations are embedded into a lower-dimensional space in which the dynamics can be learned. Despite some successes, latent dynamics models have not yet been applied to real-world robotic systems where learned representations must be robust to a variety of perceptual confounds and noise sources not seen during training. In this paper, we present a method to jointly learn a latent state representation and the associated dynamics that is amenable for long-term planning and closed-loop control under perceptually difficult conditions. As our main contribution, we describe how our representation is able to capture a notion of heteroscedastic or input-specific uncertainty at test time by detecting novel or out-of-distribution (OOD) inputs. We present results from prediction and control experiments on two image-based tasks: a simulated pendulum balancing task and a real-world robotic manipulator reaching task. We demonstrate that our model produces significantly more accurate predictions and exhibits improved control performance, compared to a model that assumes homoscedastic uncertainty only, in the presence of varying degrees of input degradation.

Inverse Kinematics for Serial Kinematic Chains via Sum of Squares Optimization

F. Mari’c, M. Giamou, S. Khoubyarian, I. Petrovi’c, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, May 31–Jun. 4, 2020, pp. 7101-7107.

@inproceedings{2020_Maric_Inverse_A,
  abstract = {Inverse kinematics is a fundamental challenge for articulated robots: fast and accurate algorithms are needed for translating task-related workspace constraints and goals into feasible joint configurations. In general, inverse kinematics for serial kinematic chains is a difficult nonlinear problem, for which closed form solutions cannot easily be obtained. Therefore, computationally efficient numerical methods that can be adapted to a general class of manipulators are of great importance. In this paper, we use convex optimization techniques to solve the inverse kinematics problem with joint limit constraints for highly redundant serial kinematic chains with spherical joints in two and three dimensions. This is accomplished through a novel formulation of inverse kinematics as a nearest point problem, and with a fast sum of squares solver that exploits the sparsity of kinematic constraints for serial manipulators. Our method has the advantages of post-hoc certification of global optimality and a runtime that scales polynomially with the number of degrees of freedom. Additionally, we prove that our convex relaxation leads to a globally optimal solution when certain conditions are met, and demonstrate empirically that these conditions are common and represent many practical instances. Finally, we provide an open source implementation of our algorithm.},
  address = {Paris, France},
  author = {Filip Mari\'{c} and Matthew Giamou and Soroush Khoubyarian and Ivan Petrovi\'{c} and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA})},
  code = {https://github.com/utiasSTARS/sos-ik},
  date = {2020-05-31/2020-06-04},
  doi = {10.1109/ICRA40945.2020.9196704},
  month = {May 31--Jun. 4},
  pages = {7101--7107},
  title = {Inverse Kinematics for Serial Kinematic Chains via Sum of Squares Optimization},
  url = {http://arxiv.org/abs/1909.09318},
  video1 = {https://www.youtube.com/watch?v=AdPze8cTUuE},
  year = {2020}
}

Inverse kinematics is a fundamental challenge for articulated robots: fast and accurate algorithms are needed for translating task-related workspace constraints and goals into feasible joint configurations. In general, inverse kinematics for serial kinematic chains is a difficult nonlinear problem, for which closed form solutions cannot easily be obtained. Therefore, computationally efficient numerical methods that can be adapted to a general class of manipulators are of great importance. In this paper, we use convex optimization techniques to solve the inverse kinematics problem with joint limit constraints for highly redundant serial kinematic chains with spherical joints in two and three dimensions. This is accomplished through a novel formulation of inverse kinematics as a nearest point problem, and with a fast sum of squares solver that exploits the sparsity of kinematic constraints for serial manipulators. Our method has the advantages of post-hoc certification of global optimality and a runtime that scales polynomially with the number of degrees of freedom. Additionally, we prove that our convex relaxation leads to a globally optimal solution when certain conditions are met, and demonstrate empirically that these conditions are common and represent many practical instances. Finally, we provide an open source implementation of our algorithm.

Inverse Kinematics as Low-Rank Euclidean Distance Matrix Completion

F. Mari’c, M. Giamou, I. Petrovi’c, and J. Kelly

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Workshop on Bringing Geometric Methods to Robot Learning, Optimization and Control, Las Vegas, Nevada, USA, Oct. 29, 2020.

Bibtex | Abstract | arXiv | Video

@inproceedings{2020_Maric_Inverse_B,
  abstract = {The majority of inverse kinematics (IK) algorithms search for solutions in a configuration space defined by joint angles. However, the kinematics of many robots can also be described in terms of distances between rigidly-attached points, which collectively form a Euclidean distance matrix. This alternative geometric description of the kinematics reveals an elegant equivalence between IK and the problem of low-rank matrix completion. We use this connection to implement a novel Riemannian optimization-based solution to IK for various articulated robots with symmetric joint angle constraints.},
  address = {Las Vegas, Nevada, USA},
  author = {Filip Mari\'{c} and Matthew Giamou and Ivan Petrovi\'{c} and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS)} Workshop on Bringing Geometric Methods to Robot Learning, Optimization and Control},
  date = {2020-10-29},
  month = {Oct. 29},
  note = {Bosch Center for Artificial Intelligence Best Workshop Contribution Award},
  title = {Inverse Kinematics as Low-Rank Euclidean Distance Matrix Completion},
  url = {https://arxiv.org/abs/2011.04850},
  video1 = {https://www.youtube.com/watch?v=wO0_w2Gw5jk},
  year = {2020}
}

The majority of inverse kinematics (IK) algorithms search for solutions in a configuration space defined by joint angles. However, the kinematics of many robots can also be described in terms of distances between rigidly-attached points, which collectively form a Euclidean distance matrix. This alternative geometric description of the kinematics reveals an elegant equivalence between IK and the problem of low-rank matrix completion. We use this connection to implement a novel Riemannian optimization-based solution to IK for various articulated robots with symmetric joint angle constraints.

Bosch Center for Artificial Intelligence Best Workshop Contribution Award

Unified Spatiotemporal Calibration of Monocular Cameras and Planar Lidars

J. Marr and J. Kelly

in Proceedings of the 2018 International Symposium on Experimental Robotics , J. Xiao, T. Kröger, and O. Khatib, Eds., Cham: Springer International Publishing AG, 2020, vol. 11, pp. 781-790.

DOI | Bibtex | Abstract

@incollection{2020_Marr_Unified,
  abstract = {Monocular cameras and planar lidar sensors are complementary. While monocular visual odometry (VO) is a relatively low-drift method for measuring platform egomotion, it suffers from a scale ambiguity. A planar lidar scanner, in contrast, is able to provide precise distance information with known scale. In combination, a monocular camera-2D lidar pair can be used as a performance 3D scanner, at a much lower cost than existing 3D lidar units. However, for accurate scan acquisition, the two sensors must be spatially and temporally calibrated. In this paper, we extend recent work on a calibration technique based on R ́enyi's quadratic entropy (RQE) to the unified spatiotemporal calibration of monocular cameras and 2D lidars. We present simulation results indicating that calibration errors of less than 5 mm, 0.1 degrees, and 0.15 ms in translation, rotation, and time delay, respectively, are readily achievable. Using real-world data, in the absence of reliable ground truth, we demonstrate high repeatability given sufficient platform motion. Unlike existing techniques, we are able to calibrate in arbitrary, target-free environments and without the need for overlapping sensor fields of view.},
  address = {Cham},
  author = {Jordan Marr and Jonathan Kelly},
  booktitle = {Proceedings of the 2018 International Symposium on Experimental Robotics},
  doi = {10.1007/978-3-030-33950-0_67},
  editor = {Jing Xiao and Torsten Kr\"{o}ger and Oussama Khatib},
  isbn = {978-3-030-33949-4},
  pages = {781--790},
  publisher = {Springer International Publishing AG},
  series = {Springer Proceedings in Advanced Robotics},
  title = {Unified Spatiotemporal Calibration of Monocular Cameras and Planar Lidars},
  volume = {11},
  year = {2020}
}

Monocular cameras and planar lidar sensors are complementary. While monocular visual odometry (VO) is a relatively low-drift method for measuring platform egomotion, it suffers from a scale ambiguity. A planar lidar scanner, in contrast, is able to provide precise distance information with known scale. In combination, a monocular camera-2D lidar pair can be used as a performance 3D scanner, at a much lower cost than existing 3D lidar units. However, for accurate scan acquisition, the two sensors must be spatially and temporally calibrated. In this paper, we extend recent work on a calibration technique based on R ́enyi's quadratic entropy (RQE) to the unified spatiotemporal calibration of monocular cameras and 2D lidars. We present simulation results indicating that calibration errors of less than 5 mm, 0.1 degrees, and 0.15 ms in translation, rotation, and time delay, respectively, are readily achievable. Using real-world data, in the absence of reliable ground truth, we demonstrate high repeatability given sufficient platform motion. Unlike existing techniques, we are able to calibrate in arbitrary, target-free environments and without the need for overlapping sensor fields of view.

Towards a Policy-as-a-Service Framework to Enable Compliant, Trustworthy AI and HRI Systems in the Wild

A. Morris, H. Siegel, and J. Kelly

Proceedings of the AAAI Fall Symposium on Artificial Intelligence for Human-Robot Interaction: Trust & Explainability in Artificial Intelligence for Human-Robot Interaction (AI-HRI), Arlington, Virginia, USA, Nov. 13–14, 2020.

Bibtex | Abstract | arXiv

@inproceedings{2020_Morris_Towards,
  abstract = {Building trustworthy autonomous systems is challenging for many reasons beyond simply trying to engineer agents that 'always do the right thing.' There is a broader context that is often not considered within AI and HRI: that the problem of trustworthiness is inherently socio-technical and ultimately involves a broad set of complex human factors and multidimensional relationships that can arise between agents, humans, organizations, and even governments and legal institutions, each with their own understanding and definitions of trust. This complexity presents a significant barrier to the development of trustworthy AI and HRI systems---while systems developers may desire to have their systems 'always do the right thing,' they generally lack the practical tools and expertise in law, regulation, policy and ethics to ensure this outcome. In this paper, we emphasize the "fuzzy" socio-technical aspects of trustworthiness and the need for their careful consideration during both design and deployment. We hope to contribute to the discussion of trustworthy engineering in AI and HRI by i) describing the policy landscape that must be considered when addressing trustworthy computing and the need for usable trust models, ii) highlighting an opportunity for trustworthy-by-design intervention within the systems engineering process, and iii) introducing the concept of a "policy-as-a-service" (PaaS) framework that can be readily applied by AI systems engineers to address the fuzzy problem of trust during the development and (eventually) runtime process. We envision that the PaaS approach, which offloads the development of policy design parameters and maintenance of policy standards to policy experts, will enable runtime trust capabilities intelligent systems in the wild.},
  address = {Arlington, Virginia, USA},
  author = {Alexis Morris and Hallie Siegel and Jonathan Kelly},
  booktitle = {Proceedings of the {AAAI} Fall Symposium on Artificial Intelligence for Human-Robot Interaction: Trust \& Explainability in Artificial Intelligence for Human-Robot Interaction {(AI-HRI)}},
  date = {2020-11-13/2020-11-14},
  month = {Nov. 13--14},
  title = {Towards a Policy-as-a-Service Framework to Enable Compliant, Trustworthy AI and HRI Systems in the Wild},
  url = {https://arxiv.org/abs/2010.07022},
  year = {2020}
}

Building trustworthy autonomous systems is challenging for many reasons beyond simply trying to engineer agents that 'always do the right thing.' There is a broader context that is often not considered within AI and HRI: that the problem of trustworthiness is inherently socio-technical and ultimately involves a broad set of complex human factors and multidimensional relationships that can arise between agents, humans, organizations, and even governments and legal institutions, each with their own understanding and definitions of trust. This complexity presents a significant barrier to the development of trustworthy AI and HRI systems---while systems developers may desire to have their systems 'always do the right thing,' they generally lack the practical tools and expertise in law, regulation, policy and ethics to ensure this outcome. In this paper, we emphasize the "fuzzy" socio-technical aspects of trustworthiness and the need for their careful consideration during both design and deployment. We hope to contribute to the discussion of trustworthy engineering in AI and HRI by i) describing the policy landscape that must be considered when addressing trustworthy computing and the need for usable trust models, ii) highlighting an opportunity for trustworthy-by-design intervention within the systems engineering process, and iii) introducing the concept of a "policy-as-a-service" (PaaS) framework that can be readily applied by AI systems engineers to address the fuzzy problem of trust during the development and (eventually) runtime process. We envision that the PaaS approach, which offloads the development of policy design parameters and maintenance of policy standards to policy experts, will enable runtime trust capabilities intelligent systems in the wild.

Learned Improvements to the Visual Egomotion Pipeline

V. Peretroukhin

PhD Thesis , University of Toronto, Toronto, Ontario, Canada, 2020.

Bibtex | Abstract | PDF

@phdthesis{2020_Peretroukhin_Learned,
  abstract = {The ability to estimate egomotion is at the heart of safe and reliable mobile autonomy. By inferring pose changes from sequential sensor measurements, egomotion estimation forms the basis of mapping and navigation pipelines, and permits mobile robots to self-localize within environments where external localization information may be intermittent or unavailable. Visual egomotion estimation, also known as visual odometry, has become ubiquitous in mobile robotics due to the availability of high-quality, compact, and inexpensive cameras that capture rich representations of the world. Classical visual odometry pipelines make simplifying assumptions that, while permitting reliable operation in ideal conditions, often lead to systematic error. In this dissertation, we present four ways in which conventional pipelines can be improved through the addition of a learned hyper-parametric model. By combining traditional pipelines with learning, we retain the performance of conventional techniques in nominal conditions while leveraging modern high-capacity data-driven models to improve uncertainty quantification, correct for systematic bias, and improve robustness to deleterious effects by extracting latent information in existing visual data. We demonstrate the improvements derived from our approach on data collected in sundry settings such as urban roads, indoor labs, and planetary analogue sites in the Canadian High Arctic.},
  address = {Toronto, Ontario, Canada},
  author = {Valentin Peretroukhin},
  institution = {University of Toronto},
  month = {March},
  note = {G. N. Patterson Award for Best Ph.D. Thesis},
  school = {University of Toronto},
  title = {Learned Improvements to the Visual Egomotion Pipeline},
  year = {2020}
}

The ability to estimate egomotion is at the heart of safe and reliable mobile autonomy. By inferring pose changes from sequential sensor measurements, egomotion estimation forms the basis of mapping and navigation pipelines, and permits mobile robots to self-localize within environments where external localization information may be intermittent or unavailable. Visual egomotion estimation, also known as visual odometry, has become ubiquitous in mobile robotics due to the availability of high-quality, compact, and inexpensive cameras that capture rich representations of the world. Classical visual odometry pipelines make simplifying assumptions that, while permitting reliable operation in ideal conditions, often lead to systematic error. In this dissertation, we present four ways in which conventional pipelines can be improved through the addition of a learned hyper-parametric model. By combining traditional pipelines with learning, we retain the performance of conventional techniques in nominal conditions while leveraging modern high-capacity data-driven models to improve uncertainty quantification, correct for systematic bias, and improve robustness to deleterious effects by extracting latent information in existing visual data. We demonstrate the improvements derived from our approach on data collected in sundry settings such as urban roads, indoor labs, and planetary analogue sites in the Canadian High Arctic.

G. N. Patterson Award for Best Ph.D. Thesis

A Smooth Representation of Belief over SO(3) for Deep Rotation Learning with Uncertainty

V. Peretroukhin, M. Giamou, D. Rosen, N. W. Greene, N. Roy, and J. Kelly

Proceedings of Robotics: Science and Systems (RSS), Corvallis, Oregon, USA, Jul. 12–16, 2020.

@inproceedings{2020_Peretroukhin_Smooth,
  abstract = {Accurate rotation estimation is at the heart of robot perception tasks such as visual odometry and object pose estimation. Deep neural networks have provided a new way to perform these tasks, and the choice of rotation representation is an important part of network design. In this work, we present a novel symmetric matrix representation of the 3D rotation group, SO(3), with two important properties that make it particularly suitable for learned models: (1) it satisfies a smoothness property that improves convergence and generalization when regressing large rotation targets, and (2) it encodes a symmetric Bingham belief over the space of unit quaternions, permitting the training of uncertainty-aware models. We empirically validate the benefits of our formulation by training deep neural rotation regressors on two data modalities. First, we use synthetic point-cloud data to show that our representation leads to superior predictive accuracy over existing representations for arbitrary rotation targets. Second, we use image data collected onboard ground and aerial vehicles to demonstrate that our representation is amenable to an effective out-of-distribution (OOD) rejection technique that significantly improves the robustness of rotation estimates to unseen environmental effects and corrupted input images, without requiring the use of an explicit likelihood loss, stochastic sampling, or an auxiliary classifier. This capability is key for safety-critical applications where detecting novel inputs can prevent catastrophic failure of learned models.},
  address = {Corvallis, Oregon, USA},
  author = {Valentin Peretroukhin and Matthew Giamou and David Rosen and W. Nicholas Greene and Nicholas Roy and Jonathan Kelly},
  booktitle = {Proceedings of Robotics: Science and Systems {(RSS)}},
  code = {https://github.com/utiasSTARS/bingham-rotation-learning},
  date = {2020-07-12/2020-07-16},
  doi = {10.15607/RSS.2020.XVI.007},
  month = {Jul. 12--16},
  note = {Best Student Paper Award},
  site = {https://papers.starslab.ca/bingham-rotation-learning/},
  title = {A Smooth Representation of Belief over SO(3) for Deep Rotation Learning with Uncertainty},
  url = {https://arxiv.org/abs/2006.01031},
  video1 = {https://www.youtube.com/watch?v=8QMcNmCPYR0},
  year = {2020}
}

Accurate rotation estimation is at the heart of robot perception tasks such as visual odometry and object pose estimation. Deep neural networks have provided a new way to perform these tasks, and the choice of rotation representation is an important part of network design. In this work, we present a novel symmetric matrix representation of the 3D rotation group, SO(3), with two important properties that make it particularly suitable for learned models: (1) it satisfies a smoothness property that improves convergence and generalization when regressing large rotation targets, and (2) it encodes a symmetric Bingham belief over the space of unit quaternions, permitting the training of uncertainty-aware models. We empirically validate the benefits of our formulation by training deep neural rotation regressors on two data modalities. First, we use synthetic point-cloud data to show that our representation leads to superior predictive accuracy over existing representations for arbitrary rotation targets. Second, we use image data collected onboard ground and aerial vehicles to demonstrate that our representation is amenable to an effective out-of-distribution (OOD) rejection technique that significantly improves the robustness of rotation estimates to unseen environmental effects and corrupted input images, without requiring the use of an explicit likelihood loss, stochastic sampling, or an auxiliary classifier. This capability is key for safety-critical applications where detecting novel inputs can prevent catastrophic failure of learned models.

Best Student Paper Award

Learned Adjustment of Camera Gain and Exposure Time for Improved Visual Feature Detection and Matching

J. L. Tomasi

Master Thesis , University of Toronto, Toronto, Ontario, Canada, 2020.

Bibtex | Abstract | PDF

@mastersthesis{2020_Tomasi_Learned,
  abstract = {Ensuring that captured images contain useful information is paramount to successful visual navigation. In this thesis, we explore a data-driven approach to account for environmental lighting changes, improving the quality of images for use in visual odometry (VO). We investigate what qualities of an image are desirable for navigation through an empirical analysis of the outputs of the VO front end. Based on this analysis, we build and train a deep convolutional neural network model to predictively adjust camera gain and exposure time parameters such that consecutive images contain a maximal number of matchable features. Our training method leverages several novel datasets consisting of images captured with varied gain and exposure time settings in diverse environments. Through real-world experiments, we demonstrate that our network is able to anticipate and compensate for lighting changes and maintain a higher number of inlier feature matches compared with competing camera parameter control algorithms.},
  address = {Toronto, Ontario, Canada},
  author = {Justin Louis Tomasi},
  month = {September},
  school = {University of Toronto},
  title = {Learned Adjustment of Camera Gain and Exposure Time for Improved Visual Feature Detection and Matching},
  year = {2020}
}

Ensuring that captured images contain useful information is paramount to successful visual navigation. In this thesis, we explore a data-driven approach to account for environmental lighting changes, improving the quality of images for use in visual odometry (VO). We investigate what qualities of an image are desirable for navigation through an empirical analysis of the outputs of the VO front end. Based on this analysis, we build and train a deep convolutional neural network model to predictively adjust camera gain and exposure time parameters such that consecutive images contain a maximal number of matchable features. Our training method leverages several novel datasets consisting of images captured with varied gain and exposure time settings in diverse environments. Through real-world experiments, we demonstrate that our network is able to anticipate and compensate for lighting changes and maintain a higher number of inlier feature matches compared with competing camera parameter control algorithms.

Robust Data-Driven Zero-Velocity Detection for Foot-Mounted Inertial Navigation

B. Wagstaff, V. Peretroukhin, and J. Kelly

IEEE Sensors Journal, vol. 20, iss. 2, pp. 957-967, 2020.

DOI | Bibtex | Abstract | arXiv | Code

@article{2020_Wagstaff_Robust,
  abstract = {We present two novel techniques for detecting zero-velocity events to improve foot-mounted inertial navigation. Our first technique augments a classical zero-velocity detector by incorporating a motion classifier that adaptively updates the detector's threshold parameter. Our second technique uses a long short-term memory (LSTM) recurrent neural network to classify zero-velocity events from raw inertial data, in contrast to the majority of zero-velocity detection methods that rely on basic statistical hypothesis testing. We demonstrate that both of our proposed detectors achieve higher accuracies than existing detectors for trajectories including walking, running, and stair-climbing motions. Additionally, we present a straightforward data augmentation method that is able to extend the LSTM-based model to different inertial sensors without the need to collect new training data.},
  author = {Brandon Wagstaff and Valentin Peretroukhin and Jonathan Kelly},
  code = {https://github.com/utiasSTARS/pyshoe},
  doi = {10.1109/JSEN.2019.2944412},
  journal = {{IEEE} Sensors Journal},
  month = {January},
  number = {2},
  pages = {957--967},
  title = {Robust Data-Driven Zero-Velocity Detection for Foot-Mounted Inertial Navigation},
  url = {http://arxiv.org/abs/1910.00529},
  volume = {20},
  year = {2020}
}

We present two novel techniques for detecting zero-velocity events to improve foot-mounted inertial navigation. Our first technique augments a classical zero-velocity detector by incorporating a motion classifier that adaptively updates the detector's threshold parameter. Our second technique uses a long short-term memory (LSTM) recurrent neural network to classify zero-velocity events from raw inertial data, in contrast to the majority of zero-velocity detection methods that rely on basic statistical hypothesis testing. We demonstrate that both of our proposed detectors achieve higher accuracies than existing detectors for trajectories including walking, running, and stair-climbing motions. Additionally, we present a straightforward data augmentation method that is able to extend the LSTM-based model to different inertial sensors without the need to collect new training data.

Self-Supervised Deep Pose Corrections for Robust Visual Odometry

B. Wagstaff, V. Peretroukhin, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, May 31–Jun. 4, 2020, pp. 2331-2337.

@inproceedings{2020_Wagstaff_Self-Supervised,
  abstract = {We present a self-supervised deep pose correction (DPC) network that applies pose corrections to a visual odometry estimator to improve its accuracy. Instead of regressing inter-frame pose changes directly, we build on prior work that uses data-driven learning to regress pose corrections that account for systematic errors due to violations of modelling assumptions. Our self-supervised formulation removes any requirement for six-degrees-of-freedom ground truth and, in contrast to expectations, often improves overall navigation accuracy compared to a supervised approach. Through extensive experiments, we show that our self-supervised DPC network can significantly enhance the performance of classical monocular and stereo odometry estimators and substantially out-performs state-of-the-art learning-only approaches.},
  address = {Paris, France},
  author = {Brandon Wagstaff and Valentin Peretroukhin and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA})},
  code = {https://github.com/utiasSTARS/ss-dpc-net},
  date = {2020-05-31/2020-06-04},
  doi = {10.1109/ICRA40945.2020.9197562},
  month = {May 31--Jun. 4},
  pages = {2331--2337},
  title = {Self-Supervised Deep Pose Corrections for Robust Visual Odometry},
  url = {https://arxiv.org/abs/2002.12339},
  video1 = {https://www.youtube.com/watch?v=AvNBUK4lTMo},
  year = {2020}
}

We present a self-supervised deep pose correction (DPC) network that applies pose corrections to a visual odometry estimator to improve its accuracy. Instead of regressing inter-frame pose changes directly, we build on prior work that uses data-driven learning to regress pose corrections that account for systematic errors due to violations of modelling assumptions. Our self-supervised formulation removes any requirement for six-degrees-of-freedom ground truth and, in contrast to expectations, often improves overall navigation accuracy compared to a supervised approach. Through extensive experiments, we show that our self-supervised DPC network can significantly enhance the performance of classical monocular and stereo odometry estimators and substantially out-performs state-of-the-art learning-only approaches.

Certifiably Optimal Monocular Hand-Eye Calibration

E. Wise, M. Giamou, S. Khoubyarian, A. Grover, and J. Kelly

Proceedings of the IEEE International Conference on Multisensor Fusion and Integration (MFI), Karlsruhe, Germany, Sep. 14–16, 2020.

@inproceedings{2020_Wise_Certifiably,
  abstract = {Correct fusion of data from two sensors requires an accurate estimate of their relative pose, which can be determined through the process of extrinsic calibration. When the sensors are capable of producing their own egomotion estimates (i.e., measurements of their trajectories through an environment), the `hand-eye' formulation of extrinsic calibration can be employed. In this paper, we extend our recent work on a convex optimization approach for hand-eye calibration to the case where one of the sensors cannot observe the scale of its translational motion (e.g., a monocular camera observing an unmapped environment). We prove that our technique is able to provide a certifiably globally optimal solution to both the known- and unknown-scale variants of hand-eye calibration, provided that the measurement noise is bounded. Herein, we focus on the theoretical aspects of the problem, show the tightness and stability of our convex relaxation, and demonstrate the optimality and speed of our algorithm through experiments with synthetic data.},
  address = {Karlsruhe, Germany},
  author = {Emmett Wise and Matthew Giamou and Soroush Khoubyarian and Abhinav Grover and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Multisensor Fusion and Integration {(MFI)}},
  code = {https://github.com/utiasSTARS/certifiable-calibration},
  date = {2020-09-14/2020-09-16},
  doi = {10.1109/MFI49285.2020.9235219},
  month = {Sep. 14--16},
  title = {Certifiably Optimal Monocular Hand-Eye Calibration},
  url = {https://arxiv.org/abs/2005.08298},
  video1 = {https://www.youtube.com/watch?v=BdjGBvuaqVo},
  year = {2020}
}

Correct fusion of data from two sensors requires an accurate estimate of their relative pose, which can be determined through the process of extrinsic calibration. When the sensors are capable of producing their own egomotion estimates (i.e., measurements of their trajectories through an environment), the `hand-eye' formulation of extrinsic calibration can be employed. In this paper, we extend our recent work on a convex optimization approach for hand-eye calibration to the case where one of the sensors cannot observe the scale of its translational motion (e.g., a monocular camera observing an unmapped environment). We prove that our technique is able to provide a certifiably globally optimal solution to both the known- and unknown-scale variants of hand-eye calibration, provided that the measurement noise is bounded. Herein, we focus on the theoretical aspects of the problem, show the tightness and stability of our convex relaxation, and demonstrate the optimality and speed of our algorithm through experiments with synthetic data.

2019

Matchable Image Transformations for Long-term Metric Visual Localization

L. Clement, M. Gridseth, J. Tomasi, and J. Kelly

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Image Matching: Local Features & Beyond, Long Beach, California, USA, Jun. 16–20, 2019.

Bibtex | PDF

@inproceedings{2019_Clement_Matchable,
  address = {Long Beach, California, USA},
  author = {Lee Clement and Mona Gridseth and Justin Tomasi and Jonathan Kelly},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition {(CVPR)} Workshop on Image Matching: Local Features \& Beyond},
  date = {2019-06-16/2019-06-20},
  month = {Jun. 16--20},
  title = {Matchable Image Transformations for Long-term Metric Visual Localization},
  year = {2019}
}

Where Do We Go From Here? Debates on the Future of Robotics Research at ICRA 2019

L. Clement, V. Peretroukhin, M. Giamou, J. Leonard, H. Kress-Gazit, J. How, M. Milford, O. Brock, R. Gariepy, A. P. Schoellig, N. Roy, H. Siegel, L. Righetti, A. Billard, and J. Kelly

IEEE Robotics & Automation Magazine, vol. 26, iss. 3, pp. 7-10, 2019.

DOI | Bibtex | PDF

@article{2019_Clement_Where,
  author = {Lee Clement and Valentin Peretroukhin and Matthew Giamou and John Leonard and Hadas Kress-Gazit and Jonathan How and Michael Milford and Oliver Brock and Ryan Gariepy and Angela P. Schoellig and Nicholas Roy and Hallie Siegel and Ludovic Righetti and Aude Billard and Jonathan Kelly},
  doi = {10.1109/MRA.2019.2926934},
  journal = {{IEEE} Robotics \& Automation Magazine},
  month = {September},
  number = {3},
  pages = {7--10},
  title = {Where Do We Go From Here? Debates on the Future of Robotics Research at ICRA 2019},
  volume = {26},
  year = {2019}
}

Certifiably Globally Optimal Extrinsic Calibration from Per-Sensor Egomotion

M. Giamou, Z. Ma, V. Peretroukhin, and J. Kelly

IEEE Robotics and Automation Letters, vol. 4, iss. 2, pp. 367-374, 2019.

DOI | Bibtex | Abstract | arXiv | Code

@article{2019_Giamou_Certifiably,
  abstract = {We present a certifiably globally optimal algorithm for determining the extrinsic calibration between two sensors that are capable of producing independent egomotion estimates. This problem has been previously solved using a variety of techniques, including local optimization approaches that have no formal global optimality guarantees. We use a quadratic objective function to formulate calibration as a quadratically constrained quadratic program (QCQP). By leveraging recent advances in the optimization of QCQPs, we are able to use existing semidefinite program (SDP) solvers to obtain a certifiably global optimum via the Lagrangian dual problem. Our problem formulation can be globally optimized by existing general-purpose solvers in less than a second, regardless of the number of measurements available and the noise level. This enables a variety of robotic platforms to rapidly and robustly compute and certify a globally optimal set of calibration parameters without a prior estimate or operator intervention. We compare the performance of our approach with a local solver on extensive simulations and multiple real datasets. Finally, we present necessary observability conditions that connect our approach to recent theoretical results and analytically support the empirical performance of our system.},
  author = {Matthew Giamou and Ziye Ma and Valentin Peretroukhin and Jonathan Kelly},
  code = {https://github.com/utiasSTARS/certifiable-calibration},
  doi = {10.1109/LRA.2018.2890444},
  journal = {{IEEE} Robotics and Automation Letters},
  month = {April},
  number = {2},
  pages = {367--374},
  title = {Certifiably Globally Optimal Extrinsic Calibration from Per-Sensor Egomotion},
  url = {https://arxiv.org/abs/1809.03554},
  volume = {4},
  year = {2019}
}

We present a certifiably globally optimal algorithm for determining the extrinsic calibration between two sensors that are capable of producing independent egomotion estimates. This problem has been previously solved using a variety of techniques, including local optimization approaches that have no formal global optimality guarantees. We use a quadratic objective function to formulate calibration as a quadratically constrained quadratic program (QCQP). By leveraging recent advances in the optimization of QCQPs, we are able to use existing semidefinite program (SDP) solvers to obtain a certifiably global optimum via the Lagrangian dual problem. Our problem formulation can be globally optimized by existing general-purpose solvers in less than a second, regardless of the number of measurements available and the noise level. This enables a variety of robotic platforms to rapidly and robustly compute and certify a globally optimal set of calibration parameters without a prior estimate or operator intervention. We compare the performance of our approach with a local solver on extensive simulations and multiple real datasets. Finally, we present necessary observability conditions that connect our approach to recent theoretical results and analytically support the empirical performance of our system.

Leveraging Robotics Education to Improve Prosperity in Developing Nations: An Early Case Study in Myanmar

J. Kelly, H. Htet, and J. Dutra

Proceedings of the Do Good Robotics Symposium (DGRS), College Park, Maryland, USA, Oct. 3–4, 2019.

Bibtex | Abstract | PDF

@inproceedings{2019_Kelly_Leveraging,
  abstract = {Robotics can be a powerful educational tool: the topic is exciting, timely, and highly engaging.  Research has shown that robotics courses can drive students' interest in science, technology, engineering, and mathematics (STEM) careers. While many successful outreach and introductory programs exist in developed countries, an open question is how best to leverage the appeal of robotics to improve educational outcomes (and, ultimately, prosperity) in developing countries. What material is most relevant? How should that material be presented to engage with students? And how do we measure the impact of such initiatives? In this paper, we report on the design and delivery of a short course on self-driving vehicles for a group of students in the developing nation of Myanmar. The pilot program was facilitated through cooperation with Phandeeyar, a unique innovation hub and startup accelerator based in Yangon. We discuss the motivation for the program, the choice of topic, and the student experience. We close by offering some preliminary thoughts about quantifying the value of this type of robotics outreach effort and of robotics education, both in Myanmar and beyond.},
  address = {College Park, Maryland, USA},
  author = {Jonathan Kelly and Htoo Htet and Jo\~{a}o Dutra},
  booktitle = {Proceedings of the Do Good Robotics Symposium {(DGRS)}},
  date = {2019-10-03/2019-10-04},
  month = {Oct. 3--4},
  title = {Leveraging Robotics Education to Improve Prosperity in Developing Nations: An Early Case Study in Myanmar},
  year = {2019}
}

Robotics can be a powerful educational tool: the topic is exciting, timely, and highly engaging.  Research has shown that robotics courses can drive students' interest in science, technology, engineering, and mathematics (STEM) careers. While many successful outreach and introductory programs exist in developed countries, an open question is how best to leverage the appeal of robotics to improve educational outcomes (and, ultimately, prosperity) in developing countries. What material is most relevant? How should that material be presented to engage with students? And how do we measure the impact of such initiatives? In this paper, we report on the design and delivery of a short course on self-driving vehicles for a group of students in the developing nation of Myanmar. The pilot program was facilitated through cooperation with Phandeeyar, a unique innovation hub and startup accelerator based in Yangon. We discuss the motivation for the program, the choice of topic, and the student experience. We close by offering some preliminary thoughts about quantifying the value of this type of robotics outreach effort and of robotics education, both in Myanmar and beyond.

Fast Manipulability Maximization Using Continuous-Time Trajectory Optimization

F. Mari’c, O. Limoyo, L. Petrovi’c, T. Ablett, I. Petrovi’c, and J. Kelly

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, Nov. 4–8, 2019, pp. 8258-8264.

DOI | Bibtex | Abstract | arXiv | Video

@inproceedings{2019_Maric_Fast,
  abstract = {A significant challenge in manipulation motion planning is to ensure agility in the face of unpredictable changes during task execution. This requires the identification and possible modification of suitable joint-space trajectories, since the joint velocities required to achieve a specific end-effector motion vary with manipulator configuration. For a given manipulator configuration, the joint space-to-task space velocity mapping is characterized by a quantity known as the manipulability index. In contrast to previous control-based approaches, we examine the maximization of manipulability during planning as a way of achieving adaptable and safe joint space-to-task space motion mappings in various scenarios. By representing the manipulator trajectory as a continuous-time Gaussian process (GP), we are able to leverage recent advances in trajectory optimization to maximize the manipulability index during trajectory generation. Moreover, the sparsity of our chosen representation reduces the typically large computational cost associated with maximizing manipulability when additional constraints exist. Results from simulation studies and experiments with a real manipulator demonstrate increases in manipulability, while maintaining smooth trajectories with more dexterous (and therefore more agile) arm configurations.},
  address = {Macau, China},
  author = {Filip Mari\'{c} and Oliver Limoyo and Luka Petrovi\'{c} and Trevor Ablett and Ivan Petrovi\'{c} and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS)}},
  date = {2019-11-04/2019-11-08},
  doi = {10.1109/IROS40897.2019.8968441},
  month = {Nov. 4--8},
  pages = {8258--8264},
  title = {Fast Manipulability Maximization Using Continuous-Time Trajectory Optimization},
  url = {https://arxiv.org/abs/1908.02963},
  video1 = {https://www.youtube.com/watch?v=tB34VfDrF84},
  year = {2019}
}

A significant challenge in manipulation motion planning is to ensure agility in the face of unpredictable changes during task execution. This requires the identification and possible modification of suitable joint-space trajectories, since the joint velocities required to achieve a specific end-effector motion vary with manipulator configuration. For a given manipulator configuration, the joint space-to-task space velocity mapping is characterized by a quantity known as the manipulability index. In contrast to previous control-based approaches, we examine the maximization of manipulability during planning as a way of achieving adaptable and safe joint space-to-task space motion mappings in various scenarios. By representing the manipulator trajectory as a continuous-time Gaussian process (GP), we are able to leverage recent advances in trajectory optimization to maximize the manipulability index during trajectory generation. Moreover, the sparsity of our chosen representation reduces the typically large computational cost associated with maximizing manipulability when additional constraints exist. Results from simulation studies and experiments with a real manipulator demonstrate increases in manipulability, while maintaining smooth trajectories with more dexterous (and therefore more agile) arm configurations.

An Observability Based Approach to Flight Path Reconstruction of Uninformative Coupled Aircraft Trajectories: A Case Study Considering Stall Maneuvers for Aircraft Certification

G. Moszczynski, M. Giamou, J. Leung, J. Kelly, and P. Grant

AIAA Science and Technology Forum and Exposition (AIAA SciTech), San Diego, California, USA, Jan. 7–11, 2019.

Bibtex | Abstract

@inproceedings{2019_Moszczynski_Observability,
  abstract = {Based on the demonstrated efficacy of observability metrics in the realm of informative trajectory optimization for sensor calibration, the application of such metrics within the context of flight path reconstruction is investigated. The minimum singular value of the observability Gramian is adopted to describe flight test information content, and used to mathematically characterize parameter estimation difficulties discussed throughout the body of literature on flight path reconstruction. A metric for total information content of a set of flight test experiments is then presented and used to motivate FPR based on multiple flight test experiments. A highly efficient maximum a posteriori trajectory estimation scheme accommodating the use of multiple flight test experiments is then presented. The finalization of this work will present the application of the adopted information metric and developed estimation scheme to a case study concerning reconstruction of stall maneuver data with poor information content collected for aircraft certification purposes.},
  address = {San Diego, California, USA},
  author = {Gregory Moszczynski and Matthew Giamou and Jordan Leung and Jonathan Kelly and Peter Grant},
  booktitle = {{AIAA} Science and Technology Forum and Exposition {(AIAA SciTech)}},
  date = {2019-01-07/2019-01-11},
  month = {Jan. 7--11},
  title = {An Observability Based Approach to Flight Path Reconstruction of Uninformative Coupled Aircraft Trajectories: A Case Study Considering Stall Maneuvers for Aircraft Certification},
  year = {2019}
}

Based on the demonstrated efficacy of observability metrics in the realm of informative trajectory optimization for sensor calibration, the application of such metrics within the context of flight path reconstruction is investigated. The minimum singular value of the observability Gramian is adopted to describe flight test information content, and used to mathematically characterize parameter estimation difficulties discussed throughout the body of literature on flight path reconstruction. A metric for total information content of a set of flight test experiments is then presented and used to motivate FPR based on multiple flight test experiments. A highly efficient maximum a posteriori trajectory estimation scheme accommodating the use of multiple flight test experiments is then presented. The finalization of this work will present the application of the adopted information metric and developed estimation scheme to a case study concerning reconstruction of stall maneuver data with poor information content collected for aircraft certification purposes.

Deep Probabilistic Regression of Elements of SO(3) using Quaternion Averaging and Uncertainty Injection

V. Peretroukhin, B. Wagstaff, and J. Kelly

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Uncertainty and Robustness in Deep Visual Learning, Long Beach, California, USA, Jun. 16–20, 2019, pp. 83-86.

Bibtex | Abstract | arXiv | Code

@inproceedings{2019_Peretroukhin_Deep,
  abstract = {Consistent estimates of rotation are crucial to vision- based motion estimation in augmented reality and robotics. In this work, we present a method to extract probabilistic estimates of rotation from deep regression models. First, we build on prior work and develop a multi-headed network structure we name HydraNet that can account for both aleatoric and epistemic uncertainty. Second, we extend HydraNet to targets that belong to the rotation group, SO(3), by regressing unit quaternions and using the tools of rotation averaging and uncertainty injection onto the manifold to produce three-dimensional covariances. Finally, we present results and analysis on a synthetic dataset, learn consistent orientation estimates on the 7-Scenes dataset, and show how we can use our learned covariances to fuse deep estimates of relative orientation with classical stereo visual odometry to improve localization on the KITTI dataset.},
  address = {Long Beach, California, USA},
  author = {Valentin Peretroukhin and Brandon Wagstaff and Jonathan Kelly},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition {(CVPR)} Workshop on Uncertainty and Robustness in Deep Visual Learning},
  code = {https://github.com/utiasSTARS/so3_learning},
  date = {2019-06-16/2019-06-20},
  longurl = {http://openaccess.thecvf.com/content_CVPRW_2019/papers/Uncertainty%20and%20Robustness%20in%20Deep%20Visual%20Learning/Peretroukhin_Deep_Probabilistic_Regression_of_Elements_of_SO3_using_Quaternion_Averaging_CVPRW_2019_paper.pdf},
  month = {Jun. 16--20},
  pages = {83--86},
  title = {Deep Probabilistic Regression of Elements of SO(3) using Quaternion Averaging and Uncertainty Injection},
  url = {https://arxiv.org/abs/1904.03182},
  year = {2019}
}

Consistent estimates of rotation are crucial to vision- based motion estimation in augmented reality and robotics. In this work, we present a method to extract probabilistic estimates of rotation from deep regression models. First, we build on prior work and develop a multi-headed network structure we name HydraNet that can account for both aleatoric and epistemic uncertainty. Second, we extend HydraNet to targets that belong to the rotation group, SO(3), by regressing unit quaternions and using the tools of rotation averaging and uncertainty injection onto the manifold to produce three-dimensional covariances. Finally, we present results and analysis on a synthetic dataset, learn consistent orientation estimates on the 7-Scenes dataset, and show how we can use our learned covariances to fuse deep estimates of relative orientation with classical stereo visual odometry to improve localization on the KITTI dataset.

The Phoenix Drone: An Open-Source Dual-Rotor Tail-Sitter Platform for Research and Education

Y. Wu, X. Du, R. Duivenvoorden, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montréal, Québec, Canada, May 20–24, 2019, pp. 5330-5336.

@inproceedings{2019_Wu_Phoenix,
  abstract = {In this paper, we introduce the Phoenix drone: the first completely open-source tail-sitter micro aerial vehicle (MAV) platform. The vehicle has a highly versatile, dual-rotor design and is  engineered to be low-cost and easily extensible/modifiable. Our open-source release includes all of the design documents, software resources, and simulation tools needed to build and fly a high-performance tail-sitter for research and educational purposes.

The drone has been developed for precision flight with a high degree of control authority. Our design methodology included extensive testing and characterization of the aerodynamic properties of the vehicle. The platform incorporates many off-the-shelf components and 3D-printed parts, in order to keep the cost down. Nonetheless, the paper includes results from flight trials which demonstrate that the vehicle is capable of very stable hovering and accurate trajectory tracking.

Our hope is that the open-source Phoenix reference design will be useful to both researchers and educators. In particular, the details in this paper and the available open-source materials should enable learners to gain an understanding of aerodynamics, flight control, state estimation, software design, and simulation, while experimenting with a unique aerial robot.},
  address = {Montr\'{e}al, Qu\'{e}bec, Canada},
  author = {Yilun Wu and Xintong Du and Rikky Duivenvoorden and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA})},
  code = {https://github.com/utiasSTARS/PhoenixDrone},
  date = {2019-05-20/2019-05-24},
  doi = {10.1109/ICRA.2019.8794433},
  month = {May 20--24},
  pages = {5330--5336},
  title = {The Phoenix Drone: An Open-Source Dual-Rotor Tail-Sitter Platform for Research and Education},
  url = {https://arxiv.org/abs/1810.03196},
  video1 = {https://www.youtube.com/watch?v=VSAk3Z0G08Q},
  year = {2019}
}

In this paper, we introduce the Phoenix drone: the first completely open-source tail-sitter micro aerial vehicle (MAV) platform. The vehicle has a highly versatile, dual-rotor design and is  engineered to be low-cost and easily extensible/modifiable. Our open-source release includes all of the design documents, software resources, and simulation tools needed to build and fly a high-performance tail-sitter for research and educational purposes. The drone has been developed for precision flight with a high degree of control authority. Our design methodology included extensive testing and characterization of the aerodynamic properties of the vehicle. The platform incorporates many off-the-shelf components and 3D-printed parts, in order to keep the cost down. Nonetheless, the paper includes results from flight trials which demonstrate that the vehicle is capable of very stable hovering and accurate trajectory tracking. Our hope is that the open-source Phoenix reference design will be useful to both researchers and educators. In particular, the details in this paper and the available open-source materials should enable learners to gain an understanding of aerodynamics, flight control, state estimation, software design, and simulation, while experimenting with a unique aerial robot.

2018

How to Train a CAT: Learning Canonical Appearance Transformations for Robust Direct Localization Under Illumination Change

L. Clement and J. Kelly

IEEE Robotics and Automation Letters, vol. 3, iss. 3, pp. 2447-2454, 2018.

@article{2018_Clement_Learning,
  abstract = {Direct visual localization has recently enjoyed a resurgence in popularity with the increasing availability of cheap mobile computing power. The competitive accuracy and robustness of these algorithms compared to state-of-the-art feature-based methods, as well as their natural ability to yield dense maps, makes them an appealing choice for a variety of mobile robotics applications. However, direct methods remain brittle in the face of appearance change due to their underlying assumption of photometric consistency, which is commonly violated in practice. In this paper, we propose to mitigate this problem by training deep convolutional encoder-decoder models to transform images of a scene such that they correspond to a chosen canonical appearance such as static diffuse illumination. We validate our method in multiple environments and illumination conditions using high-fidelity synthetic RGB-D datasets, and integrate the trained models into a direct visual localization pipeline, yielding improvements in visual odometry (VO) accuracy through time-varying illumination conditions, as well as improved relocalization performance under illumination change, where conventional methods normally fail.},
  author = {Lee Clement and Jonathan Kelly},
  code = {https://github.com/utiasSTARS/cat-net},
  doi = {10.1109/LRA.2018.2799741},
  journal = {{IEEE} Robotics and Automation Letters},
  month = {July},
  number = {3},
  pages = {2447--2454},
  title = {How to Train a {CAT}: Learning Canonical Appearance Transformations for Robust Direct Localization Under Illumination Change},
  url = {https://arxiv.org/abs/1709.03009},
  video1 = {https://www.youtube.com/watch?v=ej6VNBq3dDE},
  volume = {3},
  year = {2018}
}

Direct visual localization has recently enjoyed a resurgence in popularity with the increasing availability of cheap mobile computing power. The competitive accuracy and robustness of these algorithms compared to state-of-the-art feature-based methods, as well as their natural ability to yield dense maps, makes them an appealing choice for a variety of mobile robotics applications. However, direct methods remain brittle in the face of appearance change due to their underlying assumption of photometric consistency, which is commonly violated in practice. In this paper, we propose to mitigate this problem by training deep convolutional encoder-decoder models to transform images of a scene such that they correspond to a chosen canonical appearance such as static diffuse illumination. We validate our method in multiple environments and illumination conditions using high-fidelity synthetic RGB-D datasets, and integrate the trained models into a direct visual localization pipeline, yielding improvements in visual odometry (VO) accuracy through time-varying illumination conditions, as well as improved relocalization performance under illumination change, where conventional methods normally fail.

Overcoming the Challenges of Solar Rover Autonomy: Enabling Long-Duration Planetary Navigation

O. Lamarre and J. Kelly

Proceedings of the International Symposium on Artificial Intelligence, Robotics and Automation in Space (i-SAIRAS), Madrid, Spain, Jun. 4–6, 2018.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2018_Lamarre_Overcoming,
  abstract = {The successes of previous and current Mars rovers have encouraged space agencies worldwide to pursue additional planetary exploration missions with more ambitious navigation goals. For example, NASA's planned Mars Sample Return mission will be a multi-year undertaking that will require a solar-powered rover to drive over 150 metres per sol for approximately three months. This paper reviews the mobility planning framework used by current rovers and surveys the major challenges involved in continuous long-distance navigation on the Red Planet. It also discusses recent work related to environment-aware and energy-aware navigation, and provides a perspective on how such work may eventually allow a solar-powered rover to achieve autonomous long-distance navigation on Mars.},
  address = {Madrid, Spain},
  author = {Olivier Lamarre and Jonathan Kelly},
  booktitle = {Proceedings of the International Symposium on Artificial Intelligence, Robotics and Automation in Space {(i-SAIRAS)}},
  date = {2018-06-04/2018-06-06},
  doi = {10.48550/arXiv.1805.05451},
  month = {Jun. 4--6},
  title = {Overcoming the Challenges of Solar Rover Autonomy: Enabling Long-Duration Planetary Navigation},
  url = {https://arxiv.org/abs/1805.05451},
  year = {2018}
}

The successes of previous and current Mars rovers have encouraged space agencies worldwide to pursue additional planetary exploration missions with more ambitious navigation goals. For example, NASA's planned Mars Sample Return mission will be a multi-year undertaking that will require a solar-powered rover to drive over 150 metres per sol for approximately three months. This paper reviews the mobility planning framework used by current rovers and surveys the major challenges involved in continuous long-distance navigation on the Red Planet. It also discusses recent work related to environment-aware and energy-aware navigation, and provides a perspective on how such work may eventually allow a solar-powered rover to achieve autonomous long-distance navigation on Mars.

Self-Calibration of Mobile Manipulator Kinematic and Sensor Extrinsic Parameters Through Contact-Based Interaction

O. Limoyo, T. Ablett, F. Mari’c, L. Volpatti, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Queensland, Australia, May 21–25, 2018.

DOI | Bibtex | Abstract | arXiv | Video

@inproceedings{2018_Limoyo_Self-Calibration,
  abstract = {We present a novel approach for mobile manipulator self-calibration using contact information. Our method, based on point cloud registration, is applied to estimate the extrinsic transform between a fixed vision sensor mounted on a mobile base and an end effector. Beyond sensor calibration, we demonstrate that the method can be extended to include manipulator kinematic model parameters, which involves a non-rigid registration process. Our procedure uses on-board sensing exclusively and does not rely on any external measurement devices, fiducial markers, or calibration rigs. Further, it is fully automatic in the general case. We experimentally validate the proposed method on a custom mobile manipulator platform, and demonstrate centimetre-level post-calibration accuracy in positioning of the end effector using visual guidance only. We also discuss the stability properties of the registration algorithm, in order to determine the conditions under which calibration is possible.},
  address = {Brisbane, Queensland, Australia},
  author = {Oliver Limoyo and Trevor Ablett and Filip Mari\'{c} and Luke Volpatti and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA})},
  date = {2018-05-21/2018-05-25},
  doi = {10.1109/ICRA.2018.8460658},
  month = {May 21--25},
  title = {Self-Calibration of Mobile Manipulator Kinematic and Sensor Extrinsic Parameters Through Contact-Based Interaction},
  url = {https://arxiv.org/abs/1803.06406},
  video1 = {https://www.youtube.com/watch?v=cz9UB-BcGA0},
  year = {2018}
}

We present a novel approach for mobile manipulator self-calibration using contact information. Our method, based on point cloud registration, is applied to estimate the extrinsic transform between a fixed vision sensor mounted on a mobile base and an end effector. Beyond sensor calibration, we demonstrate that the method can be extended to include manipulator kinematic model parameters, which involves a non-rigid registration process. Our procedure uses on-board sensing exclusively and does not rely on any external measurement devices, fiducial markers, or calibration rigs. Further, it is fully automatic in the general case. We experimentally validate the proposed method on a custom mobile manipulator platform, and demonstrate centimetre-level post-calibration accuracy in positioning of the end effector using visual guidance only. We also discuss the stability properties of the registration algorithm, in order to determine the conditions under which calibration is possible.

Manipulability Maximization Using Continuous-Time Gaussian Processes

F. Mari’c, O. Limoyo, L. Petrovi’c, I. Petrovi’c, and J. Kelly

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Workshop Towards Robots that Exhibit Manipulation Intelligence, Madrid, Spain, Oct. 1, 2018.

Bibtex | Abstract | arXiv

@inproceedings{2018_Maric_Manipulability,
  abstract = {A significant challenge in motion planning is to avoid being in or near singular configurations (singularities), that is, joint configurations that result in the loss of the ability to move in certain directions in task space. A robotic system's capacity for motion is reduced even in regions that are in close proximity to (i.e., neighbouring) a singularity. In this work we examine singularity avoidance in a motion planning context, finding trajectories which minimize proximity to singular regions, subject to constraints. We define a manipulability-based likelihood associated with singularity avoidance over a continuous trajectory representation, which we then maximize using a maximum a posteriori (MAP) estimator. Viewing the MAP problem as inference on a factor graph, we use gradient information from interpolated states to maximize the trajectory's overall manipulability. Both qualitative and quantitative analyses of experimental data show increases in manipulability that result in smooth trajectories with visibly more dexterous arm configurations.},
  address = {Madrid, Spain},
  author = {Filip Mari\'{c} and Oliver Limoyo and Luka Petrovi\'{c} and Ivan Petrovi\'{c} and Jonathan Kelly},
  booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems {(IROS)} Workshop Towards Robots that Exhibit Manipulation Intelligence},
  date = {2018-10-01},
  month = {Oct. 1},
  title = {Manipulability Maximization Using Continuous-Time Gaussian Processes},
  url = {https://arxiv.org/abs/1803.09493},
  year = {2018}
}

A significant challenge in motion planning is to avoid being in or near singular configurations (singularities), that is, joint configurations that result in the loss of the ability to move in certain directions in task space. A robotic system's capacity for motion is reduced even in regions that are in close proximity to (i.e., neighbouring) a singularity. In this work we examine singularity avoidance in a motion planning context, finding trajectories which minimize proximity to singular regions, subject to constraints. We define a manipulability-based likelihood associated with singularity avoidance over a continuous trajectory representation, which we then maximize using a maximum a posteriori (MAP) estimator. Viewing the MAP problem as inference on a factor graph, we use gradient information from interpolated states to maximize the trajectory's overall manipulability. Both qualitative and quantitative analyses of experimental data show increases in manipulability that result in smooth trajectories with visibly more dexterous arm configurations.

Unified Spatiotemporal Calibration of Egomotion Sensors and 2D Lidars in Arbitrary Environments

J. Marr

Master Thesis , University of Toronto, Toronto, Ontario, Canada, 2018.

Bibtex | Abstract | PDF

@mastersthesis{2018_Marr_Unified,
  abstract = {This thesis aims to develop an automatic spatiotemporal calibration routine for lidars and egomotion sensors that relaxes many common requirements, such as the need for overlapping sensor fields of view, or calibration targets with known dimensions. In particular, a set of entropy-based calibration algorithms are extended to allow estimation of sensor clock time offsets in tandem with sensor-to-sensor spatial transformations. A novel Bayesian optimization routine is developed to address the non-smooth behaviour observed in the entropy cost function at small scales. The routine is tested on both simulation and real world data. Simulation results show that, given a set of lidar data taken from many different viewpoints, the calibration can be constrained to within less than 5 mm, 0.1 degrees, and 0.15 ms in the translational, rotational, and time-delay parameters respectively. For real-world data, in the absence of a reliable ground truth, we present results that show a repeatability of +- 4 mm, 1 degree, and 0.1 ms. When a monocular camera is used as the egomotion sensor, the routine is able to resolve the scale of the trajectory. A very brief analysis of the applicability of the method to Inertial Measurement Unit (IMU) to lidar calibration is presented.},
  address = {Toronto, Ontario, Canada},
  author = {Jordan Marr},
  month = {September},
  school = {University of Toronto},
  title = {Unified Spatiotemporal Calibration of Egomotion Sensors and {2D} Lidars in Arbitrary Environments},
  year = {2018}
}

This thesis aims to develop an automatic spatiotemporal calibration routine for lidars and egomotion sensors that relaxes many common requirements, such as the need for overlapping sensor fields of view, or calibration targets with known dimensions. In particular, a set of entropy-based calibration algorithms are extended to allow estimation of sensor clock time offsets in tandem with sensor-to-sensor spatial transformations. A novel Bayesian optimization routine is developed to address the non-smooth behaviour observed in the entropy cost function at small scales. The routine is tested on both simulation and real world data. Simulation results show that, given a set of lidar data taken from many different viewpoints, the calibration can be constrained to within less than 5 mm, 0.1 degrees, and 0.15 ms in the translational, rotational, and time-delay parameters respectively. For real-world data, in the absence of a reliable ground truth, we present results that show a repeatability of +- 4 mm, 1 degree, and 0.1 ms. When a monocular camera is used as the egomotion sensor, the routine is able to resolve the scale of the trajectory. A very brief analysis of the applicability of the method to Inertial Measurement Unit (IMU) to lidar calibration is presented.

DPC-Net: Deep Pose Correction for Visual Localization

V. Peretroukhin and J. Kelly

IEEE Robotics and Automation Letters, vol. 3, iss. 3, pp. 2424-2431, 2018.

@article{2018_Peretroukhin_Deep,
  abstract = {We present a novel method to fuse the power of deep networks with the computational efficiency of geometric and probabilistic localization algorithms. In contrast to other methods that completely replace a classical visual estimator with a deep network, we propose an approach that uses a convolutional neural network to learn difficult-to-model \textit{corrections} to the estimator from ground-truth training data. To this end, we derive a novel loss function for learning SE{3} corrections based on a matrix Lie groups approach, with a natural formulation for balancing translation and rotation errors. We use this loss to train  a Deep Pose Correction network (DPC-Net) that learns to predict corrections for a particular estimator, sensor and environment. Using the KITTI odometry dataset, we demonstrate significant improvements to the accuracy of a computationally-efficient sparse stereo visual odometry pipeline, that render it as accurate as a modern computationally-intensive dense estimator. Further, we show how DPC-Net can be used to mitigate the effect of poorly calibrated lens distortion parameters.},
  author = {Valentin Peretroukhin and Jonathan Kelly},
  code = {https://github.com/utiasSTARS/dpc-net},
  doi = {10.1109/LRA.2017.2778765},
  journal = {{IEEE} Robotics and Automation Letters},
  month = {July},
  number = {3},
  pages = {2424--2431},
  title = {{DPC-Net}: Deep Pose Correction for Visual Localization},
  url = {https://arxiv.org/abs/1709.03128},
  video1 = {https://www.youtube.com/watch?v=j9jnLldUAkc},
  volume = {3},
  year = {2018}
}

We present a novel method to fuse the power of deep networks with the computational efficiency of geometric and probabilistic localization algorithms. In contrast to other methods that completely replace a classical visual estimator with a deep network, we propose an approach that uses a convolutional neural network to learn difficult-to-model \textitcorrections to the estimator from ground-truth training data. To this end, we derive a novel loss function for learning SE3 corrections based on a matrix Lie groups approach, with a natural formulation for balancing translation and rotation errors. We use this loss to train  a Deep Pose Correction network (DPC-Net) that learns to predict corrections for a particular estimator, sensor and environment. Using the KITTI odometry dataset, we demonstrate significant improvements to the accuracy of a computationally-efficient sparse stereo visual odometry pipeline, that render it as accurate as a modern computationally-intensive dense estimator. Further, we show how DPC-Net can be used to mitigate the effect of poorly calibrated lens distortion parameters.

Inferring Sun Direction to Improve Visual Odometry: A Deep Learning Approach

V. Peretroukhin, L. Clement, and J. Kelly

The International Journal of Robotics Research, vol. 37, iss. 9, pp. 996-1016, 2018.

DOI | Bibtex | Abstract | Code

@article{2018_Peretroukhin_Inferring,
  abstract = {We present a method to incorporate global orientation information from the sun into a visual odometry pipeline using only the existing image stream, in which the sun is typically not visible. We leverage recent advances in Bayesian convolutional neural networks (BCNNs) to train and implement a sun detection model (dubbed Sun-BCNN) that infers a 3D sun direction vector from a single RGB image. Crucially, our method also computes a principled uncertainty associated with each prediction, using a Monte Carlo dropout scheme. We incorporate this uncertainty into a sliding window stereo visual odometry pipeline where accurate uncertainty estimates are critical for optimal data fusion. We evaluate our method on 21.6 km of urban driving data from the KITTI odometry benchmark where it achieves a median error of approximately 12 degrees and yields improvements of up to 42\% in translational average root mean squared error (ARMSE) and 32\% in rotational ARMSE compared with standard visual odometry. We further evaluate our method on an additional 10 km of visual navigation data from the Devon Island Rover Navigation dataset, achieving a median error of less than 8 degrees and yielding similar improvements in estimation error. In addition to reporting on the accuracy of Sun-BCNN and its impact on visual odometry, we analyze the sensitivity of our model to cloud cover, investigate the possibility of model transfer between urban and planetary analogue environments, and examine the impact of different methods for computing the mean and covariance of a norm-constrained vector on the accuracy and consistency of the estimated sun directions. Finally, we release Sun-BCNN as open-source software.},
  author = {Valentin Peretroukhin and Lee Clement and Jonathan Kelly},
  code = {https://github.com/utiasSTARS/sun-bcnn},
  doi = {10.1177/0278364917749732},
  journal = {The International Journal of Robotics Research},
  month = {August},
  number = {9},
  pages = {996--1016},
  title = {Inferring Sun Direction to Improve Visual Odometry: A Deep Learning Approach},
  volume = {37},
  year = {2018}
}

We present a method to incorporate global orientation information from the sun into a visual odometry pipeline using only the existing image stream, in which the sun is typically not visible. We leverage recent advances in Bayesian convolutional neural networks (BCNNs) to train and implement a sun detection model (dubbed Sun-BCNN) that infers a 3D sun direction vector from a single RGB image. Crucially, our method also computes a principled uncertainty associated with each prediction, using a Monte Carlo dropout scheme. We incorporate this uncertainty into a sliding window stereo visual odometry pipeline where accurate uncertainty estimates are critical for optimal data fusion. We evaluate our method on 21.6 km of urban driving data from the KITTI odometry benchmark where it achieves a median error of approximately 12 degrees and yields improvements of up to 42\% in translational average root mean squared error (ARMSE) and 32\% in rotational ARMSE compared with standard visual odometry. We further evaluate our method on an additional 10 km of visual navigation data from the Devon Island Rover Navigation dataset, achieving a median error of less than 8 degrees and yielding similar improvements in estimation error. In addition to reporting on the accuracy of Sun-BCNN and its impact on visual odometry, we analyze the sensitivity of our model to cloud cover, investigate the possibility of model transfer between urban and planetary analogue environments, and examine the impact of different methods for computing the mean and covariance of a norm-constrained vector on the accuracy and consistency of the estimated sun directions. Finally, we release Sun-BCNN as open-source software.

Near-Optimal Budgeted Data Exchange for Distributed Loop Closure Detection

Y. Tian, K. Khosoussi, M. Giamou, J. Kelly, and J. How

Proceedings of Robotics: Science and Systems (RSS), Pittsburgh, Pennsylvania, USA, Jun. 26–28, 2018.

Bibtex | Abstract | arXiv | Code

@inproceedings{2018_Tian_Near-Optimal,
  abstract = {Inter-robot loop closure detection is a core problem in collaborative SLAM (CSLAM). Establishing inter-robot loop closures is a resource-demanding process, during which robots must consume a substantial amount of mission-critical resources (e.g., battery and bandwidth) to exchange sensory data. However, even with the most resource-efficient techniques, the resources available onboard may be insufficient for verifying every potential loop closure. This work addresses this critical challenge by proposing a resource-adaptive framework for distributed loop closure detection. We seek to maximize task-oriented objectives subject to a budget constraint on total data transmission. This problem is in general NP-hard. We approach this problem from different perspectives and leverage existing results on monotone submodular maximization to provide efficient approximation algorithms with performance guarantees. The proposed approach is extensively evaluated using the KITTI odometry benchmark dataset and synthetic Manhattan-like datasets.},
  address = {Pittsburgh, Pennsylvania, USA},
  author = {Yulun Tian and Kasra Khosoussi and Matthew Giamou and Jonathan Kelly and Jonathan How},
  booktitle = {Proceedings of Robotics: Science and Systems {(RSS)}},
  code = {https://github.com/utiasSTARS/cslam-resource},
  date = {2018-06-26/2018-06-28},
  month = {Jun. 26--28},
  title = {Near-Optimal Budgeted Data Exchange for Distributed Loop Closure Detection},
  url = {http://www.roboticsproceedings.org/rss14/p71.pdf},
  year = {2018}
}

Inter-robot loop closure detection is a core problem in collaborative SLAM (CSLAM). Establishing inter-robot loop closures is a resource-demanding process, during which robots must consume a substantial amount of mission-critical resources (e.g., battery and bandwidth) to exchange sensory data. However, even with the most resource-efficient techniques, the resources available onboard may be insufficient for verifying every potential loop closure. This work addresses this critical challenge by proposing a resource-adaptive framework for distributed loop closure detection. We seek to maximize task-oriented objectives subject to a budget constraint on total data transmission. This problem is in general NP-hard. We approach this problem from different perspectives and leverage existing results on monotone submodular maximization to provide efficient approximation algorithms with performance guarantees. The proposed approach is extensively evaluated using the KITTI odometry benchmark dataset and synthetic Manhattan-like datasets.

Load Sharing — Obstacle Avoidance and Admittance Control on a Mobile Manipulator

T. Ulrich

Master Thesis , Swiss Federal Institute of Technology Zürich, Zürich, Switzerland, 2018.

Bibtex | Abstract

@mastersthesis{2018_Ulrich_Load,
  abstract = {We present an implementation of a load-sharing algorithm between a human and a robot partner, designed to jointly carry an object in an indoor cluttered environment, of which the robot has no prior. We review the state of human-robot interaction in general and deploy cooperation, using information exchange through forces and torque applied to the jointly handled object. The work is set within the master-slave paradigm and combines an admittance controller with obstacle avoidance to ensure pro-active behaviour on the robot side and collision free trajectories at all times. We derive the implementation from existing literature and validate the working algorithm on a mobile manipulator, consisting of a Clearpath Ridgeback platform, a Universal Robot 10 and a Robotiq three finger gripper.},
  address = {Z\"{u}rich, Switzerland},
  author = {Tobias Ulrich},
  month = {September},
  school = {Swiss Federal Institute of Technology Z\"{u}rich},
  title = {Load Sharing -- Obstacle Avoidance and Admittance Control on a Mobile Manipulator},
  year = {2018}
}

We present an implementation of a load-sharing algorithm between a human and a robot partner, designed to jointly carry an object in an indoor cluttered environment, of which the robot has no prior. We review the state of human-robot interaction in general and deploy cooperation, using information exchange through forces and torque applied to the jointly handled object. The work is set within the master-slave paradigm and combines an admittance controller with obstacle avoidance to ensure pro-active behaviour on the robot side and collision free trajectories at all times. We derive the implementation from existing literature and validate the working algorithm on a mobile manipulator, consisting of a Clearpath Ridgeback platform, a Universal Robot 10 and a Robotiq three finger gripper.

LSTM-Based Zero-Velocity Detection for Robust Inertial Navigation

B. Wagstaff and J. Kelly

Proceedings of the International Conference on Indoor Positioning and Indoor Navigation (IPIN), Nantes, France, Sep. 24–27, 2018.

DOI | Bibtex | Abstract | arXiv | Video

@inproceedings{2018_Wagstaff_LSTM-Based,
  abstract = {We present a method to improve the accuracy of a zero-velocity-aided inertial navigation system (INS) by replacing the standard zero-velocity detector with a long short-term memory (LSTM) neural network.  While existing threshold-based zero-velocity detectors are not robust to varying motion types, our learned model accurately detects stationary periods of the inertial measurement unit (IMU) despite changes in the motion of the user. Upon detection, zero-velocity pseudo-measurements are fused with a dead reckoning motion model in an extended Kalman filter (EKF). We demonstrate that our LSTM-based zero-velocity detector, used within a zero-velocity-aided INS, improves zero-velocity detection during human localization tasks. Consequently, localization accuracy is also improved.

Our system is evaluated on more than 7.5 km of indoor pedestrian locomotion data, acquired from five different subjects. We show that 3D positioning error is reduced by over 34\% compared to existing fixed-threshold zero-velocity detectors for walking, running, and stair climbing motions. Additionally, we demonstrate how our learned zero-velocity detector operates effectively during crawling and ladder climbing. Our system is calibration-free (no careful threshold-tuning is required) and operates consistently with differing users, IMU placements, and shoe types, while being compatible with any generic zero-velocity-aided INS.},
  address = {Nantes, France},
  author = {Brandon Wagstaff and Jonathan Kelly},
  booktitle = {Proceedings of the International Conference on Indoor Positioning and Indoor Navigation {(IPIN)}},
  date = {2018-09-24/2018-09-27},
  doi = {10.1109/IPIN.2018.8533770},
  month = {Sep. 24--27},
  note = {Best Student Paper Runner-Up},
  title = {LSTM-Based Zero-Velocity Detection for Robust Inertial Navigation},
  url = {http://arxiv.org/abs/1807.05275},
  video1 = {https://www.youtube.com/watch?v=PhmZ8NMoh2s},
  year = {2018}
}

We present a method to improve the accuracy of a zero-velocity-aided inertial navigation system (INS) by replacing the standard zero-velocity detector with a long short-term memory (LSTM) neural network.  While existing threshold-based zero-velocity detectors are not robust to varying motion types, our learned model accurately detects stationary periods of the inertial measurement unit (IMU) despite changes in the motion of the user. Upon detection, zero-velocity pseudo-measurements are fused with a dead reckoning motion model in an extended Kalman filter (EKF). We demonstrate that our LSTM-based zero-velocity detector, used within a zero-velocity-aided INS, improves zero-velocity detection during human localization tasks. Consequently, localization accuracy is also improved. Our system is evaluated on more than 7.5 km of indoor pedestrian locomotion data, acquired from five different subjects. We show that 3D positioning error is reduced by over 34\% compared to existing fixed-threshold zero-velocity detectors for walking, running, and stair climbing motions. Additionally, we demonstrate how our learned zero-velocity detector operates effectively during crawling and ladder climbing. Our system is calibration-free (no careful threshold-tuning is required) and operates consistently with differing users, IMU placements, and shoe types, while being compatible with any generic zero-velocity-aided INS.

Best Student Paper Runner-Up

2017

Cheap or Robust? The Practical Realization of Self-Driving Wheelchair Technology

M. Burhanpurkar, M. Labbé, C. Guan, F. Michaud, and J. Kelly

Proceedings of the IEEE International Conference on Rehabilitation Robotics (ICORR), London, United Kingdom, Jul. 17–20, 2017, pp. 1079-1086.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2017_Burhanpurkar_Cheap,
  abstract = {To date, self-driving experimental wheelchair tech- nologies have been either inexpensive or robust, but not both. Yet, in order to achieve real-world acceptance, both qualities are fundamentally essential. We present a unique approach to achieve inexpensive and robust autonomous and semi-autonomous assistive navigation for existing fielded wheelchairs, of which there are approximately 5 million units in Canada and United States alone. Our prototype wheelchair platform is capable of localization and mapping, as well as robust obstacle avoidance, using only a commodity RGB-D sensor and wheel odometry. As a specific example of the navigation capabilities, we focus on the single most common navigation problem: the traversal of narrow doorways in arbitrary environments. The software we have developed is generalizable to corridor following, desk docking, and other navigation tasks that are either extremely difficult or impossible for people with upper-body mobility impairments.},
  address = {London, United Kingdom},
  author = {Maya Burhanpurkar and Mathieu Labb\'{e} and Charlie Guan and Fran\c{c}ois Michaud and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Rehabilitation Robotics {(ICORR)}},
  date = {2017-07-17/2017-07-20},
  doi = {10.1109/ICORR.2017.8009393},
  month = {Jul. 17--20},
  pages = {1079--1086},
  title = {Cheap or Robust? {The} Practical Realization of Self-Driving Wheelchair Technology},
  url = {https://arxiv.org/abs/1707.05301},
  year = {2017}
}

To date, self-driving experimental wheelchair tech- nologies have been either inexpensive or robust, but not both. Yet, in order to achieve real-world acceptance, both qualities are fundamentally essential. We present a unique approach to achieve inexpensive and robust autonomous and semi-autonomous assistive navigation for existing fielded wheelchairs, of which there are approximately 5 million units in Canada and United States alone. Our prototype wheelchair platform is capable of localization and mapping, as well as robust obstacle avoidance, using only a commodity RGB-D sensor and wheel odometry. As a specific example of the navigation capabilities, we focus on the single most common navigation problem: the traversal of narrow doorways in arbitrary environments. The software we have developed is generalizable to corridor following, desk docking, and other navigation tasks that are either extremely difficult or impossible for people with upper-body mobility impairments.

Improving the Accuracy of Stereo Visual Odometry Using Visual Illumination Estimation

L. Clement, V. Peretroukhin, and J. Kelly

in 2016 International Symposium on Experimental Robotics , D. Kuli’c, Y. Nakamura, O. Khatib, and G. Venture, Eds., Cham: Springer International Publishing AG, 2017, vol. 1, pp. 409-419.

DOI | Bibtex | Abstract | arXiv

@incollection{2017_Clement_Improving,
  abstract = {In the absence of reliable and accurate GPS, visual odometry (VO) has emerged as an effective means of estimating the egomotion of robotic vehicles. Like any dead-reckoning technique, VO suffers from unbounded accumulation of drift error over time, but this accumulation can be limited by incorporating absolute orientation information from, for example, a sun sensor. In this paper, we leverage recent work on visual outdoor illumination estimation to show that estimation error in a stereo VO pipeline can be reduced by inferring the sun position from the same image stream used to compute VO, thereby gaining the benefits of sun sensing without requiring a dedicated sun sensor or the sun to be visible to the camera. We compare sun estimation methods based on hand-crafted visual cues and Convolutional Neural Networks (CNNs) and demonstrate our approach on a combined 7.8 km of urban driving from the popular KITTI dataset, achieving up to a 43\% reduction in translational average root mean squared error (ARMSE) and a 59\% reduction in final translational drift error compared to pure VO alone.},
  address = {Cham},
  author = {Lee Clement and Valentin Peretroukhin and Jonathan Kelly},
  booktitle = {2016 International Symposium on Experimental Robotics},
  doi = {https://doi.org/10.1007/978-3-319-50115-4_36},
  editor = {Dana Kuli\'{c} and Yoshihiko Nakamura and Oussama Khatib and Gentiane Venture},
  isbn = {978-3-319-50114-7},
  pages = {409--419},
  publisher = {Springer International Publishing AG},
  series = {Springer Proceedings in Advanced Robotics},
  title = {Improving the Accuracy of Stereo Visual Odometry Using Visual Illumination Estimation},
  url = {https://arxiv.org/abs/1609.04705},
  volume = {1},
  year = {2017}
}

In the absence of reliable and accurate GPS, visual odometry (VO) has emerged as an effective means of estimating the egomotion of robotic vehicles. Like any dead-reckoning technique, VO suffers from unbounded accumulation of drift error over time, but this accumulation can be limited by incorporating absolute orientation information from, for example, a sun sensor. In this paper, we leverage recent work on visual outdoor illumination estimation to show that estimation error in a stereo VO pipeline can be reduced by inferring the sun position from the same image stream used to compute VO, thereby gaining the benefits of sun sensing without requiring a dedicated sun sensor or the sun to be visible to the camera. We compare sun estimation methods based on hand-crafted visual cues and Convolutional Neural Networks (CNNs) and demonstrate our approach on a combined 7.8 km of urban driving from the popular KITTI dataset, achieving up to a 43\% reduction in translational average root mean squared error (ARMSE) and a 59\% reduction in final translational drift error compared to pure VO alone.

Robust Monocular Visual Teach and Repeat Aided by Local Ground Planarity and Colour-Constant Imagery

L. Clement, J. Kelly, and T. D. Barfoot

Journal of Field Robotics, vol. 34, iss. 1, pp. 74-97, 2017.

DOI | Bibtex | Abstract

@article{2017_Clement_Robust,
  abstract = {Visual Teach and Repeat (VT&R) allows an autonomous vehicle to accurately repeat a previously traversed route using only vision sensors. Most VT&R systems rely on natively 3D sensors such as stereo cameras for mapping and localization, but many existing mobile robots are equipped with only 2D monocular vision, typically for teleoperation. In this paper, we extend VT&R to the most basic sensor configuration -- a single monocular camera. We show that kilometer-scale route repetition can be achieved with centimeter-level accuracy by approximating the local ground surface near the vehicle as a plane with some uncertainty. This allows our system to recover absolute scale from the known position and orientation of the camera relative to the vehicle, which simplifies threshold-based outlier rejection and the estimation and control of lateral path-tracking error --- essential components of high-accuracy route repetition. We enhance the robustness of our monocular VT&R system to common failure cases through the use of color-constant imagery, which provides it with a degree of resistance to lighting changes and moving shadows where keypoint matching on standard grey images tends to struggle. Through extensive testing on a combined 30km of autonomous navigation data collected on multiple vehicles in a variety of highly non-planar terrestrial and planetary-analogue environments, we demonstrate that our system is capable of achieving route-repetition accuracy on par with its stereo counterpart, with only a modest trade-off in robustness.},
  author = {Lee Clement and Jonathan Kelly and Timothy D. Barfoot},
  doi = {10.1002/rob.21655},
  journal = {Journal of Field Robotics},
  month = {January},
  number = {1},
  pages = {74--97},
  title = {Robust Monocular Visual Teach and Repeat Aided by Local Ground Planarity and Colour-Constant Imagery},
  volume = {34},
  year = {2017}
}

Visual Teach and Repeat (VT&R) allows an autonomous vehicle to accurately repeat a previously traversed route using only vision sensors. Most VT&R systems rely on natively 3D sensors such as stereo cameras for mapping and localization, but many existing mobile robots are equipped with only 2D monocular vision, typically for teleoperation. In this paper, we extend VT&R to the most basic sensor configuration -- a single monocular camera. We show that kilometer-scale route repetition can be achieved with centimeter-level accuracy by approximating the local ground surface near the vehicle as a plane with some uncertainty. This allows our system to recover absolute scale from the known position and orientation of the camera relative to the vehicle, which simplifies threshold-based outlier rejection and the estimation and control of lateral path-tracking error --- essential components of high-accuracy route repetition. We enhance the robustness of our monocular VT&R system to common failure cases through the use of color-constant imagery, which provides it with a degree of resistance to lighting changes and moving shadows where keypoint matching on standard grey images tends to struggle. Through extensive testing on a combined 30km of autonomous navigation data collected on multiple vehicles in a variety of highly non-planar terrestrial and planetary-analogue environments, we demonstrate that our system is capable of achieving route-repetition accuracy on par with its stereo counterpart, with only a modest trade-off in robustness.

Automatic and Featureless Sim(3) Calibration of Planar Lidars to Egomotion Sensors

J. Lambert

Master Thesis , University of Toronto, Toronto, Ontario, Canada, 2017.

Bibtex | Abstract

@mastersthesis{2017_Lambert_Automatic,
  abstract = {Autonomous and mobile robots often rely on the fusion of information from different sensors to accomplish important tasks. The prerequisite for successful data fusion is an accurate estimate of the coordinate transformation between the sensors. This thesis aims at generalizing the process of extrinsically calibrating two rigidly attached sensors on a mobile robot. An entropy-based, point cloud reconstruction technique is developed to calibrate a planar lidar to a sensor capable of providing egomotion information. Recent work in this area is revisited and its theory extended to the problem of recovering the Sim(3) transformation between a planar lidar and a monocular camera, where the scale of the camera trajectory is not known a priori. An efficient algorithm with only a single tuning parameter is implemented and studied. An experimental analysis of the algorithm demonstrates this parameter provides a trade-off between computational efficiency and cost function accuracy. The robustness of the approach is tested on realistic simula- tions in multiple environments, as well as on data collected from a hand-held sensor rig. Results show that, given a non-degenerate trajectory and a sufficient number of lidar measurements, the calibration procedure achieves millimetre-scale and sub-degree accuracy. Moreover, the method relaxes the need for specific scene geometry, fiducial markers, and overlapping sensor fields of view, which had previously limited similar techniques.},
  address = {Toronto, Ontario, Canada},
  author = {Jacob Lambert},
  month = {January},
  school = {University of Toronto},
  title = {Automatic and Featureless Sim(3) Calibration of Planar Lidars to Egomotion Sensors},
  year = {2017}
}

Autonomous and mobile robots often rely on the fusion of information from different sensors to accomplish important tasks. The prerequisite for successful data fusion is an accurate estimate of the coordinate transformation between the sensors. This thesis aims at generalizing the process of extrinsically calibrating two rigidly attached sensors on a mobile robot. An entropy-based, point cloud reconstruction technique is developed to calibrate a planar lidar to a sensor capable of providing egomotion information. Recent work in this area is revisited and its theory extended to the problem of recovering the Sim(3) transformation between a planar lidar and a monocular camera, where the scale of the camera trajectory is not known a priori. An efficient algorithm with only a single tuning parameter is implemented and studied. An experimental analysis of the algorithm demonstrates this parameter provides a trade-off between computational efficiency and cost function accuracy. The robustness of the approach is tested on realistic simula- tions in multiple environments, as well as on data collected from a hand-held sensor rig. Results show that, given a non-degenerate trajectory and a sufficient number of lidar measurements, the calibration procedure achieves millimetre-scale and sub-degree accuracy. Moreover, the method relaxes the need for specific scene geometry, fiducial markers, and overlapping sensor fields of view, which had previously limited similar techniques.

From Global to Local: Maintaining Accurate Mobile Manipulator State Estimates Over Long Trajectories

F. Mari’c

Master Thesis , University of Zagreb, Zagreb, Croatia, 2017.

Bibtex | Abstract | PDF

@mastersthesis{2017_Maric_Maintaining,
  abstract = {In this thesis the problem of performing a long trajectory while maintaining an accurate state estimate is explored in the case of a mobile manipulator. The mobile manipulator used consists of a 6 degree-of-freedom manipulator and a omni-directional platform. State estimation is performed using a probabilistic framework, fusing multiple velocity and position estimates. Two approaches are explored for motion planning, the classical task priority approach and the more contemporary sequential convex optimization. Software implementation details are presented and tests are performed on both the simulation and real robot. The results show satisfactory trajectory following performance using local state estimates and motion planning.},
  address = {Zagreb, Croatia},
  author = {Filip Mari\'{c}},
  month = {September},
  school = {University of Zagreb},
  title = {From Global to Local: Maintaining Accurate Mobile Manipulator State Estimates Over Long Trajectories},
  year = {2017}
}

In this thesis the problem of performing a long trajectory while maintaining an accurate state estimate is explored in the case of a mobile manipulator. The mobile manipulator used consists of a 6 degree-of-freedom manipulator and a omni-directional platform. State estimation is performed using a probabilistic framework, fusing multiple velocity and position estimates. Two approaches are explored for motion planning, the classical task priority approach and the more contemporary sequential convex optimization. Software implementation details are presented and tests are performed on both the simulation and real robot. The results show satisfactory trajectory following performance using local state estimates and motion planning.

Reducing Drift in Visual Odometry by Inferring Sun Direction Using a Bayesian Convolutional Neural Network

V. Peretroukhin, L. Clement, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 29–Jun. 3, 2017, pp. 2035-2042.

@inproceedings{2017_Peretroukhin_Reducing,
  abstract = {We present a method to incorporate global orientation information from the sun into a visual odometry pipeline using the existing image stream only. We leverage recent advances in Bayesian Convolutional Neural Networks to train and implement a sun detection model that infers a three-dimensional sun direction vector from a single RGB image (where the sun is typically not visible). Crucially, our method also computes a principled uncertainty associated with each prediction, using a Monte-Carlo dropout scheme. We incorporate this uncertainty into a sliding window stereo visual odometry pipeline where accurate uncertainty estimates are critical for optimal data fusion. Our Bayesian sun detection model achieves median errors of less than 10 degrees on the KITTI odometry benchmark training set, and yields improvements of up to 37\% in translational ARMSE and 32\% in rotational ARMSE compared to standard VO. An implementation of our Bayesian CNN sun estimator (Sun-BCNN) is available as open-source code at https://github.com/utiasSTARS/sun-bcnn-vo.},
  address = {Singapore},
  author = {Valentin Peretroukhin and Lee Clement and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA})},
  code = {https://github. com/utiasSTARS/sun-bcnn-vo},
  date = {2017-05-29/2017-06-03},
  doi = {10.1109/ICRA.2017.7989235},
  month = {May 29--Jun. 3},
  pages = {2035--2042},
  title = {Reducing Drift in Visual Odometry by Inferring Sun Direction Using a Bayesian Convolutional Neural Network},
  url = {https://arxiv.org/abs/1609.05993},
  video1 = {https://www.youtube.com/watch?v=c5XTrq3a2tE},
  year = {2017}
}

We present a method to incorporate global orientation information from the sun into a visual odometry pipeline using the existing image stream only. We leverage recent advances in Bayesian Convolutional Neural Networks to train and implement a sun detection model that infers a three-dimensional sun direction vector from a single RGB image (where the sun is typically not visible). Crucially, our method also computes a principled uncertainty associated with each prediction, using a Monte-Carlo dropout scheme. We incorporate this uncertainty into a sliding window stereo visual odometry pipeline where accurate uncertainty estimates are critical for optimal data fusion. Our Bayesian sun detection model achieves median errors of less than 10 degrees on the KITTI odometry benchmark training set, and yields improvements of up to 37\% in translational ARMSE and 32\% in rotational ARMSE compared to standard VO. An implementation of our Bayesian CNN sun estimator (Sun-BCNN) is available as open-source code at https://github.com/utiasSTARS/sun-bcnn-vo.

Editorial: Special Issue on Field and Service Robotics

F. Pomerleau and J. Kelly

Journal of Field Robotics, vol. 34, iss. 1, pp. 3-4, 2017.

DOI | Bibtex | PDF

@article{2017_Pomerleau_FSR,
  author = {Fran\c{c}ois Pomerleau and Jonathan Kelly},
  doi = {10.1002/rob.21703},
  journal = {Journal of Field Robotics},
  month = {January},
  number = {1},
  pages = {3--4},
  title = {Editorial: Special Issue on Field and Service Robotics},
  volume = {34},
  year = {2017}
}

Increasing Persistent Navigation Capabilities for Underwater Vehicles with Augmented Terrain-Based Navigation

G. M. Reis, M. Fitzpatrick, J. Anderson, J. Kelly, L. Bobadilla, and R. N. Smith

Proceedings of the MTS/IEEE Oceans Conference (OCEANS), Aberdeen, United Kingdom, Jun. 19–22, 2017.

DOI | Bibtex | Abstract | PDF

@inproceedings{2017_Reis_Increasing,
  abstract = {Accurate and energy-efficient navigation and localization methods for autonomous underwater vehicles continues to be an active area of research. As interesting as they are important, ocean processes are spatiotemporally dynamic and their study requires vehicles that can maneuver and sample intelligently while underwater for extended durations. In this paper, we present a new technique for augmenting terrain-based navigation with physical water data to enhance the utility of traditional methods for navigation and localization. We examine the construct of this augmentation method over a range of deployment regions, e.g., ocean and freshwater lake. Data from field trials are presented and analyzed for multiple deployments of an autonomous underwater vehicle.},
  address = {Aberdeen, United Kingdom},
  author = {Gregory Murad Reis and Michael Fitzpatrick and Jacob Anderson and Jonathan Kelly and Leonardo Bobadilla and Ryan N. Smith},
  booktitle = {Proceedings of the {MTS/IEEE} Oceans Conference {(OCEANS)}},
  date = {2017-06-19/2017-06-22},
  doi = {10.1109/OCEANSE.2017.8084815},
  month = {Jun. 19--22},
  note = {Best Student Paper Finalist},
  title = {Increasing Persistent Navigation Capabilities for Underwater Vehicles with Augmented Terrain-Based Navigation},
  year = {2017}
}

Accurate and energy-efficient navigation and localization methods for autonomous underwater vehicles continues to be an active area of research. As interesting as they are important, ocean processes are spatiotemporally dynamic and their study requires vehicles that can maneuver and sample intelligently while underwater for extended durations. In this paper, we present a new technique for augmenting terrain-based navigation with physical water data to enhance the utility of traditional methods for navigation and localization. We examine the construct of this augmentation method over a range of deployment regions, e.g., ocean and freshwater lake. Data from field trials are presented and analyzed for multiple deployments of an autonomous underwater vehicle.

Best Student Paper Finalist

Improving Foot-Mounted Inertial Navigation Through Real-Time Motion Classification

B. Wagstaff, V. Peretroukhin, and J. Kelly

Proceedings of the International Conference on Indoor Positioning and Indoor Navigation (IPIN), Sapporo, Japan, Sep. 18–21, 2017.

DOI | Bibtex | Abstract | arXiv | Video

@inproceedings{2017_Wagstaff_Improving,
  abstract = {We present a method to improve the accuracy of a foot-mounted, zero-velocity-aided inertial navigation system (INS) by varying estimator parameters based on a real-time classification of motion type. We train a support vector machine (SVM) classifier using inertial data recorded by a single foot-mounted sensor to differentiate between six motion types (walking, jogging, running, sprinting, crouch-walking, and ladder-climbing) and report mean test classification accuracy of over 90\% on a dataset with five different subjects.

From these motion types, we select two of the most common (walking and running), and describe a method to compute optimal zero-velocity detection parameters tailored to both a specific user and motion type by maximizing the detector F-score. By combining the motion classifier with a set of optimal detection parameters, we show how we can reduce INS position error during mixed walking and running motion. We evaluate our adaptive system on a total of 5.9 km of indoor pedestrian navigation performed by five different subjects moving along a 130 m path with surveyed ground truth markers.},
  address = {Sapporo, Japan},
  author = {Brandon Wagstaff and Valentin Peretroukhin and Jonathan Kelly},
  booktitle = {Proceedings of the International Conference on Indoor Positioning and Indoor Navigation {(IPIN)}},
  date = {2017-09-18/2017-09-21},
  doi = {10.1109/IPIN.2017.8115947},
  month = {Sep. 18--21},
  title = {Improving Foot-Mounted Inertial Navigation Through Real-Time Motion Classification},
  url = {http://arxiv.org/abs/1707.01152},
  video1 = {https://www.youtube.com/watch?v=Jiqj6j9E8dI},
  year = {2017}
}

We present a method to improve the accuracy of a foot-mounted, zero-velocity-aided inertial navigation system (INS) by varying estimator parameters based on a real-time classification of motion type. We train a support vector machine (SVM) classifier using inertial data recorded by a single foot-mounted sensor to differentiate between six motion types (walking, jogging, running, sprinting, crouch-walking, and ladder-climbing) and report mean test classification accuracy of over 90\% on a dataset with five different subjects. From these motion types, we select two of the most common (walking and running), and describe a method to compute optimal zero-velocity detection parameters tailored to both a specific user and motion type by maximizing the detector F-score. By combining the motion classifier with a set of optimal detection parameters, we show how we can reduce INS position error during mixed walking and running motion. We evaluate our adaptive system on a total of 5.9 km of indoor pedestrian navigation performed by five different subjects moving along a 130 m path with surveyed ground truth markers.

2016

Monocular Visual Teach and Repeat Aided by Local Ground Planarity

L. Clement, J. Kelly, and T. D. Barfoot

in Field and Service Robotics: Results of the 10th International Conference , D. S. Wettergreen and T. D. Barfoot, Eds., Cham: Springer International Publishing AG, 2016, vol. 113, pp. 547-561.

DOI | Bibtex | Abstract | arXiv | Video

@incollection{2016_Clement_Monocular,
  abstract = {Visual Teach and Repeat (VT&R) allows an autonomous vehicle to repeat a previously traversed route without a global positioning system. Existing implementations of VT&R typically rely on 3D sensors such as stereo cameras for mapping and localization, but many mobile robots are equipped with only 2D monocular vision for tasks such as teleoperated bomb disposal. While simultaneous localization and mapping (SLAM) algorithms exist that can recover 3D structure and motion from monocular images, the scale ambiguity inherent in these methods complicates the estimation and control of lateral path-tracking error, which is essential for achieving high-accuracy path following. In this paper, we propose a monocular vision pipeline that enables kilometre-scale route repetition with centimetre-level accuracy by approximating the ground surface near the vehicle as planar (with some uncertainty) and recovering absolute scale from the known position and orientation of the camera relative to the vehicle. This system provides added value to many existing robots by allowing for high-accuracy autonomous route repetition with a simple software upgrade and no additional sensors. We validate our system over 4.3 km of autonomous navigation and demonstrate accuracy on par with the conventional stereo pipeline, even in highly non-planar terrain.},
  address = {Cham},
  author = {Lee Clement and Jonathan Kelly and Timothy D. Barfoot},
  booktitle = {Field and Service Robotics: Results of the 10th International Conference},
  doi = {10.1007/978-3-319-27702-8_36},
  editor = {David S. Wettergreen and Timothy D. Barfoot},
  isbn = {978-3-319-27700-4},
  pages = {547--561},
  publisher = {Springer International Publishing AG},
  series = {Springer Tracts in Advanced Robotics},
  title = {Monocular Visual Teach and Repeat Aided by Local Ground Planarity},
  url = {https://arxiv.org/abs/1707.08989},
  video1 = {https://www.youtube.com/watch?v=FU6KeWgwrZ4},
  volume = {113},
  year = {2016}
}

Visual Teach and Repeat (VT&R) allows an autonomous vehicle to repeat a previously traversed route without a global positioning system. Existing implementations of VT&R typically rely on 3D sensors such as stereo cameras for mapping and localization, but many mobile robots are equipped with only 2D monocular vision for tasks such as teleoperated bomb disposal. While simultaneous localization and mapping (SLAM) algorithms exist that can recover 3D structure and motion from monocular images, the scale ambiguity inherent in these methods complicates the estimation and control of lateral path-tracking error, which is essential for achieving high-accuracy path following. In this paper, we propose a monocular vision pipeline that enables kilometre-scale route repetition with centimetre-level accuracy by approximating the ground surface near the vehicle as planar (with some uncertainty) and recovering absolute scale from the known position and orientation of the camera relative to the vehicle. This system provides added value to many existing robots by allowing for high-accuracy autonomous route repetition with a simple software upgrade and no additional sensors. We validate our system over 4.3 km of autonomous navigation and demonstrate accuracy on par with the conventional stereo pipeline, even in highly non-planar terrain.

Entropy-Based Sim(3) Calibration of 2D Lidars to Egomotion Sensors

J. Lambert, L. Clement, M. Giamou, and J. Kelly

Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Baden-Baden, Germany, Sep. 19–21, 2016, pp. 455-461.

DOI | Bibtex | Abstract | arXiv

@inproceedings{2016_Lambert_Entropy-Based,
  abstract = {This paper explores the use of an entropy-based technique for point cloud reconstruction with the goal of calibrating a lidar to a sensor capable of providing egomotion information. We extend recent work in this area to the problem of recovering the Sim(3) transformation between a 2D lidar and a rigidly attached monocular camera, where the scale of the camera trajectory is not known a priori. We demonstrate the robustness of our approach on realistic simulations in multiple environments, as well as on data collected from a hand-held sensor rig. Given a non-degenerate trajectory and a sufficient number of lidar measurements, our calibration procedure achieves millimetre-scale and sub-degree accuracy. Moreover, our method relaxes the need for specific scene geometry, fiducial markers, or overlapping sensor fields of view, which had previously limited similar techniques.},
  address = {Baden-Baden, Germany},
  author = {Jacob Lambert and Lee Clement and Matthew Giamou and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Multisensor Fusion and Integration for Intelligent Systems {(MFI)}},
  date = {2016-09-19/2016-09-21},
  doi = {10.1109/MFI.2016.7849530},
  month = {Sep. 19--21},
  note = {Best Student Paper Award},
  pages = {455--461},
  title = {Entropy-Based Sim(3) Calibration of {2D} Lidars to Egomotion Sensors},
  url = {https://arxiv.org/abs/1707.08680},
  year = {2016}
}

This paper explores the use of an entropy-based technique for point cloud reconstruction with the goal of calibrating a lidar to a sensor capable of providing egomotion information. We extend recent work in this area to the problem of recovering the Sim(3) transformation between a 2D lidar and a rigidly attached monocular camera, where the scale of the camera trajectory is not known a priori. We demonstrate the robustness of our approach on realistic simulations in multiple environments, as well as on data collected from a hand-held sensor rig. Given a non-degenerate trajectory and a sufficient number of lidar measurements, our calibration procedure achieves millimetre-scale and sub-degree accuracy. Moreover, our method relaxes the need for specific scene geometry, fiducial markers, or overlapping sensor fields of view, which had previously limited similar techniques.

Best Student Paper Award

PROBE-GK: Predictive Robust Estimation using Generalized Kernels

V. Peretroukhin, W. Vega-Brown, N. Roy, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, May 16–21, 2016, pp. 817-824.

DOI | Bibtex | Abstract | PDF | arXiv

@inproceedings{2016_Peretroukhin_PROBE-GK,
  abstract = {Many algorithms in computer vision and robotics make strong assumptions about uncertainty, and rely on the validity of these assumptions to produce accurate and consistent state estimates. In practice, dynamic environments may degrade sensor performance in predictable ways that cannot be captured with static uncertainty parameters. In this paper, we employ fast nonparametric Bayesian inference techniques to more accurately model sensor uncertainty. By setting a prior on observation uncertainty, we derive a predictive robust estimator, and show how our model can be learned from sample images, both with and without knowledge of the motion used to generate the data. We validate our approach through Monte Carlo simulations, and report significant improvements in localization accuracy relative to a fixed noise model in several settings, including on synthetic data, the KITTI dataset, and our own experimental platform.},
  address = {Stockholm, Sweden},
  author = {Valentin Peretroukhin and William Vega-Brown and Nicholas Roy and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA})},
  date = {2016-05-16/2016-05-21},
  doi = {10.1109/ICRA.2016.7487212},
  month = {May 16--21},
  pages = {817--824},
  title = {{PROBE-GK}: Predictive Robust Estimation using Generalized Kernels},
  url = {https://arxiv.org/abs/1708.00171},
  year = {2016}
}

Many algorithms in computer vision and robotics make strong assumptions about uncertainty, and rely on the validity of these assumptions to produce accurate and consistent state estimates. In practice, dynamic environments may degrade sensor performance in predictable ways that cannot be captured with static uncertainty parameters. In this paper, we employ fast nonparametric Bayesian inference techniques to more accurately model sensor uncertainty. By setting a prior on observation uncertainty, we derive a predictive robust estimator, and show how our model can be learned from sample images, both with and without knowledge of the motion used to generate the data. We validate our approach through Monte Carlo simulations, and report significant improvements in localization accuracy relative to a fixed noise model in several settings, including on synthetic data, the KITTI dataset, and our own experimental platform.

Enabling Persistent Autonomy for Underwater Gliders with Ocean Model Predictions and Terrain Based Navigation

A. Stuntz, J. Kelly, and R. N. Smith

Frontiers in Robotics and AI, vol. 3, iss. 23, 2016.

DOI | Bibtex | Abstract | PDF

@article{2016_Stuntz_Enabling,
  abstract = {Effective study of ocean processes requires sampling over the duration of long (weeks to months) oscillation patterns. Such sampling requires persistent, autonomous underwater vehicles that have a similarly, long deployment duration. The spatiotemporal dynamics of the ocean environment, coupled with limited communication capabilities, make navigation and localization difficult, especially in coastal regions where the majority of interesting phenomena occur. In this paper, we consider the combination of two methods for reducing navigation and localization error: a predictive approach based on ocean model predictions and a prior information approach derived from terrain-based navigation. The motivation for this work is not only for real-time state estimation but also for accurately reconstructing the actual path that the vehicle traversed to contextualize the gathered data, with respect to the science question at hand. We present an application for the practical use of priors and predictions for large-scale ocean sampling. This combined approach builds upon previous works by the authors and accurately localizes the traversed path of an underwater glider over long-duration, ocean deployments. The proposed method takes advantage of the reliable, short-term predictions of an ocean model, and the utility of priors used in terrain-based navigation over areas of significant bathymetric relief to bound uncertainty error in dead-reckoning navigation. This method improves upon our previously published works by (1) demonstrating the utility of our terrain-based navigation method with multiple field trials and (2) presenting a hybrid algorithm that combines both approaches to bound navigational error and uncertainty for long-term deployments of underwater vehicles. We demonstrate the approach by examining data from actual field trials with autonomous underwater gliders and demonstrate an ability to estimate geographical location of an underwater glider to < 100 m over paths of length > 2 km. Utilizing the combined algorithm, we are able to prescribe an uncertainty bound for navigation and instruct the glider to surface if that bound is exceeded during a given mission.},
  author = {Andrew Stuntz and Jonathan Kelly and Ryan N. Smith},
  doi = {10.3389/frobt.2016.00023},
  journal = {Frontiers in Robotics and AI},
  month = {April},
  number = {23},
  title = {Enabling Persistent Autonomy for Underwater Gliders with Ocean Model Predictions and Terrain Based Navigation},
  volume = {3},
  year = {2016}
}

Effective study of ocean processes requires sampling over the duration of long (weeks to months) oscillation patterns. Such sampling requires persistent, autonomous underwater vehicles that have a similarly, long deployment duration. The spatiotemporal dynamics of the ocean environment, coupled with limited communication capabilities, make navigation and localization difficult, especially in coastal regions where the majority of interesting phenomena occur. In this paper, we consider the combination of two methods for reducing navigation and localization error: a predictive approach based on ocean model predictions and a prior information approach derived from terrain-based navigation. The motivation for this work is not only for real-time state estimation but also for accurately reconstructing the actual path that the vehicle traversed to contextualize the gathered data, with respect to the science question at hand. We present an application for the practical use of priors and predictions for large-scale ocean sampling. This combined approach builds upon previous works by the authors and accurately localizes the traversed path of an underwater glider over long-duration, ocean deployments. The proposed method takes advantage of the reliable, short-term predictions of an ocean model, and the utility of priors used in terrain-based navigation over areas of significant bathymetric relief to bound uncertainty error in dead-reckoning navigation. This method improves upon our previously published works by (1) demonstrating the utility of our terrain-based navigation method with multiple field trials and (2) presenting a hybrid algorithm that combines both approaches to bound navigational error and uncertainty for long-term deployments of underwater vehicles. We demonstrate the approach by examining data from actual field trials with autonomous underwater gliders and demonstrate an ability to estimate geographical location of an underwater glider to < 100 m over paths of length > 2 km. Utilizing the combined algorithm, we are able to prescribe an uncertainty bound for navigation and instruct the glider to surface if that bound is exceeded during a given mission.

2015

The Battle for Filter Supremacy: A Comparative Study of the Multi-State Constraint Kalman Filter and the Sliding Window Filter

L. Clement, V. Peretroukhin, J. Lambert, and J. Kelly

Proceedings of the 12th Conference on Computer and Robot Vision (CRV), Halifax, Nova Scotia, Canada, Jun. 3–5, 2015, pp. 23-30.

DOI | Bibtex | Abstract | PDF | Code

@inproceedings{2015_Clement_Battle,
  abstract = {Accurate and consistent ego motion estimation is a critical component of autonomous navigation. For this task, the combination of visual and inertial sensors is an inexpensive, compact, and complementary hardware suite that can be used on many types of vehicles. In this work, we compare two modern approaches to ego motion estimation: the Multi-State Constraint Kalman Filter (MSCKF) and the Sliding Window Filter (SWF). Both filters use an Inertial Measurement Unit (IMU) to estimate the motion of a vehicle and then correct this estimate with observations of salient features from a monocular camera. While the SWF estimates feature positions as part of the filter state itself, the MSCKF optimizes feature positions in a separate procedure without including them in the filter state. We present experimental characterizations and comparisons of the MSCKF and SWF on data from a moving hand-held sensor rig, as well as several traverses from the KITTI dataset. In particular, we compare the accuracy and consistency of the two filters, and analyze the effect of feature track length and feature density on the performance of each filter. In general, our results show the SWF to be more accurate and less sensitive to tuning parameters than the MSCKF. However, the MSCKF is computationally cheaper, has good consistency properties, and improves in accuracy as more features are tracked.},
  address = {Halifax, Nova Scotia, Canada},
  author = {Lee Clement and Valentin Peretroukhin and Jacob Lambert and Jonathan Kelly},
  booktitle = {Proceedings of the 12th Conference on Computer and Robot Vision {(CRV)}},
  code = {https://github.com/utiasSTARS/msckf-swf-comparison},
  date = {2015-06-03/2015-06-05},
  doi = {10.1109/CRV.2015.11},
  month = {Jun. 3--5},
  pages = {23--30},
  title = {The Battle for Filter Supremacy: A Comparative Study of the Multi-State Constraint Kalman Filter and the Sliding Window Filter},
  year = {2015}
}

Accurate and consistent ego motion estimation is a critical component of autonomous navigation. For this task, the combination of visual and inertial sensors is an inexpensive, compact, and complementary hardware suite that can be used on many types of vehicles. In this work, we compare two modern approaches to ego motion estimation: the Multi-State Constraint Kalman Filter (MSCKF) and the Sliding Window Filter (SWF). Both filters use an Inertial Measurement Unit (IMU) to estimate the motion of a vehicle and then correct this estimate with observations of salient features from a monocular camera. While the SWF estimates feature positions as part of the filter state itself, the MSCKF optimizes feature positions in a separate procedure without including them in the filter state. We present experimental characterizations and comparisons of the MSCKF and SWF on data from a moving hand-held sensor rig, as well as several traverses from the KITTI dataset. In particular, we compare the accuracy and consistency of the two filters, and analyze the effect of feature track length and feature density on the performance of each filter. In general, our results show the SWF to be more accurate and less sensitive to tuning parameters than the MSCKF. However, the MSCKF is computationally cheaper, has good consistency properties, and improves in accuracy as more features are tracked.

Get to the Point: Active Covariance Scaling for Feature Tracking Through Motion Blur

V. Peretroukhin, L. Clement, and J. Kelly

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) Workshop on Scaling Up Active Perception, Seattle, Washington, USA, May 30, 2015.

Bibtex | PDF

@inproceedings{2015_Peretroukhin_Get,
  address = {Seattle, Washington, USA},
  author = {Valentin Peretroukhin and Lee Clement and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA)} Workshop on Scaling Up Active Perception},
  date = {2015-05-30},
  month = {May 30},
  title = {Get to the Point: Active Covariance Scaling for Feature Tracking Through Motion Blur},
  year = {2015}
}

PROBE: Predictive Robust Estimation for Visual-Inertial Navigation

V. Peretroukhin, L. Clement, M. Giamou, and J. Kelly

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’15), Hamburg, Germany, Sep. 28–Oct. 2, 2015, pp. 3668-3675.

@inproceedings{2015_Peretroukhin_PROBE,
  abstract = {Navigation in unknown, chaotic environments continues to present a significant challenge for the robotics community. Lighting changes, self-similar textures, motion blur, and moving objects are all considerable stumbling blocks for state-of-the-art vision-based navigation algorithms. In this paper we present a novel technique for improving localization accuracy within a visual-inertial navigation system (VINS). We make use of training data to learn a model for the quality of visual features with respect to localization error in a given environment. This model maps each visual observation from a predefined prediction space of visual-inertial predictors onto a scalar weight, which is then used to scale the observation covariance matrix. In this way, our model can adjust the influence of each observation according to its quality. We discuss our choice of predictors and report substantial reductions in localization error on 4 km of data from the KITTI dataset, as well as on experimental datasets consisting of 700 m of indoor and outdoor driving on a small ground rover equipped with a Skybotix VI-Sensor.},
  address = {Hamburg, Germany},
  author = {Valentin Peretroukhin and Lee Clement and Matthew Giamou and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS'15)}},
  date = {2015-09-28/2015-10-02},
  doi = {10.1109/IROS.2015.7353890},
  month = {Sep. 28--Oct. 2},
  pages = {3668--3675},
  title = {{PROBE}: Predictive Robust Estimation for Visual-Inertial Navigation},
  url = {https://arxiv.org/abs/1708.00174},
  video1 = {https://www.youtube.com/watch?v=0YmdVJ0Be3Q},
  year = {2015}
}

Navigation in unknown, chaotic environments continues to present a significant challenge for the robotics community. Lighting changes, self-similar textures, motion blur, and moving objects are all considerable stumbling blocks for state-of-the-art vision-based navigation algorithms. In this paper we present a novel technique for improving localization accuracy within a visual-inertial navigation system (VINS). We make use of training data to learn a model for the quality of visual features with respect to localization error in a given environment. This model maps each visual observation from a predefined prediction space of visual-inertial predictors onto a scalar weight, which is then used to scale the observation covariance matrix. In this way, our model can adjust the influence of each observation according to its quality. We discuss our choice of predictors and report substantial reductions in localization error on 4 km of data from the KITTI dataset, as well as on experimental datasets consisting of 700 m of indoor and outdoor driving on a small ground rover equipped with a Skybotix VI-Sensor.

Vision-based Collision Avoidance for Personal Aerial Vehicles using Dynamic Potential Fields

F. Rehmatullah and J. Kelly

Proceedings of the 12th Conference on Computer and Robot Vision (CRV), Halifax, Nova Scotia, Canada, Jun. 3–5, 2015, pp. 297-304.

DOI | Bibtex | Abstract | PDF | Video

@inproceedings{2015_Rehmatullah_Vision,
  abstract = {In this paper we present a prototype system that aids the operator of a Personal Air Vehicle (PAV) by actively monitoring vehicle surroundings and providing autonomous control inputs for obstacle avoidance. The prototype is developed for a Personal Air Transportation System (PATS) that will enable human operators with low level of technical knowledge to use aerial vehicles for a day-to-day commute. While most collision avoidance systems used on human controlled vehicles override operator input, our proposed system allows the operator to be in control of the vehicle at all times. Our approach uses a dynamic potential field to generate pseudo repulsive forces that, when converted into control inputs, force the vehicle on a trajectory around the obstacle. By allowing the vehicle control input to be the sum of operator controls and collision avoidance controls, the system ensures that the operator is in control of the vehicle at all times. We first present a dynamic repulsive potential function and then provide a generic control architecture required to implement the collision avoidance system on a mobile platform. Further, extensive computer simulations of the proposed algorithm are performed on a quad copter model, followed by hardware experiments on a stereo vision sensor. The proposed collision avoidance system is computationally inexpensive and can be used with any sensor that can produce a point cloud for obstacle detection.},
  address = {Halifax, Nova Scotia, Canada},
  author = {Faizan Rehmatullah and Jonathan Kelly},
  booktitle = {Proceedings of the 12th Conference on Computer and Robot Vision {(CRV)}},
  date = {2015-06-03/2015-06-05},
  doi = {10.1109/CRV.2015.46},
  month = {Jun. 3--5},
  pages = {297--304},
  title = {Vision-based Collision Avoidance for Personal Aerial Vehicles using Dynamic Potential Fields},
  video1 = {https://www.youtube.com/watch?v=X0E9wxb1afE},
  year = {2015}
}

In this paper we present a prototype system that aids the operator of a Personal Air Vehicle (PAV) by actively monitoring vehicle surroundings and providing autonomous control inputs for obstacle avoidance. The prototype is developed for a Personal Air Transportation System (PATS) that will enable human operators with low level of technical knowledge to use aerial vehicles for a day-to-day commute. While most collision avoidance systems used on human controlled vehicles override operator input, our proposed system allows the operator to be in control of the vehicle at all times. Our approach uses a dynamic potential field to generate pseudo repulsive forces that, when converted into control inputs, force the vehicle on a trajectory around the obstacle. By allowing the vehicle control input to be the sum of operator controls and collision avoidance controls, the system ensures that the operator is in control of the vehicle at all times. We first present a dynamic repulsive potential function and then provide a generic control architecture required to implement the collision avoidance system on a mobile platform. Further, extensive computer simulations of the proposed algorithm are performed on a quad copter model, followed by hardware experiments on a stereo vision sensor. The proposed collision avoidance system is computationally inexpensive and can be used with any sensor that can produce a point cloud for obstacle detection.

2014

Determining the Time Delay Between Inertial and Visual Sensor Measurements

J. Kelly, N. Roy, and G. S. Sukhatme

IEEE Transactions on Robotics, vol. 30, iss. 6, pp. 1514-1523, 2014.

DOI | Bibtex | Abstract

@article{2014_Kelly_Determining,
  abstract = {We examine the problem of determining the relative time delay between IMU and camera data streams. The primary difficulty is that the correspondences between measurements from the sensors are not initially known, and hence, the time delay cannot be computed directly. We instead formulate time delay calibration as a registration problem, and introduce a calibration algorithm that operates by aligning curves in a three-dimensional orientation space. Results from simulation studies and from experiments with real hardware demonstrate that the delay can be accurately calibrated.},
  author = {Jonathan Kelly and Nicholas Roy and Gaurav S. Sukhatme},
  doi = {10.1109/TRO.2014.2343073},
  journal = {{IEEE} Transactions on Robotics},
  month = {December},
  number = {6},
  pages = {1514--1523},
  title = {Determining the Time Delay Between Inertial and Visual Sensor Measurements},
  volume = {30},
  year = {2014}
}

We examine the problem of determining the relative time delay between IMU and camera data streams. The primary difficulty is that the correspondences between measurements from the sensors are not initially known, and hence, the time delay cannot be computed directly. We instead formulate time delay calibration as a registration problem, and introduce a calibration algorithm that operates by aligning curves in a three-dimensional orientation space. Results from simulation studies and from experiments with real hardware demonstrate that the delay can be accurately calibrated.

A General Framework for Temporal Calibration of Multiple Proprioceptive and Exteroceptive Sensors

J. Kelly and G. S. Sukhatme

in Experimental Robotics: The 12th International Symposium on Experimental Robotics , O. Khatib, V. Kumar, and G. S. Sukhatme, Eds., Berlin, Heidelberg: Springer, 2014, vol. 79, pp. 195-209.

DOI | Bibtex | Abstract | PDF

@incollection{2014_Kelly_General,
  abstract = {Fusion of data from multiple sensors can enable robust navigation in varied environments. However, for optimal performance, the sensors must be calibrated relative to one another. Full sensor-to-sensor calibration is a spatiotemporal problem: we require an accurate estimate of the relative timing of measurements for each pair of sensors, in addition to the 6-DOF sensor-to-sensor transform. In this paper, we examine the problem of determining the time delays between multiple proprioceptive and exteroceptive sensor data streams. The primary difficultly is that the correspondences between measurements from different sensors are unknown, and hence the delays cannot be computed directly. We instead formulate temporal calibration as a registration task. Our algorithm operates by aligning curves in a three-dimensional orientation space, and, as such, can be considered as a variant of Iterative Closest Point (ICP). We present results from simulation studies and from experiments with a PR2 robot, which demonstrate accurate calibration of the time delays between measurements from multiple, heterogeneous sensors.},
  address = {Berlin, Heidelberg},
  author = {Jonathan Kelly and Gaurav S. Sukhatme},
  booktitle = {Experimental Robotics: The 12th International Symposium on Experimental Robotics},
  doi = {10.1007/978-3-642-28572-1_14},
  editor = {Oussama Khatib and Vijay Kumar and Gaurav S. Sukhatme},
  isbn = {978-3-642-28571-4},
  pages = {195--209},
  publisher = {Springer},
  series = {Springer Tracts in Advanced Robotics},
  title = {A General Framework for Temporal Calibration of Multiple Proprioceptive and Exteroceptive Sensors},
  volume = {79},
  year = {2014}
}

Fusion of data from multiple sensors can enable robust navigation in varied environments. However, for optimal performance, the sensors must be calibrated relative to one another. Full sensor-to-sensor calibration is a spatiotemporal problem: we require an accurate estimate of the relative timing of measurements for each pair of sensors, in addition to the 6-DOF sensor-to-sensor transform. In this paper, we examine the problem of determining the time delays between multiple proprioceptive and exteroceptive sensor data streams. The primary difficultly is that the correspondences between measurements from different sensors are unknown, and hence the delays cannot be computed directly. We instead formulate temporal calibration as a registration task. Our algorithm operates by aligning curves in a three-dimensional orientation space, and, as such, can be considered as a variant of Iterative Closest Point (ICP). We present results from simulation studies and from experiments with a PR2 robot, which demonstrate accurate calibration of the time delays between measurements from multiple, heterogeneous sensors.

Optimizing Camera Perspective for Stereo Visual Odometry

V. Peretroukhin, J. Kelly, and T. D. Barfoot

Proceedings of the Canadian Conference on Computer and Robot Vision (CRV), Montréal, Québec, Canada, May 7–9, 2014, pp. 1-7.

DOI | Bibtex | Abstract | PDF

@inproceedings{2014_Peretroukhin_Optimizing,
  abstract = {Visual Odometry (VO) is an integral part of many navigation techniques in mobile robotics. In this work, we investigate how the orientation of the camera affects the overall position estimates recovered from stereo VO. Through simulations and experimental work, we demonstrate that this error can be significantly reduced by changing the perspective of the stereo camera in relation to the moving platform. Specifically, we show that orienting the camera at an oblique angle to the direction of travel can reduce VO error by up to 82\% in simulations and up to 59\% in experimental data. A variety of parameters are investigated for their effects on this trend including frequency of captured images and camera resolution.},
  address = {Montr\'{e}al, Qu\'{e}bec, Canada},
  author = {Valentin Peretroukhin and Jonathan Kelly and Timothy D. Barfoot},
  booktitle = {Proceedings of the Canadian Conference on Computer and Robot Vision {(CRV)}},
  date = {2014-05-07/2014-05-09},
  doi = {10.1109/CRV.2014.9},
  month = {May 7--9},
  pages = {1--7},
  title = {Optimizing Camera Perspective for Stereo Visual Odometry},
  year = {2014}
}

Visual Odometry (VO) is an integral part of many navigation techniques in mobile robotics. In this work, we investigate how the orientation of the camera affects the overall position estimates recovered from stereo VO. Through simulations and experimental work, we demonstrate that this error can be significantly reduced by changing the perspective of the stereo camera in relation to the moving platform. Specifically, we show that orienting the camera at an oblique angle to the direction of travel can reduce VO error by up to 82\% in simulations and up to 59\% in experimental data. A variety of parameters are investigated for their effects on this trend including frequency of captured images and camera resolution.

An autonomous manipulation system based on force control and optimization

L. Righetti, M. Kalakrishnan, P. Pastor, J. Binney, J. Kelly, R. C. Voorhies, G. S. Sukhatme, and S. Schaal

Autonomous Robots, vol. 36, iss. 1–2, pp. 11-30, 2014.

DOI | Bibtex | Abstract

@article{2014_Righetti_Autonomous,
  abstract = {In this paper we present an architecture for autonomous manipulation. Our approach is based on the belief that contact interactions during manipulation should be exploited to improve dexterity and that optimizing motion plans is useful to create more robust and repeatable manipulation behaviors. We therefore propose an architecture where state of the art force/torque control and optimization-based motion planning are the core components of the system. We give a detailed description of the modules that constitute the complete system and discuss the challenges inherent to creat- ing such a system. We present experimental results for several grasping and manipulation tasks to demonstrate the performance and robustness of our approach.},
  author = {Ludovic Righetti and Mrinal Kalakrishnan and Peter Pastor and Jonathan Binney and Jonathan Kelly and Randolph C. Voorhies and Gaurav S. Sukhatme and Stefan Schaal},
  doi = {10.1007/s10514-013-9365-9},
  journal = {Autonomous Robots},
  month = {January},
  number = {1--2},
  pages = {11--30},
  title = {An autonomous manipulation system based on force control and optimization},
  volume = {36},
  year = {2014}
}

In this paper we present an architecture for autonomous manipulation. Our approach is based on the belief that contact interactions during manipulation should be exploited to improve dexterity and that optimizing motion plans is useful to create more robust and repeatable manipulation behaviors. We therefore propose an architecture where state of the art force/torque control and optimization-based motion planning are the core components of the system. We give a detailed description of the modules that constitute the complete system and discuss the challenges inherent to creat- ing such a system. We present experimental results for several grasping and manipulation tasks to demonstrate the performance and robustness of our approach.

2013

Editorial: Special Issue on Long-Term Autonomy

T. Barfoot, J. Kelly, and G. Sibley

The International Journal of Robotics Research, vol. 32, iss. 14, pp. 1609-1610, 2013.

DOI | Bibtex | PDF

@article{2013_Barfoot_Long-Term,
  author = {Tim Barfoot and Jonathan Kelly and Gabe Sibley},
  doi = {10.1177/0278364913511182},
  journal = {The International Journal of Robotics Research},
  month = {December},
  number = {14},
  pages = {1609--1610},
  title = {Editorial: Special Issue on Long-Term Autonomy},
  volume = {32},
  year = {2013}
}

Learning Task Error Models for Manipulation

P. Pastor, M. Kalakrishnan, J. Binney, J. Kelly, L. Righetti, G. S. Sukhatme, and S. Schaal

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, May 6–10, 2013, pp. 2612-2618.

DOI | Bibtex | Abstract

@inproceedings{2013_Pastor_Learning,
  abstract = {Precise kinematic forward models are important for robots to successfully perform dexterous grasping and manipulation tasks, especially when visual servoing is rendered infeasible due to occlusions. A lot of research has been conducted to estimate geometric and non-geometric parameters of kinematic chains to minimize reconstruction errors. However, kinematic chains can include non-linearities, e.g. due to cable stretch and motor-side encoders, that result in significantly different errors for different parts of the state space. Previous work either does not consider such non-linearities or proposes to estimate non-geometric parameters of carefully engineered models that are robot specific. We propose a data-driven approach that learns task error models that account for such unmodeled non-linearities. We argue that in the context of grasping and manipulation, it is sufficient to achieve high accuracy in the task relevant state space. We identify this relevant state space using previously executed joint configurations and learn error corrections for those. Therefore, our system is developed to generate subsequent executions that are similar to previous ones. The experiments show that our method successfully captures the non-linearities in the head kinematic chain (due to a counter-balancing spring) and the arm kinematic chains (due to cable stretch) of the considered experimental platform, see Fig. 1. The feasibility of the presented error learning approach has also been evaluated in independent DARPA ARM-S testing contributing to successfully complete 67 out of 72 grasping and manipulation tasks.},
  address = {Karlsruhe, Germany},
  author = {Peter Pastor and Mrinal Kalakrishnan and Jonathan Binney and Jonathan Kelly and Ludovic Righetti and Gaurav S. Sukhatme and Stefan Schaal},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA)}},
  date = {2013-05-06/2013-05-10},
  doi = {10.1109/ICRA.2013.6630935},
  month = {May 6--10},
  pages = {2612--2618},
  title = {Learning Task Error Models for Manipulation},
  year = {2013}
}

Precise kinematic forward models are important for robots to successfully perform dexterous grasping and manipulation tasks, especially when visual servoing is rendered infeasible due to occlusions. A lot of research has been conducted to estimate geometric and non-geometric parameters of kinematic chains to minimize reconstruction errors. However, kinematic chains can include non-linearities, e.g. due to cable stretch and motor-side encoders, that result in significantly different errors for different parts of the state space. Previous work either does not consider such non-linearities or proposes to estimate non-geometric parameters of carefully engineered models that are robot specific. We propose a data-driven approach that learns task error models that account for such unmodeled non-linearities. We argue that in the context of grasping and manipulation, it is sufficient to achieve high accuracy in the task relevant state space. We identify this relevant state space using previously executed joint configurations and learn error corrections for those. Therefore, our system is developed to generate subsequent executions that are similar to previous ones. The experiments show that our method successfully captures the non-linearities in the head kinematic chain (due to a counter-balancing spring) and the arm kinematic chains (due to cable stretch) of the considered experimental platform, see Fig. 1. The feasibility of the presented error learning approach has also been evaluated in independent DARPA ARM-S testing contributing to successfully complete 67 out of 72 grasping and manipulation tasks.

An Investigation on the Accuracy of Regional Ocean Models Through Field Trials

R. N. Smith, J. Kelly, K. Nazarzadeh, and G. S. Sukhatme

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, May 6–10, 2013, pp. 3436-3442.

DOI | Bibtex | Abstract

@inproceedings{2013_Smith_Investigation,
  abstract = {Recent efforts in mission planning for underwater vehicles have utilised predictive models to aid in navigation, optimal path planning and drive opportunistic sampling. Although these models provide information at a unprecedented resolutions and have proven to increase accuracy and effectiveness in multiple campaigns, most are deterministic in nature. Thus, predictions cannot be incorporated into probabilistic planning frameworks, nor do they provide any metric on the variance or confidence of the output variables. In this paper, we provide an initial investigation into determining the confidence of ocean model predictions based on the results of multiple field deployments of two autonomous underwater vehicles.

For multiple missions of two autonomous gliders conducted over a two-month period in 2011, we compare actual vehicle executions to simulations of the same missions through the Regional Ocean Modeling System in an ocean region off the coast of southern California. This comparison provides a qualitative analysis of the current velocity predictions for areas within the selected deployment region. Ultimately, we present a spatial heat-map of the correlation between the ocean model predictions and the actual mission executions. Knowing where the model provides unreliable predictions can be incorporated into planners to increase the utility and application of the deterministic estimations.},
  address = {Karlsruhe, Germany},
  author = {Ryan N. Smith and Jonathan Kelly and Kimia Nazarzadeh and Gaurav S. Sukhatme},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA)}},
  date = {2013-05-06/2013-05-10},
  doi = {10.1109/ICRA.2013.6631057},
  month = {May 6--10},
  pages = {3436--3442},
  title = {An Investigation on the Accuracy of Regional Ocean Models Through Field Trials},
  year = {2013}
}

Recent efforts in mission planning for underwater vehicles have utilised predictive models to aid in navigation, optimal path planning and drive opportunistic sampling. Although these models provide information at a unprecedented resolutions and have proven to increase accuracy and effectiveness in multiple campaigns, most are deterministic in nature. Thus, predictions cannot be incorporated into probabilistic planning frameworks, nor do they provide any metric on the variance or confidence of the output variables. In this paper, we provide an initial investigation into determining the confidence of ocean model predictions based on the results of multiple field deployments of two autonomous underwater vehicles. For multiple missions of two autonomous gliders conducted over a two-month period in 2011, we compare actual vehicle executions to simulations of the same missions through the Regional Ocean Modeling System in an ocean region off the coast of southern California. This comparison provides a qualitative analysis of the current velocity predictions for areas within the selected deployment region. Ultimately, we present a spatial heat-map of the correlation between the ocean model predictions and the actual mission executions. Knowing where the model provides unreliable predictions can be incorporated into planners to increase the utility and application of the deterministic estimations.

CELLO: A Fast Algorithm for Covariance Estimation

W. Vega-Brown, A. Bachrach, A. Bry, J. Kelly, and N. Roy

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, May 6–10, 2013, pp. 3160-3167.

DOI | Bibtex | Abstract

@inproceedings{2013_Vega-Brown_CELLO,
  abstract = {We present CELLO (Covariance Estimation and Learning through Likelihood Optimization), an algorithm for predicting the covariances of measurements based on any available informative features. This algorithm is intended to improve the accuracy and reliability of on-line state estimation by providing a principled way to extend the conventional fixed-covariance Gaussian measurement model. We show that in ex- periments, CELLO learns to predict measurement covariances that agree with empirical covariances obtained by manually annotating sensor regimes. We also show that using the learned covariances during filtering provides substantial quantitative improvement to the overall state estimate.},
  address = {Karlsruhe, Germany},
  author = {William Vega-Brown and Abraham Bachrach and Adam Bry and Jonathan Kelly and Nicholas Roy},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA)}},
  date = {2013-05-06/2013-05-10},
  doi = {10.1109/ICRA.2013.6631017},
  month = {May 6--10},
  pages = {3160--3167},
  title = {{CELLO}: A Fast Algorithm for Covariance Estimation},
  year = {2013}
}

We present CELLO (Covariance Estimation and Learning through Likelihood Optimization), an algorithm for predicting the covariances of measurements based on any available informative features. This algorithm is intended to improve the accuracy and reliability of on-line state estimation by providing a principled way to extend the conventional fixed-covariance Gaussian measurement model. We show that in ex- periments, CELLO learns to predict measurement covariances that agree with empirical covariances obtained by manually annotating sensor regimes. We also show that using the learned covariances during filtering provides substantial quantitative improvement to the overall state estimate.

2012

Taking the Long View: A Report on Two Recent Workshops on Long-Term Autonomy

J. Kelly, G. Sibley, T. Barfoot, and P. Newman

IEEE Robotics & Automation Magazine, vol. 19, iss. 1, pp. 109-111, 2012.

DOI | Bibtex | PDF

@article{2012_Kelly_Taking,
  author = {Jonathan Kelly and Gabe Sibley and Tim Barfoot and Paul Newman},
  doi = {10.1109/MRA.2011.2181792},
  journal = {{IEEE} Robotics \& Automation Magazine},
  month = {March},
  number = {1},
  pages = {109--111},
  title = {Taking the Long View: A Report on Two Recent Workshops on Long-Term Autonomy},
  volume = {19},
  year = {2012}
}

Towards Improving Mission Execution for Autonomous Gliders with an Ocean Model and Kalman Filter

R. N. Smith, J. Kelly, and G. S. Sukhatme

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, Minnesota, USA, May 14–18, 2012, pp. 4870-4877.

DOI | Bibtex | Abstract | PDF

@inproceedings{2012_Smith_Towards,
  abstract = {Effective execution of a planned path by an underwater vehicle is important for proper analysis of the gathered science data, as well as to ensure the safety of the vehicle during the mission. Here, we propose the use of an unscented Kalman filter to aid in determining how the planned mission is executed. Given a set of waypoints that define a planned path and a dicretization of the ocean currents from a regional ocean model, we present an approach to determine the time interval at which the glider should surface to maintain a prescribed tracking error, while also limiting its time on the ocean surface. We assume practical mission parameters provided from previous field trials for the problem set up, and provide the simulated results of the Kalman filter mission planning approach. The results are initially compared to data from prior field experiments in which an autonomous glider executed the same path without pre-planning. Then, the results are validated through field trials with multiple autonomous gliders implementing different surfacing intervals simultaneously while following the same path.},
  address = {Saint Paul, Minnesota, USA},
  author = {Ryan N. Smith and Jonathan Kelly and Gaurav S. Sukhatme},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA)}},
  date = {2012-05-14/2012-05-18},
  doi = {10.1109/ICRA.2012.6224609},
  month = {May 14--18},
  pages = {4870--4877},
  title = {Towards Improving Mission Execution for Autonomous Gliders with an Ocean Model and Kalman Filter},
  year = {2012}
}

Effective execution of a planned path by an underwater vehicle is important for proper analysis of the gathered science data, as well as to ensure the safety of the vehicle during the mission. Here, we propose the use of an unscented Kalman filter to aid in determining how the planned mission is executed. Given a set of waypoints that define a planned path and a dicretization of the ocean currents from a regional ocean model, we present an approach to determine the time interval at which the glider should surface to maintain a prescribed tracking error, while also limiting its time on the ocean surface. We assume practical mission parameters provided from previous field trials for the problem set up, and provide the simulated results of the Kalman filter mission planning approach. The results are initially compared to data from prior field experiments in which an autonomous glider executed the same path without pre-planning. Then, the results are validated through field trials with multiple autonomous gliders implementing different surfacing intervals simultaneously while following the same path.

Autonomous Mapping of Factory Floors Using a Quadrotor MAV

W. Vega-Brown, J. Kelly, A. Bachrach, A. Bry, S. Prentice, and N. Roy

Proceedings of Robotics: Science and Systems (RSS) Workshop on Integration of Perception with Control and Navigation for Resource-Limited, Highly Dynamic, Autonomous Systems, Sydney, Australia, Jul. 9–10, 2012.

Bibtex | Abstract | PDF

@inproceedings{2012_Vega-Brown_Autonomous,
  abstract = {We are developing a quadrotor-based system for autonomous mapping of factory floors. Information from a monocular camera, a laser rangefinder, and an IMU on-board the vehicle is fused to generate a 3D point cloud and a 2D image mosaic. These data products can then be used by the factory operators for logistics planning, equipment management, and related tasks.},
  address = {Sydney, Australia},
  author = {William Vega-Brown and Jonathan Kelly and Abraham Bachrach and Adam Bry and Samuel Prentice and Nick Roy},
  booktitle = {Proceedings of Robotics: Science and Systems {(RSS)} Workshop on Integration of Perception with Control and Navigation for Resource-Limited, Highly Dynamic, Autonomous Systems},
  date = {2012-07-09/2012-07-10},
  month = {Jul. 9--10},
  title = {Autonomous Mapping of Factory Floors Using a Quadrotor {MAV}},
  year = {2012}
}

We are developing a quadrotor-based system for autonomous mapping of factory floors. Information from a monocular camera, a laser rangefinder, and an IMU on-board the vehicle is fused to generate a 3D point cloud and a 2D image mosaic. These data products can then be used by the factory operators for logistics planning, equipment management, and related tasks.

2011

Simultaneous Mapping and Stereo Extrinsic Parameter Calibration Using GPS Measurements

J. Kelly, L. H. Matthies, and G. S. Sukhatme

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 9–13, 2011, pp. 279-286.

DOI | Bibtex | Abstract | PDF

@inproceedings{2011_Kelly_Simultaneous,
  abstract = {Stereo vision is useful for a variety of robotics tasks, such as navigation and obstacle avoidance. However, recovery of valid range data from stereo depends on accurate calibration of the extrinsic parameters of the stereo rig, i.e., the 6-DOF transform between the left and right cameras. Stereo self-calibration is possible, but, without additional information, the absolute scale of the stereo baseline cannot be determined. In this paper, we formulate stereo extrinsic parameter calibration as a batch maximum likelihood estimation problem, and use GPS measurements to establish the scale of both the scene and the stereo baseline. Our approach is similar to photogrammetric bundle adjustment, and closely related to many structure from motion algorithms. We present results from simulation experiments using a range of GPS accuracy levels; these accuracies are achievable by varying grades of commercially-available receivers. We then validate the algorithm using stereo and GPS data acquired from a moving vehicle. Our results indicate that the approach is promising.},
  address = {Shanghai, China},
  author = {Jonathan Kelly and Larry H. Matthies and Gaurav S. Sukhatme},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation {(ICRA)}},
  date = {2011-05-09/2011-05-13},
  doi = {10.1109/ICRA.2011.5980443},
  month = {May 9--13},
  pages = {279--286},
  title = {Simultaneous Mapping and Stereo Extrinsic Parameter Calibration Using {GPS} Measurements},
  year = {2011}
}

Stereo vision is useful for a variety of robotics tasks, such as navigation and obstacle avoidance. However, recovery of valid range data from stereo depends on accurate calibration of the extrinsic parameters of the stereo rig, i.e., the 6-DOF transform between the left and right cameras. Stereo self-calibration is possible, but, without additional information, the absolute scale of the stereo baseline cannot be determined. In this paper, we formulate stereo extrinsic parameter calibration as a batch maximum likelihood estimation problem, and use GPS measurements to establish the scale of both the scene and the stereo baseline. Our approach is similar to photogrammetric bundle adjustment, and closely related to many structure from motion algorithms. We present results from simulation experiments using a range of GPS accuracy levels; these accuracies are achievable by varying grades of commercially-available receivers. We then validate the algorithm using stereo and GPS data acquired from a moving vehicle. Our results indicate that the approach is promising.

On Temporal and Spatial Calibration for High Accuracy Visual-Inertial Motion Estimation

J. Kelly

PhD Thesis , University of Southern California, Los Angeles, California, USA, 2011.

Bibtex | Abstract

@phdthesis{2011_Kelly_Temporal,
  abstract = {The majority of future autonomous robots will be mobile, and will need to navigate reliably in unknown and dynamic environments.  Visual and inertial sensors, together, are able to supply accurate motion estimates and are well-suited for use in many robot navigation tasks. Beyond egomotion estimation, fusing high-rate inertial sensing with monocular vision enables other capabilities, such as independent motion segmentation and tracking, moving obstacle detection and ranging, and dense metric 3D mapping, all from a mobile platform.

A fundamental requirement in any multisensor system is precision calibration. To ensure optimal performance, the sensors must be properly calibrated, both intrinsically and relative to one another. In a visual-inertial system, the camera and the inertial measurement unit (IMU) require both temporal and spatial calibration --- estimates of the relative timing of measurements from each sensor and of the six degrees-of-freedom transform between the sensors are needed. Obtaining this calibration information is typically difficult and time-consuming, however. Ideally, we would like to build power-on-and-go robots that are able to operate for long periods without the usual requisite  manual sensor (re-) calibration.

This dissertation describes work on combining visual and inertial sensing for navigation applications, with an emphasis on the ability to temporally and spatially (self-) calibrate a camera and an IMU. Self-calibration refers to the use of data exclusively from the sensors themselves to improve estimates of related system parameters.

The primary difficultly in temporal calibration is that the correspondences between measurements from the different sensors are initially unknown, and hence the relative time delay between the data streams cannot be computed directly. We instead formulate temporal calibration as a registration problem, and introduce an algorithm called Time Delay Iterative Closest Point (TD-ICP) as a novel solution.  The algorithm operates by aligning curves in a three-dimensional orientation space, and incorporates in a principled way the uncertainty in the camera and IMU measurements.

We then develop a sequential filtering approach for calibration of the spatial transform between the sensors. We estimate the transform parameters using a sigma point Kalman filter (SPKF). Our formulation rests on a differential geometric analysis of the observability of the camera-IMU system; this analysis shows for the first time that the IMU pose and velocity, the gyroscope and accelerometer biases, the gravity vector, the metric scene structure, and the sensor-to-sensor transform, can be recovered from camera and IMU measurements alone. While calibrating the transform we simultaneously localize the IMU and build a map of the surroundings. No additional hardware or prior knowledge about the environment in which a robot is operating is necessary.

Results from extensive simulation studies and from laboratory experiments are presented, which demonstrate accurate camera-IMU temporal and spatial calibration. Further, our results indicate that calibration substantially improves motion estimates, and that the local scene structure can be recovered with high fidelity.

Together, these contributions represent a step towards developing fully autonomous robotic systems that are capable of long-term operation without the need for manual calibration.},
  address = {Los Angeles, California, USA},
  author = {Jonathan Kelly},
  institution = {University of Southern California},
  month = {December},
  school = {University of Southern California},
  title = {On Temporal and Spatial Calibration for High Accuracy Visual-Inertial Motion Estimation},
  year = {2011}
}

The majority of future autonomous robots will be mobile, and will need to navigate reliably in unknown and dynamic environments.  Visual and inertial sensors, together, are able to supply accurate motion estimates and are well-suited for use in many robot navigation tasks. Beyond egomotion estimation, fusing high-rate inertial sensing with monocular vision enables other capabilities, such as independent motion segmentation and tracking, moving obstacle detection and ranging, and dense metric 3D mapping, all from a mobile platform. A fundamental requirement in any multisensor system is precision calibration. To ensure optimal performance, the sensors must be properly calibrated, both intrinsically and relative to one another. In a visual-inertial system, the camera and the inertial measurement unit (IMU) require both temporal and spatial calibration --- estimates of the relative timing of measurements from each sensor and of the six degrees-of-freedom transform between the sensors are needed. Obtaining this calibration information is typically difficult and time-consuming, however. Ideally, we would like to build power-on-and-go robots that are able to operate for long periods without the usual requisite  manual sensor (re-) calibration. This dissertation describes work on combining visual and inertial sensing for navigation applications, with an emphasis on the ability to temporally and spatially (self-) calibrate a camera and an IMU. Self-calibration refers to the use of data exclusively from the sensors themselves to improve estimates of related system parameters. The primary difficultly in temporal calibration is that the correspondences between measurements from the different sensors are initially unknown, and hence the relative time delay between the data streams cannot be computed directly. We instead formulate temporal calibration as a registration problem, and introduce an algorithm called Time Delay Iterative Closest Point (TD-ICP) as a novel solution.  The algorithm operates by aligning curves in a three-dimensional orientation space, and incorporates in a principled way the uncertainty in the camera and IMU measurements. We then develop a sequential filtering approach for calibration of the spatial transform between the sensors. We estimate the transform parameters using a sigma point Kalman filter (SPKF). Our formulation rests on a differential geometric analysis of the observability of the camera-IMU system; this analysis shows for the first time that the IMU pose and velocity, the gyroscope and accelerometer biases, the gravity vector, the metric scene structure, and the sensor-to-sensor transform, can be recovered from camera and IMU measurements alone. While calibrating the transform we simultaneously localize the IMU and build a map of the surroundings. No additional hardware or prior knowledge about the environment in which a robot is operating is necessary. Results from extensive simulation studies and from laboratory experiments are presented, which demonstrate accurate camera-IMU temporal and spatial calibration. Further, our results indicate that calibration substantially improves motion estimates, and that the local scene structure can be recovered with high fidelity. Together, these contributions represent a step towards developing fully autonomous robotic systems that are capable of long-term operation without the need for manual calibration.

Visual-Inertial Sensor Fusion: Localization, Mapping and Sensor-to-Sensor Self-Calibration

J. Kelly and G. S. Sukhatme

The International Journal of Robotics Research, vol. 30, iss. 1, pp. 56-79, 2011.

DOI | Bibtex | Abstract

@article{2011_Kelly_Visual,
  abstract = {Visual and inertial sensors, in combination, are able to provide accurate motion estimates and are well suited for use in many robot navigation tasks. However, correct data fusion, and hence overall performance, depends on careful calibration of the rigid body transform between the sensors. Obtaining this calibration information is typically difficult and time-consuming, and normally requires additional equipment. In this paper we describe an algorithm, based on the unscented Kalman filter, for self-calibration of the transform between a camera and an inertial measurement unit (IMU). Our formulation rests on a differential geometric analysis of the observability of the camera--IMU system; this analysis shows that the sensor-to-sensor transform, the IMU gyroscope and accelerometer biases, the local gravity vector, and the metric scene structure can be recovered from camera and IMU measurements alone. While calibrating the transform we simultaneously localize the IMU and build a map of the surroundings, all without additional hardware or prior knowledge about the environment in which a robot is operating. We present results from simulation studies and from experiments with a monocular camera and a low-cost IMU, which demonstrate accurate estimation of both the calibration parameters and the local scene structure.},
  author = {Jonathan Kelly and Gaurav S. Sukhatme},
  doi = {10.1177/0278364910382802},
  journal = {The International Journal of Robotics Research},
  month = {January},
  number = {1},
  pages = {56--79},
  title = {Visual-Inertial Sensor Fusion: Localization, Mapping and Sensor-to-Sensor Self-Calibration},
  volume = {30},
  year = {2011}
}

Visual and inertial sensors, in combination, are able to provide accurate motion estimates and are well suited for use in many robot navigation tasks. However, correct data fusion, and hence overall performance, depends on careful calibration of the rigid body transform between the sensors. Obtaining this calibration information is typically difficult and time-consuming, and normally requires additional equipment. In this paper we describe an algorithm, based on the unscented Kalman filter, for self-calibration of the transform between a camera and an inertial measurement unit (IMU). Our formulation rests on a differential geometric analysis of the observability of the camera--IMU system; this analysis shows that the sensor-to-sensor transform, the IMU gyroscope and accelerometer biases, the local gravity vector, and the metric scene structure can be recovered from camera and IMU measurements alone. While calibrating the transform we simultaneously localize the IMU and build a map of the surroundings, all without additional hardware or prior knowledge about the environment in which a robot is operating. We present results from simulation studies and from experiments with a monocular camera and a low-cost IMU, which demonstrate accurate estimation of both the calibration parameters and the local scene structure.

2010

Self-Calibration of Inertial and Omnidirectional Visual Sensors for Navigation and Mapping

J. Kelly and G. S. Sukhatme

Proceedings of the IEEE International Conference on Robotics and Automation Workshop on Omnidirectional Robot Vision, Anchorage, Alaska, USA, May 7, 2010, pp. 1-6.

Bibtex | Abstract | PDF

@inproceedings{2010_Kelly_Self,
  abstract = {Omnidirectional cameras are versatile sensors that are able to provide a full 360-degree view of the environment. When combined with inertial sensing, omnidirectional vision offers a potentially robust navigation solution. However, to correctly fuse the data from an omnidirectional camera and an inertial measurement unit (IMU) into a single navigation frame, the 6-DOF transform between the sensors must be accurately known. In this paper we describe an algorithm, based on the unscented Kalman filter, for self-calibration of the transform between an omnidirectional camera and an IMU. We show that the IMU biases, the local gravity vector, and the metric scene structure can also be recovered from camera and IMU measurements. Further, our approach does not require any additional hardware or prior knowledge about the environment in which a robot is operating. We present results from calibration experiments with an omnidirectional camera and a low-cost IMU, which demonstrate accurate self- calibration of the 6-DOF sensor-to-sensor transform.},
  address = {Anchorage, Alaska, USA},
  author = {Jonathan Kelly and Gaurav S. Sukhatme},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Automation Workshop on Omnidirectional Robot Vision},
  date = {2010-05-07},
  month = {May 7},
  pages = {1--6},
  title = {Self-Calibration of Inertial and Omnidirectional Visual Sensors for Navigation and Mapping},
  year = {2010}
}

Omnidirectional cameras are versatile sensors that are able to provide a full 360-degree view of the environment. When combined with inertial sensing, omnidirectional vision offers a potentially robust navigation solution. However, to correctly fuse the data from an omnidirectional camera and an inertial measurement unit (IMU) into a single navigation frame, the 6-DOF transform between the sensors must be accurately known. In this paper we describe an algorithm, based on the unscented Kalman filter, for self-calibration of the transform between an omnidirectional camera and an IMU. We show that the IMU biases, the local gravity vector, and the metric scene structure can also be recovered from camera and IMU measurements. Further, our approach does not require any additional hardware or prior knowledge about the environment in which a robot is operating. We present results from calibration experiments with an omnidirectional camera and a low-cost IMU, which demonstrate accurate self- calibration of the 6-DOF sensor-to-sensor transform.

Towards the Improvement of Autonomous Glider Navigational Accuracy Through the Use of Regional Ocean Models

R. N. Smith, J. Kelly, Y. Chao, B. H. Jones, and G. S. Sukhatme

Proceedings of the ASME 29th International Conference on Ocean, Offshore and Arctic Engineering (OMAE), Shanghai, China, Jun. 6–11, 2010, pp. 597-606.

DOI | Bibtex | Abstract | PDF

@inproceedings{2010_Smith_Towards,
  abstract = {Autonomous underwater gliders are robust and widely-used ocean sampling platforms that are characterized by their endurance, and are one of the best approaches to gather subsurface data at the appropriate spatial resolution to advance our knowledge of the ocean environment. Gliders generally do not employ sophisticated sensors for underwater localization, but instead dead-reckon between set waypoints. Thus, these vehicles are subject to large positional errors between prescribed and actual surfacing locations. Here, we investigate the implementation of a large-scale, regional ocean model into the trajectory design for autonomous gliders to improve their navigational accuracy. We compute the dead-reckoning error for our Slocum gliders, and compare this to the average positional error recorded from multiple deployments conducted over the past year. We then compare trajectory plans computed on-board the vehicle during recent deployments to our prediction-based trajectory plans for 140 surfacing occurrences.},
  address = {Shanghai, China},
  author = {Ryan N. Smith and Jonathan Kelly and Yi Chao and Burton H. Jones and Gaurav S. Sukhatme},
  booktitle = {Proceedings of the {ASME} 29th International Conference on Ocean, Offshore and Arctic Engineering {(OMAE)}},
  date = {2010-06-06/2010-06-11},
  doi = {10.1115/OMAE2010-21015},
  month = {Jun. 6--11},
  pages = {597--606},
  title = {Towards the Improvement of Autonomous Glider Navigational Accuracy Through the Use of Regional Ocean Models},
  year = {2010}
}

Autonomous underwater gliders are robust and widely-used ocean sampling platforms that are characterized by their endurance, and are one of the best approaches to gather subsurface data at the appropriate spatial resolution to advance our knowledge of the ocean environment. Gliders generally do not employ sophisticated sensors for underwater localization, but instead dead-reckon between set waypoints. Thus, these vehicles are subject to large positional errors between prescribed and actual surfacing locations. Here, we investigate the implementation of a large-scale, regional ocean model into the trajectory design for autonomous gliders to improve their navigational accuracy. We compute the dead-reckoning error for our Slocum gliders, and compare this to the average positional error recorded from multiple deployments conducted over the past year. We then compare trajectory plans computed on-board the vehicle during recent deployments to our prediction-based trajectory plans for 140 surfacing occurrences.

2009

A survey and evaluation of promising approaches for automatic image-based defect detection of bridge structures

M. R. Jahanshahi, J. Kelly, S. F. Masri, and G. S. Sukhatme

Structure and Infrastructure Engineering, vol. 5, iss. 6, pp. 455-486, 2009.

DOI | Bibtex | Abstract | PDF

@article{2009_Jahanshahi_Survey,
  abstract = {Automatic health monitoring and maintenance of civil infrastructure systems is a challenging area of research. Nondestructive evaluation techniques, such as digital image processing, are innovative approaches for structural health monitoring. Current structure inspection standards require an inspector to travel to the structure site and visually assess the structure conditions. A less time consuming and inexpensive alternative to current monitoring methods is to use a robotic system that could inspect structures more frequently. Among several possible techniques is the use of optical instrumentation (e.g. digital cameras) that relies on image processing. The feasibility of using image processing techniques to detect deterioration in structures has been acknowledged by leading experts in the field. A survey and evaluation of relevant studies that appear promising and practical for this purpose is presented in this study. Several image processing techniques, including enhancement, noise removal, registration, edge detection, line detection, morphological functions, colour analysis, texture detection, wavelet transform, segmentation, clustering and pattern recognition, are key pieces that could be merged to solve this problem. Missing or deformed structural members, cracks and corrosion are main deterioration measures that are found in structures, and they are the main examples of structural deterioration considered here. This paper provides a survey and an evaluation of some of the promising vision-based approaches for automatic detection of missing (deformed) structural members, cracks and corrosion in civil infrastructure systems. Several examples (based on laboratory studies by the authors) are presented in the paper to illustrate the utility, as well as the limitations, of the leading approaches.},
  author = {Mohammad R. Jahanshahi and Jonathan Kelly and Sami F. Masri and Gaurav S. Sukhatme},
  doi = {10.1080/15732470801945930},
  journal = {Structure and Infrastructure Engineering},
  month = {December},
  number = {6},
  pages = {455--486},
  title = {A survey and evaluation of promising approaches for automatic image-based defect detection of bridge structures},
  volume = {5},
  year = {2009}
}

Automatic health monitoring and maintenance of civil infrastructure systems is a challenging area of research. Nondestructive evaluation techniques, such as digital image processing, are innovative approaches for structural health monitoring. Current structure inspection standards require an inspector to travel to the structure site and visually assess the structure conditions. A less time consuming and inexpensive alternative to current monitoring methods is to use a robotic system that could inspect structures more frequently. Among several possible techniques is the use of optical instrumentation (e.g. digital cameras) that relies on image processing. The feasibility of using image processing techniques to detect deterioration in structures has been acknowledged by leading experts in the field. A survey and evaluation of relevant studies that appear promising and practical for this purpose is presented in this study. Several image processing techniques, including enhancement, noise removal, registration, edge detection, line detection, morphological functions, colour analysis, texture detection, wavelet transform, segmentation, clustering and pattern recognition, are key pieces that could be merged to solve this problem. Missing or deformed structural members, cracks and corrosion are main deterioration measures that are found in structures, and they are the main examples of structural deterioration considered here. This paper provides a survey and an evaluation of some of the promising vision-based approaches for automatic detection of missing (deformed) structural members, cracks and corrosion in civil infrastructure systems. Several examples (based on laboratory studies by the authors) are presented in the paper to illustrate the utility, as well as the limitations, of the leading approaches.

Coordinated Three-Dimensional Robotic Self-Assembly

J. Kelly and H. Zhang

Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO), Bangkok, Thailand, Feb. 21–26, 2009, pp. 172-178.

DOI | Bibtex | Abstract | PDF

@inproceedings{2009_Kelly_Coordinated,
  abstract = {Nature has demonstrated that geometrically interesting and functionally useful structures can be built in an entirely distributed fashion. We present a biologically-inspired model and several algorithms for three-dimensional self-assembly, suitable for implementation by very simple reactive robots. The robots, which we call assembly components, have limited local sensing capabilities and operate without centralized control. We consider the problem of maintaining coordination of the assembly process over time, and introduce the concept of an assembly ordering to describe constraints on the sequence in which components may attach to a growing structure. We prove the sufficient properties of such an ordering to guarantee production of a desired end result. The set of ordering constraints can be expressed as a directed acyclic graph; we develop a graph algorithm that is able to generate ordering constraints for a wide variety of structures. We then give a procedure for encoding the graph in a set of local assembly rules. Finally, we show that our previous results for the optimization of rule sets for two-dimensional structures can be readily extended to three dimensions.},
  address = {Bangkok, Thailand},
  author = {Jonathan Kelly and Hong Zhang},
  booktitle = {Proceedings of the {IEEE} International Conference on Robotics and Biomimetics {(ROBIO)}},
  date = {2009-02-21/2009-02-26},
  doi = {10.1109/ROBIO.2009.4912999},
  month = {Feb. 21--26},
  note = {Best Student Paper Award},
  pages = {172--178},
  title = {Coordinated Three-Dimensional Robotic Self-Assembly},
  year = {2009}
}

Nature has demonstrated that geometrically interesting and functionally useful structures can be built in an entirely distributed fashion. We present a biologically-inspired model and several algorithms for three-dimensional self-assembly, suitable for implementation by very simple reactive robots. The robots, which we call assembly components, have limited local sensing capabilities and operate without centralized control. We consider the problem of maintaining coordination of the assembly process over time, and introduce the concept of an assembly ordering to describe constraints on the sequence in which components may attach to a growing structure. We prove the sufficient properties of such an ordering to guarantee production of a desired end result. The set of ordering constraints can be expressed as a directed acyclic graph; we develop a graph algorithm that is able to generate ordering constraints for a wide variety of structures. We then give a procedure for encoding the graph in a set of local assembly rules. Finally, we show that our previous results for the optimization of rule sets for two-dimensional structures can be readily extended to three dimensions.

Best Student Paper Award

Fast Relative Pose Calibration for Visual and Inertial Sensors

J. Kelly and G. S. Sukhatme

in Experimental Robotics: The Eleventh International Symposium , O. Khatib, V. Kumar, and G. J. Pappas, Eds., Berlin, Heidelberg: Springer, 2009, vol. 54, pp. 515-524.

DOI | Bibtex | Abstract | PDF

@incollection{2009_Kelly_Fast,
  abstract = {Accurate vision-aided inertial navigation depends on proper calibration of the relative pose of the camera and the inertial measurement unit (IMU). Calibration errors introduce bias in the overall motion estimate, degrading navigation performance - sometimes dramatically. However, existing camera-IMU calibration techniques are difficult, time-consuming and often require additional complex apparatus. In this paper, we formulate the camera-IMU relative pose calibration problem in a filtering framework, and propose a calibration algorithm which requires only a planar camera calibration target. The algorithm uses an unscented Kalman filter to estimate the pose of the IMU in a global reference frame and the 6-DoF transform between the camera and the IMU. Results from simulations and experiments with a low-cost solid-state IMU demonstrate the accuracy of the approach.},
  address = {Berlin, Heidelberg},
  author = {Jonathan Kelly and Gaurav S. Sukhatme},
  booktitle = {Experimental Robotics: The Eleventh International Symposium},
  doi = {10.1007/978-3-642-00196-3_59},
  editor = {Oussama Khatib and Vijay Kumar and George J. Pappas},
  isbn = {978-3-642-00195-6},
  pages = {515--524},
  publisher = {Springer},
  series = {Springer Tracts in Advanced Robotics},
  title = {Fast Relative Pose Calibration for Visual and Inertial Sensors},
  volume = {54},
  year = {2009}
}

Accurate vision-aided inertial navigation depends on proper calibration of the relative pose of the camera and the inertial measurement unit (IMU). Calibration errors introduce bias in the overall motion estimate, degrading navigation performance - sometimes dramatically. However, existing camera-IMU calibration techniques are difficult, time-consuming and often require additional complex apparatus. In this paper, we formulate the camera-IMU relative pose calibration problem in a filtering framework, and propose a calibration algorithm which requires only a planar camera calibration target. The algorithm uses an unscented Kalman filter to estimate the pose of the IMU in a global reference frame and the 6-DoF transform between the camera and the IMU. Results from simulations and experiments with a low-cost solid-state IMU demonstrate the accuracy of the approach.

Visual-Inertial Simultaneous Localization, Mapping and Sensor-to-Sensor Self-Calibration

J. Kelly and G. S. Sukhatme

Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA), Daejeon, Korea, Dec. 15–18, 2009, pp. 360-368.

DOI | Bibtex | Abstract | PDF

@inproceedings{2009_Kelly_Visual,
  abstract = {Visual and inertial sensors, in combination, are well-suited for many robot navigation and mapping tasks. However, correct data fusion, and hence overall system performance, depends on accurate calibration of the 6-DOF transform between the sensors (one or more camera(s) and an inertial measurement unit). Obtaining this calibration information is typically difficult and time-consuming. In this paper, we describe an algorithm, based on the unscented Kalman filter (UKF), for camera-IMU simultaneous localization, mapping and sensor relative pose self-calibration. We show that the sensor-to-sensor transform, the IMU gyroscope and accelerometer biases, the local gravity vector, and the metric scene structure can all be recovered from camera and IMU measurements alone. This is possible without any prior knowledge about the environment in which the robot is operating. We present results from experiments with a monocular camera and a low-cost solid-state IMU, which demonstrate accurate estimation of the calibration parameters and the local scene structure.},
  address = {Daejeon, Korea},
  author = {Jonathan Kelly and Gaurav S. Sukhatme},
  booktitle = {Proceedings of the {IEEE} International Symposium on Computational Intelligence in Robotics and Automation {(CIRA)}},
  date = {2009-12-15/2009-12-18},
  doi = {10.1109/CIRA.2009.5423178},
  month = {Dec. 15--18},
  pages = {360--368},
  rating = {0},
  title = {Visual-Inertial Simultaneous Localization, Mapping and Sensor-to-Sensor Self-Calibration},
  year = {2009}
}

Visual and inertial sensors, in combination, are well-suited for many robot navigation and mapping tasks. However, correct data fusion, and hence overall system performance, depends on accurate calibration of the 6-DOF transform between the sensors (one or more camera(s) and an inertial measurement unit). Obtaining this calibration information is typically difficult and time-consuming. In this paper, we describe an algorithm, based on the unscented Kalman filter (UKF), for camera-IMU simultaneous localization, mapping and sensor relative pose self-calibration. We show that the sensor-to-sensor transform, the IMU gyroscope and accelerometer biases, the local gravity vector, and the metric scene structure can all be recovered from camera and IMU measurements alone. This is possible without any prior knowledge about the environment in which the robot is operating. We present results from experiments with a monocular camera and a low-cost solid-state IMU, which demonstrate accurate estimation of the calibration parameters and the local scene structure.

2008

Algorithmic Distributed Assembly

J. Kelly

Master Thesis , University of Alberta, Edmonton, Alberta, Canada, 2008.

Bibtex | Abstract | PDF

@mastersthesis{2008_Kelly_Algorithmic,
  abstract = {This thesis describes a model for planar distributed assembly, in which unit-square assembly components move randomly and independently on a two-dimensional grid, binding together to form a desired target structure. The components are simple reactive agents, with limited capabilities including short-range sensing and rule-based control only, and operate in an entirely decentralized manner.

Using the model, we investigate two primary issues, coordination and sensing, from an algorithmic perspective. Our goal is to determine how a group of components can be reliably programmed to produce a global result (structure) from purely local interactions.  Towards this end, we define the local spatiotemporal ordering constraints that must be satisfied for assembly to progress in a coordinated manner, and give a procedure for encoding these constraints in a rule set. When executed by the components, this rule set is guaranteed to produce the target structure, despite the random actions of group members. We then introduce an optimization algorithm which is able to significantly reduce the number of distinct environmental states that components must recognize in order to assemble into a structure. Experiments show that our optimization algorithm outperforms existing approaches.},
  address = {Edmonton, Alberta, Canada},
  author = {Jonathan Kelly},
  institution = {University of Alberta},
  month = {April},
  school = {University of Alberta},
  title = {Algorithmic Distributed Assembly},
  year = {2008}
}

This thesis describes a model for planar distributed assembly, in which unit-square assembly components move randomly and independently on a two-dimensional grid, binding together to form a desired target structure. The components are simple reactive agents, with limited capabilities including short-range sensing and rule-based control only, and operate in an entirely decentralized manner. Using the model, we investigate two primary issues, coordination and sensing, from an algorithmic perspective. Our goal is to determine how a group of components can be reliably programmed to produce a global result (structure) from purely local interactions.  Towards this end, we define the local spatiotemporal ordering constraints that must be satisfied for assembly to progress in a coordinated manner, and give a procedure for encoding these constraints in a rule set. When executed by the components, this rule set is guaranteed to produce the target structure, despite the random actions of group members. We then introduce an optimization algorithm which is able to significantly reduce the number of distinct environmental states that components must recognize in order to assemble into a structure. Experiments show that our optimization algorithm outperforms existing approaches.

Combined Visual and Inertial Navigation for an Unmanned Aerial Vehicle

J. Kelly, S. Saripalli, and G. S. Sukhatme

in Field and Service Robotics: Results of the 6th International Conference , C. Laugier and R. Siegwart, Eds., Berlin Heidelberg: Springer, 2008, vol. 42/2008, pp. 255-264.

DOI | Bibtex | Abstract | PDF

@incollection{2008_Kelly_Combined,
  abstract = {We describe an UAV navigation system which combines stereo visual odometry with inertial measurements from an IMU. Our approach fuses the motion estimates from both sensors in an extended Kalman filter to determine vehicle position and attitude. We present results using data from a robotic helicopter, in which the visual and inertial system produced a final position estimate within 1\% of the measured GPS position, over a flight distance of more than 400 meters. Our results show that the combination of visual and inertial sensing reduced overall positioning error by nearly an order of magnitude compared to visual odometry alone.},
  address = {Berlin Heidelberg},
  author = {Jonathan Kelly and Srikanth Saripalli and Gaurav S. Sukhatme},
  booktitle = {Field and Service Robotics: Results of the 6th International Conference},
  doi = {10.1007/978-3-540-75404-6_24},
  editor = {Christian Laugier and Roland Siegwart},
  pages = {255--264},
  publisher = {Springer},
  series = {Springer Tracts in Advanced Robotics},
  title = {Combined Visual and Inertial Navigation for an Unmanned Aerial Vehicle},
  volume = {42/2008},
  year = {2008}
}

We describe an UAV navigation system which combines stereo visual odometry with inertial measurements from an IMU. Our approach fuses the motion estimates from both sensors in an extended Kalman filter to determine vehicle position and attitude. We present results using data from a robotic helicopter, in which the visual and inertial system produced a final position estimate within 1\% of the measured GPS position, over a flight distance of more than 400 meters. Our results show that the combination of visual and inertial sensing reduced overall positioning error by nearly an order of magnitude compared to visual odometry alone.

On the Observability and Self-Calibration of Visual-Inertial Navigation Systems

J. Kelly

Los Angeles, California, USA, Tech. Rep. CRES-08-005, November, 2008.

Bibtex | Abstract

@techreport{2008_Kelly_Observability,
  abstract = {We examine the observability properties of visual-inertial navigation systems, with an emphasis on self-calibration of the six degrees-of-freedom rigid body transform between a camera and an inertial measurement unit (IMU). Our analysis depends on a differential geometric formulation of the calibration problem, and on an algebraic test for the observability rank condition, originally defined by Hermann and Krener.

We demonstrate that self-calibration of the camera-IMU transform is possible, under a variety of conditions. In contrast with previous work, we show that, in the presence of a known calibration target, both the local gravity vector and the IMU gyroscope and accelerometer biases are simultaneously observable (given sufficient excitation of the system). Further and more generally, we show that for a moving monocular camera and IMU, the absolute scene scale, gravity vector, and the IMU biases are all simultaneously observable. This result implies that full self-calibration is possible, without the need for any prior knowledge about the environment in which the system is operating.},
  address = {Los Angeles, California, USA},
  author = {Jonathan Kelly},
  institution = {University of Southern California},
  month = {November},
  number = {CRES-08-005},
  title = {On the Observability and Self-Calibration of Visual-Inertial Navigation Systems},
  year = {2008}
}

We examine the observability properties of visual-inertial navigation systems, with an emphasis on self-calibration of the six degrees-of-freedom rigid body transform between a camera and an inertial measurement unit (IMU). Our analysis depends on a differential geometric formulation of the calibration problem, and on an algebraic test for the observability rank condition, originally defined by Hermann and Krener. We demonstrate that self-calibration of the camera-IMU transform is possible, under a variety of conditions. In contrast with previous work, we show that, in the presence of a known calibration target, both the local gravity vector and the IMU gyroscope and accelerometer biases are simultaneously observable (given sufficient excitation of the system). Further and more generally, we show that for a moving monocular camera and IMU, the absolute scene scale, gravity vector, and the IMU biases are all simultaneously observable. This result implies that full self-calibration is possible, without the need for any prior knowledge about the environment in which the system is operating.

A Note on Unscented Filtering and State Propagation in Nonlinear Systems

J. Kelly

Los Angeles, California, USA, Tech. Rep. CRES-08-004, November, 2008.

Bibtex | Abstract

@techreport{2008_Kelly_Unscented,
  abstract = {In this note, we illustrate the effect of nonlinear state propagation in the unscented Kalman filter (UKF). We consider a simple nonlinear system, consisting of a two-axis inertial measurement unit. Our intent is to show that the propagation of a set of sigma points through a nonlinear process model in the UKF can produce a counterintuitive (but correct) updated state estimate. We compare the results from the UKF with those from the well-known extended Kalman filter (EKF), to highlight how the UKF and the EKF differ.},
  address = {Los Angeles, California, USA},
  author = {Jonathan Kelly},
  institution = {University of Southern California},
  month = {November},
  number = {CRES-08-004},
  title = {A Note on Unscented Filtering and State Propagation in Nonlinear Systems},
  year = {2008}
}

In this note, we illustrate the effect of nonlinear state propagation in the unscented Kalman filter (UKF). We consider a simple nonlinear system, consisting of a two-axis inertial measurement unit. Our intent is to show that the propagation of a set of sigma points through a nonlinear process model in the UKF can produce a counterintuitive (but correct) updated state estimate. We compare the results from the UKF with those from the well-known extended Kalman filter (EKF), to highlight how the UKF and the EKF differ.

Just Add Wheels: Leveraging Commodity Laptop Hardware for Robotics and AI Education

J. Kelly, J. Binney, A. Pereira, O. Khan, and G. S. Sukhatme

in AAAI Technical Report WS-08-02: Proceedings of the AAAI 2008 AI Education Colloquium , Z. Dodds, H. Hirsh, and K. Wagstaff, Eds., Menlo Park, California, USA: AAAI Press, 2008, pp. 50-55.

Bibtex | Abstract | PDF

@incollection{2008_Kelly_Wheels,
  abstract = {Along with steady gains in processing power, commodity laptops are increasingly becoming sensor-rich devices. This trend, driven by consumer demand and enabled by improvements in solid-state sensor technology, offers an ideal opportunity to integrate robotics into K--12 and undergraduate education. By adding wheels, motors and a motor control board, a modern laptop can be transformed into a capable robot platform, for relatively little additional cost. We propose designing software and curricula around such platforms, leveraging hardware that many students already have in hand.

In this paper, we motivate our laptop-centric approach, and demonstrate a proof-of-concept laptop robot based on an Apple MacBook laptop and an iRobot Create mobile base. The MacBook is equipped with a built-in camera and a three-axis accelerometer unit -- we use the camera for monocular simultaneous localization and mapping (SLAM), and the accelerometer for 360 degree collision detection. The paper closes with some suggestions for ways in which to foster more work in this direction.},
  address = {Menlo Park, California, USA},
  author = {Jonathan Kelly and Jonathan Binney and Arvind Pereira and Omair Khan and Gaurav S. Sukhatme},
  booktitle = {{AAAI} Technical Report WS-08-02: Proceedings of the {AAAI} 2008 AI Education Colloquium},
  date = {2008-07-13},
  editor = {Zachary Dodds and Haym Hirsh and Kiri Wagstaff},
  isbn = {978-1-57735-370-6},
  month = {Jul. 13},
  pages = {50--55},
  publisher = {AAAI Press},
  title = {Just Add Wheels: Leveraging Commodity Laptop Hardware for Robotics and {AI} Education},
  year = {2008}
}

Along with steady gains in processing power, commodity laptops are increasingly becoming sensor-rich devices. This trend, driven by consumer demand and enabled by improvements in solid-state sensor technology, offers an ideal opportunity to integrate robotics into K--12 and undergraduate education. By adding wheels, motors and a motor control board, a modern laptop can be transformed into a capable robot platform, for relatively little additional cost. We propose designing software and curricula around such platforms, leveraging hardware that many students already have in hand. In this paper, we motivate our laptop-centric approach, and demonstrate a proof-of-concept laptop robot based on an Apple MacBook laptop and an iRobot Create mobile base. The MacBook is equipped with a built-in camera and a three-axis accelerometer unit -- we use the camera for monocular simultaneous localization and mapping (SLAM), and the accelerometer for 360 degree collision detection. The paper closes with some suggestions for ways in which to foster more work in this direction.

2007

An Experimental Study of Aerial Stereo Visual Odometry

J. Kelly and G. S. Sukhatme

Proceedings of the 6th IFAC Symposium on Intelligent Autonomous Vehicles (IAV), Toulouse, France, Sep. 3–5, 2007, pp. 197-202.

DOI | Bibtex | Abstract | PDF

@inproceedings{2007_Kelly_Experimental,
  abstract = {Unmanned aerial vehicles normally rely on GPS to provide pose information for navigation. In this work, we examine stereo visual odometry (SVO) as an alternative pose estimation method for situations in which GPS in unavailable. SVO is an incremental procedure that determines ego-motion by identifying and tracking visual landmarks in the environment, using cameras mounted on-board the vehicle. We present experiments demonstrating how SVO performance varies with camera pointing angle, for a robotic helicopter platform. Our results show that an oblique camera pointing angle produces better motion estimates than a nadir view angle, and that reliable navigation over distances of more than 200 meters is possible using visual information alone.},
  address = {Toulouse, France},
  author = {Jonathan Kelly and Gaurav S. Sukhatme},
  booktitle = {Proceedings of the 6th {IFAC} Symposium on Intelligent Autonomous Vehicles {(IAV)}},
  date = {2007-09-03/2007-09-05},
  doi = {10.3182/20070903-3-FR-2921.00036},
  month = {Sep. 3--5},
  pages = {197--202},
  title = {An Experimental Study of Aerial Stereo Visual Odometry},
  year = {2007}
}

Unmanned aerial vehicles normally rely on GPS to provide pose information for navigation. In this work, we examine stereo visual odometry (SVO) as an alternative pose estimation method for situations in which GPS in unavailable. SVO is an incremental procedure that determines ego-motion by identifying and tracking visual landmarks in the environment, using cameras mounted on-board the vehicle. We present experiments demonstrating how SVO performance varies with camera pointing angle, for a robotic helicopter platform. Our results show that an oblique camera pointing angle produces better motion estimates than a nadir view angle, and that reliable navigation over distances of more than 200 meters is possible using visual information alone.

2006

A High-Level Nanomanipulation Control Framework

D. J. Arbuckle, J. Kelly, and A. A. G. Requicha

Proceedings of the IARP-IEEE/RAS-EURON Joint Workshop on Micro and Nanorobotics, Paris, France, Oct. 23–24, 2006.

Bibtex | Abstract | PDF

@inproceedings{2006_Arbuckle_High-Level,
  abstract = {Control systems for Atomic Force Microscopes (AFMs) tend to be specific to a particular model of device, and further have a tendency to require that they be written to target an inconvenient execution environment. This paper addresses these problems by describing a high-level programming system for an AFM in which the device-specific low level code has been separated into a different process accessible across the network. This frees the bulk of the code from the assorted constraints imposed by the specific device, and also allows for the insertion of an abstraction layer between the high level control code and the device itself, making it possible to write device independent control code.},
  address = {Paris, France},
  author = {Daniel J. Arbuckle and Jonathan Kelly and Aristides A. G. Requicha},
  booktitle = {Proceedings of the {IARP-IEEE/RAS-EURON} Joint Workshop on Micro and Nanorobotics},
  date = {2006-10-23/2006-10-24},
  month = {Oct. 23--24},
  title = {A High-Level Nanomanipulation Control Framework},
  year = {2006}
}

Control systems for Atomic Force Microscopes (AFMs) tend to be specific to a particular model of device, and further have a tendency to require that they be written to target an inconvenient execution environment. This paper addresses these problems by describing a high-level programming system for an AFM in which the device-specific low level code has been separated into a different process accessible across the network. This frees the bulk of the code from the assorted constraints imposed by the specific device, and also allows for the insertion of an abstraction layer between the high level control code and the device itself, making it possible to write device independent control code.

Combinatorial Optimization of Sensing for Rule-Based Planar Distributed Assembly

J. Kelly and H. Zhang

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Beijing, China, Oct. 9–15, 2006, pp. 3728-3734.

DOI | Bibtex | Abstract | PDF

@inproceedings{2006_Kelly_Combinatorial,
  abstract = {We describe a model for planar distributed assembly, in which agents move randomly and independently on a two-dimensional grid, joining square blocks together to form a desired target structure. The agents have limited capabilities, including local sensing and rule-based reactive control only, and operate without centralized coordination. We define the spatiotemporal constraints necessary for the ordered assembly of a structure and give a procedure for encoding these constraints in a rule set, such that production of the desired structure is guaranteed. Our main contribution is a stochastic optimization algorithm which is able to significantly reduce the number of environmental features that an agent must recognize to build a structure. Experiments show that our optimization algorithm outperforms existing techniques.},
  address = {Beijing, China},
  author = {Jonathan Kelly and Hong Zhang},
  booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS)}},
  date = {2006-10-09/2006-10-15},
  doi = {10.1109/IROS.2006.281754},
  month = {Oct. 9--15},
  pages = {3728--3734},
  title = {Combinatorial Optimization of Sensing for Rule-Based Planar Distributed Assembly},
  year = {2006}
}

We describe a model for planar distributed assembly, in which agents move randomly and independently on a two-dimensional grid, joining square blocks together to form a desired target structure. The agents have limited capabilities, including local sensing and rule-based reactive control only, and operate without centralized coordination. We define the spatiotemporal constraints necessary for the ordered assembly of a structure and give a procedure for encoding these constraints in a rule set, such that production of the desired structure is guaranteed. Our main contribution is a stochastic optimization algorithm which is able to significantly reduce the number of environmental features that an agent must recognize to build a structure. Experiments show that our optimization algorithm outperforms existing techniques.

2003

Development of a Transformable Mobile Robot Composed of Homogeneous Gear-Type Units

H. Tokashiki, H. Amagai, S. Endo, K. Yamada, and J. Kelly

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, Nevada, USA, Oct. 27–31, 2003, pp. 1602-1607.

DOI | Bibtex | Abstract | PDF

@inproceedings{2003_Tokashiki_Development,
  abstract = {Recently, there has been significant research interest in homogeneous modular robots that can transform (i.e. reconfigure their overall shape). However, many of the proposed transformation mechanisms are too expensive and complex to be practical. The transformation process is also normally slow, and therefore the mechanisms are not suitable for sifuations where frequent, quick reconfiguration is required. To solve these problems, we have studied a transformable mobile robot composed of multiple homogeneous gear-type units. Each unit has only one actuator and cannot move independently. But when engaged in a swarm configuration, units are able to move rapidly by rotating around one another. The most important problem encountered when developing our multi-module robot was determining how units should join together. We designed a passive attachment mechanism that employs a single, six-pole magnet curried by each unit. Motion principles for the swarm were confirmed in simulation, and based on these results we constructed a series of hardware protofypes. In our teleoperation experiments we verified that a powered unit can easily transfer from one stationary unit to another, and that the swarm can move quickly in any direction while transforming.},
  address = {Las Vegas, Nevada, USA},
  author = {Hiroki Tokashiki and Hisaya Amagai and Satoshi Endo and Koji Yamada and Jonathan Kelly},
  booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS)}},
  date = {2003-10-27/2003-10-31},
  doi = {10.1109/IROS.2003.1248873},
  month = {Oct. 27--31},
  pages = {1602--1607},
  title = {Development of a Transformable Mobile Robot Composed of Homogeneous Gear-Type Units},
  volume = {2},
  year = {2003}
}

Recently, there has been significant research interest in homogeneous modular robots that can transform (i.e. reconfigure their overall shape). However, many of the proposed transformation mechanisms are too expensive and complex to be practical. The transformation process is also normally slow, and therefore the mechanisms are not suitable for sifuations where frequent, quick reconfiguration is required. To solve these problems, we have studied a transformable mobile robot composed of multiple homogeneous gear-type units. Each unit has only one actuator and cannot move independently. But when engaged in a swarm configuration, units are able to move rapidly by rotating around one another. The most important problem encountered when developing our multi-module robot was determining how units should join together. We designed a passive attachment mechanism that employs a single, six-pole magnet curried by each unit. Motion principles for the swarm were confirmed in simulation, and based on these results we constructed a series of hardware protofypes. In our teleoperation experiments we verified that a powered unit can easily transfer from one stationary unit to another, and that the swarm can move quickly in any direction while transforming.

2002

Learning Bayesian networks from data: An information-theory based approach

J. Cheng, R. Greiner, J. Kelly, D. Bell, and W. Liu

Artificial Intelligence, vol. 137, iss. 1–2, pp. 43-90, 2002.

DOI | Bibtex | Abstract

@article{2002_Cheng_Learning,
  abstract = {This paper provides algorithms that use an information-theoretic analysis to learn Bayesian network structures from data. Based on our three-phase learning framework, we develop efficient algorithms that can effectively learn Bayesian networks, requiring only polynomial numbers of conditional independence (CI) tests in typical cases. We provide precise conditions that specify when these algorithms are guaranteed to be correct as well as empirical evidence (from real world applications and simulation tests) that demonstrates that these systems work efficiently and reliably in practice.},
  author = {Jie Cheng and Russ Greiner and Jonathan Kelly and David Bell and Weiru Liu},
  doi = {10.1016/S0004-3702(02)00191-1},
  journal = {Artificial Intelligence},
  month = {May},
  number = {1--2},
  pages = {43--90},
  title = {Learning Bayesian networks from data: An information-theory based approach},
  volume = {137},
  year = {2002}
}

This paper provides algorithms that use an information-theoretic analysis to learn Bayesian network structures from data. Based on our three-phase learning framework, we develop efficient algorithms that can effectively learn Bayesian networks, requiring only polynomial numbers of conditional independence (CI) tests in typical cases. We provide precise conditions that specify when these algorithms are guaranteed to be correct as well as empirical evidence (from real world applications and simulation tests) that demonstrates that these systems work efficiently and reliably in practice.