The key tasks of robot perception and current mapping algorithms
Key tasks and current state of the art
Generally speaking, there are four main components of the robot perception pipeline, although literature differs slightly between authors. Siegwart et al (2011) describe these as being sensing, signal treatment, feature extraction and scene interpretation. The course material, however, lists four components as sensing, feature extraction, scene representation and scene interpretation. Premebida et al (2018) include sensing, environment representation, scene interpretation and planning as their four components. Given the overlaps and differences, I’ll focus on sensing, feature extraction and scene interpretation.
Sensing is the act of absorbing data from the environment. Data can be collected in any number of ways, depending on the type of sensors that are used. For perception, it usually defaults onto a visual, with camera or still images as the main input. It could alternatively be audio data, WiFi data or even temperature data. Regardless of the sensor, sensing from the environment simply describes the first step of a robot attempting to make sense of the world around it. Gathering up of these data can be straightforward, but combining them via sensor fusion will be more complex.
Sensing is important as it needs to be accurate. If it isn’t, then the robot will be working with inaccurate data. It won’t be able to define its world and its position properly. The knock-on effects from this will ultimately be that the robot could end up making poor decisions based on this poorly collected sensing data.
Feature extraction, as described by Siegwart (2011) is the process of informing the robot’s model and possibly impacting its actions from a higher-level percept. This percept is generated from the sensor readings that have been collected in the previous step. However, this step may not always be used in every instance. When required, it can be considered an essential part of the process. For example, in driving aids, being able to detect obstacles apart from harmless objects is critical in order to keep cars moving or preventing accidents. Features in images can be first identified by their edges, and these are often highlighted in different shades or colours in convolutional neural networks.
In images, features can be described as high level or low level. High level features are described as objects, whereas low level features are geometric primitives. These need to be defined first before they can be extracted. Combining both high- and low-level features means that the process is trying to reduce the volume of data by dropping poor and unnecessary data, while increasing the distinctiveness of each feature and building up an accurate scene.
Scene interpretation often follows feature extraction, and it can also become an essential part of the pipeline particularly if there is a long-term perceptual task to be taken care of. This is the process where the robot will begin to build up a semantic understanding of the scene it is in. Xu et al (2017) describe scene interpretation as answering the “what” question in a given image. Machine learning, both supervised and unsupervised, will often be used to make this interpretation possible.
In their paper, Xu et al (2017) also go on to describe a Tower of Knowledge (ToK) approach to scene interpretation, that goes through four levels of understanding before getting to a probabilistic answer of an object.
Limitations and their future
Robotic perception is not an easy task, considering that environments can be unpredictable, partially observable and dynamic, and sensors can often be noisy (Russel and Norvig, 2009). These challenges make it particularly difficult to build up accurate representations of the robot’s environment. Russel and Norvig go on to explain that good representations contain three properties of having enough information for good decisions, they’re structured so that they can be updated easily, and that internal variables correspond to natural state variables in the real world.
As highlighted in the course material, another limiting factor to robotic perception is the available computation power. Robots will need to be able to interpret their environment locally, and this can be problematic where there is not enough power available. Visual representations are often processed with the aid of GPUs, so if access to this is not available, then data may not be processed quickly enough.
Davison’s FutureMapping paper (2018) describes a future vision of Richard Newcombe, where Spatial AI is connected to the cloud. Machines all around the world collaborate to create a global perception map, that can be accessed as and when needed. Cloud computing can indeed facilitate ways in which to workaround hardware limitations, as long as devices and robots can access it.
Popular sensors for mapping
Creating a map of the environment involves generating an accurate image of the scene that the robot is in. Depending on the application, various sensors could be involved. Most commonly (Petit, 2020), from autonomous cars to smaller robots, cameras will often be used to gather high quality 3D data of an environment. Infra-red and LiDAR sensors will regularly be used as well, as these can provide further geometric data such as range, distance and positions of objects found in the scene. In the case of robot vacuum cleaners (Ansaldo, 2018), cliff sensors are used to detect edges of floors or steps, and bump sensors can be used so that they know an object is truly immovable with some physical contact. Some robots and devices have also been known to employ GPS and gyroscopes too, as these will help to determine location and orientation more accurately.
Commonly used sensors such as cameras and LiDAR are popular due to their reasonably low cost but high quality nature. With the right level of equipment, they provide consistently good data in ever-changing environments. It must be noted that in some cases have shown a minimal variation of sensors can succeed, as in the case of Tesla (Dickson, 2021), they shun LiDAR and rely solely on video data to achieve automated driving. However, with the number of options available, perception and mapping systems can use sensor fusion to blend data from sensors with overlapping functions. This would involve taking measurements on the same task from multiple sensors, and then seeking a weighted average or some probabilistic model to determine which outcome is most likely. Using this method means that the robot avoids situations where parts of the environment may be blocked to its only sensor and can instead receive data from alternative or supporting sensors. The tradeoff for this would be an increase in computational requirements, and could also have some impact on the speed required to process data.
Technical Challenges
The challenges in robotic mapping go beyond environments that are complex, dynamic and ever-changing. Thrun (2002) highlights a number of problems in mapping, such as measurement noise and data correspondence.
Handling the measurement noise is critical in SLAM algorithms. Due to the continuous nature of the task, it means that the noise in different measurements are statistically dependent. As Thrun explains, noise such as a small rotational error at the start of the corridor will result in a larger error at the end of the corridor. Accommodating these noise errors is not an easy task, and as such, algorithms will be both mathematically and practically complex. As an example, Choi et al (2009) present a solution that builds on the Kalman Filter-based SLAM algorithm. Their work involves estimating measurement of the noise distribution by predicting the noise covariation matrices. These matrices are often unknown with no prior knowledge, so by having a method to estimate those priors, it helps to reduce the measurement noise.
Data correspondence is simply described by Thrun as the being able to associate measurement data accurately to corresponding physical world objects. This can be caused as a result of exponential errors arising from positional errors within a scene. Any errors in the location of the robot relative to its scene can cause the robot to mis-interpret its sensor measurements and associate it with other objects. Ensuring that there are more objects or features in the scene can provide more landmarks for the SLAM algorithm to orientate itself and to anchor the robot to a more specific location, as shown by Jiang et al (2022).
Ravankar et al (2019) also point out that multi-robot systems that produce different types of maps of the same environment are more susceptible to the correspondence problem. Scale and rotation may occur from the contributions of different robots, and makes correlation of spatial information difficult. Probabilistic models can be used, as described by Thrun (2002) to help determine the likelihood of certain aspects of the scene, such as obstacles or position.
The selection of sensors may also have an impact on the performance of robot mapping. Regarding cameras versus LiDAR for vision data for example, robot vacuum cleaners already experience very different performances. LiDAR sensors will be able to detect a curtain, and will treat it as if it is a solid wall. As such, it does not clean beyond the curtain (Vacuum Wars, 2019). A camera, however, may recognize that it is moveable, and will be able to go under or through it in order to clean a larger area. Cameras on these robots such as Dyson’s 360 Heurist fix the camera on the top of the robot, but the camera actually points upwards. This is so that it can use the ceiling as its guide to building a map of the scene, rather than the floor. As it gets closer to a wall, it will then begin to detect the obstacles in its field of vision. Due to this technique, it could not be used in an outdoor setting, as there may not be any overhead cover for it to accurately build a map.
However, Rogers et al (2013) describe a solution that can handle outdoor and mixed indoor environments. Their OmniMapper SLAM solution has been specifically designed for tactical use in military situations, where environments can be unpredictable and very changeable at short notice. The team’s robots can be used with a variety of sensors, mostly cameras, but also include pan-tilt fitting. The setup differs from the previously described vacuum cleaners as it employs both camera and lasers, thus fusing the data to create a more accurate map.
Feasibility of ML and deep learning in mapping
As Tesla’s work has shown, building maps of the environment using video data working alongside machine learning is absolutely feasible (Dickson, 2021). Their vehicles, which come with autonomous driving, rely on deep learning in order to quickly determine the surrounding environment around it. In such cases, it is critical that results have a high degree of accuracy, as emergency situations would require decisions to be taken in a split second.
Faust (2019) presented how a Google team used reinforcement learning to train a robot called Fetch to successfully navigate around a busy pantry to carry out tasks such as bringing dishes and plates to a sink. Although computationally expensive and requiring days to train, their results show how Fetch is able to complete its tasks in new environments once fully trained. The solution involved tuning the rewards first, and then tuning the neural network model. They also experimented using waypoints as a method for speeding up the learning process. Incorporating a final planning step helped to improve the overall performance by improving the navigation and reducing the zig zag motions in the robots for a smoother journey.
Dobrevski (2021) presents a paper that uses deep reinforcement learning to aid navigation of a robot without the use of a map. As such, their work treats navigation as a Markov decision process. They rely on simulating large volumes of unlabelled samples, and show how learned action policies are transferrable to any given situation. Training must take place offline, physical or simulated, but it is a clear demonstration how machine learning and artificial intelligence can be applied not just to mapping, but to even replace it. Doing so means that such a solution negates the need to store and continuously update maps.
领英推荐
Discussion and Future of Mapping
Mapping in robotics is clearly a critical element, especially as it effectively enables a robot to make sense of its position within a given scene. Solutions such as those described by Dobrevski (2021) show how deep learning can be used to replace mapping, which is a really interesting development in the field. From research, there should be further gains that can be made.
An alternative method that steps towards navigation without maps is proposed by Wang et al (2022). Their wayfinding navigation solution is based on using visual landmarks to help determine the route. The team believes that their work can help development of conversational navigation, and may even allow a robot to answer the question of “where did you go?”. Their goal is to produce instructions that people can follow, and results show that they are not far off, achieving 71% accuracy with their synthetic navigational outputs compared against 75% accuracy from human navigation instructions. The solution begins first by observing 3D images to locate landmarks in sequence, before passing through an instruction generator to output the navigational directions.
Ouabi et al (2021) have come up with a very novel way that achieves great results in a specific set of circumstances. There problem is geared towards helping robots navigate metal plates. So their FastSLAM solution is able to generate more accurate and robust results, while at the same time reducing the algorithmic complexity. The team has successfully integrated beamforming maps into the solution, and relies on acoustic measurement of ultrasonic guided waves to perceive the overall scene. Being able to reduce the complexity of the algorithm has the obvious benefit of making it more applicable to less powerful hardware.
In their review of work around SLAM research, Lluvia et al (2021) suggest that improvements in pose and trajectory optimisation will have benefits. Obviously, being able to better determine a robot’s orientation would logically conclude that any task should be completed quicker. However, the team also goes into detail regarding the importance of reducing computational cost, particularly when discussing 3D mapping. This is because when the size of the environment and map resolution increase, the complexity of building an accurate 3D representation increases exponentially. Their review of literature unearths a solution by Maurovic et al (2014) that switches between 2D and 3D exploration as needed to reduce the complexity.
Final thoughts
Robot perception and mapping are key areas in the application of robots. These can be seen increasingly in domestic lives, and there is clearly room for further improvements in the space. The variety of methods to execute perception has also led to a wide range of ideas on how to improve performance. These can range from the hardware side, through to improved algorithmic efficiency and even to improve a robot’s orientation detection within its environment.
References
Siegwart, R., Nourbakhsh, I.R. and Scaramuzza, D., 2011. Introduction to autonomous mobile robots. MIT press.
Premebida, C., Ambrus, R. and Marton, Z.C., 2018. Intelligent Robotic Perception Systems. in E. G. Hurtado (ed.), Applications of Mobile Robots. London: IntechOpen.
Xu, M., Ren, J. and Wang, Z., 2017. Chapter Five - Component Identification and Interpretation: A Perspective on Tower of Knowledge. Advances in Imaging and Electron Physics, 199, pp. 237-301.
Russel, S.J. and Norvig, P., 2009. Artificial Intelligence: A Modern Approach. 3rd ed. Harlow: Pearson Education Limited.
Davison, A.J., 2018. FutureMapping: The computational structure of spatial AI systems. arXiv preprint arXiv:1803.11288.
Petit, F., 2020. Sensor fusion – key components for autonomous driving [Online]. Available from: https://www.blickfeld.com/blog/sensor-fusion-for-autonomous-driving/ [Accessed 22 June 2022].
Ansaldo, M., 2018. How a robot vacuum navigates your home [Online]. Available from: https://www.techhive.com/article/583321/how-a-robot-vacuum-navigates-your-home.html [Accessed 22 June 2022].
Dickson, B., 2021. Tesla AI chief explains why self-driving cars don’t need lidar [Online]. Available from: https://bdtechtalks.com/2021/06/28/tesla-computer-vision-autonomous-driving/ [Accessed 22 June 2022].
Thrun, S., 2002. Robotic Mapping: A Survey. Exploring artificial intelligence in the new millennium, 1(1-35), pp.1-24.
Choi, W.S., Kang, J.G. and Oh, S.Y., 2009. Measurement Noise Estimator Assisted Extended Kalman Filter for SLAM Problem. The 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, 11-15 October 2009, St. Louis, USA. pp. 2077-2082
Jiang, S., Wang, S., Yi, Z., Zhang, M. and Lv, X, 2022. Autonomous Navigation System of Greenhouse Mobile Robot Based on 3D Lidar and 2D Lidar SLAM. Frontiers in Plant Science, 13.
Ravankar, A., Ravankar, A.A., Hoshino, Y. and Kobayashi, Y., 2019. On Sharing Spatial Data with Uncertainty Integration Amongst Multiple Robots Having Different Maps. Applied Sciences, 9(13), 2753.
Vacuum Wars, 2019. Lidar vs Vslam (cameras vs lasers) For Robot Vacuums - Which One is Best? [Online]. Available from: https://youtu.be/5O8VmDiab3w [Accessed 22 June 2022].
Rogers, J.G., Young, S., Gregory, J., Nieto-Granda, C. and Christensen, H.I., 2013. Robot mapping in large-scale mixed indoor and outdoor environments. SPIE Defense, Security, and Sensing, 2013 Unmanned Systems Technology XV, 17 May 2013, Baltimore, Maryland, United States.
Faust, A., 2019. Deep learning for robot navigation - Kirkland ML Summit ’19 [Online]. Available from: https://youtu.be/EmH1XMZZGwc [Accessed 22 June 2022].
Dobrevski. M. and Skocaj, D., 2021. Deep reinforcement learning for map-less goal-driven robot navigation. International Journal of Advanced Robotic Systems, January-February 2021: pp. 1–13.
Wang, S., Montgomery, C., Orbay, J., Birodkar, V., Faust, A., Gur, I., Jaques, N., Waters, A., Baldridge, J. and Anderson, P., 2022. Less is More: Generating Grounded Navigation Instructions from Landmarks. https://arxiv.org/abs/2111.12872
Ouabi, O.L., Pomarede, P., Geist, M., Declercq, N. and Pradalier, C., 2021. A FastSLAM Approach Integrating Beamforming Maps for Ultrasound-based Robotic Inspection of Metal Structures. IEEE Robotics and Automation Letters, IEEE 2021.
Lluvia, I., Lazkano, E. and Ansuategi, A., 2021. Active Mapping and Robot Exploration: A Survey. Sensors, 21, 2445.
Maurovic, I., Dakulovic, M. and Petrovic, I., 2014. Autonomous Exploration of Large Unknown Indoor Environments for Dense 3D Model Building. Proceedings of the 19th World Congress The International Federation of Automatic Control, 24-29 August 2014, Cape Town, South Africa.
Passionate about digital marketing, It is my hobby.
2 年Worth to share