# Literature Review

# Application of Machine Learning and computer vision for weld recognition and path alignment of a climbing Robot for weld scanning.

# Resources

Article Databases

Research Services

Massey Proxy for research

TensorRT

Best place for publishing it:

ICCV 2021 : IEEE/CVF International Conference on Computer Vision Submission Deadline Wednesday 17 Mar 2021

# Articles

All the articles are available at this repository.

The articles are divided into the literature review topics, introduction and conclusion.

# Articles - Introduction

# 1. A Survey of Wall Climbing Robots: Recent Advances and Challenges

Published in	Conference Location	Paper Citations	Authors	Link
2016	Robotics 5. 14.	31	Nansai, S., et al.	link

# Citation

Nansai, S., & Mohan, R. E. (2016). A Survey of Wall Climbing Robots: Recent Advances and Challenges. Robotics, 5, 14. doi:10.3390/robotics5030014

# Abstract

In recent decades, skyscrapers, as represented by the Burj Khalifa in Dubai and Shanghai Tower in Shanghai, have been built due to the improvements of construction technologies. Even in such newfangled skyscrapers, the façades are generally cleaned by humans. Wall climbing robots, which are capable of climbing up vertical surfaces, ceilings and roofs, are expected to replace the manual workforce in façade cleaning works, which is both hazardous and laborious work. Such tasks require these robotic platforms to possess high levels of adaptability and flexibility. This paper presents a detailed review of wall climbing robots categorizing them into six distinct classes based on the adhesive mechanism that they use. This paper concludes by expanding beyond adhesive mechanisms by discussing a set of desirable design attributes of an ideal glass façade cleaning robot towards facilitating targeted future research with clear technical goals and well-defined design trade-off boundaries.

# Review

Explores opportunities for robots outside the industry like cleaning skyscappers in Dubai, talks about the accidents,

# Interesting quotes from the article:

"Robots have been advancing exponentially over the last three decades, moving beyond the traditional bounds of industrial applications into service missions sharing social spaces with humans."

"Frey and Osborne have estimated that 47% of total U.S. employment will be replaced by robots and/or artificial intelligence (AI) in the near future"

"Façade cleaning of high-rise buildings and skyscrapers offers enormous opportunities for the use of robots."

"Numerous incidents of accidents have been reported even with the use of gondolas in façade cleaning jobs. One such situation involved a gondola being uncontrollable due to a gust of wind at Shanghai World Financial Center [10]. In another instance, the gondola became suspended in mid-air at a 240-m height at One World Trade Center in New York [11]. Robotic solutions offer enormous potential in significantly minimizing risk to humans, as well improving productivity in façade cleaning jobs."

"This paper presents a detailed review of such adhesive mechanisms for wall climbing robots, categorizing them into six distinct classes."

"Magnetic Adhesion: This method is often adopted for walls that have high levels of magnetic permeability. Since most of the prior work in this area uses permanent magnets, this eliminates the need for any additional devices, such as power supply. This results in improved payload capacity."

# 2. Inspection Robots in Oil and Gas Industry a Review of Current Solutions and Future Trends

Published in	Conference Location	Paper Citations	Authors	Link
2019 25th International Conference on Automation and Computing (ICAC)	Lancaster, United Kingdom, United Kingdom	-	Leijian Yu ; Erfu Yang ; Peng Ren ; Cai Luo ; Gordon Dobie ; Dongbing Gu ; Xiutian Yan	link

# Citation

L. Yu et al., "Inspection Robots in Oil and Gas Industry: a Review of Current Solutions and Future Trends," 2019 25th International Conference on Automation and Computing (ICAC), Lancaster, United Kingdom, 2019, pp. 1-6, doi: 10.23919/IConAC.2019.8895089.

# Abstract

With the increasing demands for energy, oil and gas companies have a demand to improve their efficiency, productivity and safety. Any potential corrosions and cracks on their production, storage or transportation facilities could cause disasters to both human society and the natural environment. Since many oil and gas assets are located in the extreme environment, there is an ongoing demand for robots to perform inspection tasks, which will be more cost-effective and safer. This paper provides a state of art review of inspection robots used in the oil and gas industry which including remotely operated vehicles (ROVs), autonomous underwater vehicles (AUVs), unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs). Different kinds of inspection robots are designed for inspecting different asset structures. The outcome of the review suggests that the reliable autonomous inspection UAVs and AUVs will gain interest among these robots and reliable autonomous localisation, environment mapping, intelligent control strategies, path planning and Non-Destructive Testing (NDT) technology will be the primary areas of research.

# Review

This article is new (2019) and describes general information on the Oil and Gas industry regarding inspection Robots, It explores diferent types os NDT and also talks on computer vision and also on dee learning methods helping computer vision based inspections, which is the exactly subject of the research.

# Interesting quotes from the article:

"In the 2018 Energy Outlook, British Petroleum (BP) predicted that the absolute consumption of oil and gas would have steady growth to 2040"

"the demand of the inspection robots will steadily increase in the foreseeable future, and in 2015, the global capital expenditure of these robots will reach 2.85 billion dollars."

"These robots have different mechanisms and structures for different inspection tasks. Some of them are focused on inspecting oil storage tanks, while some of them are designed for pipeline inspection. Nevertheless, most of them need experienced engineers to manipulate them to conduct the inspection process."

"the most commonly applied inspection technologies can be roughly divided into four classes, i.e., visual inspection, ultrasonic inspection, magnetic inspection and eddy current inspection."

"For inspecting the vertical structures, wall climbing robots have gained great interests. The climbing technologies are the main difference between these robots. At the same time, the most important task in the design and development of a climbing robot is to develop an appropriate mechanism to ensure that the robot adheres to different types of walls and surfaces reliably without sacrificing its mobility."

"Vision-based localisation and navigation systems have become popular these days. They are based on the images captured by the onboard camera, and VSLAM algorithms can locate robots’ position, estimate its state and simultaneously build the map of the surrounding environment."

"The autonomous inspection will also come true with the help of advanced computer vision algorithms, especially with the deep learning method."

Published in	Conference Location	Paper Citations	Authors	Link
2002	IEEE Transactions on Pattern Analysis and Machine Intelligence	764	G. N. Desouza and A. C. Kak	link

# Citation

G. N. Desouza and A. C. Kak, "Vision for mobile robot navigation: a survey," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 237-267, Feb. 2002, doi: 10.1109/34.982903.

# Abstract

This paper surveys the developments of the last 20 years in the area of vision for mobile robot navigation. Two major components of the paper deal with indoor navigation and outdoor navigation. For each component, we have further subdivided our treatment of the subject on the basis of structured and unstructured environments. For indoor robots in structured environments, we have dealt separately with the cases of geometrical and topological models of space. For unstructured environments, we have discussed the cases of navigation using optical flows, using methods from the appearance-based paradigm, and by recognition of specific objects in the environment.

# Review

Good Survey, unfortunately it's from 2002 and may not be of assistance on the literature review.

# Interesting quotes from the article:

"The progress made in the last two decades has been on two separate fronts: vision-based navigation for indoor robots and vision-based navigation for outdoor robots. We believe that the strides made in both these areas have been significant."

"The reason for that is that a human operator can use the internal map representation of a structured environment to conveniently specify different destination points for the robot. But, for the mapless case, using the appearance-based approaches mentioned so far, in most cases the robot only has access to a few sequences of images that help it to get to its destination, or a few predefined images of target goals that it can use to track and pursue."

"Compared to 20 years ago,we have a much better sense today of the problems we face as we try to endow mobile robots with sensory intelligence. We are much more cognizant of the importance of prior knowledge and the various forms in which it can be modeled."

# Articles - Mobile Climbing Robots in heavy industry

# 1. Climbing Robots for Commercial Applications – a Survey

Published in	Conference Location	Paper Citations	Authors	Link
2003	6th International Conference on Climbing and Walking Robots CLAWAR	?	Berns, K., et al.	link

# Citation

K. Berns, C. Hillenbrand, and T. Luksch, "Climbing robots for commercial applications–a survey," in Proceedings of the 6th International Conference on Climbing and Walking Robots CLAWAR, 2003, pp. 17-19.

# Abstract

In this paper a short survey on the research of climbing machine is given with a strong focus on industrial applications. Based on a classification of different types of climbing machines examples of robots are presented, which are prototypically developed for industrial and commercial use. Considering the application environment the system requirements of climbing machines will be presented. At the end of the paper a climbing machine used for the inspection of concrete bridges is presented as an example with a huge commercial potential.

# Review

A short survey with some small insights about the beginnings of climbing robot uses. It points out that since the 80'sclimbing robots have been developed.

# Interesting quotes from the article:

"Since the end of the 80ties climbing robots are examined for different types of application scenarios all over the world."

"at the end of the 80ties and the begin of the 90ties in Japan several national projects concerning climbing robots for specific application scenarios have been developed. These include cleaning robots for glass walls, ship hull cleaning robots, rescue robots for fire brigades, inspection robots for steal tanks and wall. MMost of the developments were stopped because there still exists adhesion problems. Also the cost for the development of such machines were to high."

"At the end of the 90ties mainly in Europe several different prototype machine have been developed for different types of applications like the inspection of pipes and ducts in the petrochemical industry, maintenance and inspection work in the construction and nuclear industry or cleaning robots for huge class walls."

"The inspection of the big concrete walls which is frequently necessary by law for stacks, bridges or dams etc. is the application area for our climbing robot. The climbing principle and the inspection subsystem was successfully tested."

# 2. Windmill Climbing Robot

Published in	Conference Location	Paper Citations	Authors	Link
2018 International Conference on Computer and Applications (ICCA)	Beirut, Lebanon	0	Khaldoun Hatoum et. al.	link

# Citation

K. Hatoum, R. Alkhatib, N. Jaber, M. Sabbah and M. O.Diab, "Windmill Climbing Robot," 2018 International Conference on Computer and Applications (ICCA), Beirut, 2018, pp. 394-398, doi: 10.1109/COMAPP.2018.8460435.

# Abstract

Having the windmill main components at the top of its high tower made its maintenance a very risky job for workers, and with an increase of use of wind turbines, the risk of accidents occurring is proportionally increasing too. For this, engineers started working on climbing robots and mechanisms to replace risking human lives. This research works on adding to this research topic, by proposing a full study of a new mechanism enhanced with calculations and modeling for it to be applicable and reliable. The proposed robot mechanism can circumference the tower of the wind mill and climb upwards through means of rubber chains.

# Review

An interesting application for climbing robots. Windmill climbing.

# Interesting quotes from the article:

"A Windmill is a combination of several complex parts synchronized while working, aiming to extract the maximum of passing wind currents. Having high towers and its main components such as the nacelle, blades, and gearbox located on the top, a mill’s problem falls in its maintenance procedure where its tower is considered to be very slippery, thus climbing such structure tends to be a very hazardous job."

"The paper introduced a new suction method to be applied into hook-like claws which tend to be of sharp edges."

"Another paper proposed what is called Tankbot. It represents a tank-like climbing robot utilized by adhesive treads of soft elastomer. Having the characteristics of a wheeled robot design gives Tankbot a strong attachment to both smooth and rough terrains with minimal vibrations. This tank-like structure design study was focused on the peeling force to have its normal component maximized with respect to the surface in order to maximize the climbing stability."

"These design standards make tankbot a super climbing robot on any terrain condition, and on any slope angle, vertically or laterally. Upon study and derivation of mechanical models, Tankbot was translated into prototypes of several dimensions and tested to reach the optimal and viable Tankbot design which reached the optimal tread tension range for maximum pealing force, having the best performance [2]."

"Tackling common research topics, a paper presented the design of a four-legged robot, each leg having four degrees of freedom. The design was a combination of two climbing techniques, one adapted from rock climbing with four limbs while the other from cats climbing with its claws. So, the actual design was having four legs with each leg utilized with a claw providing the robot to move in all directions while climbing up a wall. Moreover, a gripping device made of twelve hooks are installed at the end of each leg."

"Climbing robots have been, are still challenging. Engineers are discovering day by day much advanced and reliable techniques, and this research aims to provide one of these new techniques which deliver a new concept other than the recently known ones such as magnetic units or suction methods."

"The robot needs to climb a tower with a diameter varying from mill to mill, and within the same mill there is also variation due to canonical shape. Based on such condition, the proposed design is to have one main chain to push the whole mechanism upward or downward."

"This paper suggests a design of a robot that can circumference the tower of the wind mill and climb upwards through means of rubber chains."

# 3. Design and Development of Semi-Autonomous Wall Climbing Robot Mechanism

Published in	Conference Location	Paper Citations	Authors	Link
2018	2nd International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE) - Ghaziabad, India, India	0	Rohith M. et. al.	link

# Citation

R. M., S. Vigneshwaran and H. Mattoo, "Design and Development of Semi-Autonomous Wall Climbing Robot Mechanism," 2018 2nd International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE), Ghaziabad, India, 2018, pp. 314-317, doi: 10.1109/ICMETE.2018.00075.

# Abstract

The purpose of this research paper is to present a wall climbing robot mechanism which can be easily scaled up and applied in various fields like, search and rescue, surveillance etc. The objective is to make a four wheeled robot which can climb and manoeuver surfaces at any inclination be it perpendicular, acute or obtuse. This robot will be made by adding two upward and downward thrusters to a four wheeled robot. On approaching an inclination, the sensor mounted in the center of the robot will calculate the angle of inclination autonomously. The amount of upward thrust required to lift the front wheels of the robot is calculated and the ESCs ensures that the required thrust is generated by the propeller in the necessary angle. Thus, the front wheels have a greater area of contact on the wall and therefore can help it climb up. The motors will be connected to a beam which is connected to two servo motors on either side, thus allowing us to control the angle at which the motor is fixed, the angle is autonomously calculated by the robot, thus enabling the mechanism to manoeuver terrains with inclinations of any magnitude. This mechanism is semi-autonomous in the sense that the operator has to manually control the robots movement but the propeller set up autonomously aligns itself based on the inclination of the wall.

# Review

An interesting cement climbing robot using fans to generate negative pressure.

# Interesting quotes from the article:

"The versatility of Wall Climbing Robots is expanding drastically and thus a viable easily modifiable and scalable mechanism is extremely necessary. A robot with the ability to function on any structure irrespective of the surface and terrain can be valuable for several purposes like surfacebased operations such as cleaning, painting, and inspection, surveillance etc."

"The Wall Climbing Robots developed till date use one of four basic mechanisms such as, 1. Magnetic Attraction, 2. Claws and grippers, 3. Vacuum Suction and 4. Pneumatics."

"The main factors to be taken into consideration while designing a wall climbing robot is, its adhesion method and locomotion mechanism. With respect to the adhesion method used, we can categorize the existing Wall Climbing Robot into 4 main types, vacuum or suction[3][7][8], magnetic[2][5], grippers[9] and electrostatic[4]."

"Crawlers[5] have a good speed, wheeled robots are the fastest of all but they face trouble when they have to maneuver obstacles."

"Its simplicity in design and intelligent electronics and controls setup allow it to be easily operated on any sort of surface or terrain"

"The propellers will be positioned marginally offset from the geometric center of the robot to reduce the chances of toppling an to counter the action of gravity better."

# 4. A Modular Biped Wall-Climbing Robot With High Mobility and Manipulating Function

Published in	Conference Location	Paper Citations	Authors	Link
IEEE/ASME Transactions on Mechatronics ( Volume: 18 , Issue: 6 , Dec. 2013 )	IEEE	63	Guan, Y., et al.	link

# Citation

Y. Guan et al., "A Modular Biped Wall-Climbing Robot With High Mobility and Manipulating Function," in IEEE/ASME Transactions on Mechatronics, vol. 18, no. 6, pp. 1787-1798, Dec. 2013, doi: 10.1109/TMECH.2012.2213303.

# Abstract

High-rise tasks such as cleaning, painting, inspection, and maintenance on walls of large buildings or other structures require robots with climbing and manipulating skills. Motivated by these potential applications and inspired by the climbing motion of inchworms, we have developed a biped wall-climbing robot-W-Climbot. Built with a modular approach, the robot consists of five joint modules connected in series and two suction modules mounted at the two ends. With this configuration and biped climbing mode, W-Climbot not only has superior mobility on smooth walls, but also has the function of attaching to and manipulating objects equivalent to a “mobile manipulator.” In this paper, we address several fundamental issues with this novel wall-climbing robot, including system development, analysis of suction force, basic climbing gaits, overcoming obstacles, and transiting among walls. A series of comprehensive and challenging experiments with the robot climbing on walls and performing a manipulation task have been conducted to demonstrate its superior climbing ability and manipulation function. The analytical and experimental results have shown that W-Climbot represents a significant advancement in the development of wall-climbing robots.

# Review

A good example of robot imitating animal locomotion and dual suction adhesion.

# Interesting quotes from the article:

"High-rise tasks such as cleaning, painting, inspection, and maintenance on walls of large buildings or other structures require robots with climbing and manipulating skills. Motivated by these potential applications and inspired by the climbing motion of inchworms, we have developed a biped wall-climbing robot"

"IN the past several decades, a great variety of wall-climbing robots have been developed, in order to release humans from high-altitude, high-risk, and high-intensity operations such as cleaning, painting, inspection, and maintenance on walls of skyscrapers or other large structures."

"However, they still face several major challenges. The following capabilities and functionalities are desired for a wall-climbing robotic system in real applications:

attaching reliably towalls or surfaces, the most basic function for a wall-climbing robots;
overcoming obstacles on or gaps betweenwalls, and adapting to various conditions on nonseamless surfaces;
climbing omnidirectionally on a surface;
making transitions between walls, a key feature for mobility lacking in many wall-climbing robots;
manipulating function, an important skill for performing tasks on walls."

"The last desired functionality of a climbing robot, i.e., the ability to manipulate objects, is basic yet often neglected among wall-climbing robots in the literature."

"For a robot to climb on a wall with vacuum suction, the suction modules must generate a sufficiently large suction force to support the whole robot, preventing it from slipping or falling down."

"For high-rise tasks on high buildings and huge structures, we have developed a biped modular wall-climbing robot—WClimbot. The main body of the robot is actually an arm with sufficient degrees of freedom, and the dual suction modules at the two ends may serve as adhesive tools and end-effectors."

# 5. A Survey of Technologies for Climbing Robots Adhesion to Surfaces

Published in	Conference Location	Paper Citations	Authors	Link
2008 IEEE International Conference on Computational Cybernetics	Stara Lesna, Slovakia	20	Silva, M. F., et al.	link

# Citation

M. F. Silva, J. A. T. Machado and J. K. Tar, "A Survey of Technologies for Climbing Robots Adhesion to Surfaces," 2008 IEEE International Conference on Computational Cybernetics, Stara Lesna, 2008, pp. 127-132, doi: 10.1109/ICCCYB.2008.4721392.

# Abstract

Climbing robots are being developed for applications ranging from cleaning to inspection of difficult to reach constructions. These machines should be capable of travelling on different types of surfaces (such as floors, walls, ceilings) and to walk between such surfaces. Furthermore, these machines should adapt and reconfigure for various environment conditions and should be self-contained. Regarding the adhesion to the surface, they should be able to produce a secure gripping force using a light-weight mechanism. Bearing these facts in mind, this paper presents a survey of different technologies used for climbing robots adhesion to surfaces.

# Review

A comprehensive study on climbing robot adhesion.

# Interesting quotes from the article:

"Climbing robots are useful devices that can be adopted in a variety of applications like maintenance, building, inspection and safety in the process and construction industries. These systems are mainly adopted in places where direct access by a human operator is very expensive, because of the need for scaffolding, or very dangerous, due to the presence of an hostile environment."

"A wall climbing robot should be light and allow a large payload, reducing excessive adhesion forces and carrying instrumentations during navigation."

"The two major issues in the design of wall climbing robots are their locomotion and the adhesion methods."

"With respect to the locomotion type, three types are often considered: the crawler, the wheeled and the legged types. According to the adhesion method, these robots are generally classified into three groups: vacuum or suction cups, magnetic, and gripping to the surface. Recently, new methods for assuring the adhesion, based in biological findings, have also been proposed."

"In the last decades, different applications have been envisioned for these robots, mainly in the technical inspection, maintenance and failure, or breakdown, diagnosis in dangerous environments. These tasks are necessary in bridges [1], [2], nuclear power plants [3] or pipelines [4], for scanning the external surfaces of gas or oil tanks [4], [5] and offshore platforms [2], for performing non-destructive tests in industrial structures [6], [7], and also in planes [8], [1], [9] and ships [1], [10]. Furthermore, they have been applied in civil construction repair and maintenance [2], in anti-terrorist actions [11], in cleaning operations in sky-scrapers [12], [13], [14], [15], for cleaning the walls and ceilings of restaurants, community kitchens and food preparation industrial environments [16], in the transport of loads inside buildings [17] and for reconnaissance in urban environments [18]. Their application was also proposed in the human care [2] and education [19] areas."

"With respect to the locomotion type, the simpler alternatives usually make use of sliding segments, with suction cups or magnets that grab to surfaces, in order to move [8], [3], [6], [12], [13], [15], [16] (Figure 1). Although the crawler type is able to move relatively faster, it is not adequate to be applied in rough environments, being its main disadvantage the difculty in crossing cracks and obstacles."

"Another principle adopted for creating the adhesion force is magnetic adhesion. Magnetic attachment can be highly desirable due to its inherent reliability; furthermore, the method is fast but it implies the adoption of heavy actuators. Despite that, magnetic attachment is useful only in specic environments where the surface is ferromagnetic and, therefore, for most applications it is an unsuitable choice [30]."

"The most frequent solution is the use of electromagnets [25], [10]. Another possibility is the use of permanent magnets to adhere to the surface, combined with wheels or tracks to move along it. The advantage of this last solution is that there is not the need to spend energy for the adhesion process [19]. A third solution is to use magnetic wheels that allow to implement the locomotion and the adhesion at the same time [4]."

"NEW ADHESION PRINCIPLES:

Climbing Robots Using Gecko Inspired Synthetic Dry Adhesives

Climbing Robots Using Micro-structured Polymer Feet

Climbing Robots Using Microspines"

# Conclusion

"During the two last decades, the interest in climbing robotic systems has grown steadily. Their main intended applications range from cleaning to inspection of difcult to reach constructions. This paper presented a survey of several crawling, wheeled and legged climbing robots, adopting different technologies for the adhesion to the surfaces."

# 6. A Performance Oriented Novel Design of Hexapod Robots

Published in	Conference Location	Paper Citations	Authors	Link
IEEE/ASME Transactions on Mechatronics	( Volume: 22 , Issue: 3 , June 2017 )	8	Guoliang Zhong et al.	link

# Citation

G. Zhong, L. Chen and H. Deng, "A Performance Oriented Novel Design of Hexapod Robots," in IEEE/ASME Transactions on Mechatronics, vol. 22, no. 3, pp. 1435-1443, June 2017, doi: 10.1109/TMECH.2017.2681722.

# Abstract

This paper presents a novel hexapod robot with legs radially free distributing around the body. Compared with radial symmetric or rectangular symmetric robots, the legs of a radially free distributed hexapod robot can rotate around the body of the robot and redistribute their positions thanks to the proposed radially free distribution mechanism. To investigate the performance of the designed robot, the kinematic reachable workspace is analyzed in the three-dimensional coordinate system and enlarged due to the redistribution mechanism, and then an analysis on the fault tolerance and stability of the robot is addressed to verify the effectiveness of proposed radially free distribution mechanism in theory. Further, several comparative experiments are conducted to examine the superior performance of the proposed hexapod robot in stability, climbing, and energy consumption. The results show that the hexapod robot with radially free distributed legs can improve various performance indexes when it lifts weight or walks on slope.

# Review

A good example of an Hexapod.

# Interesting quotes from the article:

"THE ARTICULATED multilegged robots possess powerful potential forwalking in the complex environment and dealingwith unreachable or dangerous tasks. Moreover, multilegged robots have distinctive advantages, such as walking on uneven or irregular terrain and having more edge on static stability and mobility than wheeled robots [1] or tracked robots [2]."

"In nature, a large number of arthropod insects have six legs so that they can maintain static stability when they are walking. Moreover, in most cases, there are no speed superiorities in having more than six legs, and hexapod robots also show excellent flexibility and stability when moving or executing specific tasks."

"Over the past decades, hexapod robots have been a research hotspot in the domain of mobile robots, where a large number of researches have been carried out to study a variety of hexapod robots such as subsea hexapod robots [10], wall-climbing hexapod robots [11], rescue hexapod robots [12], and so on."

# Conclusion

"This paper has developed a novel hexapod robot, whose legs can radially free distribute around the robot body and switch to manipulators when it executes tasks. The proposed radially free distribution mechanism enlarges the workspace, improves the fault tolerance, and enhances the slope climbing capability. At the same time, a stability measure criterion has been obtained based on the ZMP theory. The developed robot can maintain better stability degree when it lifts weight or climbs slope after redistributing its legs. Further, various experiments have been conducted to verify the stability, slope climbing capability, and energy consumption through comparing to the conventional hexapod robot with radial symmetric legs. The obtained results indicate that the energy consumption of the developed robot has superior performance benefitting from the proposed novel mechanism. Future work will be planned to investigate optimal control of the robotic legs distribution to improve the operation performance of RFDHR."

# 7. Gecko Inspired Surface Climbing Robots

Published in	Conference Location	Paper Citations	Authors	Link
2004 IEEE International Conference on Robotics and Biomimetics	Shenyang, China	87	C. Menon at al.	link

# Citation

C. Menon, M. Murphy and M. Sitti, "Gecko Inspired Surface Climbing Robots," 2004 IEEE International Conference on Robotics and Biomimetics, Shenyang, 2004, pp. 431-436, doi: 10.1109/ROBIO.2004.1521817.

# Abstract

Many applications call for robots to perform tasks in workspaces where traditional vehicles cannot reach. Using robots to perform these tasks can afford better human safety as well as lower cost operations. This paper focuses on the development of gecko inspired synthetic dry adhesives for wall climbing robots which can scale vertical walls. Many applications are of great interest for this kind of robot such as inspection, repair, cleaning, and exploration. The fabrication of synthetic dry adhesives inspired by nature is discussed as well as the design of prototype wall climbing robots. Results are presented and discussed to show the feasibility of novel gecko inspired robots

# Review

This article explores the gecko capability of climbing by developing a gecko inspired robot. Talks a bit about magnetic adhesion as well.

# Interesting quotes from the article:

"Some wall climbing robots are in use in industry today cleaning high-rise buildings, and performing inspections in dangerous environments such as storage tanks for petroleum industries and nuclear power plants"

"The most common type is suction adhesion [2,3,4] where the robot carries an onboard pump to create a vacuum inside cups which are pressed against the wall or ceiling. This type of attachment has some major drawbacks associated with it. The suction adhesion mechanism requires time to develop enough vacuum to generate sufficient adhesion force. This delay may reduce the speed at which the robot can locomote. Another issue associated with suction adhesion is that any gap in the seal can cause the robot to fall. This drawback limits the suction cup adhesion mechanism to relatively smooth, nonporous, non-cracked surfaces. Lastly, the suction adhesion mechanism relies on the ambient pressure to stick to a wall, and therefore is not useful in space applications as the ambient pressure in space is essentially zero."

"Another common type of adhesion mechanism is magnetic adhesion [5,6]. Magnetic adhesion has been implemented in wall climbing robots for specific applications such as nuclear facilities inspection. In specific cases where the surface allows, magnetic attachment can be highly desirable for its inherent reliability. Despite that, magnetic attachment is useful only in specific environments where the surface is ferromagnetic, so for most applications it is an unsuitable choice."

"Geckos’ ability to climb surfaces, whether wet or dry, smooth or rough, has attracted scientists attention for decades."

"The gecko’s ability to stick to surfaces lies in its feet, specifically the very fine hairs on its toes. There are billions of these tiny fibers which make contact with the surface and create a significant collective surface area of contact. The hairs have physical properties which let them bend and conform to a wide variety of surface roughness, meaning that the adhesion arises from the structure of these hairs themselves"

# Conclusion

"The importance of realizing mechanisms able to climb every kind of surface without contaminating the surrounding environment has driven the research to focus on the ability of animals to climb vertical walls. In this paper, a new climb ing robot strategy was presented. Also a new technique for fabricating synthetic microfibers for use as dry adhesives and the results of this process were presented. Two robotic prototypes, equipped with conventional adhesives, were built, analyzed, and successfully tested. The prototypes were able to climb vertical smooth surfaces, demonstrating the feasibility of the novel robot designs. Future work includes improving the synthetic hair fabrication and the implementation of this material in more agile and robust climbing robots."

# 8. Smooth Vertical Surface Climbing With Directional Adhesion

Published in	Conference Location	Paper Citations	Authors	Link
IEEE Transactions on Robotics ( Volume: 24 , Issue: 1 , Feb. 2008 )	IEEE	303	Sangbae Kim et al.	link

# Citation

S. Kim, M. Spenko, S. Trujillo, B. Heyneman, D. Santos and M. R. Cutkosky, "Smooth Vertical Surface Climbing With Directional Adhesion," in IEEE Transactions on Robotics, vol. 24, no. 1, pp. 65-74, Feb. 2008, doi: 10.1109/TRO.2007.909786.

# Abstract

Stickybot is a bioinspired robot that climbs smooth vertical surfaces such as glass, plastic, and ceramic tile at 4 cm/s. The robot employs several design principles adapted from the gecko including a hierarchy of compliant structures, directional adhesion, and control of tangential contact forces to achieve control of adhesion. We describe the design and fabrication methods used to create underactuated, multimaterial structures that conform to surfaces over a range of length scales from centimeters to micrometers. At the finest scale, the undersides of Stickybot's toes are covered with arrays of small, angled polymer stalks. Like the directional adhesive structures used by geckos, they readily adhere when pulled tangentially from the tips of the toes toward the ankles; when pulled in the opposite direction, they release. Working in combination with the compliant structures and directional adhesion is a force control strategy that balances forces among the feet and promotes smooth attachment and detachment of the toes.

# Review

Another gecko robot, this time with much closer characteristics with the real gecko.

# Interesting quotes from the article:

"When two surfaces are brought together, adhesion is created via van der Waals forces. Since van der Waals forces scale as 1/d3 , where d is the local separation between two flat surfaces [17], it is critical for the surfaces to be within an order of 10 s of nanometers of each other."

"For climbing robots, higher adhesion in tangential direction loading is significant; therefore, the tip geometry need to be designed for the distribution of tangential load."

# 9. LEMUR 3: A Limbed Climbing Robot for Extreme Terrain Mobility in Space

Published in	Conference Location	Paper Citations	Authors	Link
2017 IEEE International Conference on Robotics and Automation (ICRA)	Singapore, Singapore	14	Aaron Parness et al.	link

# Citation

A. Parness, N. Abcouwer, C. Fuller, N. Wiltsie, J. Nash and B. Kennedy, "LEMUR 3: A limbed climbing robot for extreme terrain mobility in space," 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 2017, pp. 5467-5473, doi: 10.1109/ICRA.2017.7989643.

# Abstract

This paper introduces a new four-limbed robot, LEMUR 3, that has demonstrated climbing on cliff faces and smooth glass. Each limb on the robot consists of seven identical actuators in a serial chain. Each limb terminates in a single axis force sensor that allows various end effectors to be mounted and connected to the robot's power and communication system. Microspine grippers were used for climbing the rocky surface and gecko adhesive grippers were used for the glass solar panels. All other hardware and much of the software was common for the two demonstrations. The robot's mechanical, electrical, and software systems, various gripping devices, and field demonstrations are described. Limbed mobility is of interest to JPL and NASA because of its potential to access extreme terrain, including that on Mars and in microgravity environments.

# Review

More advanced types of climbing robots involve multiples actuators and complex gripping mechanisms. With this mechanisms the robot can have access to uneven surfaces, steep slopes can be overtaken and other specific situations.

# Interesting quotes from the article:

"Three generations of wheeled robots have explored Mars [1]–[3] and three types of wheeled robots have driven on the Moon [4]–[6]. The objectives of these missions were enabled by the mobility of these vehicles. However, several terrain types with high scientific value are not accessible to wheeled systems, and several incidents have shown the vulnerabilities of wheeled architectures."

"Despite the efforts of the rover drivers, Opportunity could not reach the layering to deploy its scientific instruments because the slopes were too steep [7]. The twin Spirit rover had the unfortunate fate of getting stuck in loose sand and was unable to free itself, due in part to the limited degrees of freedom of its wheeled architecture [8]. Most recently, the Curiosity rover has suffered from punctures in its wheels that have slowed its progress and limited terrain types that could be traversed [9]."

"When equipped with gripping end effectors, limbed robots also have the potential to enter and explore caves (including the cave ceilings)"

"The biggest drawback to limbed robots is their complexity. This leads to higher cost and higher risk of failure in some cases, although the redundancy and adaptability of limbs can overcome failures that simpler architectures may suffer."

# 10. The design of permanent-magnetic wheeled wall-climbing robot

Published in	Conference Location	Paper Citations	Authors	Link
2017	IEEE International Conference on Information and Automation (ICIA)	2	Jiannan Cai et. al.	link

# Citation

J. Cai, K. He, H. Fang, H. Chen, S. Hu and W. Zhou, "The design of permanent-magnetic wheeled wall-climbing robot," 2017 IEEE International Conference on Information and Automation (ICIA), Macau, 2017, pp. 604-608, doi: 10.1109/ICInfA.2017.8078979.

# Abstract

In this paper, a new type of magnetic wheeled wallclimbing robot is proposed in order to overcome the shortcomings of existing climbing robots. It takes advantage of SOLIDWORKS to establish a three-dimensional model. Also, the paper introduces the design idea of wall-climbing robot, uses MATLAB to do the numerical analysis and make magnetic simulation through ANSOFT MAXWELL.

# Review

An application of wall climbing robots for cleaning ship hulls. Also a analysis withing ansoft Maxwell is made and a point on two magnets over one is sustained.

# Interesting quotes from the article:

"The wall-climbing robot equipped with cleaning tools, attracts on the ship surface and carries out cleaning jobs. In this way, it operates simply and cleans quickly. Since most of the hull surfaces are not smooth, wall-climbing robots are required to have these qualities: stable adsorption properties, good steering performance and surface adaptability. The robot should also have obstacle crossing ability because of many obstacles attracting to the ship body [2]. The distance between the nozzle and the wall surface needs to be controlled, but the marine fouling on the ship body is usually thick and uneven, which requires the cleaning mechanism could freely adjust the distance from the nozzle to the surface of ship."

WARNING

"Compared with a whole piece of the ring magnet, these two ring magnets cause less magnetic flux leakage. So it requires smaller size of the magnet."

"The arm cleaning mechanism designed in this paper is composed of motor, rotary shaft, support arm, nozzle rotating disk, high pressure nozzle and so on. The servo motor controls the rotation of the rotary shaft, and the support arm can move up and down along the rotation axis under the control of the motor. The high pressure nozzle is mounted on the rotating disc at an angle and the rotating disk is rotated by the recoil force of the high pressure water jet to form a circular cleaning area."

# Conclusion

"This paper proposed a new kind of magnetic wheeled climbing robot which overcomes the existing climbing wall robot structure’s shortcoming, such as cumbersome, inflexible walking and low adaptability to the wall. The calculation formula of the allowable adsorption force of the magnetic disk is obtained through the static analysis. What’s more, we use MATLAB to do numerical simulation and the ANSOFT MAXWELL to do magnetic field simulation. The simulation results show that the design of the magnetic structure satisfies the requirements of the adsorption force, and the robot could have reliable adsorption on the ship's wall without the risk of downward slippage and overturning."

# 11. Analysis of the Wheel-wall Gap and Its Influence on Magnetic Force for Wheeled Wall-climbing Robot Adsorbed on the Cylindrical Tank

Published in	Conference Location	Paper Citations	Authors	Link
2020	IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China	0	Taoyu Han ; Ruiming Qian	link

# Citation

T. Han and R. Qian, "Analysis of the Wheel-wall Gap and Its Influence on Magnetic Force for Wheeled Wall-climbing Robot Adsorbed on the Cylindrical Tank," 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 2020, pp. 889-893, doi: 10.1109/ITNEC48623.2020.9084740.

# Abstract

For the permanent magnet adsorption and four-wheel wall-climbing robot adsorbed on the surface of the cylindrical tank, only three wheels contact the wall, and one wheel is non-contact in general. The solution method of the contact point position on the contact wheels is put forward in this paper. The wheel-wall gap equation of the wheels of the robot is derived. The coordinate transformation matrix is established, which can be used to describe the position and orientation between the robot and the tank. Combined the robot parameters, the distribution curves and maximal values of the wheel-wall gaps of three contact wheels and one non-contact wheel are calculated. The influence of wheel-wall gap on the magnetic adsorption force is obtained by ANSOFT simulation. The analysis and calculation methods can be used to design the magnetic wheel and control of the wheeled wall-climbing robot.

# Review

This article study the influence of the Magnetic Wheel gap on the magnetic force for mobile climbing robots in cylindrical tanks.

# Interesting quotes from the article:

"the permanent magnet wheeled wall-climbing robot has a good application prospect in the tank wall detection due to its compact structure, simple control and agile motion, so many researchers have developed the permanent magnet wheeled wall-climbing robot"

Take a look at this article for optimized wheel design

Wall-Climbing Robots with Permanent-Magnet Contact Devices: Design and Control Concept of the Contact Devices

"In the above studies, tank walls were approximated as planes, but in the actual tank inspection, the surfaces of the tank are curved. The magnetic wheels are only in point contact with the wall. Apart from the contact point, there is a gap between the wheel and the wall, which will reduce the magnetic adsorption force."

"The research object of this article is a permanent magnet wheeled wall-climbing robot with four wheels and the cylindrical tank. In order to get more accurate adsorption force that the robot stably adsorbs on the tank wall, we first need to solve the gap between the magnetic wheel and the tank wall."

"In this paper, we set up different coordinate systems, and use the coordinate transformation to get the coordinates of the contact points between the magnetic wheels and the tank wall, and then the distance equation is established to solve the gap."

# Conclusion

"This paper proposed a method to calculate the wheel-wall gap when the permanent magnet wheeled wall-climbing robot is adsorbed on the tank wall. Then we use the method to calculate the value of the gaps between the magnetic wheels and the tank wall by MATLAB, and obtain the change law of wheel-wall gap under different situations. Along the width direction of the wheels from inside to outside, the gaps between the wheels and the tank wall gradually increases and tend to linear change. When the robot is adsorbed on the tank wall with the innermost points at the bottom of the contact wheels, with the increases of the stagger angle β, the gaps of the contacts gradually decrease, while the gap of the noncontact gradually increases, and the increase speed is fast. In the end, we perform the magnetic simulation by ANSFOT based on the value of the gap, and get the influence curve of the gap on the magnetic absorption force. With the increase of the gap, the magnetic adsorption force will decrease exponentially, so it is necessary to solve the wheel-wall gap for the more accurate magnetic adsorption force. The calculation method of wheel-wall gap studied in this paper can provide evidence for establishing a more accurate model for magnetic force analysis and simulation. And we can get the more reliable magnetic adsorption force, which is of great significance for the robot to work safely and stably on the tank wall."

# 12. Magnetic Omnidirectional Wheels for Climbing Robots

Published in	Conference Location	Paper Citations	Authors	Link
2013 IEEE/RSJ International Conference on Intelligent Robots and Systems	Tokyo, Japan	10	Mahmoud Tavakoli et al.	link

# Citation

M. Tavakoli, C. Viegas, L. Marques, J. N. Pires and A. T. de Almeida, "Magnetic omnidirectional wheels for climbing robots," 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, 2013, pp. 266-271, doi: 10.1109/IROS.2013.6696363.

# Abstract

This paper describes design and development of omnidirectional magnetic climbing robots with high maneuverability for inspection of ferromagnetic 3D human made structures. The main focus of this article is design, analysis and implementation of magnetic omnidirectional wheels for climbing robots. We discuss the effect of the associated problems of such wheels, e.g. vibration, on climbing robots. This paper also describes the evolution of magnetic omnidirectional wheels throughout the design and development of several solutions, resulting in lighter and smaller wheels which have less vibration and adapt better to smaller radius structures. These wheels are installed on a chassis which adapts passively to flat and curved structures, enabling the robot to climb and navigate on such structures.

# Review

Talks about the development of climbing robots in the past 2 decades, desire of high maneuverability, limitations of not being able to go around the pole, or high energy associated with the task.

Also touches the omnidirectional wheels problem with smooth movements.

Another aspect is the discontinuous contacts with the ground due to the gaps between the successive rollers, this can be mitigated but not removed.

# Interesting quotes from the article:

"Climbing robots have been developed during the past two decades in order to facilitate some jobs such as periodical inspections for detection of cracks, corrosion, material degradation and welding defects on tanks and piping. Other applications of interest include ship hull grooming, cleaning, and painting of such structures. Gas and oil tanks, Wind turbines, pipelines and marine vessels are examples of the structures which are target of this research work."

"Some of the applications e.g. painting or cleaning or periodical inspection need the robot be able to scan the whole structure or to reach to a pose on the structure rapidly and then perform in situ maintenance"

"Furthermore, in all cases high maneuverability is desired. One of the most important limitations of many pole climbing robots is that they can not rotate around the pole [13], or in order to rotate around the pole, they have high energy and time costs [14], [15], [16], while rotating around the pole is necessary for being able to scan the whole structure. Another important aspect is adaptability of the robot to various structures."

"The main objective of this research is to implement a robot which is able to climb and navigate over ferromagnetic structures considering:

High maneuverability.
High speed.
Adaptability to a reasonable range of curvature.
Adaptability to a reasonable range of structure’s ferromagnetic materials and thickness.
Simplicity."

"Climbing robots based on permanent magnets have also been developed for different purposes. They were based on a magnetic caterpillar[17], magnetic array wheel [18] or permanent magnetic wheels [6]. Yet there is a lot of space for improvements on many aspects of magnetic wheel based climbing robots."

"In the current research we tried to concentrate on aspects of climbing robots such as maneuverability and adaptability to various structures."

"However due to the discontinuous movement nature of omnidirectional wheels, the robot could not achieve a smooth movement. We will discuss how a climbing robot based on omnidirectional wheels suffers more from a chattering in movement than a terrestrial robot, and will try to improve such problem."

"As discussed in [21], the classic type of omnidirectional wheels (figure 2-a) makes discontinuous contacts with the ground due to the gaps between the successive rollers, which causes vertical vibration. To minimize this gap, Mecanum wheels, double row wheels, alternate wheels and half wheels were developed. In none of these wheels the gap is totally removed."

"The magnetic force has an inverse relation with the cubic order of the distance from surface (see equation 1). Therefore as we experienced with OmniClimber- I, small vibrations changes the normal magnetic force on each wheel which causes undesired vibrations in the direction normal to the surface and thus results in a non smooth movement. Furthermore a difference on normal magnetic force causes a difference on each wheel’s traction, resulting in a low trajectory following accuracy."

# Conclusion

"In this paper we presented evolution of omnidirectional magnetic wheels, which resulted in a lighter and smaller wheels which suffers from less vibration, and adapt better to curved structures. Future works includes development of a fully round omnidirectional magnetic wheels and integration of exteroceptive sensors to compensate the odometry errors, and a vision system for inspection of the structure."

# Articles - Non-Destructive Inspection using robots

# 1. Current and Future Research Focus on Inspection of Vertical Structures in Oil and Gas Industry

Published in	Conference Location	Paper Citations	Authors	Link
2018	18th International Conference on Control, Automation and Systems (ICCAS)	191	Sudevan, V., et al.	link

# Citation

V. Sudevan, A. Shukla and H. Karki, "Current and Future Research Focus on Inspection of Vertical Structures in Oil and Gas Industry," 2018 18th International Conference on Control, Automation and Systems (ICCAS), Daegwallyeong, 2018, pp. 144-149.

# Abstract

Regular inspection of oil and gas installations are vital for production, maintenance, safety and environmental impact assessment. Due to complex set-up and hazardous operation environment, the inspection, maintenance and repair (IMR) operations as considered as an inevitable task in oil and gas industries. Integrating technologies from the field of robotics, sensing and process control will be a decisive step in digitalization of oil and gas industry. The traditional vertical structures inspection system uses rope access, scaffolds, telescopic elevation platforms supported by cranes and manned helicopters. The challenges faced by conventional techniques are the construction of scaffolding, sending inspectors into dangerous and fatal environments, shutdown of plant operations etc. that has financial burden on the plant operating cost. Introduction of robotic technologies such as wall-climbing robots, Unmanned Aerial Vehicle (UAV) and wall-climbing drones, provides a possible solution for these challenges by increasing the efficiency, reducing the risks and lowering the cost of IMR tasks. The current research focus in this field is to automate and improve the inspection and testing capabilities of these systems. This paper presents the state of art of current scenario and future research focus on vertical structure inspection in oil and gas industry.

# Review

A great article, full of interesting information, specially about the ways for inspection of vertical structures, limitations of wall climbing robots and industry insights.

# Interesting quotes from the article:

"With increase in urbanization and industrialization, demand for oil and gas is expected to grow at a relentless pace in forthcoming years."

"When focusing on the sustainability and resource efficiency in the oil and gas industry, closer attention should be paid to the reliability and serviceability of existing structures to extend their lifetime. This requires more sophisticated and effective methods for inspection and monitoring of structures, in order to assess their structural state and thus trigger repair and rehabilitation efforts."

"The planning and execution of Inspection, Maintenance and Repair (IMR) activities in oil and gas industries are costly and time consuming due to its complex set-up and hazardous environment."

"There are continual requirement of close visual inspection of plant and other structures in both onshore and offshore oil and gas industries. The routine inspection, maintenance and cleaning of these infrastructures involves number of manual operations which are dangerous for skilled workers. Rope access inspection, construction of scaffolds and usage of manned helicopters are some of the conventional methods employed to inspect and monitor the high-rise structures like flare stacks, storage tanks, cooling towers or live systems in an oil and gas industry."

"The various challenges in performing conventional manual inspection methods are (i) Working personnel has to climb the high-rise or live systems to conduct the inspection tasks. (ii) Inspection sensors need to be carried in hand by the operator while accessing the structures at height. (iii) Close visual inspection is not always possible on live systems. (iv) The manual inspection methods are time consuming, labor intensive, inefficient, requires high skilled operators and involves high cost and risk factors."

"The oil and gas industries are now focusing on the development of an autonomous inspection/testing method to carry out the inspection tasks in high risk and hazardous areas."

"The manual inspection activity involves detecting defects and making a judgment based on the type of defect, whether to accept, reject or rework the part."

Installation of scaffoldings around the structure
Rope access techniques
Elevation platforms supported by cranes
Manned and remote-controlled helicopters

"The manual inspection can be prohibitively time consuming and expensive, forcing the oil and gas companies to complete their facility inspection only when required by law. The major benefits with the robotic inspection method is that inspections can be conducted more frequently due to their low operation cost. The robotic inspections can be conducted either in remoteoperated or as fully autonomous manner."

"In recent years the maturity and stability of climbing technologies have resulted in increasing number of climbing robots in industrial applications"

"The researchers have developed several robots for wall crawling, yet there is no guaranteed solution. One of the critical reasons why existing wall-crawling robots have not been available in the field is the risk of accidental fall due to operational failure from the harsh environment, like strong wind and the surface’s unpredictable condition."

"The limitation of wall climbing robots is safety, localization on the structure and limited payload capacity."

"Now, the researchers are focusing towards the development of an autonomous drone that can perform inspection tasks by identifying the structure to be inspected, navigate autonomously along the identified structure, collect and analyze the data collected and locate the fault locations autonomously."

"Approaches using an RGB-D sensor instead of a laser scanner [21] do not share the same constraints on the structure of the environment, but are only applicable to indoor operations. The advanced image processing algorithms such as Scale-Invariant Feature Transform (SIFT), Speeded up Robust Features (SURF), Simultaneous Localization and Mapping (SLAM), Parallel Tracking and Mapping (PTAM) etc., are widely used by the researches for the accurate target identification and tracking in outdoor operations. An autonomous features based target identification method was introduced using SURF detector [22]. Autonomous pipeline detection and tracking algorithm [23] was successfully implemented using canny edge detector and Probabilistic Hough Transformation for horizontal onshore oil and gas pipeline structures [24-25]. These algorithms can be modified and used for autonomous vertical structure inspection in oil and gas industry. It is clearly seen from the review that even though many attempts are continuing to automate the vertical structure inspection, no fully autonomous vision based inspection system that is capable of detecting the faults has been successfully developed yet."

"The challenges to be faced during the development of autonomous inspection system for oil and gas industry may include the regulation around the machine learning systems used for automated flight, payload limitations, manage the transition towards effective solutions regarding contingency and failure management, cyber resilience in order to mitigate against deliberate use of drones."

# Conclusion

"This paper presents a review of current inspection scenario and the future research focus on vertical structure inspection in oil and gas industry. The challenges to be faced during the development of autonomous inspection system for oil and gas industry may include the regulation around the machine learning systems used for automated flight, payload limitations, manage the transition towards effective solutions regarding contingency and failure management, cyber resilience in order to mitigate against deliberate use of drones."

# 2. Differential-Drive In-Pipe Robot for Moving Inside Urban Gas Pipelines

Published in	Conference Location	Paper Citations	Authors	Link
IEEE Transactions on Robotics ( Volume: 21 , Issue: 1 , Feb. 2005 )	IEEE	235	Se-gon Roh et al.	link

# Citation

Se-gon Roh and Hyouk Ryeol Choi, "Differential-drive in-pipe robot for moving inside urban gas pipelines," in IEEE Transactions on Robotics, vol. 21, no. 1, pp. 1-17, Feb. 2005, doi: 10.1109/TRO.2004.838000.

# Abstract

Pipelines for the urban gas-supply system require a robot possessing outstanding mobility and advanced control algorithms, since they are configured with various pipeline elements, such as straight pipelines, elbows, and branches. In this paper, we present a comprehensive work for moving inside underground urban gas pipelines with a miniature differential-drive in-pipe robot, called the Multifunctional Robot for IN-pipe inSPECTion (MRINSPECT) IV. MRINSPECT IV has been developed for the inspection of urban gas pipelines with a nominal 4-in inside diameter. The mechanism for steering with differential-drive wheels, arranged three-dimensionally, allows it to easily adapt to most of the existing configurations of pipelines, as well as providing excellent mobility during navigation. After carrying out analysis for fittings in pipelines, mathematical descriptions of their geometries are presented, which make it possible to estimate the movement patterns of the robot while passing through the fittings. Also, we propose a method of controlling the robot by modulating speeds of driving wheels that is applicable without sophisticated sensory information. To confirm the effectiveness of the proposed method, experiments are performed, and supplementary considerations on the design of the in-pipe robot are discussed.

# Review

Very good example of In-Pipe robot for visual inspection, despite being from 2005. particulary the wall press type for climbing vertical pipelines and the steering tasks accomplished by modulating the wheels velovity.

# Interesting quotes from the article:

"Currently, the applications of robots for the maintenance of the pipeline utilities are considered as one of the most attractive solutions available. In-pipe robots, which have a long history of development in robotics, can be classified into several elementary forms according to movement patterns, as shown in Fig. 1, although most of them have been designed depending upon specific applications."

"the wall-press type, which has a number of advantages in climbing vertical pipelines, corresponds to the robot with a flexible mechanism for pressing the wall with whatever means they apply"

"For successful navigation, however, in-pipe robots are strongly demanded to have the ability of negotiating elbows and branches, because urban gas pipelines are configured with a number of special fittings, such as elbows, branches, and their combinations."

"When mobile robots navigate on plain surfaces, such as indoor environments, steering is accomplished by modulating the speeds of wheels according to the desired movement direction."

# Conclusion

"In this paper, the issues of the mechanical construction of MRINSPECT IV and its control, mainly focused on the movement in fittings such as the elbow and the branch, were discussed. According to the experiments, MRINSPECT IV could navigate almost all kinds of pipeline configurations, regardless of the effect of gravity, its postures, and the direction of movement. Though the algorithms were described based on MRINSPECT IV, the ideas can be generalized to other robots. However, according to our experiences on this work, the mechanism of the in-pipe robot should be adaptable to the characteristic condition of the pipelines, and it is the preliminary requirement for successful movement. The use of a general-purpose robot may not be possible in in-pipe applications. For that means, MRINSPECT IV has the possibility of being used in practical applications, although it is still under improvement through testing in field conditions."

# 3. Micro Inspection Robot for 1-in Pipes

Published in	Conference Location	Paper Citations	Authors	Link
IEEE/ASME Transactions on Mechatronics ( Volume: 4 , Issue: 3 , Sep 1999 )	IEEE	130	K. Suzumori et al.	link

# Citation

K. Suzumori, T. Miyagawa, M. Kimura and Y. Hasegawa, "Micro inspection robot for 1-in pipes," in IEEE/ASME Transactions on Mechatronics, vol. 4, no. 3, pp. 286-292, Sept. 1999, doi: 10.1109/3516.789686.

# Abstract

A micro inspection robot for 1-in pipes has been developed. The robot is 23 mm in diameter and 110 mm in length and is equipped with a high-quality micro charge-coupled device (CCD) camera and a dual hand for manipulating small objects in pipes. It can travel through both vertical pipes and curved sections, making possible inspections that would be difficult with conventional endoscopes. Its rate of travel is 6 mm/s and it has a load-pulling power of 1 N. To realize this microrobot, the authors have specially designed and developed several micro devices and micromechanisms: a novel micromechanism called a planetary wheel mechanism for robot drive; a micro electromagnetic motor with a micro planetary reduction gear to drive the planetary wheel mechanism; a micro pneumatic rubber actuator that acts as a hand; a micro CCD camera with high resolution; and a pneumatic wobble motor for rotating the camera and hands. In the paper, the design and performance of these micro devices are reported, the performance of the robot as a whole is described, and an application example is given.

# Review

A micro robot used for 1-inch visual inspection, with micro manipulation hand and micro camera.

# Interesting quotes from the article:

"THE need to carry out inspections inside small pipelines has grown recently [1]–[5], with particular demand relating to the 1-in pipelines often found in chemical plants, heat exchangers, and gas or water supply systems. Some basic research on mobile mechanisms for use in pipes with smaller than 1-in inner diameter has been reported."

"such microrobots for small pipes have low pulling force and have difficulty negotiating curved pipes or vertical pipes. Further, commercial charge-coupled device (CCD) cameras are too big to mount on these robots."

"Images with 470 TV lines in horizontal and 350 TV lines in vertical resolution are obtained, and micro cracks with 25- m width on pipe surface are easily recognized."

# 4. A Wall Climbing Robotic System For Non Destructive Inspection Of Above Ground Tanks

Published in	Conference Location	Paper Citations	Authors	Link
2006 Canadian Conference on Electrical and Computer Engineering	Ottawa, Ont., Canada	11	Love P. Kalra ; Weimin Shen ; Jason Gu	link

# Citation

L. P. Kalra, W. Shen and J. Gu, "A Wall Climbing Robotic System for Non Destructive Inspection of Above Ground Tanks," 2006 Canadian Conference on Electrical and Computer Engineering, Ottawa, Ont., 2006, pp. 402-405, doi: 10.1109/CCECE.2006.277523.

# Abstract

The inspection and maintenance of above ground storage tanks (AST) can be very time consuming and dangerous when performed manually. The motivation behind this paper is to simplify the task of nondestructive testing of above ground storage tanks in oil refineries and other industrial applications. The proposed robotic system consists of an autonomous mobile platform that can move on the vertical walls of the tanks carrying the testing probes, a ground station where the sensor data can be monitored for faults or internal cracks in the tank walls and the wireless communication link. In this paper we are presenting the proposed mechanical design for the robotic vehicle, the control system and a coverage algorithm to autonomously scan the cylindrical surface of the oil storage tank with the mounted sensor for fault detection. The control system consists of hierarchal control architecture with four different layers viz: task layer, behavior layer, control layer, and physical layer.

# Review

This articles is an example of a tracked robot with batteries, the paper focus mainly on the mechanical architecture and the control system of the robot.

# Interesting quotes from the article:

"demands an adhesion mechanism that does not require any external power. Permanent magnet makes a great candidate for such a requirement."

"Since the attraction force of the magnets decreases exponentially with the increase in air gap, an experiment was performed to know the exact relation between the decreasing magnetic force with the increase in distance. After observing an exponential decrease in attraction force of the magnets with distance, it was decided to keep the magnets directly attached to the surface by designing a magnet holder shown in figure 2."

# 5. A Wall Climbing Robot for Oil Tank Inspection

Published in	Conference Location	Paper Citations	Authors	Link
2006 IEEE International Conference on Robotics and Biomimetics	Kunming, China	39	Love P. Kalra ; Jason Gu ; Max Meng	link

# Citation

L. P. Kalra, J. Gu and M. Meng, "A Wall Climbing Robot for Oil Tank Inspection," 2006 IEEE International Conference on Robotics and Biomimetics, Kunming, 2006, pp. 1523-1528, doi: 10.1109/ROBIO.2006.340155.

# Abstract

Thousands of storage tanks in oil refineries have to be inspected manually to prevent leakage and/or any other potential catastrophe. A wall climbing robot with permanent magnet adhesion mechanism equipped with nondestructive sensor has been designed. The robot can be operated autonomously or manually. In autonomous mode the robot uses an ingenious coverage algorithm based on distance transform function to navigate itself over the tank surface in a back and forth motion to scan the external wall for the possible faults using sensors without any human intervention. In manual mode the robot can be navigated wirelessly from the ground station to any location of interest. Preliminary experiment has been carried out to test the prototype.

# Review

This article is a specific application for Oil tank inspection and could be considered complementary to the above article.

# Interesting quotes from the article:

"Robots with the ability to adhere to the surface of an iron structure could be useful in many types of facilities, such as oil reservoirs, spherical gas tanks and the steam drum of nuclear power plants for performing several tasks, e.g. inspections, short-blasting or painting."

# 6. Wireless Climbing Robots for Industrial Inspection

Published in	Conference Location	Paper Citations	Authors	Link
IEEE ISR 2013	Seoul, South Korea	1	Hernando Leon-Rodriguez ; Tariq Sattar ; Jong-Oh Park	link

# Citation

# Abstract

Climbing robots have proven their abilities and enormous potential for industrial inspection tasks. These automated systems can climb, work, perform different actions in hazardous environments, changing between different types of surfaces and navigating through narrow spaces with difficult accessibility. Recently advances in mechanical engineering design and materials such as composites, have resulted in components and structures with complex geometries that need to be inspected more rigorously with more robust devices and novel techniques. The paper describes a survey on climbing robots for different environments along with adhesion techniques that operate in critical industrial environments, such as aerospace, transportation, pipelines, petro-chemical processing and power generation. The application for these robots are mainly for surveillance and inspection rather than executing none destructive tests.

# Review

This article focus on climbing robots with wireless control, which will not be the case, although It might be a good point to acknowledge the benefits.

It has an example of an interesting robot which carries the Phased Array equipment attached to the robot.

Some other types of adhesion are explored in this article like Friction or gripper adhesion and multiple surface adhesion.

# Interesting quotes from the article:

"In many cases e.g. in power plants, pipelines, aerospace, storage tanks in the petro-chemical and food processing industries, etc., the inspection has to be performed during an outage by shutting down a plant [1]. There is an enormous pressure to reduce the outage time by performing the inspections as efficiently and quickly as possible to provide a rapid turnaround."

"Current research is developing mobile none destructive testing robots that: navigate inside petro-chemical storage tanks (while full of product) to inspect floors for pitting and corrosion; climb on the hulls of steel ships to inspect hundreds of kilometres of weld; inspect the walls of petro-chemical storage tanks for corrosion and weld integrity;"

"Magnetic adhesion is the most common industrial technique for climbing robots and NDT. This method depends dramatically on the position, direction and type of the magnetic field. The simplest form of magnetic adhesion uses permanent magnets to obtain constant forces."

"The Neodymium and Samarium-Cobalt magnets have proved to be most useful in climbing robot applications, these magnets in especial arrange and with flux concentrator have been performed in a 200 kg wall climbing robot [8][9]."

"Robotic access both speeds up the inspection and reduce costs by eliminating the expensive and lengthy erection of scaffolding or the preparation of the site before humans can manually perform the inspection."

Robot Follows the weld by infrared localization

"The NDT robot is able to follow the welding robot by using infrared distance measuring sensors and by sensing the hot welding point with a thermal array sensor of eight thermopiles arranged in a row. It can measure the temperature of eight adjacent points simultaneously. The sensor reads infra-red in the 2 um to 22 um range, which is the radiant heat wavelength [8]."

# Conclusion

"The wireless climbing robots presented here are a very brief review of several available in research and commercial applications, designed to provide access to inspection sites on very large structures and/or test sites located in hazardous environments. The robots can deploy sensors to implement an appropriate technique from the almost all NDT techniques to find defects such as cracks, inclusions, lamination debonding and the extent of corrosion on steel structures. Robotic access both speeds up the inspection and reduce costs by eliminating the expensive and lengthy erection of scaffolding or the preparation of the site before humans can manually perform the inspection. Thus outage turnaround can be reduced or an outage prevented where the robotic inspection can be performed while the plant is in service. Robotic deployment of NDT is the only means of performing testing where the test site is located in hazardous and dangerous environments."

# 7. Bilateral laser vision tracking synchronous inspection robotic system

Published in	Conference Location	Paper Citations	Authors	Link
2017	2017 Far East NDT New Technology & Application Forum (FENDT)	0	Chun-Iei TU et. al.	link

# Citation

C. TU, X. LI, J. LI, X. Wang and S. Sun, "Bilateral Laser Vision Tracking Synchronous Inspection Robotic System," 2017 Far East NDT New Technology & Application Forum (FENDT), Xi'an, 2017, pp. 207-215, doi: 10.1109/FENDT.2017.8584573.

# Abstract

For inspecting welding seams of large-scale equipment such as storage tanks and spherical tanks, it usually cost much manpower and material, while automated testing robot can achieve fast and accurate detection. Because X-ray Flat Panel Detector is dependent on specialized automated equipment, it can greatly enhance X-ray inspection technology in large storage tanks that applying the Mecanum Omnidirectional Mobile Robot into automated weld detection. In this paper, an X-ray Flat Panel Detector based wall-climbing robotic system is developed for intelligent detecting of welding seams. The robot system consists of two Mecanum vehicles equipped with either a Flat Panel Detector or an X-ray generator and climbing on both side of the tank wall. Inspection robot can carry detector stably with reliable suction force and adapt to different surfaces. To let the X-ray Flat Panel Detector work properly, laser vision tracking system is used to ensure synchronous operation of the two robots. Some experiment was conducted and reported.

# Review

In this application two robots have to be synchronized in order to generate a radiography of the weld. This is costly and can only be achieved with an empty clean tank or sphere. The same analysis of the weld can be done using a Phased Array/ToFD system which is allowed by the standard.

# Interesting quotes from the article:

"The robot system consists of two Mecanum vehicles equipped with either a Flat Panel Detector or an X-ray generator and climbing on both side of the tank wall."

"Inspection robot can carry detector stably with reliable suction force and adapt to different surfaces. To let the X-ray Flat Panel Detector work properly, laser vision tracking system is used to ensure synchronous operation of the two robots."

"We make some experiments and find out the influence of gravity is quite large."

# Conclusion

"Climbing robot with Mecanum wheel can achieve Omni - directional movement, and the three-axis adjustable absorption mechanism we designed ensure wheels tangent to the work surface and provide sufficient adsorption force. Through the robotic force and stability analysis, the robot can make safe and reliable operation. We make some experiments and find out the influence of gravity is quite large. Through compensation to accelerate, the robot motion is more stable. More work will be performed to make the robot motion more accurate."

# 8. Design of Inspection Robot for Spherical Tank Based on Mecanum Wheel

Published in	Conference Location	Paper Citations	Authors	Link
2019 Far East NDT New Technology & Application Forum (FENDT)	China	0	Jie LI et al.	link

# Citation

J. LI, H. Feng, C. Tu, S. JIN and X. WANG, "Design of Inspection Robot for Spherical Tank Based on Mecanum Wheel," 2019 Far East NDT New Technology & Application Forum (FENDT), Qingdao, Shandong province, China, 2019, pp. 218-224, doi: 10.1109/FENDT47723.2019.8962531.

# Abstract

Traditional manual weld inspection of large spherical tank has high cost, low efficiency and high degree of danger. It is of great significance to design an automated weld flaw detection system. This paper designs a wheeled wall-climbing robot that can crawl on the outer wall of the spherical tank. The Mecanum wheeled trolley is used as a mobile platform for ultrasonic testing with ultrasonic testing equipment. Firstly, the key structure of the robot is introduced, and the independent suspension structure and spherical adjustment mechanism are designed. The reliability of magnetic adsorption was analyzed by static calculation, and the rationality of the structure was verified. The ultrasonic probe fixture is designed according to the ultrasonic flaw detection process. The defect size calculated according to the flaw detection principle. Through the experiment, the motion effect of the wall-climbing robot on the curved surface is analyzed, and the rationality and application value of the wall-detecting robot flaw detection scheme are verified.

# Review

# Interesting quotes from the article:

"Spherical tanks, as a common pressure vessel, are widely used for the storage of various dangerous goods media. Due to the special nature of its storage media, its security performance needs to be checked regularly. The weld is a weak part of the spherical tank, which is prone to defects and poses great safety hazards."

"The Mecanum wheel is a four-wheel drive. It requires four wheels to make good contact with the wall when climbing the wall. Therefore, the robot design requires an adjustment mechanism to adapt to the spherical creep of different curvatures. The suspension is designed to adjust the independent suspension structure, and the angle between the same row of wheels can be adjusted by adjusting the top knob. The relative posture of the front and rear suspensions can be adjusted by an intermediate rotation mechanism. Suspension adjustment and adjustment of the intermediate rotating mechanism ultimately allows the car to adapt to the surface of the can. The shock absorber is used for surface adaptation."

"The Mecanum car needs four wheels to cling to the ground when moving. The suspension enables the car to overcome obstacles such as welds on the surface of the tank, reducing the vibration of the robot during exercise. When an obstacle is encountered, the independent suspension is deformed so that the wheel can overcome the obstacle alone, thereby reducing the impact on other drive wheels."

"As the adsorption force, the magnetic force needs to be large enough to meet the safety adsorption. However, the excessive magnetic force will increase the resistance of the robot to crawl. Therefore, the magnetic adsorption force needs to be analyzed, so that the magnet can reduce the redundancy that hinders the movement of the car on the basis of stable adsorption. The remaining suction minimizes the power consumption of the robot."

"The suspension enables the car to overcome obstacles such as welds on the surface of the tank, reducing the vibration of the robot during exercise. When an obstacle is encountered, the independent suspension is deformed so that the wheel can overcome the obstacle alone, thereby reducing the impact on other drive wheels."

"In this paper, a wall climbing robot is designed for the detection of spherical tank welds. The robot was analyzed for the external structure of the spherical tank. According to the experiment, the robot can better adapt to the spherical surface with different curvatures, and can overcome obstacles such as welds on the surface."

# Conclusion

# 9. Environment identification and path planning for autonomous NDT inspection of spherical storage tanks

Published in	Conference Location	Paper Citations	Authors	Link
2016	2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium	-	Marco Antonio Simoes Teixeira, Higor Barbosa Santos, Andre Schneider de Oliveira, Lucia Valeria Ramos de Arruda and Flavio Neves-Jr	link

# Citation

# Abstract

This paper presents a novel approach to inspection planning in spherical storage tanks by an autonomous climbing robot. The objective is the automatic extraction of some environment characteristics, by robot, to predict the tank dimensions and robot localization. Three distinct perception sources (long range laser rangefinder, light detection and ranging; and depth camera) are used to predict a 3D occupancy grid wrapping calculated tank. From this grid, a path for tank inspection is computed that ensuring a complete icon at the entire tank surface. This scanning must consider kinematic constraints of magnetic wheels and NDT standard. The approach is evaluated in four LPG’s spherical tanks virtually designed with same characteristics that real tank projects.

# Review

This article has a good approach for autonomous robot and reasons to inspect GLP spheres. It talks about odometry and also has some interesting articles to be researched that may help. Also a complex perception system with 3 inputs is explored.

# Interesting quotes from the article:

"Mobile robots are a powerful tool to inspect inaccessible and hazardous environments and guarantee repetition and precision that this task requires. Inspection robots should navigate with precision over the whole surface to be inspected to detect any failure or defect."

"Inspection task traditionally exposes the technicians to unhealthy and often hazardous environments. Mobile robots can be used to reduce or avoid these risks."

"Inspection robots have the task of navigating through the entire inner and outer surface of storage tank searching for structural failures in the steel plates or weld beads. However, the inspection task cannot be planned if the robot does not know the environment characteristics."

"The odometry of mobile robots can be improved if environment parameters are known, ensuring the pure rolling without skidding and avoiding lateral motion."

"In inspection tasks, the path planning is the main requirement, where the robot must cover all surfaces (not only for navigation), but for an accurate inspection."

"This work proposes a novel approach to inspection planning of unknown spherical storage tanks, where an autonomous mobile robot extracts several characteristics from environment to predict tank’s dimensions and its localization based on three distinct perception sources as a result an occupancy grid is also predicted allow in the inspection planning."

"Robot’s perception systems are composed by three main sources. A long-range laser finder (until 70 meters with precision of millimeters) is applied to measure the relative distance between robot and environment, allowing the estimation of tank parameters. A Light Detection And Ranging (LIDAR) is applied to detect any obstacle during navigation. These two perception sources are mounted on a mobile base that can be rotated in roll axis. Finally, a fixed depth camera is used for environment mapping and obstacle detection. The fusion of these sources also promotes a high precision odometry system, as discussed in [14]."

Look for thease articles:

A. de Oliveira, L. de Arruda, F. Neves-Jr, R. Espinoza, and J. Nadas, “Adhesion force control and active gravitational compensation for autonomous inspection in lpg storage spheres,” Robotics Symposium and Latin American Robotics Symposium (SBR-LARS), 2012 Brazilian, pp. 232–238, Oct 2012. [14] R. Veiga, A. S. de Oliveira,

R. Veiga, A. S. de Oliveira, L. V. R. Arruda, and F. Neves-Jr, “Localization and navigation of a climbing robot inside a lpg spherical tank based on dual-lidar scanning of weld beads,” in Springer Book on Robot Operating System (ROS): The Complete Reference. New York: Springer, 2015.

R. V. Espinoza, A. S. de Oliveira, L. V. R. de Arruda, and F. Neves- Jr, “Navigation’s stabilization system of a magnetic adherence-based climbing robot,” Journal of Intelligent & Robotic Systems, pp. 1–17, 2014.

# 10. Rigorous Tracking of Weld Beads for the Autonomous Inspection with a Climbing Robot

Published in	Conference Location	Paper Citations	Authors	Link
2019	2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE)	0	Vinicius de Vargas Terres et. al.	link

# Citation

V. de Vargas Terres et al., "Rigorous Tracking of Weld Beads for the Autonomous Inspection with a Climbing Robot," 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE), Rio Grande, Brazil, 2019, pp. 252-257, doi: 10.1109/LARS-SBR-WRE48964.2019.00051.

# Abstract

This article presents a novel strategy for an autonomous inspection robot for oil industry tanks. The goal is to identify and track the external weld beads of a tank by an inspection robot. The chosen robot is an autonomous climbing robot with differential magnetic wheels. The detection of the weld bead is made with a line profile sensor. An algorithm is proposed to identify the weld bead position from the robot. The control is designed with the Fuzzy technique. The approach is evaluated on a real plate and a cylindrical tank virtually designed with the same characteristics that real tank projects.

# Review

This article uses fuzzy controller to control the trajectory of the robot. Also gives some characteristics of computer vision versus laser. They used ROS to implement the control. ALso SciKit-Fuzzy

Despite the success on tracking welds, the robot has shown difficulties with the accuracy of the controller, the main reason being maneuverability.

# Interesting quotes from the article:

"The application of mobile robotics in this industry is aimed at creating a product that mitigates the time demanded and, consequently, operational costs."

"The storage tanks require frequent inspections to avoid or minimize the risk of failures and leaks. The inspection is traditionally realized through the Non-Destructive Testing (NDT) techniques, like ultrasound."

"In these three articles, it is possible to observe that the vision-based method is highly influenced by the ambient light condition. Other disadvantages, when compared to laser sensors, are the measurement accuracy and the effort of processing the information received."

"All the approaches chosen in the works analyzed assumed a deep knowledge in the robot’s kinematics. With a Fuzzy approach, a more simple design is possible. Furthermore, the integration of multiple inputs and outputs variables raises the complexity of the systems. In this case, the Fuzzy is an excellent option to model the controller, as shown in [7], [8]."

"To determine if these values are the weld bead border, they are compared with a threshold value. If there are not two borders, it means that the robot is on a weld bead joint. In this case, the orientation error and the weld bead joint type is set. If there are two borders in the data, the data measured is linearized."

"A digital low-pass filter is proposed for reducing the noise in the signal received. The filter implemented is a FIR with a Kaiser window."

"Initially, the weld bead limits must be determined. The input data is differentiated with respect to the x-axis. Then, the location of the maximum and minimum values of the derivative function is calculated. To determine if these values are the weld bead border, they are compared with a threshold value. If there are not two borders, it means that the robot is on a weld bead joint. In this case, the orientation error and the weld bead joint type is set. If there are two borders in the data, the data measured is linearized. The reason for this procedure is the reduction of the influence of tank curvature."

"The trajectory controller is designed applying the Fuzzy control technique. It is chosen because of the reduction of the robot’s kinematics influence in the design of the controller. Another reason is its ease of design and application."

"The data filtering, weld bead recognition, and control were implemented in python and ROS (Robot Operating System). ROS is a middleware, and its a collection of tools and libraries to simplify the programming in robotics [16]."

"However, the accuracy of the controller at turns was not at the expected level. The main restriction of the robot is its maneuverability. When compared to other climbing robots, like [8], the AIR- 1 lacks positioning precisely the robot. Its reason is the difference in degrees of freedom. Furthermore, The limitation of the robot for making small adjustments, caused by the magnetic wheels, leads to a steady-state error. Although the satisfactory results, other variables have to be considered to achieve a more realistic system and improved the obtained results, for example, the slip of the robot."

Look for these articles

T. Eiammanussakul, J. Taoprayoon, and V. Sangveraphunsiri, “Weld bead tracking control of a magnetic wheel wall climbing robot using a laser-vision system,” in Applied Mechanics and Materials, vol. 619. Trans Tech Publ, 2014, pp. 219–223.

[1] L. Zhang, J. Jiao, Q. Ye, Z. Han, and W. Yang, “Robust weld line detection with cross structured light and hidden markov model,” in 2012 IEEE International Conference on Mechatronics and Automation. IEEE, 2012, pp. 1411–1416.

[2] T. Eiammanussakul, J. Taoprayoon, and V. Sangveraphunsiri, “Weld bead tracking control of a magnetic wheel wall climbing robot using a laser-vision system,” in Applied Mechanics and Materials, vol. 619. Trans Tech Publ, 2014, pp. 219–223.

[3] K. J. Kim, H. W. Roh, H. K. Leem, R. S. Leem, G. Changwon, and S. J. Lee, “Application of a robot to grinding welding-beads remained in removal of working pieces for shipbuilding,” WMSCI08, 2008.

This one is a PID controller to guide the mobile robot:

[6] Z. Gui, Y. Deng, Z. Sheng, T. Xiao, Y. Li, F. Zhang, N. Dong, and J. Wu, “Design and experimental verification of an intelligent wallclimbing welding robot system,” Industrial Robot: An International Journal, vol. 41, no. 6, pp. 500–507, 2014.

This one shows better maneuverability and uses Fuzzy logic as well:

[8] H. B. Santos, M. A. S. Teixeira, A. S. de Oliveira, L. V. R. de Arruda, and F. Neves-Jr., “Quasi-Omnidirectional Fuzzy Control of a Climbing Robot for Inspection Tasks,” Journal of Intelligent & Robotic Systems, vol. 91, no. 2, pp. 333–347, Aug 2018.

# 11. Automated Inspection of Pressure Vessels through a Climbing Robot with Sliding Autonomy

Published in	Conference Location	Paper Citations	Authors	Link
2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE)	Rio Grande, Brazil,	0	Piatan Sfair Palar et al.	link

# Citation

P. Sfair Palar, A. Schneider de Oliveira, V. de Vargas Terres and J. Endress Ramos, "Automated Inspection of Pressure Vessels through a Climbing Robot with Sliding Autonomy," 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE), Rio Grande, Brazil, 2019, pp. 287-292, doi: 10.1109/LARS-SBR-WRE48964.2019.00057.

# Abstract

AIR is a climbing robot designed for nondestructive testing and inspection of weld seams in tanks and vessels of the oil and gas industry. In this paper a Fuzzy control system is proposed to a optimal selection of Levels of Autonomy (LoA) at each moment, mixing joystick inputs from an operator and sensor information from the environment to ensure that the movement of tracking weld seams is being achieved while allowing the operator to have full control of the robot's movements without having to control each maneuver manually. Experiments were executed in a simulator environment and are presented along with future ideas for research.

# Review

This articles touches on the exac spot that, the inspectior needs to directly control the robot, and also analyse the sensor information while maneuvering the robot. So, some level of autonomy would help him to focus in more important tasks like the analysis.

Talks about levels of authonomy and an interesting fact that full autonomy might not be the answear but a assisted autonomy where the operator is helped by the robot to keep in line. Like, It will always be in line if the level of autonomy is on and can be maneuvered if the operator turns it off.

# Interesting quotes from the article:

"In the petrochemical industry, the integrity of structures like pipes and vessels requires strict quality control and proper inspection to prevent hazardous accidents. Monitoring these materials can be a dangerous task because of harsh environments and difficult to access areas. For weld seams inspection, non-destructive testing can be achieved utilizing ultrasonic sensors."

"In oil refineries, the tanks and vessels that hold the fluids can be as large as an 80 meters diameter cylinder, hindering the inspection task for humans [1]. Climbing robots are a valid substitute for this task, presenting a more reliable and safe method of inspection."

"In inspection tasks, usually direct control is needed because the operator has to analyze sensor information while maneuvering the robot. A more intelligent approach is the robot having different Levels of Autonomy (LoA), so the operator can - in moments of stress, high workload or anxiety - assign some tasks to the robot while being concentrated on others"

"The primary motivation of this work is that the change in LoA to be invisible to the operator. While the operator maneuvers the robot, the robot corrects where the operator is doing wrong and also let the operator controls the velocity and the path that the robot will follow."

"With the advancement of technology, more and more communication and cooperation between robots and human operators are necessary, through what is called Human-Robot Interaction (HRI)"

"In the industry, HRI is observed in processes like collaboration for handling movements in assembly, automatic orientation for cranes and riveting. Technologies such as human and object detection in 3D environments and gesture recognition can be utilized to help in accomplishing these goals"

"One definition of autonomy, considering the social feature of an agent, is the notion that an agent is autonomous when it can choose act in a way that contradicts the decisions of other agents"

"An agent can possess several autonomy levels, determined by how much their decisions are affected by other agents. Another definition of autonomy for robots is for how much time they can be ignored. The more a robot can operate without human interruption, more autonomy that the robot has [6]."

"In several applications, changing the autonomy level on the fly is needed. A fully autonomous robot could need some help in some situations, and different tasks can require different autonomy levels. While being fully autonomous dismisses the attention of an operator, unwanted situations can occur and also endanger or threaten humans."

"TABLE I: Autonomy levels in decisions and action selection [12] 10 The computer decides everything and acts autonomously 9 Informs the human only if it, the computer, decides to 8 Informs the human only if asked 7 Executes automatically then necessarily informs the human 6 Allows the human a restricted time to veto before execution 5 Executes a given suggestion if the human approves 4 Suggests one alternative 3 Narrows the selection down to a few 2 The computer offers a set of decision/action alternatives 1 Human must take all the decisions with no assistance"

"Algorithms were developed for the robot to follow the weld seams and are part of the autonomous mode. There is also a laser, shown in Fig. 2, for visual reference only for the operator to better visualize the orientation of the robot."

"The four levels of autonomy organized for selection when operating the robot are summarized in Table II. The first mode is when the operator has full control of robot movements, controlling linear and angular velocities. The only action that the robot can take in this mode is to stop a movement requested by the user that can induce a collision. The second mode is Shared Control, where the robot stays in the weld bead by controlling angular velocities, and the user controls how fast the robot will move forward. The third level of autonomy is Supervisory Control. In this mode, the robot will move at a safe speed staying in the weld bead and waits for the user to choose a direction when an intersection is found. The fourth and final mode is Full Autonomous, where the user defines a set point, and the robot will determine the best route to reach that destination. The setpoint can be global by locating a point in the tank or local, by establishing a distance related to the actual position of the robot. In this work, the Full Autonomous mode setpoint is always defined at the endpoint of the experiment."

"The Fuzzy system takes in information from the back and front sensors, linear and angular velocities input by the operator on the joystick and the position of the weld seam relative to the center of the robot and outputs the most fitting LoA with the available information."

"The membership function for linear and angular velocities input by the operator and the weld position detected by the profile sensor are displayed in Fig. 3. The Fuzzy sets for linear velocities are Negative Low, Negative Medium, Negative High, Zero, Positive Low, Positive Medium, and Positive High. For angular velocities and weld seam position, the sets are both:: Left High, Left Medium, Left Low, Center, Right Low, Right Medium and Right High."

"In this work a Fuzzy system was presented for better controlling a climbing robot capable of inspecting gas and oil tanks and vessels without jeopardizing the operator’s attention while maintaining full control of the robot’s movements."

# Articles - Computer Vision and Machine Learning

# 1. Human vs computer in scene and object recognition

Published in	Conference Location	Paper Citations	Authors	Link
2014 IEEE Conference on Computer Vision and Pattern Recognition	Columbus, OH, USA	18	Ali Borji et al.	link

# Citation

A. Borji and L. Itti, "Human vs. Computer in Scene and Object Recognition," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 113-120, doi: 10.1109/CVPR.2014.22.

# Abstract

Several decades of research in computer and primate vision have resulted in many models (some specialized for one problem, others more general) and invaluable experimental data. Here, to help focus research efforts onto the hardest unsolved problems, and bridge computer and human vision, we define a battery of 5 tests that measure the gap between human and machine performances in several dimensions (generalization across scene categories, generalization from images to edge maps and line drawings, invariance to rotation and scaling, local/global information with jumbled images, and object recognition performance). We measure model accuracy and the correlation between model and human error patterns. Experimenting over 7 datasets, where human data is available, and gauging 14 well-established models, we find that none fully resembles humans in all aspects, and we learn from each test which models and features are more promising in approaching humans in the tested dimension. Across all tests, we find that models based on local edge histograms consistently resemble humans more, while several scene statistics or "gist" models do perform well with both scenes and objects. While computer vision has long been inspired by human vision, we believe systematic efforts, such as this, will help better identify shortcomings of models and find new paths forward.

# Review

Highlights some situations where the computer vision has surpassed the human vision.

# Interesting quotes from the article:

"Across all tests, we find that models based on local edge histograms consistently resemble humans more, while several scene statistics or “gist” models do perform well with both scenes and objects. While computer vision has long been inspired by human vision, we believe systematic efforts, such as this, will help better identify shortcomings of models and find new paths forward."

"The computer vision community has made rapid advances in several areas recently. In some restricted cases (e.g., where variability is low), computers even outperform humans for tasks such as frontal-view face recognition, fingerprint recognition, change detection, etc."

"Models outperform humans in rapid categorization tasks, indicating that discriminative information is in place but humans do not have enough time to extract it [19]. Models outperform humans on jumbled images and score relatively high in absence of (less) global information. Explicit addition of opportunistic local discriminative features, that humans often use, may enhance accuracy of models."

"We find that some models and edge detection methods are more efficient on line drawings and edge maps. Our analysis helps objectively assess the power of edge detection algorithms to extract meaningful structural features for classification, which hints toward two new directions."

"While models are far from human performance over object and scene recognition on natural scenes, even classic models show high performance and correlation with humans on sketches."

"Consistent with the literature, we find that some models (e.g., HOG, SSIM, geo/texton, and GIST) perform well. We find that they also resemble humans better"

# Conclusion

"We learn that: 1) Models outperform humans in rapid categorization tasks, indicating that discriminative information is in place but humans do not have enough time to extract it [19]. Models outperform humans on jumbled images and score relatively high in absence of (less) global information. Explicit addition of opportunistic local discriminative features, that humans often use, may enhance accuracy of models. 2) We find that some models and edge detection methods are more efficient on line drawings and edge maps. Our analysis helps objectively assess the power of edge detection algorithms to extract meaningful structural features for classification, which hints toward two new directions. First, it provides another objective metric (in addition to conventional F-measure) for evaluating edge detection methods (i.e., an edge detection method serving better classification accuracy is favored). Second, it will help study which structural components of scenes are more important. For example, the fact that long contours are more informative [25] can be used to build better feature detec- tors. 3) While models are far from human performance over object and scene recognition on natural scenes, even classic models show high performance and correlation with humans on sketches. The simplicity of sketches is a great opportunity to transcend models and discover mechanisms of biological object recognition. Another direction in this regard is to augment color, line, and spatial information for building better gist models (e.g., similar to geo map). 4) Consistent with the literature, we find that some models (e.g., HOG, SSIM, geo/texton, and GIST) perform well. We find that they also resemble humans better. GIST, a model of scene recognition works better than many models over both Caltech-256 and Sketch datasets. HMAX has the 2nd best correlation on sketches and achieves a high accuracy. 5) Invariance analysis shows that only sparseSIFT and geo color are invariant to in-plane rotation with the former having higher accuracy (our 3rd test). On test 4, LBP has the highest d 0 and is the most similar model to humans over original images but it fails on rotated images. We argued that both accuracy and confusion matrices are important in evaluating models. On the one hand, high performing models may not show good correlation with humans which warrants further inspection. One could propose other alternatives to highly correlated CMs, e.g., looking at which exemplars are difficult to classify (instead of looking at misses at the category level). On the other hand, highly correlated CMs could occur even when the absolute performance (e.g., classification accuracy) is quite different. Contrasting humans and machines, although helpful, has its own challenges for two reasons. First, there exist many models (some we did not consider here e.g., new deep learning methods [35]), with several parameters (e.g., normalization, pooling sizes, kernels), sometimes yielding to quite different scores. Second, similarly human studies have been designed for specific purposes and hypotheses (with different settings) and it is not trivial to directly use them for model evaluation. This calls for extensive collaboration among experimental and computational vision researchers."

# 2. Machine-Vision-Based Human-Oriented Mobile Robots - A Review

Published in	Conference Location	Paper Citations	Authors	Link
		9	Finzgar, M., et al.	link

# Citation

Finžgar, M., & Podržaj, P. (2017). Machine-Vision-Based Human-Oriented Mobile Robots: A Review. Strojniški vestnik - Journal of Mechanical Engineering, 63, 331-348. doi:10.5545/sv-jme.2017.4324

# Abstract

In this paper we present a study of vision-based, human-recognition solutions in human-oriented, mobile-robot applications. Human recognition is composed of detection, tracking and identification. Here, we provide an analysis of each step. The applied vision systems can be conventional 2D, stereo or omnidirectional. The camera sensor can be designed to detect light in the visible or infrared parts of the electromagnetic spectrum. Regardless of the method or the type of sensor chosen, the best results in human recognition can be obtained by using a multimodal solution. In this case, the vision system is enhanced with other forms of sensory information. The most common sensors are laser range finders, microphones and sonars. As medicine is expected to be one of the main fields of application for mobile robots, we give it special emphasis. An overview of current applications and proposal of potential future applications are given. Without doubt, properly controlled mobile robots will play an ever-increasing role in the future of medicine.

# Review

Human oriented mobile robots, shows many applications on the interaction with robots and human, a little bit out of the context for industrial but maybe its a fild worth mentioning if the narrative allows.

Very extent review.

# Interesting quotes from the article:

"Human-oriented mobile robots are becoming increasingly important since the need for health-based assistance is increasing for the growing number of elderly and/or chronically ill people. Mobile robots could offer assistance and reliable health monitoring and therefore improve people’s quality of life."

"More recent mobilerobot applications are dealing with the assistance of elderly people [5], support for autism diagnosis and intervention [6] and assessments of people’s physiological state [7]."

"One of the most important is human-robot interaction (HRI). In order to make it as natural as possible and to ensure reliable execution of the mobile robots’ tasks these systems should be autonomous, robust, fast, non-contact and, most importantly, safe. These characteristics are needed for unbiased, realtime measurements in different situations (occlusions, varying illumination, etc.). Moreover, it is necessary for the mobile robots to provide only the tasks for which they were built and not to keep people under surveillance and/or disturb their privacy [8]."

"The crucial characteristic of mobile robots needing to work in a human environment is the ability to recognise people. This is important for safety reasons, the successful performance of the mobile robots’ tasks and a natural HRI. Human recognition consists of 3 basic steps [12]: detection, tracking (localisation) and identification."

"Vision-only-based mobile-robot systems are composed of colour vision, thermal vision or a combination of the two. In colour-vision systems, the following approaches have been implemented: conventional 2D vision, stereo vision and omnidirectional vision."

# Conclusion

"The reviewed literature reveals that there is no universal solution for a human-oriented mobile robot. Different hardware and software solutions have their pros and cons in different environmental settings and situations, which makes us believe that mobile robots with multiple sensor modalities will be the most studied in the near future. An interesting solution for indoor environments, such as households or clinics, is iSpace, which could offer the implementation of mobile robots with different tasks in the same environment. From the perspective of human recognition, the human-identification step seems to be the most challenging. We believe that a lot of effort will be put into it, since correct identification is crucial in healthcare, where misidentification could result in the wrong treatment, having potentially fatal consequences. Because vision systems are already being widely used in medicine, the use of thermal vision and colour vision in mobile robots for diagnostic/screening purposes is promising. In the future, to implement mobile robots in as many healthcare applications as possible, the focus will need to be put on HRI, human identification, robustness of the mobile robots’ task performance and quality, together with security of the measured data."

# 3. Computer Vision Startups Tackle AI

Published in	Conference Location	Paper Citations	Authors	Link
IEEE MultiMedia ( Volume: 23 , Issue: 4 , Oct.-Dec. 2016 )	IEEE	1	Alex Jaimes	link

# Citation

A. Jaimes, "Computer Vision Startups Tackle AI," in IEEE MultiMedia, vol. 23, no. 4, pp. 94-96, Oct.-Dec. 2016, doi: 10.1109/MMUL.2016.62.

# Abstract

With the visual revolution upon us, significant opportunities exist for new applications requiring computer vision and multimedia technologies. Learn how the revolution began, why the computer vision field is ripe for innovation, and what helps determine whether a startup will succeed.

# Review

Good insights on startups and low cost application of computer vision, with low cost hardware.

# Interesting quotes from the article:

"One could easily argue that the visual revolution first started back in 2004 with the introduction of Flickr, which changed the consumption of photography for the masses."

"According to Mary Meeker’s annual Internet Trends report, in 2014, people uploaded an average of 1.8 billion digital images every single day.1"

"There’s no question that all of the buzz around artificial intelligence (AI) has its roots in progress made with images. The research community has been obsessed with the challenge of automatic “image understanding,” almost as a basic pillar of reaching the AI dream. And there has been some good progress, particularly with the application of deep-learning techniques."

"The confluence of the factors I described earlier— lower hardware costs, better cameras, a shift of photography from hardware to software, the implicit monetary value of images, and the visibility of better performance of computer vision (outside of academic circles)—are creating a unique moment in the fields of computer vision and multimedia. Finally, after many years, commercial interest in visual content and algorithmic progress aremerging."

"On the other hand, younger startups are trying to tap into the business of recognizing objects, events, and scenes in images."

"The visual revolution, however, is not limited to the Internet. Lower costs of hardware and the availability of low-cost and open source hardware platforms, such as Arduino and Raspberry Pi, have also created significant opportunities by allowing the production of low-cost hardware devices."

"This has facilitated new applications and the emergence of startups, ranging from those that aim to build devices for the home to those that offer low-cost security systems and even robots."

"At the same time, some industries are on a path to becoming completely revolutionized, with computer vision at the core. This includes transportation, for example, not only in self-driving cars but also in the use of technologies tomonitor driver fatigue."

"Computer vision has been around for a long time, and it’s been used in industrial applications for many years, but we’re definitely reaching a new stage, both in terms of its intersection with big data and its impact in the consumer space."

"It’s never been more visible to consumers, and that will translate to a number of failures, but also big successes."

# Conclusion

"At the end of the day, in general, whether startups in the computer vision and multimedia fields succeed will depend highly on execution, and a large part of that implies taking a human-centered approach to research and innovation. This, in turn, will require taking an interdisciplinary approach involving big data analysis, interaction, and human issues. In particular, the use of context will be critical, and although many of the applications I’ve discussed here focus mainly on using the visual components (the images or videos), the big opportunity for multimedia still lies in leveraging the integration of multiple types of data to better exploit context when developing computer vision technology and applications."

# 4. A Computational Approach to Edge Detection

Published in	Conference Location	Paper Citations	Authors	Link
IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: PAMI-8 , Issue: 6 , Nov. 1986 )	IEEE	14669	John Canny	link

# Citation

J. Canny, "A Computational Approach to Edge Detection," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679-698, Nov. 1986, doi: 10.1109/TPAMI.1986.4767851.

# Abstract

This paper describes a computational approach to edge detection. The success of the approach depends on the definition of a comprehensive set of goals for the computation of edge points. These goals must be precise enough to delimit the desired behavior of the detector while making minimal assumptions about the form of the solution. We define detection and localization criteria for a class of edges, and present mathematical forms for these criteria as functionals on the operator impulse response. A third criterion is then added to ensure that the detector has only one response to a single edge. We use the criteria in numerical optimization to derive detectors for several common image features, including step edges. On specializing the analysis to step edges, we find that there is a natural uncertainty principle between detection and localization performance, which are the two main goals. With this principle we derive a single operator shape which is optimal at any scale. The optimal detector has a simple approximate implementation in which edges are marked at maxima in gradient magnitude of a Gaussian-smoothed image. We extend this simple detector using operators of several widths to cope with different signal-to-noise ratios in the image. We present a general method, called feature synthesis, for the fine-to-coarse integration of information from operators at different scales. Finally we show that step edge detector performance improves considerably as the operator point spread function is extended along the edge.

# Review

Edge detection and its benefits for the industry.

# Interesting quotes from the article:

"EDGE detectors of some kind, particularly step edge detectors, have been an essential part of many computer vision systems. The edge detection process serves to simplify the analysis of images by drastically reducing the amount of data to be processed, while at the same time preserving useful structural information about object boundaries."

"There is certainly a great deal of diversity in the applications of edge detection, but it is felt that many applications share a common set of requirements. These requirements yield an abstract edge detection problem, the solution of which can be applied in any of the original problem domains."

"In all of these examples there are common criteria relevant to edge detector performance. The first and most obvious is low error rate. It is important that edges that occur in the image should not be missed and that there be no spurious responses. In all the above cases, system performance will be hampered by edge detector errors."

"The second criterion is that the edge points be well localized. That is, the distance between the points marked by the detector and the "center" of the true edge should be minimized. This is particularly true of stereo and shape from motion, where small disparities are measured between left and right images"

"In this paper we will develop a mathematical form for these two criteria which can be used to design detectors for arbitrary edges. We will also discover that the first two criteria are not "tight" enough, and that it is necessary to add a third criterion to circumvent the possibility of multiple responses to a single edge."

# 5. Deep Learning With TensorFlow: A Review

Published in	Conference Location	Paper Citations	Authors	Link
September 10, 2019	Journal of Educational and Behavioral Statistics	2	Pang, B., Nijkamp, E. and Wu, Y. N. (2020)	link

# Citation

Pang, B., Nijkamp, E. and Wu, Y. N. (2020) ‘Deep Learning With TensorFlow: A Review’, Journal of Educational and Behavioral Statistics, 45(2), pp. 227–248. doi: 10.3102/1076998619872761.

# Abstract

This review covers the core concepts and design decisions of TensorFlow. TensorFlow, originally created by researchers at Google, is the most popular one among the plethora of deep learning libraries. In the field of deep learning, neural networks have achieved tremendous success and gained wide popularity in various areas. This family of models also has tremendous potential to promote data analysis and modeling for various problems in educational and behavioral sciences given its flexibility and scalability. We give the reader an overview of the basics of neural network models such as the multilayer perceptron, the convolutional neural network, and stochastic gradient descent, the most commonly used optimization method for neural network models. However, the implementation of these models and optimization algorithms is timeconsuming and error-prone. Fortunately, TensorFlow greatly eases and accelerates the research and application of neural network models. We review several core concepts of TensorFlow such as graph construction functions, graph execution tools, and TensorFlow’s visualization tool, TensorBoard. Then, we apply these concepts to build and train a convolutional neural network model to classify handwritten digits. This review is concluded by a comparison of low- and high-level application programming interfaces and a discussion of graphical processing unit support, distributed training, and probabilistic modeling with TensorFlow Probability library.

# Review

Good review of TensorFlow, talks about machine learning together with computer vision, talks about TensorBoard and GPU acceleration with CUDA

# Interesting quotes from the article:

"In the field of machine learning, neural networks have shown remarkable success in a wide range of areas such as computer vision (Krizhevsky, Sutskever, & Hinton, 2012), natural language processing (Collobert & Weston, 2008), and bioinformatics (Min, Lee, & Yoon, 2017)."

"TensorFlow is a flexible and scalable software library for numerical computations using dataflow graphs. This library and related tools enable users to efficiently program and train neural network and other machine learning models and deploy them to production. Core algorithms of TensorFlow is written in highly optimized Cþþ and CUDA (Compute Unified Device Architecture), a parallel computing platform and API created by NVIDIA."

"Neural network models are often large, convoluted structures and may lead to confusion. TensorBoard is a visualization tool that makes it easier to understand and debug TensorFlow programs. Users can use TensorBoard to easily visualize the computation graph of a model, training metrics, and parameter values."

"TensorFlow is a popular and flexible machine learning library. It has both low-level APIs and high-level APIs (Keras and Estimators) and supports multiple languages interfaces besides Python. Users can easily visualize model and training metrics with TensorBoard, distribute training with tf.distribute. Strategy, and combine probabilistic methods with deep neural networks using TFP library."

"Neural network models and some other machine learning models heavily involve matrix multiplication that are simple computations and highly parallelizable. The architecture of GPU is ideal for this type of computation. GPU can be several 100 times (or even more, depending on the particular hardware) faster than CPU on neural network model training. TensorFlow code is optimized to run on GPU by utilizing CUDA and cuDNN (CUDA Deep Neural Network library), a deep neural network library based on CUDA."

"A CNN consists of multiple layers of convolutional and fully connected layers, with max pooling and average pooling between the layers."

"Max pooling is performed after each convolutional layer, so that the feature maps at the higher layers are smaller than those at the lower layer."

# 6. Deep Convolutional Neural Networks for Computer Aided

Published in	Conference Location	Paper Citations	Authors	Link
IEEE Transactions on Medical Imaging ( Volume: 35 , Issue: 5 , May 2016 )	IEEE	1436	H. Shin et al.	link

# Citation

H. Shin et al., "Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning," in IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1285-1298, May 2016, doi: 10.1109/TMI.2016.2528162.

# Abstract

Remarkable progress has been made in image recognition, primarily due to the availability of large-scale annotated datasets and deep convolutional neural networks (CNNs). CNNs enable learning data-driven, highly representative, hierarchical image features from sufficient training data. However, obtaining datasets as comprehensively annotated as ImageNet in the medical imaging domain remains a challenge. There are currently three major techniques that successfully employ CNNs to medical image classification: training the CNN from scratch, using off-the-shelf pre-trained CNN features, and conducting unsupervised CNN pre-training with supervised fine-tuning. Another effective method is transfer learning, i.e., fine-tuning CNN models pre-trained from natural image dataset to medical image tasks. In this paper, we exploit three important, but previously understudied factors of employing deep convolutional neural networks to computer-aided detection problems. We first explore and evaluate different CNN architectures. The studied models contain 5 thousand to 160 million parameters, and vary in numbers of layers. We then evaluate the influence of dataset scale and spatial image context on performance. Finally, we examine when and why transfer learning from pre-trained ImageNet (via fine-tuning) can be useful. We study two specific computer-aided detection (CADe) problems, namely thoraco-abdominal lymph node (LN) detection and interstitial lung disease (ILD) classification. We achieve the state-of-the-art performance on the mediastinal LN detection, and report the first five-fold cross-validation classification results on predicting axial CT slices with ILD categories. Our extensive empirical evaluation, CNN model analysis and valuable insights can be extended to the design of high performance CAD systems for other medical imaging tasks.

# Review

An application of CNN's in

# Interesting quotes from the article:

"Unlike previous image datasets used in computer vision, ImageNet [1] offers a very comprehensive database of more than 1.2 million categorized natural images of 1000+ classes. The CNN models trained upon this database serve as the backbone for significantly improving many object detection and image segmentation problems using other datasets [6], [7], e.g., PASCAL [8] and medical image categorization [9]–[12]. However, there exists no large-scale annotated medical image dataset comparable to ImageNet, as data acquisition is difficult, and quality annotation is costly."

"There are currently three major techniques that successfully employ CNNs to medical image classification: 1) training the “CNN from scratch” [13]–[17]; 2) using “off-the-shelf CNN” features (without retraining the CNN) as complementary information channels to existing hand-crafted image features, for chest X-rays [10] and CT lung nodule identification [9], [12]; and 3) performing unsupervised pre-training on natural or medical images and fine-tuning on medical target images using CNN or other types of deep learning models [18]–[21]. A decompositional 2.5D view resampling and an aggregation of random view classification scores are used to eliminate the “curse-of-dimensionality” issue in [22], in order to acquire a sufficient number of training image samples."

"Recently, ImageNet pre-trained CNNs have been used for chest pathology identification and detection in X-ray and CT modalities They have yielded the best performance results by integrating low-level image features"

"We employ CNNs (with the characteristics defined above) to thoraco-abdominal lymph node (LN) detection (evaluated separately on the mediastinal and abdominal regions) and interstitial lung disease (ILD) detection."

A. Thoracoabdominal Lymph Node Datasets

"Transforming axial, coronal, and sagittal representations to RGB also facilitates transfer learning from CNN models trained on ImageNet."

"In summary, the assumption that there are or must be pixel-wise spatial correlations among input channels does not apply to the CNN model representation. For other medical imaging problems, such as pulmonary embolism detection [29], in which orientation can be constrained along the attached vessel axis, vessel-aligned multi-planar image representation (MPR) is more effective than randomly aligned MPR."

B. Interstitial Lung Disease Dataset

"We utilize the publicly available dataset of [37]. It contains 905 image slices from 120 patients, with six lung tissue types annotations containing at least one of the following: healthy (NM), emphysema (EM), ground glass (GG), fibrosis (FB), micronodules (MN) and consolidation (CD)"

"Due to the high precision of CNN based image processing, highly accurate lung segmentation is not necessary. The localization of ILD regions within the lung is simultaneously learned through selectively weighted CNN reception fields in the deepest convolutional layers during the classification based CNN training [49], [50]. Some areas outside of the lung appear in both healthy or diseased images. CNN training learns to ignore them by setting very small filter weights around the corresponding regions (Fig. 13). This observation is validated by [40]."

We mainly explore three convolutional neural network architectures (CifarNet [5], [22], AlexNet [4] and GoogLeNet [33]) with different model training parameter values.

"CifarNet: CifarNet, introduced in [5], was the state-ofthe- art model for object recognition on the Cifar10 dataset, which consists of 32 32 images of 10 object classes. The objects are normally centered in the images. Some example images and class categories from the Cifar10 dataset are shown in Fig. 7. CifarNet has three convolution layers, three pooling layers, and one fully-connected layer. This CNN architecture, also used in [22] has about 0.15 million free parameters. We adopt it as a baseline model for the LN detection."

"AlexNet: The AlexNet architecture was published in [4], achieved significantly improved performance over the other non-deep learning methods for ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012. This success has revived the interest in CNNs [3] in computer vision. ImageNet consists of 1.2 million 256 256 images belonging to 1000 categories. At times, the objects in the image are small and obscure, and thus pose more challenges for learning a successful classification model. More details about the ImageNet dataset will be discussed in Section III-B. AlexNet has five convolution layers, three pooling layers, and two fully-connected layers with approximately 60 million free parameters. AlexNet is our default CNN architecture for evaluation and analysis in the remainder of the paper."

"GoogLeNet: The GoogLeNet model proposed in [33], is significantly more complex and deep than all previous CNN architectures. More importantly, it also introduces a new module called “Inception”, which concatenates filters of different sizes and dimensions into a single new filter (refer to Fig. 6). Overall, GoogLeNet has two convolution layers, two pooling layers, and nine “Inception” layers. Each “Inception” layer consists of six convolution layers and one pooling layer. An illustration of an “Inception” layer from GoogLeNet is shown in Fig. 6. GoogLeNet is the current state-of-the-art CNN architecture for the ILSVRC challenge, where it achieved 5.5% top-5 classification error on the ImageNet challenge, compared to AlexNet's 15.3% top-5 classification error."

# Conclusion

"In this paper, we exploit and extensively evaluate three important, previously under-studied factors on deep convolutional neural networks (CNN) architecture, dataset characteristics, and transfer learning. We evaluate CNN performance on two different computer-aided diagnosis applications: thoraco-abdominal lymph node detection and interstitial lung disease classification. The empirical evaluation, CNN model visualization, CNN performance analysis, and conclusive insights can be generalized to the design of high performance CAD systems for other medical imaging tasks."

# 7. Going Deeper with Convolutions

Published in	Conference Location	Paper Citations	Authors	Link
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)	Boston, MA, USA	9114	Christian Szegedy et al.	link

# Citation

C. Szegedy et al., "Going deeper with convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1-9, doi: 10.1109/CVPR.2015.7298594.

# Abstract

We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

# Review

Talks about the state of the art of convolution neural networks. And Inception arquitecture, the GoogleNet who won the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

GoogleNet was introduced here. More importantly, it also introduces a new module called “Inception”, which concatenates filters of different sizes and dimensions into a single new filter

# Interesting quotes from the article:

"One encouraging news is that most of this progress is not just the result of more powerful hardware, larger datasets and bigger models, but mainly a consequence of new ideas, algorithms and improved network architectures."

"For most of the experiments, the models were designed to keep a computational budget of 1.5 billion multiply-adds at inference time, so that the they do not end up to be a purely academic curiosity, but could be put to real world use, even on large datasets, at a reasonable cost."

In our case, the word “deep” is used in two different meanings: first of all, in the sense that we introduce a new level of organization in the form of the “Inception module” and also in the more direct sense of increased network depth.

"we increased the depth and width of the network while keeping the compu- tational budget constant."

"One particular in-carnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection."

"In the last three years, our object classification and detection capabilities have dramatically improved due to advances in deep learning and convolutional networks [10]. One encouraging news is that most of this progress is not just the result of more powerful hardware, larger datasets and bigger models, but mainly a consequence of new ideas, algorithms and improved network architectures."

"Another notable factor is that with the ongoing traction of mobile and embedded computing, the efficiency of our algorithms – especially their power and memory use – gains importance."

"Starting with LeNet-5 [10], convolutional neural networks (CNN) have typically had a standard structure – stacked convolutional layers (optionally followed by con- trast normalization and max-pooling) are followed by one or more fully-connected layers. Variants of this basic design are prevalent in the image classification literature and have yielded the best results to-date on MNIST, CIFAR and most notably on the ImageNet classification challenge [9, 21]. For larger datasets such as Imagenet, the recent trend has been to increase the number of layers [12] and layer size [21, 14], while using dropout [7] to address the problem of overfitting."

"Despite concerns that max-pooling layers result in loss of accurate spatial information, the same convolutional network architecture as [9] has also been successfully employed for localization [9, 14], object detection [6, 14, 18, 5] and human pose estimation [19]."

"Finally, the current state of the art for object detection is the Regions with Convolutional Neural Networks (R-CNN) method by Girshick et al. [6]. R-CNN decomposes the overall detection problem into two subproblems: utilizing lowlevel cues such as color and texture in order to generate object location proposals in a category-agnostic fashion and using CNN classifiers to identify object categories at those locations. Such a two stage approach leverages the accuracy of bounding box segmentation with low-level cues, as well as the highly powerful classification power of state-ofthe- art CNNs."

Check This article later

[6] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition, 2014. CVPR 2014. IEEE Confer- ence on, 2014.

"Current state-of-the-art architectures for computer vision have uniform structure. The large number of filters and greater batch size allows for the efficient use of dense computation."

"The main idea of the Inception architecture is to consider how an optimal local sparse structure of a convolutional vision network can be approximated and covered by readily available dense components. Note that assuming translation invariance means that our network will be built from convolutional building blocks. All we need is to find the optimal local construction and to repeat it spatially."

"By the“GoogLeNet” name we refer to the particular incarnation of the Inception architecture used in our submission for the ILSVRC 2014 competition."

# Conclusions

"Our results yield a solid evidence that approximating the expected optimal sparse structure by readily available dense building blocks is a viable method for improving neural networks for computer vision. The main advantage of this method is a significant quality gain at a modest increase of computational requirements compared to shallower and narrower architectures. Our object detection work was competitive despite not utilizing context nor performing bounding box regression, suggesting yet further evidence of the strengths of the Inception architecture. For both classification and detection, it is expected that similar quality of result can be achieved by much more expensive non-Inception-type networks of similar depth and width. Still, our approach yields solid evidence that moving to sparser architectures is feasible and useful idea in general. This suggest future work towards creating sparser and more refined structures in automated ways on the basis of [2], as well as on applying the insights of the Inception architecture to other domains."

# Classification of CNN's architectures

# 8. Deep Residual Learning for Image Recognition

Published in	Conference Location	Paper Citations	Authors	Link
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)	Las Vegas, NV, USA	19366	Kaiming He et al.	link

# Citation

K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.

# Abstract

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

# Review

The article expose the breakthroughs for image classification using Deep Residual nets, and they relatively improved the COCO's standard metric in 28%.

# Interesting quotes from the article:

"Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset."

"Deep convolutional neural networks [22, 21] have led to a series of breakthroughs for image classification [21, 49, 39]."

"Recent evidence [40, 43] reveals that network depth is of crucial importance, and the leading results [40, 43, 12, 16] on the challenging ImageNet dataset [35] all exploit “very deep” [40] models, with a depth of sixteen [40] to thirty Many other nontrivial visual recognition tasks [7, 11, 6, 32, 27] have also greatly benefited from very deep models."

"When deeper networks are able to start converging, a degradation problem has been exposed: with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly. Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error, as reported in [10, 41] and thoroughly verified by our experiments."

"In this paper, we address the degradation problem by introducing a deep residual learning framework. Instead of hoping each few stacked layers directly fit a desired underlying mapping, we explicitly let these layers fit a residual mapping."

"To the extreme, if an identity mapping were optimal, it would be easier to push the residual to zero than to fit an identity mapping by a stack of nonlinear layers."

"The formulation of F(x)+x can be realized by feedforward neural networks with “shortcut connections” (Fig. 2)."

"Shortcut connections [2, 33, 48] are those skipping one or more layers. In our case, the shortcut connections simply perform identity mapping, and their outputs are added to the outputs of the stacked layers (Fig. 2)."

"Identity shortcut connections add neither extra parameter nor computational complexity. The entire network can still be trained end-to-end by SGD with backpropagation, and can be easily implemented using common libraries (e.g., Caffe [19]) without modifying the solvers."

"We show by experiments (Fig. 7) that the learned residual functions in general have small responses, suggesting that identity mappings provide reasonable preconditioning."

"Most remarkably, on the challenging COCO dataset we obtain a 6.0% increase in COCO’s standard metric (mAP@[.5, .95]), which is a 28% relative improvement. This gain is solely due to the learned representations. Based on deep residual nets, we won the 1st places in several tracks in ILSVRC & COCO 2015 competitions: ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation. The details are in the appendix."

# Conclusions

"Based on deep residual nets, we won the 1st places in several tracks in ILSVRC & COCO 2015 competitions: ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation. The details are in the appendix."

# 9. R-CNN - Rich feature hierarchies for accurate object detection and semantic segmentation

Published in	Conference Location	Paper Citations	Authors	Link
2014 IEEE Conference on Computer Vision and Pattern Recognition	Columbus, OH, USA	6186	Girshick, R.	link

# Citation

R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 580-587, doi: 10.1109/CVPR.2014.81.

# Abstract

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also present experiments that provide insight into what the network learns, revealing a rich hierarchy of image features. Source code for the complete system is available at http://www.cs.berkeley.edu.ezproxy.massey.ac.nz/~rbg/rcnn.

# Review

R-CNNs improved image recognition in about 30% (PASCAL VOC 2012), it made use of high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects, bassicaly, they've combined region proposals with CNNs. accoordingly to XXX Faster-Rcnn article XXX R-CNN's mainly plays as a classifier, and does not predicts object bounds, exeptc for refining bounding box regression.

# Interesting quotes from the article:

"Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features."

"it is generally acknowledged that progress has been slow during 2010-2012, with small gains obtained by building ensemble systems and employing minor variants of successful methods."

"But we also know that recognition occurs several stages downstream, which suggests that there might be hierarchical, multi-stage processes for computing features that are even more informative for visual recognition."

"Fukushima’s “neocognitron” [17], a biologicallyinspired hierarchical and shift-invariant model for pattern recognition, was an early attempt at just such a process. The neocognitron, however, lacked a supervised training algorithm"

"Building on Rumelhart et al. [30], LeCun et al. [24] showed that stochastic gradient descent via backpropagation was effective for training convolutional neural networks (CNNs), a class of models that extend the neocognitron."

"This paper is the first to show that a CNN can lead to dramatically higher object detection performance on PASCAL VOC as compared to systems based on simpler HOG-like features."

"To achieve this result, we focused on two problems: localizing objects with a deep network and training a high-capacity model with only a small quantity of annotated detection data."

"Unlike image classification, detection requires localizing (likely many) objects within an image. One approach frames localization as a regression problem. However, work from Szegedy et al. [33], concurrent with our own, indicates that this strategy may not fare well in practice (they report a mAP of 30.5% on VOC 2007 compared to the 58.5% achieved by our method).An alternative is to build a sliding-window detector. CNNs have been used in this way for at least two decades, typically on constrained object categories, such as faces [29, 35] and pedestrians [31]. In order to maintain high spatial resolution, these CNNs typically only have two convolutional and pooling layers."

"We also considered adopting a sliding-window approach. However, units high up in our network, which has five convolutional layers, have very large receptive fields (195 × 195 pixels) and strides (32×32 pixels) in the input image, which makes precise localization within the sliding-window paradigm an open technical challenge."

"Instead, we solve the CNN localization problem by operating within the “recognition using regions” paradigm [19], which has been successful for both object detection [34] and semantic segmentation [5]."

"At test time, our method generates around 2000 category-independent region proposals for the input image, extracts a fixed-length feature vector from each proposal using a CNN, and then classifies each region with category-specific linear SVMs."

"We use a simple technique (affine image warping) to compute a fixed-size CNN input from each region proposal, regardless of the region’s shape. Figure 1 presents an overview of our method and highlights some of our results. Since our system combines region proposals with CNNs, we dub the method R-CNN: Regions with CNN features."

"A second challenge faced in detection is that labeled data is scarce and the amount currently available is insufficient for training a large CNN. The conventional solution to this problem is to use unsupervised pre-training, followed by supervised fine-tuning (e.g., [31])."

"The second principle contribution of this paper is to show that supervised pre-training on a large auxiliary dataset (ILSVRC), followed by domainspecific fine-tuning on a small dataset (PASCAL), is an effective paradigm for learning high-capacity CNNs when data is scarce."

"Region proposals. A variety of recent papers offer methods for generating category-independent region proposals. Examples include: objectness [1], selective search [34], category-independent object proposals [12], constrained parametric min-cuts (CPMC) [5], multi-scale combinatorial grouping [3], and Cires¸an et al. [6], who detect mitotic cells by applying a CNN to regularly-spaced square crops, which are a special case of region proposals. While R-CNN is agnostic to the particular region proposal method, we use selective search to enable a controlled comparison with prior detection work (e.g., [34, 36])."

"Feature extraction. We extract a 4096-dimensional feature vector from each region proposal using the Caffe [22] implementation of the CNN described by Krizhevsky et al. [23]. Features are computed by forward propagating a mean-subtracted 227 × 227 RGB image through five convolutional layers and two fully connected layers. We refer readers to [22, 23] for more network architecture details."

"Bounding box regression Based on the error analysis, we implemented a simple method to reduce localization errors. Inspired by the bounding box regression employed in DPM [15], we train a linear regression model to predict a new detection window given the pool5 features for a selective search region proposal. Full details are given in the supplementary material. Results in Table 1, Table 2, and Figure 4 show that this simple approach fixes a large number of mislocalized detections, boosting mAP by 3 to 4 points."

# Conclusion

In recent years, object detection performance had stagnated. The best performing systems were complex ensembles combining multiple low-level image features with high-level context from object detectors and scene classifiers. This paper presents a simple and scalable object detection algorithm that gives a 30% relative improvement over the best previous results on PASCAL VOC 2012.

We achieved this performance through two insights. The first is to apply high-capacity convolutional neural networks to bottom-up region proposals in order to localize and segment objects. The second is a paradigm for training large CNNs when labeled training data is scarce. We show that it is highly effective to pre-train the network-withsupervision—for a auxiliary task with abundant data (image classification) and then to fine-tune the network for the target task where data is scarce (detection). We conjecture that the “supervised pre-training/domain-specific fine-tuning” paradigm will be highly effective for a variety of data-scarce vision problems.

We conclude by noting that it is significant that we achieved these results by using a combination of classical tools from computer vision and deep learning (bottom-up region proposals and convolutional neural networks). Rather than opposing lines of scientific inquiry, the two are natural and inevitable partners.

# 10. Fast R-CNN

Published in	Conference Location	Paper Citations	Authors	Link
2015 IEEE International Conference on Computer Vision (ICCV)	Santiago, Chile	4702	R. Girshick	link

# Citation

R. Girshick, "Fast R-CNN," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 1440-1448, doi: 10.1109/ICCV.2015.169.

# Abstract

This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.

# Review

In this article, the author exposes a drawback of R-CNN, the author states that R-CNN performs a ConvNet forward pass for each object proposal, without sharing computation. Basically, Fast R-cnn's addresses the R-CNN sharing computation drawback by using SPPnets, which computes a convolutional feature map for the entire input image and then classifies each object proposal.

# Interesting quotes from the article:

"In this paper, we streamline the training process for stateof- the-art ConvNet-based object detectors [9, 11]. We propose a single-stage training algorithm that jointly learns to classify object proposals and refine their spatial locations."

"R-CNN, however, has notable drawbacks:

Training is a multi-stage pipeline. R-CNN first finetunes a ConvNet on object proposals using log loss. Then, it fits SVMs to ConvNet features. These SVMs act as object detectors, replacing the softmax classifier learnt by fine-tuning. In the third training stage, bounding-box regressors are learned.
Training is expensive in space and time. For SVM and bounding-box regressor training, features are extracted from each object proposal in each image and written to disk. With very deep networks, such as VGG16, this process takes 2.5 GPU-days for the 5k images of the VOC07 trainval set. These features require hundreds of gigabytes of storage.
Object detection is slow. At test-time, features are extracted from each object proposal in each test image. Detection with VGG16 takes 47s / image (on a GPU)."

" R-CNN is slow because it performs a ConvNet forward pass for each object proposal, without sharing computation. Spatial pyramid pooling networks (SPPnets) [11] were proposed to speed up R-CNN by sharing computation."

"The SPPnet method computes a convolutional feature map for the entire input image and then classifies each object proposal using a feature vector extracted from the shared feature map. Features are extracted for a proposal by maxpooling the portion of the feature map inside the proposal into a fixed-size output (e.g., 6 × 6). Multiple output sizes are pooled and then concatenated as in spatial pyramid pooling [15]. SPPnet accelerates R-CNN by 10 to 100× at test time. Training time is also reduced by 3× due to faster proposal feature extraction."

"SPPnet also has notable drawbacks. Like R-CNN, training is a multi-stage pipeline that involves extracting features, fine-tuning a network with log loss, training SVMs, and finally fitting bounding-box regressors. Features are also written to disk. But unlike R-CNN, the fine-tuning algorithm proposed in [11] cannot update the convolutional layers that precede the spatial pyramid pooling. Unsurprisingly, this limitation (fixed convolutional layers) limits the accuracy of very deep networks."

"We call this method Fast R-CNN because it’s comparatively fast to train and test. The Fast RCNN method has several advantages:

Higher detection quality (mAP) than R-CNN, SPPnet
Training is single-stage, using a multi-task loss
Training can update all network layers
No disk storage is required for feature caching"

"A Fast R-CNN network takes as input an entire image and a set of object proposals. The network first processes the whole image with several convolutional (conv) and max pooling layers to produce a conv feature map. Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map. Each feature vector is fed into a sequence of fully connected (fc) layers that finally branch into two sibling output layers: one that produces softmax probability estimates over K object classes plus a catch-all “background” class and another layer that outputs four real-valued numbers for each of theK object classes. Each set of 4 values encodes refined bounding-box positions for one of the K classes."

"Fast R-CNN detection Once a Fast R-CNN network is fine-tuned, detection amounts to little more than running a forward pass (assuming object proposals are pre-computed). The network takes as input an image (or an image pyramid, encoded as a list of images) and a list of R object proposals to score. At test-time, R is typically around 2000, although we will consider cases in which it is larger (≈ 45k). When using an image pyramid, each RoI is assigned to the scale such that the scaled RoI is closest to 2242 pixels in area [11]. For each test RoI r, the forward pass outputs a class posterior probability distribution p and a set of predicted bounding-box offsets relative to r (each of the K classes gets its own refined bounding-box prediction). We assign a detection confidence to r for each object class k using the estimated probability Pr(class = k | r) Δ= pk. We then perform non-maximum suppression independently for each class using the algorithm and settings from R-CNN [9]."

# Conclusion

This paper proposes Fast R-CNN, a clean and fast update to R-CNN and SPPnet. In addition to reporting state-of-theart detection results, we present detailed experiments that we hope provide new insights. Of particular note, sparse object proposals appear to improve detector quality. This issue was too costly (in time) to probe in the past, but becomes practical with Fast R-CNN. Of course, there may exist yet undiscovered techniques that allow dense boxes to perform as well as sparse proposals. Such methods, if developed, may help further accelerate object detection.

# 11. Faster R-CNN - 2017

Published in	Conference Location	Paper Citations	Authors	Link
IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 39 , Issue: 6 , June 1 2017 )	IEEE	3714	Ren, S., et al.	link

# Citation

S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017, doi: 10.1109/TPAMI.2016.2577031.

# Abstract

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network(RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features-using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3], our detection system has a frame rate of 5 fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

# Review

Faster R-CNN's exposes region proposal as the bottleneck of object detection and introduce the novel Region Proposal Network (RPN) enabling nearly cost-free region proposals. In It's two stages it proposes candidate object bounding boxes (RPN), and then extracts features using RoIPool from each canditate and performs classification and bounding-box regression , as a Fast R-CNN. (refer to mask R-CNN paper for this)

# Interesting quotes from the article:

"Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck."

"An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features"

"Although region-based CNNs were computationally expensive as originally developed in [5], their cost has been drastically reduced thanks to sharing convolutions across proposals [1], [2]. The latest incarnation, Fast R-CNN [2], achieves near real-time rates using very deep networks [3], when ignoring the time spent on region proposals. Now, proposals are the test-time computational bottleneck in stateof- the-art detection systems"

"Onemay note that fast region-basedCNNs take advantage of GPUs, while the region proposal methods used in research are implemented on the CPU, making such runtime comparisons inequitable. An obvious way to accelerate proposal computation is to re-implement it for the GPU. This may be an effective engineering solution, but re-implementation ignores the down-stream detection network and therefore misses important opportunities for sharing computation."

"In this paper, we show that an algorithmic change—computing proposals with a deep convolutional neural network— leads to an elegant and effective solution where proposal computation is nearly cost-free given the detection network’s computation."

"To this end, we introduce novel Region Proposal Networks (RPNs) that share convolutional layers with state-of-the-art object detection networks [1], [2]. By sharing convolutions at test-time, the marginal cost for computing proposals is small (e.g., 10 ms per image)."

"Our observation is that the convolutional feature maps used by region-based detectors, like Fast R-CNN, can also be used for generating region proposals. On top of these convolutional features, we construct an RPN by adding a few additional convolutional layers that simultaneously regress region bounds and objectness scores at each location on a regular grid. The RPN is thus a kind of fully convolutional network (FCN) [7] and can be trained end-to-end specifically for the task for generating detection proposals."

"Our scheme can be thought of as a pyramid of regression references (Fig. 1c), which avoids enumerating images or filters of multiple scales or aspect ratios. This model performs well when trained and tested using single-scale images and thus benefits running speed."

"To unify RPNs with Fast R-CNN [2] object detection networks, we propose a training scheme that alternates between fine-tuning for the region proposal task and then fine-tuning for object detection, while keeping the proposals fixed. This scheme converges quickly and produces a unified network with convolutional features that are shared between both tasks.1"

"The R-CNN method [5] trains CNNs end-to-end to classify the proposal regions into object categories or background. R-CNN mainly plays as a classifier, and it does not predict object bounds (except for refining by bounding box regression). Its accuracy depends on the performance of the region proposal module (see comparisons in [20])."

"In the OverFeat method [9], a fully-connected layer is trained to predict the box coordinates for the localization task that assumes a single object. The fully-connected layer is then turned into a convolutional layer for detecting multiple class-specific objects."

"FASTER R-CNN Our object detection system, called Faster R-CNN, is composed of two modules. The first module is a deep fully convolutional network that proposes regions, and the second module is the Fast R-CNN detector [2] that uses the proposed regions. The entire system is a single, unified network for object detection (Fig. 2). Using the recently popular terminology of neural networks with ‘attention’ [31] mechanisms, the RPN module tells the Fast R-CNN module where to look. In Section 3.1 we introduce the designs and properties of the network for region proposal. In Section 3.2 we develop algorithms for training both modules with features shared."

"Region Proposal Networks A Region Proposal Network takes an image (of any size) as input and outputs a set of rectangular object proposals, each with an objectness score.3 We model this process with a fully convolutional network [7], which we describe in this section. Because our ultimate goal is to share computation with a Fast R-CNN object detection network [2], we assume that both nets share a common set of convolutional layers. In our experiments, we investigate the Zeiler and Fergus model [32] (ZF), which has five shareable convolutional layers and the Simonyan and Zisserman model [3] (VGG-16), which has 13 shareable convolutional layers."

"Nevertheless, our method achieves bounding-box regression by a different manner from previous RoI-based (Region of Interest) methods [1], [2]. In [1], [2], bounding-box regression is performed on features pooled from arbitrarily sized RoIs, and the regression weights are shared by all region sizes. In our formulation, the features used for regression are of the same spatial size (3 3) on the feature maps. To account for varying sizes, a set of k bounding-box regressors are learned. Each regressor is responsible for one scale and one aspect ratio, and the k regressors do not share weights. As such, it is still possible to predict boxes of various sizes even though the features are of a fixed size/scale, thanks to the design of anchors."

# Conclusion

We have presented RPNs for efficient and accurate region proposal generation. By sharing convolutional features with the down-stream detection network, the region proposal step is nearly cost-free. Our method enables a unified, deep-learning-based object detection system to run at 5-17 fps. The learned RPN also improves region proposal quality and thus the overall object detection accuracy.

# 12. Mask R-CNN - 2017

Published in	Conference Location	Paper Citations	Authors	Link
2017 IEEE International Conference on Computer Vision (ICCV)	Venice, Italy	2075	He, K., et al.	link

# Citation

K. He, G. Gkioxari, P. Dollár and R. Girshick, "Mask R-CNN," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 2980-2988, doi: 10.1109/ICCV.2017.322.

# Abstract

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available.

# Review

Masks R-CNN extends Faster R-CNNs by adding a prediting segmentation masks on each RoI, in parallel with the existing branch for classification and bounding box regression. They state that Faster R-CNN was not designed for pixel-to-pixel alignment between network inputs and outputs.

# Interesting quotes from the article:

"The vision community has rapidly improved object detection and semantic segmentation results over a short period of time. In large part, these advances have been driven by powerful baseline systems, such as the Fast/Faster RCNN [9, 29] and Fully Convolutional Network (FCN) [24] frameworks for object detection and semantic segmentation, respectively."

"Instance segmentation is challenging because it requires the correct detection of all objects in an image while also precisely segmenting each instance. It therefore combines elements from the classical computer vision tasks of object detection, where the goal is to classify individual objects and localize each using a bounding box, and semantic segmentation, where the goal is to classify each pixel into a fixed set of categories without differentiating object instances. 1 Given this, one might expect a complex method is required to achieve good results. However, we show that a surprisingly simple, flexible, and fast system can surpass prior state-of-the-art instance segmentation results."

"Our method, called Mask R-CNN, extends Faster R-CNN [29] by adding a branch for predicting segmentation masks on each Region of Interest (RoI), in parallel with the existing branch for classification and bounding box regression (Figure 1)."

"The mask branch is a small FCN applied to each RoI, predicting a segmentation mask in a pixel-topixel manner. Mask R-CNN is simple to implement and train given the Faster R-CNN framework, which facilitates a wide range of flexible architecture designs. Additionally, the mask branch only adds a small computational overhead, enabling a fast system and rapid experimentation."

"In principle Mask R-CNN is an intuitive extension of Faster R-CNN, yet constructing the mask branch properly is critical for good results. Most importantly, Faster R-CNN was not designed for pixel-to-pixel alignment between network inputs and outputs."

"To fix the misalignment, we propose a simple, quantization-free layer, called RoIAlign, that faithfully preserves exact spatial locations."

"Faster R-CNN: We begin by briefly reviewing the Faster R-CNN detector [29]. Faster R-CNN consists of two stages. The first stage, called a Region Proposal Network (RPN), proposes candidate object bounding boxes. The second stage, which is in essence Fast R-CNN [9], extracts features using RoIPool from each candidate box and performs classification and bounding-box regression. The features used by both stages can be shared for faster inference. We refer readers to [17] for latest, comprehensive comparisons between Faster R-CNN and other frameworks."

"Given the effectiveness of Mask R-CNN for extracting object bounding boxes, masks, and keypoints, we expect it be an effective framework for other instance-level tasks."

"Implementation Details: We make minor modifications to the segmentation system when adapting it for keypoints."

# 13. Mask R-CNN - 2020

Youtube Video

Published in	Conference Location	Paper Citations	Authors	Link
IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 42 , Issue: 2 , Feb. 1 2020 )	IEEE	K. He et al.		link

# Citation

K. He, G. Gkioxari, P. Dollár and R. Girshick, "Mask R-CNN," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 386-397, 1 Feb. 2020, doi: 10.1109/TPAMI.2018.2844175.

# Abstract

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code has been made available at: https://github.com/facebookresearch/Detectron.

# Review

State-of-the-art technology, some trade offs on speed accuracy has to be taken into consideration.

# Interesting quotes from the article:

"This manuscript also describes some techniques that improve over our original results published in [7]."

"R-CNN. The Region-based CNN (R-CNN) approach [8] to bounding-box object detection is to attend to a manageable number of candidate object regions [9], [10] and evaluate convolutional networks [11], [12] independently on each RoI."

8 being the R-CNN article

"Mask R-CNN is conceptually simple: Faster R-CNN has two outputs for each candidate object, a class label and a bounding-box offset; to this we add a third branch that outputs the object mask.Mask R-CNNis thus a natural and intuitive idea. But the additional mask output is distinct from the class and box outputs, requiring extraction of much finer spatial layout of an object. Next,we introduce the key elements of Mask R-CNN, including pixel-to-pixel alignment, which is the main missing piece of Fast/Faster R-CNN."

"Our approach follows the spirit of Fast R-CNN [1] that applies bounding-box classification and regression in parallel (which turned out to largely simplify the multi-stage pipeline of original R-CNN [8])."

"We compare Mask R-CNN to the state-of-the-art methods in instance segmentation in Table 1. All instantiations of our model outperform baseline variants of previous state-ofthe- art models. This includes MNC [23] and FCIS [24], the winners of the COCO 2015 and 2016 segmentation challenges, respectively. Without bells and whistles, Mask RCNN with ResNet-101-FPN backbone outperforms FCIS++

[24], which includes multi-scale train/test, horizontal flip test, and online hard example mining (OHEM) [13]. While outside the scope of this work, we expect many such improvements to be applicable to ours."

"Although Mask R-CNN is fast, we note that our design is not optimized for speed, and better speed/accuracy tradeoffs could be achieved [15], e.g., by varying image sizes and proposal numbers, which is beyond the scope of this paper."

"We have presented a simple and effective framework for instance segmentation, which also shows good results on bounding box detection and can be extended to pose estimation. We hope the simplicity and generality of this framework will facilitate future research on these and other instance-level visual recognition tasks."

"We further report instance segmentation results on the Cityscapes [46] dataset. This dataset has fine annotations for 2,975 training images, 500 validation images, and 1,525 test images. It has 20k coarse training images without instance annotations, which we do not use. All images are at a fixed resolution of 2048 1024 pixels. The instance segmentation task involves 8 object categories, whose numbers of instances on the fine training set are: Instance segmentation performance on this task is measured by the COCO-style mask AP (averaged over IoU thresholds); AP50 (i.e., mask AP at an IoU of 0.5) is also reported."

"Implementation We apply our Mask R-CNN models with the ResNet- FPN-50 backbone; we have tested the 101-layer counterpart and found it performs similarly due to the small dataset size. We train with image scale (shorter side) randomly sampled from [800, 1024], which reduces overfitting; inference is on a single scale of 1,024 pixels. We use a mini-batch size of 1 image per GPU (so effectively 8 on 8 GPUs) and train the model for 24k iterations, starting from a learning rate of 0.01 and reducing it to 0.001 at 18k iterations. It takes 4 hours of training on a single 8-GPU machine under this setting. Other implementation details are identical as in Section 3.1."

# 14. Deep Convolutional Neural Networks for Automated Characterization of Arctic Ice-Wedge Polygons in Very High Spatial Resolution Aerial Imagery

Published in	Conference Location	Paper Citations	Authors	Link
2018 Remote Sensing, 10(9).	-	18	Zhang, W., et al.	link

# Citation

Zhang, W., Witharana, C., Liljedahl, A. K., & Kanevskiy, M. (2018). Deep Convolutional Neural Networks for Automated Characterization of Arctic Ice-Wedge Polygons in Very High Spatial Resolution Aerial Imagery. Remote Sensing, 10(9). doi:10.3390/rs10091487

# Abstract

The microtopography associated with ice-wedge polygons governs many aspects of Arctic ecosystem, permafrost, and hydrologic dynamics from local to regional scales owing to the linkages between microtopography and the flow and storage of water, vegetation succession, and permafrost dynamics. Wide-spread ice-wedge degradation is transforming low-centered polygons into high-centered polygons at an alarming rate. Accurate data on spatial distribution of ice-wedge polygons at a pan-Arctic scale are not yet available, despite the availability of sub-meter-scale remote sensing imagery. This is because the necessary spatial detail quickly produces data volumes that hamper both manual and semi-automated mapping approaches across large geographical extents. Accordingly, transforming big imagery into ‘science-ready’ insightful analytics demands novel image-to-assessment pipelines that are fueled by advanced machine learning techniques and high-performance computational resources. In this exploratory study, we tasked a deep-learning driven object instance segmentation method (i.e., the Mask R-CNN) with delineating and classifying ice-wedge polygons in very high spatial resolution aerial orthoimagery. We conducted a systematic experiment to gauge the performances and interoperability of the Mask R-CNN across spatial resolutions (0.15 m to 1 m) and image scene contents (a total of 134 km2) near Nuiqsut, Northern Alaska. The trained Mask R-CNN reported mean average precisions of 0.70 and 0.60 at thresholds of 0.50 and 0.75, respectively. Manual validations showed that approximately 95% of individual ice-wedge polygons were correctly delineated and classified, with an overall classification accuracy of 79%. Our findings show that the Mask R-CNN is a robust method to automatically identify ice-wedge polygons from fine-resolution optical imagery. Overall, this automated imagery-enabled intense mapping approach can provide a foundational framework that may propel future pan-Arctic studies of permafrost thaw, tundra landscape evolution, and the role of high latitudes in the global climate system.

# Review

Very good example on the work flow for practical application od Mask R-CNN.

# Interesting quotes from the article:

"Accordingly, transforming big imagery into ‘science-ready’ insightful analytics demands novel image-to-assessment pipelines that are fueled by advanced machine learning techniques and high-performance computational resources."

"Our findings show that the Mask R-CNN is a robust method to automatically identify ice-wedge polygons from fine-resolution optical imagery."

"Annotated Data We selected 340 subsets, each with the dimension of 600 600 pixels (i.e., 90 m 90 m), from the false-color composite of the Nuiqsut image (Figure 2) for manual annotation purposes. We chose the dimension of annotated data as 600 600 pixels based on two considerations: (1) to maximize the number of IWPs per subset; and (2) to minimize the error from manual annotation process as a too small or too large subset can be difficult for annotation. To avoid a class balancing problem, we roughly annotated an even number of objects for HCP and LCP polygons (3728 and 3764 polygonal objects). We used the “VGG Image Annotator” web tool (http://www.robots.ox.ac.uk/~vgg/software/ via/via.html) to annotate training samples for object instance segmentation and saved the training data in the format of JavaScript Object Notation (.json), which is also the data format being used in some other machine learning training data collection, such as COCO (http://cocodataset.org/#home). Finally, we randomly split the annotated 340 subsets into three sub datasets based on an 80:10:10 split, which entailed: training dataset (272 subsets (Figure 2a)), validation dataset for minimizing overfitting (33 subsets (Figure 2b)), and test dataset for evaluating the performance of the trained DL algorithm (35 subsets (Figure 2c))."

"General Workflow Our automated imagery-based ice-wedge polygon extraction workflow rests on four key steps (Figure 3): (1) division of VHSR imagery into overlapping patches; (2) object instance segmentation of input patches; (3) mask-to-polygon conversion; (4) eliminate duplicate polygons and compose unique polygons. Two block sizes of 600 × 600 pixels and 360 × 360 pixels were used to partition the VHSR scenes of Nuiqsut and Crea Creek with an overlap of 20% using the Geospatial Data Abstraction Library (GDAL, http://www.gdal.org/). It is worth noting that the two block sizes were selected separately in order to match the scale of the annotated data (i.e., 90 × 90 m). The DL algorithm performed the object instance segmentation with outputs as predicted binary mask with classification information. Finally, we cleaned (e.g., remove duplicates) the output polygons using scikit-image (http://scikit-image.org/) and"

"Deep Learning Algorithm We chose the state-of-the-art Mask R-CNN method [86] to implement the object instance segmentation due to its simplicity and effectiveness [94]. The Mask R-CNN method is an extended method for object instance segmentation and is built on the Faster R-CNN (a fast and effective algorithm for object detection [95]) by including a function for predicting masks for distinct objects [86]. Methodologically, the Mask R-CNN is a two-stage algorithm: (1) the Mask R-CNN generates proposals (i.e., candidate object bounding boxes) after scanning the image; (2) the Mask R-CNN predicts the class, bounding box, and binary mask for each region of interest (RoI) [86]. In terms of structure (Figure 4), the Mask R-CNN mainly consists of (1) backbone architecture Residual Learning network (ResNet) [96] for feature extraction; (2) Feature Pyramid Network (FPN) [97] for improving representation of objects at multiple scales; (3) Region Proposal Network (RPN) for generating RoI; (4) RoI Classifier for class prediction of each RoI; (5) Bounding Box Regressor (BBR) for refining RoI; (6) FCN [98] with RoIAlign [86] and bilinear interpolation for predicting pixel-accurate mask. A deeper discussion on the Mask R-CNN algorithm is beyond the scope of this study; thus, we refer readers to He et al. [86] for a detailed discussion on the mathematical basis of the algorithm."

"Accuracy Assessment We conducted a three-step assessment for the trained Mask R-CNN. First, we assessed the mean average precision (mAP: the mean of average precision values of each class) of the trained Mask R-CNN with the hold-out test dataset. Second, we randomly selected additional 30 subsets (Figure 5) for each study site (including non-IWP samples and excluding previous selected 340 subsets for Nuiqsut) and manually validate the accuracies of detection, delineation, and classification. We evaluated the accuracies of detection, delineation, and classification based on the following criteria: a positive detection indicates a IWP is correctly detected by the Mask R-CNN, and a negative detection indicates a IWP is not correctly detected by the Mask R-CNN; likewise, a positive delineation indicates the Mask R-CNN successfully inferences outline of an IWP based on interpreter’s judgement; a positive classification indicates the Mask R-CNN inferences the correct type of IWP of a detected IWP. Our manual validation included the following steps: in Step (1) we created 30 random square polygons (100 100 m) for the each study area with 12 attributes in shapefile database (true-positive, false-positive, true-negative, and false-negative for each of detection, delineation, and classification respectively) to assess the commission and omission errors; In Step (2) we manually counted all IWPs within or crossing the boundaries of the validation square polygons; and in Step (3) we filled the number for the each attribute based on the manual counting. Third, we corroborated qualitative assessments with detailed visual inspections by coupling imagery and LiDAR DTMs."

"Implementation We implemented the Mask R-CNN method using an open-source package built on Keras and Tensorflow developed by the team of Mask-RCNN on Github [99]. The codes are available on Github (https://github.com/matterport/Mask_RCNN). We conducted experiments on a customized server equipped with an Intel i5 CPU, 16 GB RAM, a GeForce GTX 970 graphic card, and a GeForce GTX 1080ti graphic card. In the training process, a graphics processing unit (GPU) (GeForce GTX 1080ti graphic card) was used to train the adopted ResNet-101 (101 layers) backbone Mask R-CNN in the package with a mini-batch size of 2 images, 272 steps per epoch, learning rate of 0.001, learning momentum of 0.9, and weight decay of 0.0001. The accessibility of this open-source package was the main reason we chose the ResNet-101 rather than ResNet-152 backbone Mask R-NN. We modified the loading dataset function for our customized training data and left other parameters in default settings. For the full description of parameter settings of the used submodels of the Mask R-CNN (e.g., ResNet, RPN, BBR etc.), we refer readers to the model part of the Mask R-CNN repository on Github (https://github.com/matterport/Mask_RCNN). Instead of building the Mask R-CNN from scratch, we trained our model using the pre-trained weights of the Mask R-CNN for COCO because COCO has a large amount of training data for the Mask R-CNN to learn common and discriminative features (known as transfer learning). To minimize overfitting, we used the validation dataset to decide the most generalized Mask-RCNN. Additionally, random horizontal flips augmentation was used to introduce variety in the training data. In the inference process, we set a detection confidence threshold as 70% (i.e., detections with confidence less than 70% were ignored). Two GPUs were used to accelerate the process of inferencing input patches with 20% overlapping by using a data processing queue. As a result, the aerial imagery (63,103 62,584 pixels) of Nuiqsut with a spatial resolution of 0.15 m, a total of 8134 patches (600 600 pixels), was able to infer approximately to within 21 min (~6.3 fps)."

"Mask R-CNN was a successful method in characterizing the Arctic ice-wedge polygonal landscape from VHSR images (Figure 12). The algorithm successfully utilized the multi-level feature representations learned from the training data for detection, delineation, and classification of targets of interest (Figures 13 and 14). However, there is a negative correlation between the overall accuracies of detection and (the coarser) spatial resolution of input imagery. This is likely to be due to the resolution dependency of the training dataset (0.15 m) in the Mask R-CNN approach. This means that the consistency of spatial resolution of both training data and predicting data could significantly impact the Mask R-CNN’s performance. Sub-meter resolution commercial satellite imagery (<0.25 m) is available across the pan-Artic domain. Thus, a Mask R-CNN derived product, with a relaxation to spatial resolution, could potentially be of high value for science applications that can benefit from a pan-Arctic ice-wedge polygon map."

# Conclusion

"Imagery-based IWP mapping is still largely constrained to time- and labor-intensive human-augmented workflows. The rapid influx of sub-meter resolution commercial satellite imagery to the polar science community demands high-performance image analysis workflows to meet the ever-increasing demand for imagery-enabled science products. We applied the Mask R-CNN to automatically detect and delineate ice-wedge polygons and identify their type. Four case studies from two study sites were used to assess the performance and transferability of the Mask R-CNN for the mission of characterizing the tundra ice-wedge polygon landscape. Our results report that: (1) the Mask R-CNN can detect up to 79% of IWPs in study sites with a VHSR imagery pixel resolution of 0.15 m and around 72% of IWPs with the imagery with a pixel resolution of 0.25 m; (2) besides promising performance in detection, the Mask R-CNN can delineate and classify detected IWPs accurately; (3) the pressure test of the Mask R-CNN on resampled imagery shows the flexibility and potential in automatically mapping IWPs in coarser RS images. Findings of this study provide an extensible framework for imagery-enabled intense mapping of ice-wedge polygons at extensive spatial and temporal scales. While the Mask R-CNN presents promising ability in automatically characterizing IWPs, further studies are necessary to fully understand the use of deep learning-driven object instance segmentation in characterizing IWPs. Our future work will focus on four main directions: (1) a comprehensive comparison study on how well the Mask R-CNN performs compared to other methods (e.g., OBIA, PANet, and MaskLab etc.); (2) a full analysis on model optimization procedures by examining a variety of combinations of hyperparameters; (3) a detailed analysis on the minimum requirements and quality of the training datasets, which intend to use in the Mask R-CNN training purposes; (4) a stress analysis by coupling spatial artifacts with spectral, radiometric, and structural artifacts."

# 15. Speed/accuracy trade-offs for modern convolutional object detectors

Published in	Conference Location	Paper Citations	Authors	Link
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)	Honolulu, HI, USA	343	J. Huang et al.	link

# Citation

J. Huang et al., "Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 3296-3297, doi: 10.1109/CVPR.2017.351.

# Abstract

The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-toapples comparisons are difficult due to different base feature extractors (e.g., VGG, Residual Networks), different default image resolutions, as well as different hardware and software platforms. We present a unified implementation of the Faster R-CNN [30], R-FCN [6] and SSD [25] systems, which we view as meta-architectures and trace out the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures. On one extreme end of this spectrum where speed and memory are critical, we present a detector that achieves real time speeds and can be deployed on a mobile device. On the opposite end in which accuracy is critical, we present a detector that achieves state-of-the-art performance measured on the COCO detection task.

# Review

# Interesting quotes from the article:

"A lot of progress has been made in recent years on object detection due to the use of convolutional neural networks (CNNs). Modern object detectors based on these networks — such as Faster R-CNN [30], R-FCN [6], Multibox [39], SSD [25] and YOLO [28] — are now good enough to be deployed in consumer products (e.g., Google Photos, Pinterest Visual Search) and some have been shown to be fast enough to be run on mobile devices."

"However, it can be difficult for practitioners to decide what architecture is best suited to their application. Standard metrics, such as mean average precision (mAP), do not tell the entire story, since for real deployments of computer vision systems, running time and memory usage are also critical."

"In this paper, we seek to explore the speed/accuracy trade-off of modern detection systems in an exhaustive and fair way."

"Our findings show that using fewer proposals for Faster R-CNN can speed it up significantly without a big loss in accuracy, making it competitive with its faster cousins, SSD and RFCN."

"Single Shot Detector (SSD). Though the SSD paper was published only recently (Liu et al., [25]), we use the term SSD to refer broadly to architectures that use a single feedforward convolutional network to directly predict classes and anchor offsets without requiring a second stage perproposal classification operation (Figure 1(a)). Under this definition, the SSD meta-architecture has been explored in a number of precursors to [25]. Both Multibox and the Region Proposal Network (RPN) stage of Faster RCNN [39, 30] use this approach to predict class-agnostic box proposals. [32, 28, 29, 9] use SSD-like architectures to predict final (1 of K) class labels. And Poirson et al., [27] extended this idea to predict boxes, classes and pose."

"Faster R-CNN. In the Faster R-CNN setting, detection happens in two stages (Figure 1(b)). In the first stage, called the region proposal network (RPN), images are processedby a feature extractor (e.g., VGG-16), and features at some selected intermediate level (e.g., “conv5”) are used to predict class-agnostic box proposals. The loss function for this first stage takes the form of Equation 1 using a grid of anchors tiled in space, scale and aspect ratio."

"R-FCN. While Faster R-CNN is an order of magnitude faster than Fast R-CNN, the fact that the region-specific component must be applied several hundred times per image led Dai et al. [6] to propose the R-FCN (Region-based Fully Convolutional Networks) method which is like Faster R-CNN, but instead of cropping features from the same layer where region proposals are predicted, crops are taken from the last layer of features prior to prediction (Figure 1(c)). This approach of pushing cropping to the last layer minimizes the amount of per-region computation that must be done. Dai et al. argue that the object detection task needs localization representations that respect translation variance and thus propose a position-sensitive cropping mechanism that is used instead of the more standard ROI pooling operations used in [10, 30] and the differentiable crop mechanism of [5]. They show that the R-FCN model (using Resnet 101) could achieve comparable accuracy to Faster R-CNN often at faster running times. Recently, the R-FCN model was also adapted to do instance segmentation in the recent TA-FCN model [21], which won the 2016 COCO instance segmentation challenge."

"Critical points on the optimality frontier. (Fastest: SSD w/MobileNet): On the fastest end of this optimality frontier, we see that SSD models with Inception v2 and Mobilenet are most accurate of the fastest models. Note that if we ignore postprocessing, Mobilenet seems to be roughly twice as fast as Inception v2 while being slightly worse in accuracy. (Sweet Spot: R-FCN w/Resnet or Faster RCNN w/Resnet and only 50 proposals): There is an “elbow” in the middle of the optimality frontier occupied by R-FCN models using Residual Network feature extractors which seem to strike the best balance between speed and accuracy among our model configurations. As we discuss below, Faster R-CNN w/Resnet models can attain similar speeds if we limit the number of proposals to 50. (Most Accurate: Faster R-CNN w/Inception Resnet at stride 8): Finally Faster R-CNN with dense output Inception Resnet models attain the best possible accuracy on our optimality frontier, achieving (at time of submission) the state-of-theart single model performance. However these models are slow, requiring nearly a second of processing time."

# Conclusion

"We have performed an experimental comparison of some of the main aspects that influence the speed and accuracy of modern object detectors. We hope this will help practitioners choose an appropriate method when deploying object detection in the real world. We have also identified some new techniques for improving speed without sacrificing much accuracy, such as using many fewer proposals than is usual for Faster R-CNN."

# 16. GPU acceleration design method for driver's seatbelt detection

Published in	Conference Location	Paper Citations	Authors	Link
2019 14th IEEE International Conference on Electronic Measurement & Instruments (ICEMI)	Changsha, China, China	0	Jing Yongquan et al.	link

# Citation

J. Yongquan, W. Tianshu, L. Jin, Z. Zhijia and G. Chao, "GPU acceleration design method for driver’s seatbelt detection," 2019 14th IEEE International Conference on Electronic Measurement & Instruments (ICEMI), Changsha, China, 2019, pp. 949-953, doi: 10.1109/ICEMI46757.2019.9101821.

# Abstract

With the development and maturity of deep learning algorithms, CNN have emerged in the field of computer vision. Image recognition is one of the important research directions in the field of computer vision. The traditional image recognition method is to extract features by constructing feature descriptors and then classify them by classifiers, such as gradient direction histogram and support vector machine. These methods generally have the problems of poor robustness and insufficient ability to extract features in complex application scenarios. At the same time, convolutional neural network has not been well applied in image recognition due to its large amount of computation and slow speed. With the development of GPU, the parallel computing capability has been greatly improved. This paper designs a GPU acceleration method for the driver's seatbelt detection system based on CNN. The system is based on the Deconv-SSD target detection algorithm for vehicle detection, the Squeeze-YOLO algorithm for vehicle front windshield location, and the semantic segmentation for seat belt detection. Based on the characteristics of GPU, through the off-line merging bath normlization and convolution layer, Tensorrt model conversion technology to realize the GPU optimization speed. The results show that the proposed acceleration method can effectively improve the detection efficiency.

# Review

Article that discuss GPU acceleration with TensorRT and CNN's. It classifies CNN as the most accurate algorithm for image classification, target detection and semantics segmentation.

It characterizes the slow speed of the CNN because of the convolutions and the need for GPU acceleration.

With the use of tensor RT the article were capable of an improvement of 40% for some tasks.

# Interesting quotes from the article:

"The excellent performance of in-depth learning in the field of image recognition makes CNN widely used in various fields of computer vision.In the fields of image classification, target detection and semantics segmentation, CNN has become the most accurate algorithm[1]."

"But the convolution neural network relies on a lot of convolution computation, and the algorithm is slow.With the continuous development of computer technology, especially the development of GPU (Graphics Processing Unit),parallel computing technology is becoming more and more popular. Parallel computing can effectively solve the problem of slow computing speed of cnn."

"The widely used NVIDIA GPU(Graphics Processing Unit) is a general-purpose GPU, that is, GPGPU. General GPU has SIMD(Single Instruction Multiple Data) instruction sets, which can carry out the parallel computation of multiple data streams with one instruction at the same time, so it is suitable for parallel computing of parallel data such as CNN and image[2]."

"According to the GPU hardware characteristics, the Tensorrt acceleration library is used to reduce the data interaction between CPU and GPU, so as to further achieve the acceleration effect."

"Tensorrt is an accelerated computing library developed by NVIDIA, it can automatically analyze the trained model, merge the computing layer without CPU and GPU interaction, no longer send data back to CPU and memory, and complete the calculation in GPU and display memory. while supporting the storage of the merged model as a Tensorrt format to facilitate off-line calls."

"Tensorrt supports the conversion of weights of various mainstream deep learning development frameworks, which avoids the redundancy of deploying multiple deep learning development frameworks in system integration and achieves unification on the system deployment side."

"Tensorrt can also be used for off-line quantization, using float 16-bit semi-precision floating-point calculations instead of float 32-bit single-precision floating-point calculations. Semi-precision floating-point Numbers have less bit width than single-precision floating-point Numbers, and can theoretically perform the same calculations in half the clock cycle as float 32-bit full-precision floating-point Numbers. However, in practical use, since the NVIDIA GPU architecture is different, the semi-precision floating-point SIMD computing power is different, and the final acceleration effect is dependent on the hardware itself architecture. Because of the redundancy and the robustness of the CNN itself, the semi-precision floating-point number cannot be reduced by the method of off-line quantization."

# 17. Computer Vision System for Welding Inspection of Liquefied Petroleum Gas Pressure Vessels Based on Combined Digital Image Processing and Deep Learning Techniques

Published in	Conference Location	Paper Citations	Authors	Link
12 August 2020	Sensors 20:4505		Cruz, Y.J.; Rivas, M.; Quiza, R.; Beruvides, G.; Haber, R.E.	link

# Citation

Cruz, Y.J.; Rivas, M.; Quiza, R.; Beruvides, G.; Haber, R.E. Computer Vision System for Welding Inspection of Liquefied Petroleum Gas Pressure Vessels Based on Combined Digital Image Processing and Deep Learning Techniques. Sensors 2020, 20, 4505.

Cruz Hernández, Yarens & Rivas, Marcelino & Quiza, Ramon & Beruvides, Gerardo & Haber Guerra, Rodolfo Elias & Cruz Hernández, Yarens. (2020). Computer Vision System for Welding Inspection of Liquefied Petroleum Gas Pressure Vessels Based on Combined Digital Image Processing and Deep Learning Techniques. Sensors. 20. 4505. 10.3390/s20164505.

# Abstract

One of the most important operations during the manufacturing process of a pressure vessel is welding. The result of this operation has a great impact on the vessel integrity; thus, welding inspection procedures must detect defects that could lead to an accident. This paper introduces a computer vision system based on structured light for welding inspection of liquefied petroleum gas (LPG) pressure vessels by using combined digital image processing and deep learning techniques. The inspection procedure applied prior to the welding operation was based on a convolutional neural network (CNN), and it correctly detected the misalignment of the parts to be welded in 97.7% of the cases during the method testing. The post-welding inspection procedure was based on a laser triangulation method, and it estimated the weld bead height and width, with average relative errors of 2.7% and 3.4%, respectively, during the method testing. This post-welding inspection procedure allows us to detect geometrical nonconformities that compromise the weld bead integrity. By using this system, the quality index of the process was improved from 95.0% to 99.5% during practical validation in an industrial environment, demonstrating its robustness.

# Review

Very good article, definetly are going to use the derivative to calculate the edges and tip of the laser line fon the weld bead profile.

# Interesting quotes from the article:

"the application of more ecient monitoring strategies on the production lines at critical stages is required to avoid the generation or propagation of defects"

"The use of non-destructive testing (NDT) techniques can contribute to the early detection of these defects, allowing the deployment of cost-eective line monitoring and control systems that reduce expensive o-line measure-rework-assess loops. NDT techniques, such as radiography, ultrasonic testing, penetrant liquid testing, magnetic particle testing, phased arrays, time-of-flight diraction, and multi-elements eddy current, are more and more extensively applied. Tomography, acoustic emissions, ultrasonic guided waves, and laser ultrasonic techniques continue to be strong topics of interest [5]."

"Lately, NDT techniques based on computer vision are becoming integral parts of many production systems, due to the increasing computing power, hyper connectivity, and easy installation of digital cameras in production lines."

"Nowadays, computer vision technologies have demonstrated unprecedent benefits in the industry; they allow detection of defects unnoticeable to human operators, automate extremely tedious measuring tasks, perform visual inspection in risky environments, substitute costly end-of-line product quality inspection procedures for multi-stage inspection systems, etc."

"Wang et al. [9] developed a robust weld seam recognition method under heavy noise based on structured light vision. By using this algorithm, butt joints, T joints, and lap joints were accurately recognized."

"Fan et al. [12] introduced a method based on digital image processing for initial point alignment in narrow robotic welding. This method allowed detection of the weld seam center point when a laser stripe line was projected over a junction, and it was used later by Fan et al. [13] to develop a weld seam tracking system."

"Du et al. [16] proposed a convolutional neural network (CNN) to perform the feature area recognition and weld seam identification. This CNN analyzed the projection of a 659 nm laser stripe over dierent types of junctions, obtaining a validation accuracy of 98.0% under strong noise."

"Pinto-Lopera et al. [17] proposed a system for measuring weld bead geometry by using a single high-speed camera and a long-pass optical filter."

"Although several researches on pre-welding and post-welding inspection exist, they are usually only focused on one of these tasks. There is a lack of studies on the combined use of computer vision techniques to provide an integral welding inspection under practical shop floor conditions, which may include variation in illumination, presence of fume, and mechanical vibrations. For dealing with the aforementioned shortcomings, this paper proposes an integrated system for pre-welding and post-welding inspection, based on computer vision techniques, for industrial applications."

"CNNcapabilities for image classification are remarkable. However, the processing of post-welding images implies a dierent task, to accurately estimate the weld bead dimensions. For this reason, another approach based on the examination of the laser profile was proposed. In order to determine where the weld bead is located in the image, the first and second derivative of the one-pixel width laser profile are calculated using the numeric approximations described in the following equations, respectively: The minimum value of the second derivative proved to be a representative feature of the weld bead center. The maximum values of the second derivative on both sides of the weld bead center proved to be representative features of the weld bead edges. Figure 6 shows the mentioned points in the second derivative graphic representation, while Figure 7 shows the same points over the RGB image."

"in the pre-welding inspection, the computer vision system showed a better performance, not only rejecting a higher number of nonconforming items (96.3%) than the human-based system (47.5%), but also wrongly rejecting a lower number of conforming items (0.4% vs. 1.6%). As the welding of misaligned parts produced dimensionally incorrect weld joints, the fraction of nonconforming items after the welding operation was also higher for the human-based inspection system (11.1% vs. 7.3%). Finally, in the post-welding inspection, the ratio of true positives was remarkably higher for the computer vision system (94.0% vs. 59.0%), while presenting a slightly lower false negatives ratio (1.6% vs. 3.3%)."

"From an overall analysis of the whole system, it can be noted that introducing the computer vision inspection increased the process quality index (i.e., the ratio between the conforming items and the total produced items) from 95.0% to 99.5%."

"On the other hand, the laser triangulation approach used in the post-welding inspection estimated the weld bead dimensions with an average relative error of 3.4% for the weld bead width and 2.7% for the height during the method testing."

# Conclusion

"As the main outcome of the paper, a computer vision inspection system, for detecting joint misalignment and geometrical defects in a welding process, was designed and implemented. In spite of its low cost, the used hardware was shown to be effective for achieving the proposed goal, and demonstrated a robust performance under the shop floor conditions where it was tested. In the pre-welding inspection, the used CNN was capable of detecting misalignment in 97.7% of the cases during the method testing. On the other hand, the laser triangulation approach used in the post-welding inspection estimated the weld bead dimensions with an average relative error of 3.4% for the weld bead width and 2.7% for the height during the method testing. The improvement of the overall quality index of the process during practical validation, from 95.0% to 99.5%, supported the technical feasibility of the industrial introduction of the proposed system. As future development of the present work, it might be considered the incorporation of connectivity capabilities to the implemented modules, through the concepts of the Industrial Internet of Things. This addition will be an important step toward the integration of the considered welding process to a modern manufacturing environment with interconnected production stations and lines."

# 18. A robust weld seam recognition method under heavy noise based on structured-light vision

Published in	Conference Location	Paper Citations	Authors	Link
2020	Journal - Robotics and Computer Integrated Manufacturing 61 (2020)	China	2	link

# Citation

Nianfeng Wang, Kaifan Zhong, Xiaodong Shi, Xianmin Zhang, A robust weld seam recognition method under heavy noise based on structured-light vision, Volume 61, 2020, 101821, ISSN 0736-5845, https://doi.org/10.1016/j.rcim.2019.101821.

# Abstract

Structured-light vision systems are widely used in robotic welding. The key to improving the robotic visual servo performance and weld quality is the weld seam recognition accuracy. Common detection algorithms are likely to be disturbed by the noise of spatter and arc during the welding process. In this paper, a weld seam recognition algorithm is proposed based on structured light vision to overcome this challenge. The core of this method is fully utilizing information of previous frames to process the current frame, which can make weld seam extraction both more robust and effective. The algorithm can be divided into three steps: initial laser center line recognition, online laser center line detection, and weld feature extraction. A Laplacian of Gaussian filter is used for recognizing the laser center line in the first frame. Afterwards, an algorithm based on the NURBS-snake model detects the laser center line online in a dynamic region of interest (abbreviated ROI). The center line obtained from first step is set as the initial contour of the NURBS-snake model. Using the line obtained from the previous step, feature points are determined by segmentation and straight-line fitting, while the position of the weld seam can be calculated according to the feature points. The accuracy, efficiency and robustness of the recognition algorithm are verified by experiments.

# Review

This article explores a method is developed for weld seam recognition. The recognition method is designed for laser-based vision systems and consists of three steps: initial laser center line recognition, online laser center line detection, and weld feature extraction.

It also cites multiple lines for line-structured light active vision systems.

# Interesting quotes from the article:

"The position of the weld seam should be detected to adjust the pose of the weld torch in real time to reduce error, which requires a robust and efficient seam recognition algorithm."

"However, images captured directly at the welding position are commonly disturbed by arc, smoke, and spatter, which increase the error of visual recognition."

"Active vision systems generally use line-structured light or encoded structured light, and they use a laser projector or an encoded-pattern light source, respectively."

"The line-structured light emitted by a laser projector is widely used in robotic welding, and there are a variety of shapes of the laser stripe, such as a line [6] [7], multiple lines [8,9], a cross [10], a triangle [11], or a circle [12,13]."

"Surface reflection, weld spatter, arc and smoke appearing in the image need to be eliminated, as shown in Fig. 2. The median filter [18] and the Gaussian filter [6] are often used for image denoising. The median filter can remove the “salt and pepper” noise in the image and preserve the edge details of the image. The Gaussian filter is mainly used to suppress high-frequency noise in the image. In addition, some scholars use the morphological filter to improve the quality of welding images [19]."

"However, these traditional filtering methods are difficult to apply to welding images because of disturbances in complex unstructured welding environments. Therefore, some scholars have adopted the method of processing multiframe continuous images [20,21], which is mainly based on the temporality of noise, such as arc spatter during welding. Some scholars directly use thresholding to reduce the noise, such as by transforming images into binary images [22]."

"To further improve the detection accuracy, some subpixel extraction methods, such as the Gaussian approximation method [29] and the center of gravity method [29,30], are also used to extract the position of the laser center line. Compared with the Gaussian approximation method, the center of gravity method combined with the maximum pixel intensity can obtain better accuracy [31].

However, the calculation of the center position in each image is unavoidable and excessively time-consuming, which results in a delayed adjustment. In addition, the edge detection method can be used for the extraction of the laser center line [32]."

"A technique that obtains an initial laser center line in the first frame and utilizes the laser stripe in the previous frame to extract that in the current one by employing the NURBS-snake model. This method uses the previous frame information, which suppresses the sudden interference and smoothens the detected weld trajectory. Furthermore, the method only needs to perform preprocessing in the first frame, which saves time during the subsequent image processing."

"The initial laser center line extraction that uses the Laplacian of Gaussian filter and takes into account the variable width of the laser stripe. Additionally, subpixel processing utilizing the grayscale centroid method is performed to further improve the precision of laser position recognition."

"Calculating the laser center position in each frame obviously requires too much time. To improve efficiency, the initial center line is utilized to assist the laser stripe recognition iteratively. As such, the algorithm of the active contour model (ACM) is adopted, and the initial center line is used as the starting contour. Applying the ACM method, also called the snake model, the target boundary can be represented by a continuous curve. This method plays an important role in computer vision; therefore, numerous published studies [36–40] have sought to improve it. In this paper, the NURBS-snake model is adopted to delineate the center line of the laser stripe, and the contour of the model is represented by the NURBS curve."

"To obtain an accurate initial laser center line, a LOG filter is designed to scan the first frame image with minimal welding noise. To improve the accuracy of laser center line detection during welding, an algorithm is proposed based on the NURBS-snake model, and a dynamic ROI is created depending on the laser center line to reduce the time required for the subsequent calculation."

WARNING

Research these articles, for laser line and multiple lines:

[6] W. Huang, R. Kovacevic, Development of a real-time laser-based machine vision system to monitor and control welding processes, Int. J. Adv. Manuf. Technol. 63 (1–4) (2012) 235–248, https://doi.org/10.1007/s00170-012-3902-0.

[7] A. Caggiano, L. Nele, E. Sarno, R. Teti, 3D Digital reconfiguration of an automated welding system for a railway manufacturing application, vol. vol. 25 of Procedia CIRP, pp. 39–45.

[8] Z. Xiao, Research on a trilines laser vision sensor for seam tracking in welding, Robotic Welding, Intelligence and Automation, RWIA’2010, Lecture Notes in Electrical Engineering vol. 88, Springer Verlag, 2011, pp. 139–144.

[9] W.J. Shao, Y. Huang, Y. Zhang, A novel weld seam detection method for space weld seam of narrow butt joint in laser welding, Opt. Laser Technol. 99 (2018) 39–51, https://doi.org/10.1016/j.optlastec.2017.09.037.

# 19. An Initial Point Alignment and Seam-Tracking System for Narrow Weld

Published in	Conference Location	Paper Citations	Authors	Link
Feb 2020	IEEE Transactions on Industrial Informatics	2	J. Fan et al.	link

# Citation

J. Fan et al., "An Initial Point Alignment and Seam-Tracking System for Narrow Weld," in IEEE Transactions on Industrial Informatics, vol. 16, no. 2, pp. 877-886, Feb. 2020, doi: 10.1109/TII.2019.2919658.

# Abstract

Recently, laser vision sensors are widely applied in initial point alignment and seam tracking to improve the level of intelligent welding because of good characteristics. However, since the deformation of laser stripe is unobvious at the narrow weld with 0.2 mm width, these methods are not applicable for the narrow weld. Moreover, there are rare researches that could achieve initial point alignment and seam tracking of narrow weld simultaneously. Therefore, an initial point alignment and seam tracking system for narrow weld is proposed in this paper. At first, a laser vision sensor with extra light emitting diode light is used to obtain laser and weld seam image. Besides, the seam feature point is extracted and three-dimensional coordinates can be obtained with vision model. In addition, three controllers including decision controller, initial point alignment controller, and seam-tracking controller are proposed to achieve initial point alignment and seam tracking control in X- and Z-axis directions. Moreover, feature verification, Kalman filter, and output pulse verification are designed to improve the accuracy and stability of this system. Finally, many initial point alignment and seam-tracking experiments of narrow weld are conducted. Experimental results demonstrate that proposed system can well achieve initial point alignment and seam tracking of planar and curved surface narrow weld.

# Review

Good point on the initial point alignment affected by the variation of ambient light, proposed formulas for weld tracking system, initial position based on previous known data (Based on the prior knowledge of the position of welding work-piece).

# Interesting quotes from the article:

"In order to improve welding efficiency and welding quality, many different sensors are used for initial point alignment and seam tracking process, including ultrasonic sensor [2], acoustic sensor [3], audible sensor [4], inductive sensor [5], laser displacement sensor [6], magneto optical sensor [7], and arc sensor [8]."

"Recently, vision sensors get much attention because of their characteristics of abundant information and noncontact [9], [10]. The vision sensor can be separated into passive vision sensor and laser vision sensor."

"Since laser vision sensors have characteristics of high accuracy and robustness, they are widely used for initial point alignment [11], [12] and seam tracking [13], [14]."

"As mentioned above, traditional laser vision sensors cannot be used for initial point alignment and seam tracking of narrow weld, because narrow weld could not be detected due to unobvious deformation of laser stripe at the narrow weld. In order to realize initial point alignment of narrow weld, some passive vision-based methods are proposed."

However, since, the initial point alignment errors of passive vision-based methods are usually large and easily affected by the variation of ambient light, and the premise of accurate seam tracking is that the welding torch is aligned with the initial point accurately; there are rare research that could achieve accurate initial point alignment and seam tracking of narrow weld simultaneously.

"The laser vision sensor used in this paper mainly consists of an industrial camera, a diode laser, an optical filter, and an LED light"

WARNING

Consider Optical filter for red light

"Initial point searching: Based on the prior knowledge of the position of welding work-piece, the welding robot moves in opposite direction of welding and the detection starts. The seam feature point could be extracted, and its 3-D coordinates could be determined according to vision model."

"In this paper, median filter is also adopted to filter out some salt-and-pepper noises in captured welding image, which blurs image less than the widely used mean filter. After image preprocessing, most of the noises could be filtered out and weld seam is still distinct, as shown in Fig. 9(a)."

"The proposed control system mainly consists of feature verification, decision controller, initial point alignment controller, and seam-tracking controller, as shown in Fig. 10."

"The decision controller is used to decide the working stage. The working stage contains two stages including initial point alignment and welding stage. The purpose of initial point alignment is to make weld torch arrive at initial point.Welding stage consists of reference feature setting and seam tracking. The reference feature setting is used to get the reference feature and the seam tracking is utilized to guide the weld torch along the weld path during entire welding process."

Take a look at this article for the formulas for motor control

"Output Pulse Verification: Stability is very important in seam-tracking control to ensure welding quality. In order to ensure the stability of the welding system, the seam-tracking control should be smooth on the premise of achieving deviation correction. Therefore, the output pulse should be verified before they are sent to motor [26]. Actually, the output pulses at each sample cycle should be less than a threshold value nl to keep the welding steady."

"The seam-tracking accuracy of the proposed system is mainly influenced by laser vision sensor measurement error and tracking control method. Laser vision sensor measurement error is affected by calibration method and image processing method. Measurement precision verification experiments results show that measurement errors of the laser vision sensor are less than 0.10 and 0.15mmin X- and Z-axis directions. Seam-tracking experiment results showthat the seam tracking accuracy is 0.3 mm, which is much smaller than the width of the formed weld bead, so seam tracking accuracy could satisfy the welding demand of narrow weld."

# 20. Computer Vision System for Weld Bead Analysis

Published in	Conference Location	Paper Citations	Authors	Link
April 2018	SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing	0		link

# Citation

Luciane B. Soares, Átila A. Weis, Bruna de V. Guterres, Ricardo N. Rodrigues, and Silvia S. da C. Botelho. 2018. Computer vision system for weld bead geometric analysis. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing (SAC '18). Association for Computing Machinery, New York, NY, USA, 292–299. DOI:https://doi.org/10.1145/3167132.3167159

# Abstract

Welding processes are very important in different industries and requires precision and attention in the steps that will be performed. This article proposes the use of an autonomous weld bead geometric analysis system in order to verify the presence of geometric failures that may compromise the weld integrity. Using an vision system attached to a linear welding robot, images of pre-welded and post-welded metal plates are captured and compared and metrics are applied for evaluation. The proposed method uses Hidden Markov Model (HMM) to identify the weld bead edges and calculate several evaluation metrics to detect geometric failures such as misalignment, lack or excess of fusion, among others.

# Review

A very interesting article because of the edge recognition, also the mention of training and testing for the recognition model. No mention of deep learning or TensorFlow.

# Interesting quotes from the article:

"The works of Schreiber et al. and Sun et al. use training techniques and pattern recognition, so that the algorithm learns the good and bad characteristics of a weld bead. Schreiber et al. presents a system that consists of two distinct phases: training and verification. In the training phase the system learns the quality criterion required from the training performed through images of weld beads considered to be well welded, examined by a welding engineer. This step should be performed only once before using the robot and consists of extracting measurements from the images of the training weld beads. These measures are combined in order to obtain reference values and tolerance ranges."

"The method can be divided on two stages: training and testing. In the training stage several weld bead images are presented in order to make the system learn to identify not only good quality edges, but also discontinuities. The testing stage consists on the pos-weld verification on wich the weld bead edges are highlighted and the metrics, present in the section 4, are applied. Through this stage it is possible to assist not only the welder, but also the inspector to identify discontinuities that may cause the weld bead invalidation"

"At a first moment, the edges of three weld bead images of 3288 pixels height by 4608 pixels wide were highlighted. Thus we have a total of 19728 annotated lines since each image has a 3288 pixels height and two annotated edges. Therefore, each image provides 6576 samples and 19728 edge points samples, called edges profile, since each annotated point is located on a single line. For each point along the edge an edge profile is extracted consisting on b pixels on the right and b pixels on the left of the annoted edge. The edge profile represents each pixel intensity and its concept is exemplified on Figure 4. After extraction of all edge profiles, there is a set of vectors that will be used to learn the edge model. Thus, b = 80 is used. Applying the PCA technique in the set of edge profiles possible overlaps are reduced thus obtaining a reduction in the data without significant loss of information. It is used 85 % of the total variance of the analyzed components resulting on a PCA size of nineteen."

# 21. A structured light vision sensor for on-line weld bead measurement and weld quality inspection

Published in	Conference Location	Paper Citations	Authors	Link
2020	Int J Adv Manuf Technol 106, 2065–2078 (2020)	2	Chen SB et. al.	link

# Citation

Han, Y., Fan, J. & Yang, X. A structured light vision sensor for on-line weld bead measurement and weld quality inspection. Int J Adv Manuf Technol 106, 2065–2078 (2020). https://doi.org/10.1007/s00170-019-04450-2

# Abstract

Weld bead measurement and weld quality inspection are important parts in industrial welding. In this paper, a structured light vision sensor is developed to achieve on-line weld bead measurement and weld quality inspection. Firstly, a structured light vision sensor with a narrow-band optical filter is developed to reduce welding noises such as arc lights and splashes. Secondly, the weld bead type identification algorithm including image pre-processing, baseline extraction, and weld bead classification is proposed to classify filling weld bead and capping weld bead. Thirdly, feature extraction algorithms of filling weld bead and capping weld bead are presented to obtain corresponding feature points. Combining the image coordinates of feature points with structured light vision model, the weld bead size could be obtained and the weld quality could be evaluated. Finally, many weld bead measurement and weld quality inspection experiments are conducted. Experimental results demonstrate that the developed structured light vision sensor and proposed methods could achieve satisfactory performance for weld quality inspection.

# Review

This article exposes that Ultrasonic is widely used. Also the expensiveness of laser systems and the need to decrease the cost of such systems. It has a beautiful system of reconstructing in 3D the weld bead that could be used.

To achieve accurate extraction of the laser stripe profile, they use first a narrow band filter

# Interesting quotes from the article:

"Ultrasonic inspection is widely used in the weld bead defect detection because of its features of low cost and flexibility [11, 12], but its lack of visual record is easily affected by the environmental conditions such as the temperature change [13]."

"Vision sensors have been applied for weld quality inspection because of their characteristics of non-contact and large information [15, 16]. Vision sensors mainly comprise of two types including passive vision sensors and structured light vision sensors."

"However, these researches achieve weld quality inspection based on passive vision, which is easily disturbed by ambient lights noises, so weld quality inspection results are not robust. Structured light vision sensor is widely used for weld bead defect detection owing to its features of high accuracy and robustness."

"However, the costs of these systems are high due to expensive hardware and software, which limits their application in the industrial welding. Therefore, it is very important to develop a simple and inexpensive structured light vision sensor for weld bead measurement and weld quality inspection."

"Firstly, this developed structured light vision sensor could achieve on-line weld bead measurement and weld quality inspection, which are the prerequisites for real-time weld quality control. Secondly, the developed structured light vision sensor could automatically identify the weld bead type such as filling weld bead and capping weld bead, and achieve corresponding weld bead dimensional measurement, three-dimensional reconstruction, and weld defect detection."

"the structured light vision sensor developed in this paper mainly comprises an industrial camera, a stripe laser, a narrow-band optical filter, and a baffle plate. An industrial camera MER-131-75GM with 16- mm focal length is adopted to acquire a gray image with pixel number of 1280×1024."

"The developed structured light vision sensor could get the size of the weld bead such as the groove width, weld bead width, reinforcement height, and plate displacement. These parameters could be used for detecting weld defects such as weld bead misalignment, mismatch, large height of reinforcement, and undercut."

"During the welding process, there exist some noises such as arc lights and splashes. To achieve accurate extraction of the laser stripe profile, welding noises should be eliminated. Firstly, a baffle plate and a narrow-band filter are added to the developed structure light vision sensor. In this way, most of the arc lights could be blocked out of the image,"

"The procedure of the weld bead identification of filling weld bead. a The original structured light image. b The filtered image. c ROI of the laser stripe. d Center profile of the laser stripe. e Two baselines of the laser stripe. f Two border points of the laser stripe"

"Since the laser stripe will be almost flat at the capping weld bead, the second derivative of the center profile of the laser stripe is not suitable for feature point extraction. Thus, a feature point extraction of capping weld bead is presented in this paper. Firstly, two border points b1 and b2 are extracted using the method described in section 3.1.3. Secondly, searching for center points of the laser stripe from the left border point b1 to the right border point b2 and calculating the distance between them and baseline. The center point, which has the maximum positive distance to the baseline, is the top point b3."

# 22. Automatic Detection of Welding Defects using Deep Neural Network

Published in	Conference Location	Paper Citations	Authors	Link
2018	Journal of Physics: Conference Series.	18	Hou, Wenhui et al.	link

# Citation

Hou, Wenhui & Wei, Ye & Jie, Guo & Jin, Yi & Zhu, Chang’an. (2018). Automatic Detection of Welding Defects using Deep Neural Network. Journal of Physics: Conference Series. 933. 012006. 10.1088/1742-6596/933/1/012006.

# Abstract

In this paper, we propose an automatic detection schema including three stages for weld defects in x-ray images. Firstly, the preprocessing procedure for the image is implemented to locate the weld region; Then a classification model which is trained and tested by the patches cropped from x-ray images is constructed based on deep neural network. And this model can learn the intrinsic feature of images without extra calculation; Finally, the sliding-window approach is utilized to detect the whole images based on the trained model. In order to evaluate the performance of the model, we carry out several experiments. The results demonstrate that the classification model we proposed is effective in the detection of welded joints quality.

# Review

In this paper automatic detection of defect is explored, in a more controlled enviranment such as radiography. Although, defect identification can be implemented in the future using machine vision.

# Interesting quotes from the article:

"With the increasing requirement for the quality of equipment, radiographic testing as one of the oldest non-destructive testing (NDT) is commonly used in the detection of welded joints quality in many industrial fields such as the nuclear, chemical and aeronautical [1]. The real-time detection and automatic identification for welding defects from the digitized gray images become the focus of NDT research."

"These systems mainly rely on three steps : digital image processing, feature extraction, and pattern recognition."

"When the value is more than an appointed threshold, the pixel is considered as defect. This process is called threshold judgement. The probability map for each pixel judged as a defect is shown in figure 7(a). The original image with the defect region marked is shown in figure 7(b). The results of other images are shown in figure 8."

# MODEL SLOT FOR LITERATURE REVIEW

Published in	Conference Location	Paper Citations	Authors	Link
				link

# Citation

# Abstract

# Review

# Interesting quotes from the article:

# Conclusion

Lastest articles revised

# Discovering Class-Wise Trends of Max-Pooling in Subspace

Published in	Conference Location	Paper Citations	Authors	Link
2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)	Niagara Falls, NY, USA	2	Y. Zheng	link

# Citation

Y. Zheng, B. K. Iwana and S. Uchida, "Discovering Class-Wise Trends of Max-Pooling in Subspace," 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), Niagara Falls, NY, 2018, pp. 98-103, doi: 10.1109/ICFHR-2018.2018.00026.

# Abstract

The traditional max-pooling operation in Convolutional Neural Networks (CNNs) only obtains the maximal value from a pooling window. However, it discards the information about the precise position of the maximal value. In this paper, we extract the location of the maximal value in a pooling window and transform it into "displacement feature". We analyze and discover the class-wise trend of the displacement features in many ways. The experimental results and discussion demonstrate that the displacement features have beneficial behaviors for solving the problems in max-pooling.

# Review

# Interesting quotes from the article:

# Conclusion

In this paper, we extract the displacement features that record the location information of the maximal values in pooling windows. Then, we discover the class-wise trend of the displacement features in many ways. Through the analysis and discussion, the displacement features may improve the performance of some specific tasks. For the future work, we plan to adopt some other techniques to enhance the displacement features and combine the displacement features with pooling features in some ways.

# Fully convolutional networks for semantic segmentation

Published in	Conference Location	Paper Citations	Authors	Link
				link

# Citation

J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic segmentation," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 3431-3440, doi: 10.1109/CVPR.2015.7298965.

# Abstract

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

# Review

FCN does dense prediction, which is pixel class labeling.

# Interesting quotes from the article:

"Fully convolutional networks can efficiently learn to make dense predictions for per-pixel tasks like semantic segmentation."

"Convolutional networks are driving advances in recognition. Convnets are not only improving for whole-image classification [20, 31, 32], but also making progress on local tasks with structured output. These include advances in bounding box object detection [29, 10, 17], part and keypoint prediction [39, 24], and local correspondence [24, 8]."

"The natural next step in the progression from coarse to fine inference is to make a prediction at every pixel. Prior approaches have used convnets for semantic segmentation [27, 2, 7, 28, 15, 13, 9], in which each pixel is labeled with the class of its enclosing object or region, but with shortcomings that this work addresses."

"We show that a fully convolutional network (FCN) trained end-to-end, pixels-to-pixels on semantic segmentation exceeds the state-of-the-art without further machinery."

"To our knowledge, this is the first work to train FCNs end-to-end (1) for pixelwise prediction and (2) from supervised pre-training."

"Fully convolutional versions of existing networks predict dense outputs from arbitrary-sized inputs. Both learning and inference are performed whole-image-ata- time by dense feedforward computation and backpropagation. In-network upsampling layers enable pixelwise prediction and learning in nets with subsampled pooling."

"This method is efficient, both asymptotically and absolutely, and precludes the need for the complications in other works. Patchwise training is common [27, 2, 7, 28, 9], but lacks the efficiency of fully convolutional training. Our approach does not make use of pre- and post-processing complications, including superpixels [7, 15], proposals [15, 13], or post-hoc refinement by random fields or local classifiers [7, 15]."

"Our model transfers recent success in classification [20, 31, 32] to dense prediction by reinterpreting classification nets as fully convolutional and fine-tuning from their learned representations. In contrast, previous works have applied small convnets without supervised pre-training [7, 28, 27]."

"Semantic segmentation faces an inherent tension between semantics and location: global information resolves what while local information resolves where. Deep feature hierarchies encode location and semantics in a nonlinear local-to-global pyramid. We define a skip architecture to take advantage of this feature spectrum that combines deep, coarse, semantic information and shallow, fine, appearance information in Section 4.2 (see Figure 3)."

"We fuse features across layers to define a nonlinear localto- global representation that we tune end-to-end."

"Each layer of data in a convnet is a three-dimensional array of size h × w × d, where h and w are spatial dimensions, and d is the feature or channel dimension. The first layer is the image, with pixel size h × w, and d color channels. Locations in higher layers correspond to the locations in the image they are path-connected to, which are called their receptive fields."

"Convnets are built on translation invariance. Their basic components (convolution, pooling, and activation functions) operate on local input regions, and depend only on relative spatial coordinates."

"An FCN naturally operates on an input of any size, and produces an output of corresponding (possibly resampled) spatial dimensions. A real-valued loss function composed with an FCN defines a task."

"Fully convolutional networks are a rich class of models, of which modern classification convnets are a special case."

# Conclusion

Fully convolutional networks are a rich class of models, of which modern classification convnets are a special case. Recognizing this, extending these classification nets to segmentation, and improving the architecture with multi-resolution layer combinations dramatically improves the state-of-the-art, while simultaneously simplifying and speeding up learning and inference.

# Unet new Articles

# UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

Published in	Conference Location	Paper Citations	Authors	Link
13 December 2019	IEEE Transactions on Medical Imaging	26	Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh and J. Liang	link

# Citation

Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh and J. Liang, "UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation," in IEEE Transactions on Medical Imaging, vol. 39, no. 6, pp. 1856-1867, June 2020, doi: 10.1109/TMI.2019.2959609.

# Abstract

The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects-an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.

# Review

Very useful graphs explaining the UNet evolution up to the UNet++.

# Interesting quotes from the article:

# Conclusion

We have presented a novel architecture, named UNet++, for more accurate image segmentation. The improved performance by our UNet++ is attributed to its nested structure and re-designed skip connections, which aim to address two key challenges of the U-Net: 1) unknown depth of the optimal architecture and 2) the unnecessarily restrictive design of skip connections. We have evaluated UNet++ using six distinct biomedical imaging applications and demonstrated consistent performance improvement over various state-of-the-art backbones for semantic segmentation and meta framework for instance segmentation.

# Attention Unet++: A Nested Attention-Aware U-Net for Liver CT Image Segmentation

Published in	Conference Location	Paper Citations	Authors	Link
				link

# Citation

# Abstract

# Review

# Interesting quotes from the article:

# Conclusion

# Mask-RCNN and U-Net Ensembled for Nuclei Segmentation

Published in	Conference Location	Paper Citations	Authors	Link
2019	2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)			link

# Citation

# Abstract

Nuclei segmentation is both an important and in some ways ideal task for modern computer vision methods, e.g. convolutional neural networks. While recent developments in theory and open-source software have made these tools easier to implement, expert knowledge is still required to choose the right model architecture and training setup. We compare two popular segmentation frameworks, U-Net and Mask-RCNN in the nuclei segmentation task and find that they have different strengths and failures. To get the best of both worlds, we develop an ensemble model to combine their predictions that can outperform both models by a significant margin and should be considered when aiming for best nuclei segmentation performance.

# Review

# Interesting quotes from the article:

"Mask-RCNN’s and U-Net’s differences are highlighted by these results. The mAP of U-Net and Mask-RCNN is quite similar, but the large differences in other metrics reveal their strengths and shortcomings. U-Net’s good performance on the Dice score indicates that it is able to create accurate segmentation masks, although with the cost of increased amount of detection errors. Mask-RCNN had worse Dice score, but better recall and precision, indicating that it can detect nuclei more accurately but struggles to predict a good segmentation mask. Another interesting observation is the amount of underand oversegmentation, where Mask-RCNN had much better performance compared to U-Net. It seems that Mask-RCNN can better detect individual nuclei from a cluster while U-Net has a tendency to clump them into one big nucleus. This is a problem that stems from U-Net’s border segmentation output channel, as a very accurate border segmentation with just a few erroneous pixels could result in merged masks. The chosen IoU threshold of 0.7 affects U-Net’s recall and precision metric adversely, as its performance was worst in that range."

# Conclusion

Despite achieving quite similar overall performance on the nuclei segmentation task, the U-Net and Mask-RCNN models made different errors and combining their predictive power using the ensemble model proved to produce better results than either alone. The results indicate that using ensemble model in a nuclei segmentation task improves results, which leads to the question if using it would be beneficial in other instance segmentation tasks as well. Similar instance segmentation tasks in biomedical or medical imaging might see improved performance if an ensemble model would be trained on top of current state of the art solutions. Future work is needed to find if this hypothesis is correct.