In this article, we are giving you insights into one of our core technologies here at sewts – our Computer Vision. It is an integral part of making our automation solutions possible. We want to lift the curtain of secrecy a bit and give you an overview of how it works, what it does, and what some of the biggest challenges in developing it were. Don’t worry, you don’t need to be a developer yourself to understand the article, we’re breaking it down to be comprehensible for everybody.
What does the Computer Vision do?
First things first: What does the Computer Vision actually do? It analyzes materials and recognizes special features that are need for the handling of this material. In the case of VELUM, our automation solution for industrial laundries, those features are the seams of towels, but for another textile, the key features could look or be different. Our Computer Vision actually not only recognizes the key features of one separate textile, but it recognizes all of those of a bunch of textiles in a pile and provides the deciding factors on what textile to pick up first.
And here’s how it works!
Step by Step
Alright, let’s look at this in a more detailed way. We will split the process into three main components: Pre-processing, Seam Detection and smart Post-processing.
In step 1, Pre-processing, we start out with data from two types of cameras, a 2D-camera and a 3D-camera. The pictures will be prepared for the following steps and cut to only cover the relevant area. Furthermore, it needs to be compressed and then again cut into smaller pieces that can be processed in parallel, thus ensuring a high performance of the process.
Step 2 is the actual Seam Detection. We use different Machine Learning methods (Object Detection, Semantic Segmentation, etc.) on the level of individual pixels of the picture to recognize features. Based on the textures observed in the picture, the pixels are split into three classes: Background, foreground aka towel and – you guessed it – seams! How much of which class appears is of course different every time.
Afterwards, we’re already at step 3, the Post-processing. The image pieces are getting reassembled to create a full picture and voilà, we have all possible seams for gripping identified. The 10 best seams will be identified in an analysis of the topology of the pile as well as the appearance of the seams themselves. So if you were wondering about the need of the 3D camera so far, this is where it comes into play. Next, our Top 10 candidates need to survive a few filters – every seam that’s outside the reach of the robot or that has already been targeted without success before is being eliminated. And finally, a scoring involving the height differences within the pile will determine where the robot will grip.
This may seem like a long process, but it actually takes less than 1 second – and it starts all over again for the next grip of the robot, since the pile of textiles changes constantly.
Challenges in developing the VELUM vision
And there you have it! This is how our Computer Vision works! Of course, the point still has to be translated into a real-world coordinate for the robot. But that’s a different issue!
It also goes without saying that we had to try a lot of things to get to this process. We for example found that trying the segmentation on the whole picture was just way too slow, that only separating the towels from the background did not work, and so many more. Of course, the complexity largely stems from the fact that we’re working with deformable materials, but that’s a given – we didn’t choose our slogan “Automating complexity” for nothing after all.
We hope that we could give you a good, insightful overview of our VELUM vision. If you have any questions, feel free to message our Head of Software Ernst! His e-mail is firstname.lastname@example.org.