CNN-Based Autonomy Bin-Picking Platform With Minimal Human Intervention

Vision-based robots have been utilized for pick-and-place operations for their ability to find object poses and their high repeatability. Various vision-based autonomous pick and place approaches have been introduced with the development of visual servo and machine learning techniques. As they progress into handling a variety of objects without aligned state, more flexible and lightweight operations have been studied. However, in contrast to objects in a bin handled in manufacturing environment, many pick-and-place methods which utilize machine learning techniques were researched with a less crowded environment and with a large dataset for training. there is rare research about human intervention for dataset. This research suggests two methods for pick-and-place with minimum human intervention.

First method is a self-training bin picking platform, which uses an initial Convolutional Neural Network (CNN) model generated with human data, then continuously re-trained by the robot itself for the accuracy of the model. This method consists of ‘human part’ and ‘autonomy part’. In the human part, a user clicks the objects which are pickable or nonpickable objects from depth image and selects 3D partial point cloud model for Iterative Closest Point (ICP) algorithm. Next, an initial CNN model is trained by human data which is used for autonomous part. In autonomy part, a robot autonomously performs grasp operation with the initial CNN model, and iteratively trains the CNN model using the collected data. Through the experiments, the initial CNN model which trained initial human data showed 74% success rate, which increased up to 87% after retraining with 2000 data.

Second method is an autonomous robotic bin-picking platform which combines human demonstration with a collaborative robot for the flexibility and You Only Look Once (YOLO) neural network model for the faster object localization without prior CAD models or dataset for the training. After simple human demonstration of which target object to pick and place, raw color and depth data were refined, and the image on top of the bin was utilized to create synthetic images and annotations for a YOLOv5 model. To pick up the target object, the point cloud of it was lifted using the depth data corresponding to the result of the trained YOLOv5 model, and the object pose was estimated through Iterative Closest Points (ICP) algorithm. After picking up the target object, the robot places it where the user placed in the previous human demonstration stage. From the result of experiments with four types of objects and four human demonstrations, it took a total of 0.496 seconds to recognize the target object and estimate the object pose. The success rate of object detection was 95.6%, and all the found objects were successfully picked up.

These methods show a possibility to build an autonomous robotic bin picking system with less human intervention for preparing the labeled dataset which is used for training deep learning models without any reference models and professional workers for 2D and 3D data process.