Robopicker: Problem Statement
The agricultural sector faces significant challenges in ensuring that fresh produce is grown and picked correctly while minimizing food loss before being sold. In particular, the stone fruit (peach, plums, etc) industry grapples with unique obstacles, including the delicate nature of the fruit, labor shortages, and the time-sensitive nature of harvesting. Research shows that nearly 40% of fruits and vegetables are lost during growing and harvesting due to factors such as improper harvesting techniques, inadequate handling, and lack of efficient tools. The urgency to develop innovative solutions to mitigate food waste of all kinds has never been more apparent than today, particularly as global populations rise and the demand for fresh produce increases. More specifically, with the team’s close proximity to peach orchards, and the recent massive loss of peach crop, the team has decided to tackle the food waste/yield loss problem from the stone fruit and peach position.
A peach picking device equipped with a vision system will enhance worker productivity while reducing food loss during harvesting. Current harvesting methods predominantly rely on manual labor, which can lead to inconsistencies in picking, damaging of the fruit, and ultimately resulting in substantial financial losses for growers. By introducing a technology-driven solution, these issues will be addressed, benefiting not only fruit pickers and growers, but also consumers who seek high-quality, fresh peaches. With a more efficient picking solution, not only will growers see a higher profit from their crops, but consumers will be able to purchase their produce for a lower price due to increased supply. Even produce pickers, who in industry are paid by the amount of fruit they pick, will see benefits from a more efficient picking solution, earning them a higher average wage.
The Solution
Our peach picking tool, combined with a novel, custom-built vision system, represents a significant advancement over existing methods. Our device combines safe picking methods with a real time computer vision evaluation system. By incorporating a vision system, the device can identify peach ripeness and assign a ripeness rating using color, size, and shape recognition. The system can accurately determine which peaches are ripe and ready for harvest, significantly reducing the chances of picking under-ripe or overripe fruit. Additionally, this system can be trained with a wide variety of data, meaning that its fundamental ripeness detection system can work for any other type of produce, not just peaches. Additionally, the ripeness detection percentage can be further extrapolated with additional testing to predict maximum ripeness, ensuring that pickers and growers can harvest at the time most optimal for sales.
The device is equipped with a gentle pneumatic picking mechanism to minimize damage to the fruit, addressing one of the primary concerns for growers and pickers. We do not want our solution to completely remove a large job industry, we only want to enhance the workforce on hand and generate a solution that benefits all members of the industry, especially pickers. With a more delicate picking mechanism, pickers will be able to supply a greater quality of fruit on average and therefore receive higher wages as a result.
Vision System
Peaches are particularly unique in how they achieve “ideal ripeness,” as they ripen off the branch after being picked. They are typically picked exclusively for their color. Peaches gain the deep red color that is desirable in peaches from sun exposure. The longer they’re exposed to the sunlight, the more color they’ll gain. Thus, for the vision model, it is sufficient to solely grade ripeness based on color, as that is what is being looked for when picking.
My main contribution to this project was the vision system. The vision system is composed of three steps: Identify, Segment, and Grade. The identify section uses a Meta model called DINO (Self-Distillation with No Labels), which uses its weights pretrained on millions of images scraped from the internet. It takes in an input image, and when querying for a specific item (like a peach), will identify the item within the image and output a bounding box around it. The segment section uses a model called SAM2, another Meta model. SAM2 is also trained on millions of images, and will identify the boundaries of items within a bounding box. It will then output a mask of the pixels of that item. That mask can be used to isolate the pixels, which are then passed into the grade portion of the model. The grading is where this model is the most novel. We used a ResNet-18 model pretrained on ImageNet (a dataset of over 14 million annotated images) and fine-tuned it on the NinePeach dataset, which consists of thousands of annotated images of peaches labeled by ripeness stage (unripe, semi ripe, and ripe). ResNet-18 is a convolutional neural network (CNN) that builds feature representations of images through a series of convolutional layers, batch normalization, ReLU activations, and shortcut (residual) connections that help prevent vanishing gradients during training. By fine-tuning the model on NinePeach, the model final fully connected layer with a new output layer matching the ripeness grading categories. During training, the model learned to recognize color, texture, and shape features associated with different ripeness levels. Early layers of the network detect basic patterns such as edges and color gradients, while deeper layers combine these features into higher-level concepts like skin softness or bruising. The final output of the model is a predicted ripeness label for each peach image, allowing it to automatically grade new peach samples based solely on visual cues.The vision system runs entirely through cloud computing on AWS. Onboard the device, there is a Raspberry Pi 5 that connects to an EC2 GPU Instance via AWS’s IOT (Internet of Things). The Pi acts as the onboard communicator, taking the images from the camera and relaying them to the EC2. Once the EC2 receives the image, it runs it through the model, outputting an annotated image (like the rightmost one in Figure 19). This annotated image is then sent back to the Pi, which relays the image to the touch screen for the user to view. All this runs in around 1s, allowing for a fairly seamless user experience. Live video was considered, but due to latency issues with AWS and saving the annotated images back to the pi, it seemed like an overall worse solution, where the user would have to maintain the peaches within the frame the entire time they were trying to pick, leading to an overall clunkier and more challenging picking experience. Another thing that was considered was local computing with something similar to a Jetson Nano.
A test dataset was constructed from around 400 images from the NinePeach dataset. These images were run through the entire model 20 times each (8000 total runs) in order to get robust, accurate testing data, while also confirming repeatability and consistency of the inferential predictions from the model. Testing efforts showcase that the vision system can achieve an accuracy rating of ~92% on ripe peaches, as seen in Figure 39. Lastly, the vision system is 98% accurate in identifying and segmenting peaches in a given image frame, as seen in Figure 40. It is also very fast, due to how lightweight the model is. It can grade at a rate of ~0.14s per peach, as seen in Figure 41.