Scalable learning for bridging the species gap in image-based plant phenotyping

作者:

Highlights:

摘要

The traditional paradigm of applying deep learning – collect, annotate and train on data – is not applicable to image-based plant phenotyping. Data collection involves the growth of many physical samples, imaging them at multiple growth stages and finally manually annotating each image. This process is error-prone, expensive, time consuming and often requires specialised equipment. Almost 400,000 different plant species exist across the world. Each varying greatly in appearance, geometry and structure, a species gap exists between the domain of each plant species. The performance of a model is not generalisable and may not transfer to images of an unseen plant species. With the costs of data collection and number of plant species, it is not tractable to apply deep learning to the automation of plant phenotyping measurements. Hence, training using synthetic data is effective as the cost of data collection and annotation is free. We investigate the use of synthetic data for image-based plant phenotyping. Our conclusions and released data are applicable to the measurement of phenotypic traits including plant area, leaf count, leaf area and shape. In this paper, we validate our proposed approach on leaf instance segmentation for the measurement of leaf area. We study multiple synthetic data training regimes using Mask-RCNN when few or no annotated real data is available. We also present UPGen: a Universal Plant Generator for bridging the species gap. UPGen leverages domain randomisation to produce widely distributed data samples and models stochastic biological variation. A model trained on our synthetic dataset traverses the domain and species gaps. In validating UPGen, the relationship between different data parameters and their effects on leaf segmentation performance is investigated. Imitating a plant phenotyping facility processing a new plant species, our methods outperform standard practices, such as transfer learning from publicly available plant data, by 26.6% and 51.46% on two unseen plant species respectively. We benchmark UPGen by using it to compete in the CVPPP Leaf Segmentation Challenge. Generalising across multiple plant species, our method achieves state-of-the-art performance scoring a mean of 88% across A1-4 test datasets. Our synthetic dataset and pretrained model are available at https://csiro-robotics.github.io/UPGen-Webpage/.

论文关键词:

论文评审过程:Received 18 September 2019, Revised 16 March 2020, Accepted 1 June 2020, Available online 10 June 2020, Version of Record 17 June 2020.

论文官网地址:https://doi.org/10.1016/j.cviu.2020.103009