Augmenting Plant Leaf Segmentation tasks using Generative AI

Go Back • Published Tuesday, 02 September 2025 • 8 min read

I still need to add the figures onto this.

So over the summer of this year I was involved in a research internship with my university looking at the artificial intelligence. Given the progress made with generative artificial intelligence (transformers i.e. ChatGPT, diffusion, gaussian splatting, etc.), we wanted to see if they could be used within context of augmenting some computer vision workloads not too similar to how the data augmentation is already performed. For this, plant phenotyping and image segmentation was chosen this field was one of the major ones that the Computer Vision Lab at the University of Nottingham was focusing on, plus that not much research had been done with this topic in particular.

Motivation

Plant phenotyping tasks have been vital towards the study of phenotypic traits. Tasks such as leaf segmentation aides researchers in learning more about leaf-level traits such as leaf area, count, stress, and development phases — all in which much of the food and agriculture industry heavily relies on.

A common challenge which often hinders such tasks is the amount of quality training data available. Manual annotation of plant images is both time-consuming and prone to inconsistency, making it difficult to construct sufficiently large datasets. Furthermore, external characteristics and challenges such as overlapping between leaves and environmental conditions including brightness, shadows, and blurriness due to wind often reduces image clarity and model performance [1] [2].

An approach to solve this would be to generate synthetic data based on known plant traits to help supplement training; research such as PlantDreamer has seen substantial improvements in the ability to generate synthetic 3D objects and 2D images using AI, although not much further research has been made to put this in context of said tasks [3]. This project aims to discover new findings and limitations associated with using synthetic data in the context of plant phenotyping tasks.

Related Work

Significant progress has been made in the computer vision space for semantic and instance object segmentation algorithm, with deep learning techniques outperforming conventional methods and showing great potential in addressing plant phenotyping tasks. This project considers several well established and state-of-the-art (SOTA) models such as UNET++, DeepLabV3+, and SegFormer, along with PlantDreamer — a diffusion-guided 3D Gaussian Splatting framework for generating realistic plant models.

UNET++

UNET++ is a variant of the original UNET architecture, presented as a nested, densely-connected design that bridges the semantic gap between encoder and decoder features by interconnecting intermediate layers via redesigned skip pathways [4] [5]. This structure improves multi-scale feature aggregation and segmentation quality across objects of varying sizes, as shown in medical imaging benchmarks. Importantly, in the domain of plant phenotyping, UNET++ and its derivatives have been effectively adapted to tackle leaf segmentation tasks. For instance, Eff-UNET++ implements an EfficientNet backbone to reduce parameter count while maintaining high fidelity in boundary delineation and improving segmentation accuracy for plant leaves [1]. Moreover, AC-UNET, an improved UNet variant incorporating attention mechanisms and multi-scale modules, demonstrates superior performance in segmenting stems and leaves of Betula luminifera, outperforming standard UNET and DeepLabV3 in terms of mIoU and pixel accuracy [6].

DeepLabV3+

DeepLabV3+ is a segmentation model which extends the DeepLab family by combining Atrous Spatial Pyramid Pooling (ASPP) with a lightweight decoder module, enabling refined boundary recovery and multi-scale context modelling [7]. It has seen numerous adaptations in plant-related segmentation. A recent application integrates YOLOv8 for leaf detection, followed by DeepLabV3+ (augmented with DenseASPP and a spatial pyramid mechanism) to achieve significant improvements in mIoU and mPA over classic models like UNET, FCN, and DeepLabV3 [8]. Similarly, HAB-DeepLabV3+ enhances disease spot segmentation on maple leaves under complex backgrounds, yielding a 12.1 % increase in IoU and 6.8 % in pixel accuracy over standard DeepLabV3+ [9].

SegFormer

SegFormer presents a transformer-based model architecture for semantic segmentation characterised by a hierarchical transformer encoder that produces multi-scale features without needing positional encoding, thereby offering robustness across varying image resolutions. Its MLP-based decoder efficiently fuses local and global attention, resulting in state-of-the-art performance and computational efficiency on benchmarks like ADE20K and Cityscapes [10]. For agricultural applications, SegFormer has been tailored, with enhancements such as Efficient Channel Attention (ECA) and Feature Pyramid Network (FPN) modules, to better segment cucumber leaf disease spots. This variant (ECA-SegFormer) achieved an mIoU of 60.86 % and mean pixel accuracy of 38.03 %, marking notable improvements over the baseline [11].

PlantDreamer

Recent advances in synthetic data generation have shown promise for agricultural vision tasks, particularly when annotated real-world datasets are limited or expensive to obtain. PlantDreamer is a diffusion-guided 3D Gaussian Splatting framework specifically designed to generate realistic 3D plant models with high-fidelity leaf morphology and texture. By rendering these models under varied camera poses, lighting conditions, and backgrounds, PlantDreamer can produce large-scale synthetic image collections paired with perfect segmentation masks [3].

For plant leaf segmentation, such synthetic datasets can be highly valuable: they introduce morphological diversity (e.g., leaf size, shape, and arrangement), address challenges like occlusion and overlap, and allow systematic domain randomisation to improve model robustness.

Methodology

The aim of the project is to augment the existing performance of real world training data and potentially replace the need for real datasets. To address this, we develop a semantic segmentation pipeline for plant leaf analysis combining state-of-the-art (SOTA) deep learning architectures with synthetic and real data. Three segmentation architectures, UNET++, DeepLabV3+, and SegFormer, along with two encoder models, ResNet152 [12] and EfficentNet-B4 [13], were implemented and compared against.

Additionally, PlantDreamer was leveraged to address the challenge of limited annotated datasets. This is integrated into the framework and is used to generate around 250 image and mask pairs per variation of bean plants. This includes varying poses and lighting conditions. This was supplemented with 80 image/mask pairs of annotated real bean plant data.

As part of training strategy, all images were resized to 256 by 256 and normalised. No other data augmentation was applied (such as random rotation or flipping) were applied as this could be reproduced through generated new sets of synthetic data. The proposed models were implemented using PyTorch as its deep learning framework for training and evaluation purposes. The models were trained on a setup consisting of 4 cores, 8GB of memory (RAM), and a RTX A4000, incorporating hyperparameters tuning, early stopping — both to speed up rate of convergence and prevent overfitting — and Dice Cross Entropy as the loss function. Models were also first pre-trained on synthetic data to learn general leaf morphology, followed by fine-tuning on real annotated data for domain adaptation.

The performance were evaluated based on three metrics:

mIoU=1Cc=1CTPcTPc+FPc+FNc \text{mIoU} = \frac{1}{C} \sum_{c=1}^{C} \frac{\text{TP}_c}{\text{TP}_c + \text{FP}_c + \text{FN}_c}

Mean Intersection over Union (mIoU) which measures the average percentage of correctly classified pixels per class, highlighting how well each class is segmented. In context of this project, this helps to evaluate overall leaf vs. background separation.

mPA=1Cc=1CTPcTPc+FNc\text{mPA} = \frac{1}{C} \sum_{c=1}^{C} \frac{\text{TP}_c}{\text{TP}_c + \text{FN}_c}

Mean Pixel Accuracy (mPA) measures the average percentage of correctly classified pixels per class, highlighting how well each class is segmented. Specifically it checks if full leaf shapes are captured.

mDice=1Cc=1C2TPc2TPc+FPc+FNc\text{mDice} = \frac{1}{C} \sum_{c=1}^{C} \frac{2 \cdot \text{TP}_c}{2 \cdot \text{TP}_c + \text{FP}_c + \text{FN}_c}

Mean Dice Coefficient (mDice) quantifies the average similarity between predicted and ground truth regions per class, emphasising precise shape and boundary matching. This is helpful in assessing clean boundaries and small leaves.

Additionally both quantitative results and qualitative visualisations, such as segmentation overlays, are analysed to compare model performance across architectures and dataset compositions.

Results

Overall results varied between different model/dataset configurations after training was concluded:

The choice in encoder complexity had minor impact to the performance of each model, conversely the choice of encoder backbone did improve performance in some configurations. For instance, EfficientNet had shown to be generally more performant over ResNet.

More notably, the quantity and type of data used to train such models were more important as it heavily affected the performance of each trained model. Training on solely on real data (1:0) as a baseline, trained models on a ratio of 1:3 (80:240) real to synthetic data saw the best performance generally across all models, seeing an average of 10% increase across all metrics. Increasing the amount of synthetic data to a ratio of 1:25 (80:2000) saw a marginal decrease in model performance to around 0.08-0.09, whilst using purely synthetic data (0:1 ratio) saw complete degradation in performance — typically ranging between 0.01 to 0.2.

Conclusion

PlantDreamer, subsequently synthetic data, has proven to be a good tool to help augment the performance of existing plant segmentation models in situations where obtaining data is difficult. As seen, performance often increases by 10% or even more when the right split of real to synthetic data is found. However in it’s current state, PlantDreamer is unable to be a reliable substitute for real data.

Models tend to overfit on artefacts and traits that are unique to the synthetic data produced by PlantDreamer which causes performance on models to degrade when evaluated on real world workloads. Future work for the project would include improving on PlantDreamer’s texturing, morphological generation, and domain adaption techniques in order to reduce the chance for overfitting on unrelated features; the refinement of deep learning algorithms to better support plant phenotyping tasks; and additionally integrating instance segmentation for distinguishing individual leaves.

Future Work

One of the goals we wanted to try and achieve was instance segmentation — using SOTA models such as Mask RCNN and SAM2 — however, this wasn’t realised due to lack of time (which was also made worse thanks to the university’s GPU cluster imploding mid-project 🙃).

Another avenue for improvement would be with Plant Dreamer itself. One thing it doesn’t really do well at the moment is domain adaptation; the synthetic data itself is feasible however it could be considered as

References

[1]
S. Bhagat, M. Kokare, V. Haswani, P. Hambarde, and R. Kamble, “Eff-UNet++: A novel architecture for plant leaf segmentation and counting,” Ecological informatics, vol. 68, p. 101583, 2022.
[2]
D. Ward, P. Moghadam, and N. Hudson, “Deep Leaf Segmentation Using Synthetic Data.” 2019. [Online]. Available: https://arxiv.org/abs/1807.10931
[3]
Z. K. Hartley, L. A. Stuart, A. P. French, and M. P. Pound, “PlantDreamer: Achieving Realistic 3D Plant Models with Diffusion-Guided Gaussian Splatting,” arXiv preprint arXiv:2505.15528, 2025.
[4]
Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation.” 2020. [Online]. Available: https://arxiv.org/abs/1912.05074
[5]
Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “UNet++: A Nested U-Net Architecture for Medical Image Segmentation.” 2018. [Online]. Available: https://arxiv.org/abs/1807.10165
[6]
X. Yi et al., “AC-UNet: an improved UNet-based method for stem and leaf segmentation in Betula luminifera,” Frontiers in Plant Science, vol. 14, p. 1268098, 2023.
[7]
L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818.
[8]
T. Yang, S. Zhou, A. Xu, J. Ye, and J. Yin, “An approach for plant leaf image segmentation based on YOLOV8 and the improved DEEPLABV3+,” Plants, vol. 12, no. 19, p. 3438, 2023.
[9]
P. Wu et al., “Sweetgum leaf spot image segmentation and grading detection based on an improved deeplabv3+ network,” Forests, vol. 14, no. 8, p. 1547, 2023.
[10]
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.” 2021. [Online]. Available: https://arxiv.org/abs/2105.15203
[11]
R. Yang, Y. Guo, Z. Hu, R. Gao, and H. Yang, “Semantic segmentation of cucumber leaf disease spots based on ECA-SegFormer,” Agriculture, vol. 13, no. 8, p. 1513, 2023.
[12]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition.” 2015. [Online]. Available: https://arxiv.org/abs/1512.03385
[13]
M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.” 2020. [Online]. Available: https://arxiv.org/abs/1905.11946