Abstract: Plants species identification aims at automatic recognition of plants of specific species. The use of traditional methods for recognizing the plants is time-consuming. Modern technologies such as Machine learning[ML], Deep learning, Neural networks help is an automatic recognition of plants. For recognition of plants, a single set of features are inefficient. Hence plants are categorized into different parts based on the features that are used for classification, for example in flower category, colour, shape and texture of the flower petals are used for classification whereas, in others leaf, stem etc. are used for classification. Once the plant image is captured, the system performs some pre-processing and classifications, then recognizes the plants and produces the output. The classifiers such as Random Forest Classifier, Neural Networks, Linear Regression and K-NN are used for classification and recognition. In this paper, we have explained various technologies that are used for plants recognition and also presented a comparative analysis of various technologies.
Keywords: Feature extraction; simple and compound leaves; colour features; shape features; AdaBoost; Plant recognition; Random Forest Classifier; K-NN;
As the world is turning towards Artificial intelligence People depends more on machines rather than human. Machine Learning is a subset of artificial intelligence. Machine Learning [ML] is the study of computer algorithms. Machine learning algorithms build a model based on sample data, known as training data. Some of the example application of Machine Learning in Image Recognition, Traffic prediction, Speech Recognition, Product recommendations, Self-driving cars, Email Spam and Malware Filtering, Virtual Personal Assistant, Online Fraud Detection. These are some important application where machine learning is used to make the work easier. Most of the machine learning engineers use Python and Java as the programming language, some use python for a problem like sentiment analysis tasks, and some use Java for other machine learning applications like security and threat detection.
Agriculture plays an indispensable role in the world’s economy. Especially in India where agriculture is considered as the backbone. The main problem the farmers are dealing with is that they lack knowledge about the right method to increase the crop yield. Many farmers are not aware of the new farming techniques. There are various factors involved in increasing crop quality and quantity. There are thousands of species of plants and different plants under the species . Each plant has a different requirement of nutrients and fertilizers. The farmers can't identify the species. They need to approach the botanists and soil experts to identify the plant species and the required type of soil, nutrients and fertilizers. Agriculture is usually practised in remote areas and the farmers find it hard to approach the experts who may be unreachable to all the rural areas. Contacting the experts and asking them to visit the agriculture land can turn to be an expensive and tedious process. This may disappoint the farmers and they tend to stick on to the conventional methods of farming, which are not very profitable sometimes. The technology of Artificial Intelligence can play an important role here. Digital farming is gaining a lot of importance and has arisen new scientific fields that use precision farming to increase crop yield.
The plant species recognition system can help in solving the problem of farmers. Using machine learning, plant species recognition, the model can be built for plant species recognition. Identification of plant species can be done more accurately. Instead of approaching the experts to recognise the plant species, they can make use of the plant species recognition system via their smartphones that save time and money. Various plant species recognition systems have been developed. For all the methods developed, various datasets were considered and each dataset had different species. Most of the models or techniques developed consider the morphological feature of the plant especially the leaves, as they are available almost all the seasons. There are different datasets available and datasets can also be created for which the machine can be trained and tested. There are different steps involved in the training and testing phase like the data acquisition or data collection, feature extraction and finally the classification. The data acquisition is to create a dataset by collecting samples of the required plant species.
There are different feature extraction techniques available. The main objective of this step is to get the base features of the leaf-like the shape, colour, perimeter, area  that will help in distinguishing between the plant species. In the classification step, different algorithms and techniques are applied on these datasets in different steps. Accuracy is obtained for each technique and comparison is done among these. The best among the available techniques applied is chosen to obtain accurate results.
Machine learning is mainly used for pattern recognition in plant species recognition, the best technique used for plant species identification are like Neural network, CNN, Deep learning, K-NN, Multilayer perceptron AdaBoost which gives the maximum accurate results. Automated plant species identification has more advantages in farming ecological system and ecosystem.
In this paper Section, II shows an overview of the existing plant species recognition technique and Section III shows the comparison analysis performed among different methodology. Section IV is the conclusion of the paper.
II. LITERATURE SURVEY
In the layered approach  pre-processing layer is the top-most layer used to capture a plant image. To standardize the attributes extracted, the image is binarized and is sent down to the colour layer for feature computations. Colour layer is the first layer for feature recognition as shown in Fig.1. In this, green leaves are segregated from non-green leaves by different processing steps i.e. colour discrimination, non-green leaf and green leaf. Green leaves are difficult to categorize so they are sent down for further processing to shape layer. In this layer, simple leaves are separated from compound leaves by different processing steps i.e. shape discrimination, simple leaf, compound leaf and sent to shape classifiers for classification. Fig.2 represents the accuracy obtained by using the neuro-fuzzy classifier.
Fig. 1 Layered approach architecture
Fig. 2 Accuracy obtained by using the neuro-fuzzy classifier
In feature decision-making system , leaf image is obtained using the scanner. The coloured image obtained is converted to grayscale and in pre-processing step, it is converted to a binary image. Following this ACO  algorithm is used to discriminate among the features extracted. The extracted features are finally sent down to support vector machine (SVM)  which trains the dataset and classifies the species. The system efficiency is tested on around 2050 leaves of two databases  i.e. FCA and Flavia. The average accuracy obtained by using the ACO  method is 95.53%. Fig.3 depicts the steps involved in plants recognition.
Fig. 3 Plant species recognition steps
By using Machine Learning approaches , different mechanisms were used for plant recognition based on features captured. The plant is categorised into different types of ex. flower category, for this, colour, the appearance of the flower is used for classification whereas, for others leaf, root, stalk etc. are used for classification as shown in fig.4. In this approach  both local and global features are used for classification. Global features are affected by illumination effects, noise, viewing angles etc. SIFT , is the mechanism used for feature extraction purpose, it is not affected by the illumination effects or noise etc. SIFT vector is a 128-bit feature vector, hence it is sent to Sparse coding for vector compression and quantization. Before sending the SIFT vector for SVM classifier for classification, a sparse vector of each SIFT vector is pooled using Spatial Pyramid matching.
Fig. 4 Framework for plant species detection
In leaf-based plant recognition  as shown in fig.6, digital colour image of the plant leaf is acquired through the scanner; background of the leaf is blurred. Once the leaf image is acquired, pre-processing is done to obtain the contour of the leaf which is extracted from the histogram of the grey image of the leaf captured. Later digital morphological features are extracted which includes geometrical configurations and unvarying characteristics. MMC hypersphere classifier is used for classification. MMC classifier saves the storage space and reduces the classification time.
Fig. 5 Schematic representation of leaf shape-based recognition
In the CNN based D–Leaf  method various techniques were applied sequentially as shown in Fig.6 to obtain the final result. The technique began with the process of sampling where various samples were obtained using the DSLR camera to click image by using the white sheet in the background and fluorescent light to get a clear image. The raw images obtained through sampling were converted to the required format in image processing using image reconstruction and morphometric measurements. The important features of the leaves were obtained in feature extraction using Convolutional Neural Network (s deep learning algorithm) for which a CNN model named D-leaf was developed using MATLAB. An accuracy of 94.88% was obtained with the D-Leaf model. The Fig.7  shows the D-Leaf architecture.
Fig.6 Steps in D-Leaf
Fig. 7 D-Leaf Architecture 
In Adaptive Boosting Methodology  plant Species Recognition was done in different stages as shown in fig.8. In data acquisition, the samples of a variety of leaves were collected and later the digital images were produced. In the next step digital images were converted to the grayscale format. The feature extraction steps involve the extraction of morphological features of the leaf-like major axis, minor axis, perimeter using different formulas. The fourth step was the training and testing phase where the model was trained and tested for the samples collected. For classification two algorithms namely K-NN and Multilayer Perceptron were applied. The precision of the final model was improved using Ada-Boost methodology.
Fig. 8 Stages in Plant Species Recognition
In-Plant species recognition using k-NN classifiers , a dataset called Folio was created by collecting 32 different species and 20 pictures of each species. The methodology mainly had two sides namely client and server as shown in fig.9. Various Pre-processing techniques were applied to convert it to the required format for further processing. Feature extraction is done in different steps. At first Convex hull was formed using the boundary points. Then the morphological information was obtained. The information extracted from the Distance map was used to create ratios which were used for the pattern matcher. Colour histogram was used for computing the cropped part of the image. The final step was the Matcher where the matcher algorithm consisted of two stages of K-NN. In the first stage, the Euclidian distance between the new leaves and each of the leaves in the old data set was calculated. Of the results obtained the closest three results were returned. In the second stage, the correlation coefficient was calculated by comparing the colour histograms. The K-NN algorithm was used to calculate the closest match. An accuracy of 83.5% was obtained in the first stage.
Fig. 9 Methodology
In Automatic recognition of medicinal plants , a database was created by collecting 24 species from the tropical islands of Mauritius. The images were clicked using a smartphone by placing the leaf on the white background. In the automatic pre-processing stage, the shadow of the image was removed by converting the images in jpeg to HSV format. Further, it is transformed into a binary format image. The base features like the perimeter, length, width, area, hull area, hull perimeter, number of vertices, vertical and horizontal maps, area of the bounding box, 45-degree radial map and original RBG values of each pixel are extracted in the feature extraction process. The classification is done using five Machine Learning classifiers namely Random Forest Classifier, Multilayer Perceptron Neural Network, Support Vector Machine, Naïve Bayes and K-NN. The accuracy is as shown in Table 1. The best accuracy was obtained for the Random Forest Classifier.
Classifier Accuracy (%)
Random forest (numTrees=100)
Multilayer Perceptron Neural Network
Support Vector Machine (PolyKernal and c=4.0)
k-Nearest Neighbour (k=1) 82.5
Table 1. Performance of machine learning classifiers
B. Yanikoglu et al  Proposed system to identify the plant using a given image or taking a photo of the plant, it addresses about two sub-problem that is identifying from the image of a leaf and unconstrained photograph of the plant-based the shape, texture and colour. The author summarizes of the system participate in image CLEF'2012 campaign, identification of both type problem runs parallel, that is segmentation, pre-processing and recognition, In unconstrained photograph concentrate more on segmentation and pre-processing. Segmentation for isolated plants is easy and for the unconstrained photograph is difficult, it is done by the technique such as Otsu's adaptive threshold. Automatic segmentation mainly depends on image acquisition method. Consider the Fig.10 steps of Automatic segmentation checks on constrained connectivity creates colour quasi flat zone Fig. 10a.next, taking count of both chromatic and non-chromatic variations checks morphological colour gradient (Fig. 10b). Watershed transforms to remove small basins Fig. 10c. Followed by the image turns to greyscale distance and then greyscale mask Fig. 10d, 10e. followed by Hue distance and hue mask Fig.10f.The final object mask is mask intersection Fig. 9h.then pre-processing mask superposition on the original Fig.10i.The main drawback of this method depends on colour dominant hence if the image has other colours dominant the entire processes can corrupt. Human assistance segmentation developed a semi-automatic system to known the effect of human assistance using mathematical morphology. This type of segmentation uses a marker-based watershed transform, which is vigorous and quick segmentation tool. In human assistance there is pre-processing, feature extraction, classification is the steps feature extraction includes Texture feature, shape feature and colour feature. Table 2. Represents the average rank based on the feature. The accuracy of 61% obtained on the classification of isolated leaf image and increases to 80.69% from top-5 classifications from 126 plants. For foliage, image accuracy obtained 8.49%, 22.15% for top-5 classification.
Fig. 10 Steps of Automatic Segmentations 
Feature group Length Top-1 Top-2 Top-5 Average inverse rank
Shape + colour
Shape + texture
Shape + texture + colour 95
Table 2. Isolated images classification accuracies with feature description 
Lei Zhang et al .The proposed system focus on extracting, the stable feature of plants which is different from other plants this system works on two-phase, training phase and identification face. Leaf image acquisition and pre-processing, Feature extraction identification, SOM neural network are the steps followed in this proposed system. In leaf image acquisition and pre-processing, the leaf image is a capture from scanner or camera and convert RGB colour to grey to avoid the colour disturbance. Fig.11 represents the steps of plant species identification system, which works on two stages training stage and identification stage.
Fig.11. Plant species identification system 
In feature extraction two types of extraction, Geometric extraction and texture feature extraction. In Geometric feature extraction, the photo of the leaf turns to grey and consider the counter of the leaf by using border tracing algorithm, Fig.12. Represents the original leaf image, Fig.13 represents the counter of that image in this algorithm consider the ratio of leaf length and width and it is necessary to remove the stalk of the leaf because it might vary ratio.
Fig.12 Original leaf Image. Fig.13. Counter of the image
For texture feature extraction discrete wavelet transform is used and statistical moment are applied to extract the texture feature. Discrete Wavelet Transform decomposes a signal: S1(n) at resolution l into two components. SOM neural network this technique is used in the field of pattern recognition, it is two layer-based networks, In the training process, SOM works on competitive learning rule. The competitive learning algorithm is used to correct the weight vector of the matching neuron. The maximum accuracy of the SOM neural network is 95.83%. The average correct identification rate of this proposed system is 88.4%.
Jana Waldchen et al The proposed system have two-phase Training phase and Application phase, In training phase it analyses the image that has been identified, In the application phase, the trained image exposes to a new image. This approach was done on 32 species and secured the identification accuracy of 90%. For automated plant species identification CNN approach is used in this approach do not necessary to differentiate feature detection and extraction step both can become part of iterative training step Fig.14 represent the fundamental steps of supervised machine learning. Table.3 represents accuracies based on different models.
Fig 14. Steps of supervised machine learning
Table 3. Accuracies for Model-based approach, model-free approach, deep learning for different dataset
Data set Model-based
approach Model-free approach Deep learning
Swedish leaf 82.0% 93.7% 99.8%
Flavia 90.3% 95.9% 99.7%
Leafsnap 73.0% 72.6% 97.6%
ICL 83.8% 91.3% 93.9%
Oxford flower 17 - 91.8% 96.6%
Oxford flower 102 - 90.2% 96.6%
Jana waldchen, Patrick mader  Image-based approach is considered a very good approach for plant species identification. Image classification has the following steps Image acquisition, pre-processing Feature extraction and descriptive, classification. Fig.15. represents the stages of the image-based classification process. Image Acquisition this step contain taking the image of a leaf or a part of a plant which have to identify followed by pre-processing step have removed the unwanted features in the image and consider the desired image this step mainly concentrate on segmentation followed by feature extraction step will extract the feature like geometric, texture, colour, shape based on feature extraction classification of done.
Fig.15.Steps followed in image-based classification
This Artificial Neural Network  is a computational model composed by a set of artificial neurons i.e. processing units that are inter-connected with other neurons just like the functioning of the human brain. Each of these systems is composed of 3 layered system, I.e., Input layer, Hidden Layer and Output Layer. Here, the input layer contains neurons which transfer data via synapses to the next layer, the hidden layer. Similarly, this hidden layer transfers this data to the last layer, the output layer.
Multilayer Feed-forward ANNs allow the signals to travel one way only; from input to output. Here, the output of any layer does not affect that same layer.
In this convolution operation, different features of the inputs are extracted. The low-level features are extracted in the first convolution layer i.e., corners, edges, and lines. Whereas, The higher-level features are extracted from the Higher-level layers. Here, The CNNs in particular and the Non-linear layers, Neural networks, in general, are relying on a non-linear ‘trigger’ function to signal distinct recognition of likely features on each hidden layer.
The below table represents the performed training using a labelled data set that consists of inputs in a wide range of representative input patterns which are tagged with their expected output responses.
Table 4. Training performed on different data sets.
In , The goal of Deep learning architectures is to yield more abstract and discriminative representations. Hence, they are formed by multiple nonlinear and linear transformations of input data. The state-of-the-art has gradually improved in the field of speech recognition, object detection, visual object recognition etc. and many more domains such as genomics and drug discovery.
III. COMPARISON ANALYSIS
Table 5.Comparision of method and Accuracies based on Flavia and Folio Dataset:
Flavia ACO Algorithm  Shape, Morphology texture, Colour 96.25%
Flavia I. Model free approach
II. Model based approach
III. Deep Learning  - 95.9%
Flavia K-NN  Shape 76.96%
Flavia K-NN  Vein Morphometric 91.62% (80-20*)
91.4% (3-fold CV)
92.19% (5-fold CV)
Flavia Decision Tree  Vein Morphometric 76.22% (80-20*)
81.37% (3-fold CV)
86.28% (5-fold CV)
Flavia ML Perceptron  Vein Morphometric 94.89% (80-20*)
94.11% (3-fold CV)
95.38% (5-fold CV)
Flavia AdaBoost with MLP  Vein Morphometric 95.08% (80-20*)
94.85% (3-fold CV)
95.42% (5-fold CV)
Flavia CNN Shape, vein 95.5%
Flavia K-NN, DT Colour, shape 91.30%
Folio K-NN  Shape, Colour histogram 87.3%
Folio CNN Architecture  Shape, Colour histogram 97.7%
FCA ACO Algorithm  Shape, Morphology texture, Colour 94.81%
Non-Green Plants(Simple leaves) Layered Approach(NFC Based)  Colour + Leaf shape 91%
Green Plants(Simple leaves) Layered Approach(NFC Based)  Colour + Leaf shape 98.7%
Compound Leaves Layered Approach(NFC Based)  Leaf shape 96%
BJFU100 ResNet26 - 91.78%
Swedish K-NN Shape 95.73%
Swedish K-NN Shape 96.53%
Swedish CNN Shape, vein 98.22%
Swedish Fuzzy K-NN Shape, texture 99.25%
In , databases are classified as Flavia and FCA dataset. Flavia dataset with 1600 samples, out of this 960 sets were used for training, 320 for validation and 320 for final testing and the accuracy increased from 93.12% to 96.25%. Flavia dataset was used and for testing, training and classification various approaches were used. In the 80-20 approach 80% is used for training the dataset and the remaining is used for testing. The second approach was the 3-Fold and the 5-Fold cross-validation. In both, the approaches two datasets are used for training and the third is used for testing. Four different types of classification are used for classification as shown in the Table.5. Of which AdaBoost with MLP achieved the best accuracy.
A dataset called Folio was created by collecting 32 different species and 20 pictures of each species. It was observed that as the number of species considered was increased the accuracy percentage reduced. When eight species were considered the accuracy of 97.0% was achieved and it reduced to 87.2% when 32 species were considered .
Whereas in FCA dataset out of 450 samples, 225 were used for training, 90 for validation and 135 for final testing and gained accuracy of 93.33% .
In , NFC based classification is done which helps in characterizing heterogeneous leaf types, the scope for customization etc. Above table 5. depicts the accuracy between non-green and green leaves.
In this survey paper, Various efficient plant species recognition systems using morphological features and layered method for feature Extraction is mentioned. An automated recognition system using the multilayer feed-forward neural network is done for the plant's leaf images.
Various solutions like image processing combined with a shallow architecture and a deep architecture have been proposed for classification of the leaf. It can also be done by K-NN, Decision tree, Multilevel perceptron, Adaboost algorithm, Layered approach, SOM Neural Network, Ant colony optimisation for obtaining improved precision rate. Out of many approaches mentioned in this paper, The results showed in Table 10 closely competes with the latest extensive approaches on differentiating leaf features under CNN- based leaf classification.
Among the proposed models in this paper, the ResNet26 results in 91.78% accuracy in the test, which shows the demonstration of deep learning technology for large-scale plant classification in Natural Environment.
These are the various approaches, which can be opted for the most accurate results of different plant species recognition.