Automatic Detection of Flavonoids from Spectroscopic Images by Fusion of Two-Dimensional Convolution Product with Multi-Scale Textural Descriptors ()
1. Introduction
In Côte d’Ivoire, limited access to healthcare and medicines, due to high costs and poor infrastructure, is pushing more and more households to resort to traditional medicine as an alternative [1]. This practice utilizes the benefits of compounds found in plants, such as flavonoids, known for their antioxidant, anti-inflammatory, antiviral, anticancer, and neuroprotective properties, helping to prevent certain cancers, heart attacks, and strengthen the immune system [2]. However, this seemingly safe practice carries major dangers such as the risk of overdose, unexpected side effects, and drug interactions [3]. Moreover, current analytical methods, such as thin-layer chromatography (TLC), widely used for the analysis of plant extracts and described in detail by [4] as a simple, effective method that can be adapted to many types of compounds, still rely on manual interpretations, making them prone to errors and variable results. Given these limitations, the integration of digital tools, particularly those derived from image processing, offers an opportunity to automate and make chromatogram analysis more reliable. In particular, image segmentation allows for the automatic extraction of areas of interest (spots) on TLC plates, thus facilitating the rapid, objective, and reproducible identification of flavonoids [5]. This raises the following challenge: how to design and implement a method integrating digital technologies, such as image segmentation, to automate flavonoid identification?
The main objective is to establish a system capable of automatically detecting, classifying, and quantifying flavonoids present on a chromatographic plate, with greater precision and reproducibility than traditional interpretation methods. Specifically, this research will involve extracting and separating flavonoids using thin-layer chromatography, performing digital processing of the chromatograms using segmentation, and comparing the performance of the digital approach with chemical analysis.
This dissertation is structured into three main chapters, in addition to the introduction and conclusion. The first chapter provides general information on the topic; the second details the equipment and methods used; and finally, the third chapter presents the results obtained and provides a discussion.
2. Materials and Methods
2.1. Materials
2.1.1. Chemical Reagents and Experimental Design
In our study of flavonoids, we used several devices.
For compound extraction, we used an AGROLAB M2-A analog magnetic stirrer with heating and a stirring water bath (Figure 1); a magnetic mill and a balance (see Figure 2).
Figure 1. (a) An AGROLAB M2-A analog heating magnetic stirrer; (b) A stirring water bath.
Figure 2. (a) A magnetic crusher; (b) A balance.
For the thin-layer study, we used:
A TLC chromatography plate,
An eluent or mobile phase,
A chromatography cuvette,
A 254 nm and 366 nm ultraviolet (UV) lamp (see Figure 3).
Figure 3. A 366 nm and 254 nm UV lamp.
For image processing, we used:
An HP Probook 440 G6 laptop with the following specifications:
An Intel Core i5-8265U CPU 1.60 GHz 1.80 GHz,
16 GB of RAM,
Windows 11 Professional operating system version 23H2.
Matlab software: version 9.13.0.2049777 (R2022b).
The rear camera of an iPhone 12.
A 12-megapixel wide-angle camera with a 120˚ viewing angle (equivalent to 26 mm in 35 mm format),
A 12-megapixel ultra-wide-angle sensor with a 120˚ viewing angle (equivalent to 13 mm in 35 mm format).
2.1.2. Plant Material
A survey of medicinal plant sellers identified two types of plants, namely Paullinia pinnata Lim, a species widespread in secondary forest regions undergoing reforestation and along riverbanks in savannahs. It is never found in overly dry areas. It is found in all plant formations in Côte d’Ivoire under the names: Ahé-biébun (Akyé); Trondi (Baoulé); Mlan nomon (Malinké), or Gbessagbébroh (Bété). This plant is presented in Figure 4.
Figure 4. Paullinia pinnata.
This plant has many therapeutic effects; a decoction of the leaves is used in infants with oral candidiasis and can be administered orally or rectally. Portions of the roots macerated in palm wine make an aphrodisiac drink. We also used Morinda lucida, which is a species of grassy savannahs, thickets, and forests. It is common from the forest region to the savannah and is known as Lélik’n in Adioukrou. It is shown in Figure 5.
Figure 5. Morinda lucida.
Like Paullinia pinnata, Morinda lucida also possesses numerous therapeutic effects. Decoctions of the roots, bark, and leaves are recognized remedies for various types of fever, including yellow fever, malaria, and fever during childbirth. It is also used to treat diabetes, hypertension, dysentery, and stomach aches.
2.2. Methods
2.2.1. Extraction of Phytocompounds
1) Hydromethanolic Maceration
Each plant sample is ground and 15 g of each of the two powders obtained are macerated in 100 ml of 80% (v/v) MeOH for 24 hours with constant stirring. The process is repeated twice, keeping the same grounds, but with a fresh solvent. After filtration through a Büchner funnel, the combined hydromethanolic macerates were concentrated under reduced pressure at 75˚C using a water bath to give two crude hydromethanolic extracts of Morinda lucida leaves (E1) and Paullinia pinnata leaves (E2).
2) Preparation of Selective Extracts
Extracts E1 and E2 were treated with 3 × 20 mL of hexane, chloroform, and ethyl acetate, respectively. The different organic fractions are concentrated in a water bath under reduced pressure, then stored in the refrigerator to give the selective chloroform extracts
, ethyl acetate
which were used for tests (phytochemical screening).
3) Phytochemical Screening
The screening was carried out following the analytical procedures described in the work of [6].
Two drops of each selective extract were deposited using a capillary tube onto the points (2.5 cm apart) of the baseline, drawn 1 cm from the bottom of the chromatoplates. The deposits were allowed to dry for a moment before introducing the plates into the chromatography tank containing the migration solvent (developing agent). After development and drying, the chromatograms were revealed with reagents specific to the groups of phytocompounds sought, then visualized first in the visible light and then under a UV lamp at 365 nm.
The migration solvents used were:
CHCl3/AcOEt/hexane (10:10:5; v/v/v) and CHCl3/(CH3)2CO/(Et)2 NH (10:8:2; v/v/v) for the chloroform fractions.
CHCl3/AcOEt/CH3 CO2H (12:10:1; v/v/v) and AcOEt/CH3OH/H2O/CHCl3 (18:2, 4:2, 1:6; v/v/v/v) for the ethyl acetate fractions.
The developers used are:
Aluminum chloride (AlCl3) which reveals flavonoids as a yellow spot in the visible spectrum and as a blue to brown spot under UV at 365 nm.
Potassium hydroxide (KOH) which reveals coumarins as yellow spots in the visible spectrum and as a spot whose color varies from intense yellow to blue or green under UV.
Ammonia which reveals flavonoids with a green color in the visible spectrum and under UV. It also reveals anthocyanins with a blue or violet color under UV.
Aluminum chloride + acetone (as a substitute for Neu’s reagent) which reveals flavonoids as yellow and brown spots immediately or after 15 minutes in the visible spectrum and as orange, red, yellow, blue, and green spots under UV [7].
2.2.2. Image Processing Method
1) Convolution Product
2D discrete convolution (formula 8) is applied to each image to enhance the contours of the segmented regions.
We implemented the code for processing the various images using the Sobel kernel, a powerful edge detection filter in Matlab. Since the image is represented as a 3D matrix (colors), we used the Sobel edge detection kernel using the following equation:
(1)
2) Image Segmentation
Image analysis was performed using a color-based region segmentation algorithm. Each chromatographic spot is considered a region of interest (ROI). Thus, several parameters were extracted, including the average pixel intensity of the segmented region given by the formula:
(2)
Where M and N are the image dimensions (MN is the total number of pixels) [8].
The entropy by the formula:
(3)
Where p(i) denotes the probability of occurrence of gray level i, L denotes the total number of possible gray levels (256) [9].
The eccentricity (it measures the elongation of the region; the region is circular if it is 0 and it is elongated if it is 1) is given by the formula:
(4)
where a and b denote the length of the major and minor semi-axes, respectively [10]. The standard deviation is defined by:
(5)
where
denotes the mean of the variables
and defined by:
(6)
We also have ct, which denotes the weighted centroid of the pixels (it determines the luminous center of the spots), and also which represents the orientation.
Once the various parameters have been calculated, they will allow us to establish a database that will automatically identify flavonoids on thin films. To do this, we will calculate normalized Euclidean distances.
Normalized Euclidean distance is a variant of classical Euclidean distance that incorporates a normalization step before calculating the data. Normalization becomes necessary when variables have different scales, which could bias the importance that each dimension contributes to the overall distance [11]. Variable normalization is defined by:
(7)
where xi is the variable to be normalized, μi is the mean of variables xi, and σi is the standard deviation of variables xi [12].
We used normalized Euclidean distance by calculating the distance between the characteristic vectors of the ores. It is calculated according to the following formula:
(8)
where:
ai and bi represent the parameter values in the compared regions, respectively.
σi is the standard deviation of variable i, used to normalize the distance and avoid bias.
3. Results and Discussions
3.1. Results
3.1.1. Presentation of the Extracts and Developers Used
1) Nature of the Extracts and Choice of Developers
For this study, we used chloroform and ethyl acetate extracts. These solvents were used due to their polarity; their difference in polarity allows for better extraction. Chloroform, being slightly polar, allows for the extraction of less hydroxylated or glycosylated flavonoids. Ethyl acetate, being more polar, allows for the extraction of more hydroxylated or less methylated flavonoids [13].
2) Initial Qualitative Observations
The chromatograms obtained after development indicate the presence of spots of varying colors and intensities depending on the developers and extracts. We observed distinct, separate spots, as well as overlapping spots. These preliminary results confirm the presence of phenolic compounds and are recorded in Figure 6 in the form of chromatogram images, where each letter represents the use or absence of developer.
Similarly, the chromatograms for the ethyl acetate extracts are obtained in Figure 7.
Figure 6. Chloroform fractions.
Figure 7. Ethyl acetate extracts.
3) Observation with the convolution product
After being processed by the various developers, the image was then processed by the convolution product using the Sobel kernel. The latter was used for its sensitivity to the contours of a region but also for its computational speed [14]. The convolution product therefore allowed us to obtain the following images in Figures 8-10.
Figure 8. AlCl3 and KOH extracts treated by convolution.
3.1.2. Extraction and Processing of Image Parameters
1) Segmentation Method and Visualization of Segmented Images
Image analysis was performed using an algorithm based on pixel color and intensity. This choice led us to use region-based segmentation. This method allows us to segment grayscale images in order to extract various parameters from them, as shown in Figure 11 and Figure 12.
Figure 9. AlCl3 and AlCl3 + acetone extracts treated by convolution.
Figure 10. KOH and developer-free extracts processed by convolution.
Figure 11. Chloroform fractions (gray level).
Figure 12. Ethyl acetate fractions (gray level).
The segmented images allowed us to determine the characteristic parameters, the values of which are listed in Table 1 and Table 2, depending on the fractions and developers used. In the table, at represents the standard deviation, bt the average pixel intensity, ct the weighted centroid, and the orientation, S the equidistance and E the entropy.
Table 1. Parameter values of the chloroform fraction.
Parameter |
at |
bt |
ct |
et |
S |
E |
AlCl3 |
118.11 |
0.6695 |
635.8027 |
73.1913 |
5.2097 |
9.2495 |
KOH |
112.62 |
0.8423 |
650.7392 |
73.4299 |
4.7954 |
7.6464 |
Without developer |
68.59 |
0.7584 |
591.5805 |
23.5993 |
2.4102 |
3.2901 |
Table 2. Values of the parameters of the ethyl acetate fraction.
Parameter |
at |
bt |
ct |
et |
S |
E |
AlCl3 + acetone |
74.13 |
0.7546 |
330.9494 |
17.1366 |
4.481 |
4.1661 |
KOH |
74.28 |
0.854 |
329.71765 |
17.6278 |
2.6497 |
3.3746 |
Without developer |
76.9 |
0.929 |
281.02295 |
20.838 |
2.8393 |
3.53 |
AlCl3 |
73.91 |
0.855 |
322.9612 |
17.6667 |
3.288 |
3.2206 |
3.1.3. Visualization by Histograms
The extracted parameters allowed us to create band histograms. Each histogram represents a well-defined fraction. Each band represents a parameter, and each group of bands represents a developer. Figure 13 and Figure 14 thus yield a representation of the parameters as a function of the developers.
Figure 13. Chloroform fraction.
Figure 14. Ethyl acetate fraction.
From the histograms, we note that the parameters at, bt, ct, and have approximately the same height regardless of the developer used. Analysis of these histograms allows us to conclude that a single developer may be sufficient to identify flavonoids.
3.1.4. Statistical Analysis of Parameters
1) Calculation of Mean Parameter Values and Standard Deviation
Mean parameter values were calculated for each fraction. This step provides an overall representation of the behavior of each extract. The mean parameter values reflect the overall trend of the parameters. To do this, we calculated the error on the mean
, expressed by formula 13:
(9)
The various results are shown in Table 3 and Table 4. These tables present the mean values of the characteristic parameters of flavonoids according to the fractions.
Table 3. Mean parameters (Chloroform).
Parameter |
at |
bt |
ct |
et |
S |
E |
AlCl3 |
118.11 |
0.6695 |
635.30275 |
73.1913 |
5.2097 |
9.2495 |
KOH |
112.62 |
0.8423 |
650.23925 |
73.4299 |
4.7954 |
7.6464 |
Without developer |
68.59 |
0.7584 |
591.5805 |
23.5993 |
2.4102 |
3.2901 |
|
99.7733333 |
0.75673333 |
626.040833 |
56.7401667 |
4.13843333 |
6.72866667 |
|
15.6720051 |
0.04989002 |
17.7614816 |
16.5705765 |
0.8723539 |
1.78047634 |
Table 4. Parameter means (ethyl acetate).
Parameter |
at |
bt |
ct |
et |
S |
E |
AlCl3 + acétone |
74.13 |
0.7546 |
330.9494 |
17.1366 |
4.481 |
4.1661 |
KOH |
74.28 |
0.854 |
329.71765 |
17.6278 |
2.6497 |
3.3746 |
Without developer |
76.9 |
0.929 |
281.02295 |
20.838 |
2.8393 |
3.53 |
AlCl3 |
73.91 |
0.855 |
322.9612 |
17.6667 |
3.288 |
3.2206 |
|
74.805 |
0.84815 |
316.1628 |
18.317275 |
3.3145 |
3.572825 |
|
0.70245403 |
0.03578811 |
11.8441434 |
0.84885568 |
0.41121756 |
0.20759833 |
The various average parameter values obtained constitute control vectors for flavonoid identification. We created a database from these values.
2) Flavonoid Identification
Figure 15 and Figure 17 present different segmented regions of the chloroform and ethyl acetate fractions, respectively. These regions include secondary metabolites to be identified. Regions R1, R2, and R3 in Figure 15 are the regions to be characterized in the chloroform fraction. Regions R1, R2, and R3 in Figure 16 are the regions to be characterized in the ethyl acetate fraction.
Figure 15. Chloroform fraction (test) and its segmented regions.
Figure 16. Segmented regions in gray level (chloroform extract).
Figure 17 and Figure 18 present the binary images resulting from the segmentations of the chloroform and ethyl acetate fraction regions, respectively. The characteristic parameter values are obtained after segmentation and binarization of these regions.
Figure 17. Ethyl acetate extract and its segmented regions.
Figure 18. Segmented regions in grayscale (ethyl acetate).
The different parameter values for each fraction constitute characteristic vectors of these regions and the secondary metabolites they contain. These different vectors were used to characterize the regions by calculating the normalized Euclidean distance.
To identify secondary metabolites in thin layers, we calculated the normalized Euclidean distances of the different vectors obtained. A comparative study was carried out based on the control vectors and the vectors containing the parameters of the regions (spots) to be identified. We chose 0.05 as the normalized Euclidean distance threshold; a value chosen in the work of [15] to demonstrate strong similarity between two elements. Thus, two elements will be identical if the Euclidean distance is less than or equal to 0.05. Table 5 and Table 7 contain the elements of the control vectors and the vectors containing the parameters of the regions to be identified for the chloroform and ethyl acetate fractions, respectively. Table 6 and Table 8 present the values of the normalized Euclidean distances between the control vectors and the vectors containing the parameters of the regions to be identified.
Table 5. Normalized vectors (chloroform).
Vectors |
at |
bt |
ct |
et |
S |
E |
Vt |
99.7733333 |
0.75673333 |
626.040833 |
56.7401667 |
4.13843333 |
6.72866667 |
VR1 |
116.13 |
0.4419 |
590.7339 |
51.4255 |
2.3818 |
3.198 |
VR2 |
138.05 |
0.7324 |
93.57425 |
53.8183 |
3.2629 |
3.5242 |
VR3 |
99.65 |
0.7523 |
626.12105 |
56.7319 |
4.1264 |
6.7081 |
Table 6. Normalized Euclidean distance (chloroform).
Vectors |
(Vt, VR1) |
(Vt, VR2) |
(Vt, VR3) |
dnorm |
4.88480437 |
4.3225579 |
0.04024532 |
Table 6 presents the normalized Euclidean distance between the control vectors and the vectors of the regions to be characterized for the chloroform fraction. The values of the distances dnorm (Vt, VR1) and dnorm (Vt, VR2) are greater than the threshold of 0.05. These values indicate that these regions do not contain flavonoids. However, the value of the distance dnorm (Vt, VR3) is less than 0.05. This result demonstrates the presence of flavonoids in this region.
Table 7. Normalized vectors (acetate).
|
at |
bt |
ct |
et |
S |
E |
Vt |
74.805 |
0.8481 |
316.1628 |
18.317 |
3.3145 |
3.5728 |
VR1 |
73.25 |
0.8042 |
310.6527 |
17.3467 |
3.2176 |
4.0374 |
VR2 |
149.68 |
0.8446 |
183.61485 |
89.638 |
12.3589 |
5.2353 |
VR3 |
74.83 |
0.8487 |
315.8241 |
18.313 |
3.4001 |
3.573 |
Table 8. Normalized Euclidean distance (ethyl acetate).
Vectors |
(Vt, VR1) |
(Vt, VR2) |
(Vt, VR3) |
dnorm |
2.45303378 |
5.23192175 |
0.03934469 |
Table 8 presents the normalized Euclidean distance between the control vectors and the vectors of the regions to be characterized for the ethyl acetate fraction. The values of the distances dnorm (Vt, VR1) and dnorm (Vt, VR2) are greater than the threshold 0.05. These values show that these regions do not contain flavonoids. On the other hand, the value of the distance dnorm (Vt, VR3) is less than 0.05. This result shows the presence of flavonoids in this region for the ethyl acetate fraction.
3.2. Discussions
The characterization of digital images integrating secondary metabolites from thin layers allowed us to highlight flavonoids. Several parameters were calculated, based on which we plotted histograms. These histograms allowed us to conclude that it is possible to characterize the chloroform and ethyl acetate fractions using a single developer.
The results obtained indicate that the R3 regions of the chloroform and ethyl acetate fractions are flavonoids (their Euclidean distances are respectively less than 0.05, indicating a strong correlation between these areas and our controls), which is not the case for the R2 and R1 regions. These results, verified by [16], confirm the reliability of the implemented image segmentation-based method. By relying on measurable parameters such as average pixel intensity or entropy, it eliminates biases related to subjective interpretation. These results confirm the observations of [17] and [18], which highlight the value of automated reading of chromatograms through image analysis. Finally, the very low normalized Euclidean distances (0.04 for chloroform extracts and 0.03 for acetates) demonstrate the strong consistency between our numerical measurements and the results of conventional methods expected, particularly in chemistry, thus validating the robustness of our method.
Conventional analysis methods present several well-known difficulties, including spot overlap and sometimes insufficient resolution; these limitations are even more pronounced when working with complex mixtures. In this context, the work of [19] has shown that manual identification of flavonoids in polyphenol-rich extracts can lead to false positives, particularly due to interference from other compounds present in the extracts. Faced with these shortcomings, our image segmentation approach provides a concrete solution. Using segmentation algorithms based on pixel color and intensity, we were able to accurately isolate chromatographic spots, even in cases where contrast was low. This represents an advancement compared to simple observation with the naked eye.
Despite the generally satisfactory performance of our method, some technical constraints remain and deserve to be taken into account to optimize the results. One of the first limitations concerns the quality of the captured images, particularly with a smartphone (iPhone 12 in our case). Although this device offers good resolution, it sometimes introduces variations in the extracted parameters, as shown by the relatively high standard deviation observed for the mean intensity in the chloroform extracts (at = 15.67). These fluctuations can affect reproducibility, particularly when seeking to establish a stable and reliable database for automated recognition. In parallel, other technical challenges were observed, including the overlap of certain spots on the chromatograms, often linked to imperfect migration or high concentrations. In this case, the algorithm can merge several spots, where the human eye still perceives differences. To overcome this limitation, more advanced approaches, such as active contours (Snakes) or methods derived from deep learning such as [20], would allow better separation of the areas of interest. Finally, the processing of undeveloped extracts has proven to be more complex: the spots are often not very marked and their detection less reliable. The addition of adaptive contrast preprocessing or the use of a more sensitive universal developer would improve the initial detection, particularly for compounds present in trace amounts.
4. Conclusion
This work has demonstrated that it is entirely possible to automate the identification of flavonoids on thin films (TLC) through the integration of digital tools such as image segmentation. This approach provides a fast, reliable, and reproducible alternative to traditional manual methods, which are often time-consuming and subject to interpretation. In a context such as that of Côte d’Ivoire, where more and more people are turning to traditional medicine, this solution makes perfect sense. It addresses a concrete need to better supervise the analysis of medicinal plants, using modern tools, without relying on cumbersome infrastructure or expensive resources. The results obtained showed good agreement with traditional visual and chemical observations, while providing significant gains in accuracy and speed. The very low Euclidean distances between the test and control profiles, as well as the increased sensitivity to weakly revealed stains, demonstrate the real potential of this method.