1. Introduction
Wood identification can be of vital importance for designing, monitoring, and establishing sustainable wood product value chains and for ensuring legality under laws and policies governed by international treaties (e.g., the Convention on the International Trade in Endangered Species of Flora and Fauna) as well as national laws and policies (e.g., the United Statesʼ Lacey Act., and 2012 Illegal Logging Prohibition Act of Australia). Wood identification is traditionally performed by wood anatomy experts in a laboratory setting and relies on the ability of human experts to recognize and differentiate anatomical features. Recently, to tackle the paucity of traditional wood identification expertise (
Wiedenhoeft et al. 2019), computer vision wood identification (CVWID) systems have been applied both in the laboratory and in the field to address the challenge of identifying wood without a trained expert’s eye (
Khalid et al. 2008;
Martins et al. 2013;
Filho et al. 2014;
Figueroa-Mata et al. 2018;
Ravindran et al. 2018,
2019,
2021;
Damayanti et al. 2019;
de Andrade et al. 2020;
Ravindran and Wiedenhoeft 2020;
Souza et al. 2020). The open-source XyloTron platform (
Ravindran et al. 2020,
2021) has shown potential for real-time, field-deployable, screening-level wood identification (
Ravindran et al. 2019,
2021;
Ravindran and Wiedenhoeft 2020;
Arévalo et al. 2021), and with the XyloPhone (
Wiedenhoeft 2020), it is possible to move from laptop-based devices to smartphones for field deployment. Both the XyloTron and XyloPhone platforms provide an imaging system that enable the capture of macroscopic features (
Miller et al. 2002;
Ruffinatto et al. 2015) suitable for wood identification.
Designing high-performing, scalable CVWID systems requires understanding wood anatomy and how that anatomy influences the training, performance, and deployability of convolutional neural networks (CNN) (
Ravindran et al. 2022) or other machine-learning-based models (
de Geus et al. 2021).
Hwang and Sugiyama (2021) report the classification accuracy of numerous CNN models used in wood identification studies, with most prior works demonstrating a high
in silico accuracy of 90% and better with similar performance across different architectures, but most of those studies do not report any subsequent model testing on new, unique specimens, so their real-world applicability is unknown. It may be the case that for CVWID the number of classes, number of training images (coverage of anatomical variation), quality of specimen surface preparation (visibility of anatomical features), quality of images (clarity of anatomical features), the size of the area imaged vis-à-vis the scale of diagnostic anatomical features, and the degree to which the anatomical features among the classes are similar are all likely important factors for CNN architecture design and eventual field performance of trained models. For this reason, it is vitally important to attempt to evaluate how wood anatomy at a range of scales affects imaging and CVWID model performance.
Ravindran et al. (2022) estimated that approximately 40 classes of North American hardwoods need to be included in a field-deployable computer vision model for the North American market, a number substantially greater than anything previously published for this region, either in terms of macroscopic images (
Lopes et al. 2020, 10 classes) or at the naked-eye level (
Wu et al. 2021, 11 classes). As noted in
Ravindran et al. (2022), the influence of class number on CVWID models is unknown, especially for North American hardwoods, where there are, broadly speaking, two wood anatomically distinct groups of woods — the diffuse-porous woods and the ring-porous woods. They therefore used a fundamental domain-specific factor, porosity, to inform taxa selection and label space design. In general, diffuse-porous woods show less wood anatomical spatial heterogeneity with regard to radial growth rate, growth ring domains (earlywood vs latewood), and physiological age of the wood (
Ravindran et al. 2022). Diffuse-porous woods of North America also show comparatively lower overall wood anatomical variability (e.g., axial parenchyma patterns, vessel arrangement, and ray width and frequency), than, for example, diffuse-porous tropical woods (e.g.,
de Andrade et al. 2020;
Arevalo et al. 2021), or compared with the latewood of ring-porous North American woods (
Fig. 1).
Ravindran et al. (2022) therefore separated the North American hardwoods into two groups: the diffuse-porous woods of the earlier work and the ring-porous woods addressed herein.
Unlike diffuse-porous hardwoods, ring-porous hardwoods, by definition, show dramatic differences between earlywood and latewood within a growth ring and among species (
Fig. 1). Due to the spatial heterogeneity displayed by ring-porous woods, it is possible, depending on the area of tissue captured and the respective sizes of the earlywood and latewood regions, to obtain an image that does not exhibit all the anatomical characteristics that typify the wood. Fast radial growth can result in images that show only latewood (
Fig. 1C), that is, only the latter-formed portion of a single growth ring. Tangentially varying features (e.g., broad rays in
Quercus;
Fig. 1B) may be absent in some images. Slow radial growth can produce an image that is primarily earlywood (
Fig. 1D). The impact of such spatial heterogeneity as reflected in test images is unknown and unexplored. An initial work purporting to use CVWID to classify ten ring-porous North American hardwoods did not appear to consider spatial heterogeneity related to wood anatomy (
Lopes et al. 2020). Furthermore, the apparently subpar image quality of that dataset was first questioned (
Wiedenhoeft 2020) and later the machine-learning analysis and underlying dataset were demonstrated to be inherently flawed based on data hygiene for CVWID inference (
Ravindran and Wiedenhoeft 2022).
In this study, we develop a CVWID model to identify 17 classes of North American ring-porous woods using the XyloTron platform and a CNN. In addition to performance evaluation for accuracy and domain-informed examination of model misclassifications, we investigate the influence of wood anatomical spatial heterogeneity of ring-porous woods on specimen level model predictions and discuss how other forms of wood anatomical heterogeneity are thus potentially capable of influencing model performance in field deployment settings. Finally, we propose a path for future research for developing a robust, highly accurate, field-deployable, unified North American hardwood model.
3. Results
The top-1 prediction accuracy for the specimen level cross-validation model was 98.0%. When tested on the PACw + MSUtw dataset, the top-1 and top-2 cross-validation accuracies were 91.9% and 98.3%, respectively. The field model top-1 accuracy was 91.4%, and the top-2 accuracy was 100%.
Table 5 shows the summary of the cross-validation (accumulated over the five folds) and field model’s prediction accuracies. Confusion matrices for the cross-validation and field models are shown in
Figs. 2 and
3, respectively.
Example images of Type 1 and Type 3 misclassifications from the field model’s confusion matrix (
Fig. 3) are shown in
Fig. 4. A summary of misclassification data for the field model is presented in
Table 6.
For the top-1 accuracy of the field model, 11 classes showed zero source misclassifications on the PACw + MSUtw dataset: Asimina, Carya, Castanea, Catalpa, Celtis, Fraxinus, Gymnocladus, Morus, Robinia, Sassafras, and UlmusH. At least one source misclassification was shown in the remaining six classes (
Fig. 3), with 17 misclassified specimens of 198 test specimens in total. Six classes provided source misclassifications, and those misclassified specimens were attributed to the five following classes: Gymnocladus, QuercusR, QuercusW, Robinia, and UlmusH. There were five classes that drew sink misclassifications: Gymnocladus, QuercusR, QuercusW, Robinia, and UlmusH. Eight classes showed neither source nor sink misclassifications: Asimina, Carya, Castanea, Catalpa, Celtis, Fraxinus, Morus, and Sassafras.
Table 6 summarizes the number and proportions of misclassification types. Fifteen of the 17 (88.2%) misclassifications were Type 1. There were only two out of 17 (11.8%) misclassifications that were of Type 3, and there were zero Type 2 misclassifications.
When tested against the three spatial heterogeneity datasets, prediction accuracy of the field model remained nearly unchanged at 91.3% in the case of the Slow-Growth dataset, and fell by 11.4% for the Fast-Growth dataset and 8.3% for the Broad Rays Absent (QuercusW) dataset. Of the Slow-Growth dataset, a specimen from class Cladrastis was predicted as Gymnocladus and a specimen from the class QuercusW was predicted as QuercusR. Within the Fast-Growth dataset, two specimens from the class UlmusS were predicted as UlmusH. The Broad Rays Absent dataset, which consisted of the class QuercusW, had one of six specimens misclassified as QuercusR.
Table 7 summarizes the accuracies for the three spatial heterogeneity datasets when tested with the field model. A comparison of the test specimen and an example image of the predicted class of each spatial heterogeneity dataset is shown in
Fig. 5.