Cookies Notification

We use cookies to improve your website experience. To learn about our use of cookies and how you can manage your cookie settings, please see our Cookie Policy.
×

Effects of numbers of observations and predictors for various model types on the performance of forest inventory with airborne laser scanning

Publication: Canadian Journal of Forest Research
18 November 2021

Abstract

Semi- and nonparametric models are popular in the area-based approach (ABA) using airborne laser scanning. It is unclear, however, how many predictors and training plots are needed to provide accurate predictions without overfitting. This work aims to explore these limits for various approaches: ordinary least squares regression (OLS), generalized additive models (GAM), least absolute shrinkage and selection operator (LASSO), random forest (RF), support vector machine (SVM), and Gaussian process regression (GPR). We modeled timber volume (m3·ha–1) for four boreal sites using ABA with 2–39 predictors and 20–500 training plots. OLS, GAM, LASSO, and SVM overfitted as the number of predictors approached the number of training plots. They required ≥15 plots per predictor to provide accurate predictions (RMSE ≤30%). GAM required ≥250 plots regardless of the number of predictors. The number of predictors only mildly affected RF and GPR, but they required ≥200 and ≥250 training plots, respectively. RF did not overfit in any circumstances, whereas GPR overfit even with 500 training plots. Overall, using up to 39 predictors did not generally result in overfit, and for most model types, it resulted in better accuracy for sufficiently large datasets (≥250 plots).

Résumé

Les modèles semi et non paramétriques sont populaires avec l’approche territoriale (AT) qui utilise le balayage laser aéroporté. On ignore cependant combien de prédicteurs et de parcelles d’entraînement sont nécessaires pour générer des prédictions justes sans surapprentissage. Cet article vise à explorer ces limites pour différentes approches : régression des moindres carrés ordinaire (MCO), modèles additifs généraux (MAG), opérateur de sélection et réduction par moindres valeurs absolues (LASSO); forêt aléatoire (FA), machine à vecteurs de support (MVS), et régression de processus gaussien (RPG). Nous avons modélisé le volume de bois (m3·ha−1) dans quatre stations boréales à l’aide de l’AT avec 2 à 39 prédicteurs et 20 à 500 parcelles d’entraînement. Les approches MCO, MAG, LASSO et MVS surajustaient lorsque le nombre de prédicteurs approchait le nombre de parcelles d’entraînement. Elles exigeaient au moins 15 parcelles par prédicteur pour générer des prédictions justes (EMQ ≤ 30 %). L’approche MAG nécessitait au moins 250 parcelles peu importe le nombre de prédicteurs. Le nombre de prédicteurs influençait seulement légèrement les approches FA et RPG mais elles exigeaient respectivement au moins 200 et 250 parcelles. L’approche FA ne surajustait pas quelles que soient les circonstances, tandis que l’approche RPG surajustait même avec 500 parcelles d’entraînement. Dans l’ensemble, l’utilisation de 39 prédicteurs n’engendrait généralement pas de surapprentissage et, avec la plupart des types de modèles, cela se traduisait par une précision accrue pour des jeux de données suffisamment importants (au moins 250 parcelles). [Traduit par la Rédaction]

Get full access to this article

View all available purchase options and get full access to this article.

References

Belgiu M. and Drăgu L. 2016. Random forest in remote sensing: a review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 114: 24–31.
Breiman L. 2001. Random forests. Mach. Learn. 45(1): 5–32.
Cawley G.C. and Talbot N.L.C. 2010. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11: 2079–2107.
Cosenza D.N., Korhonen L., Maltamo M., Packalen P., Strunk J.L., Næsset E., et al. 2020. Comparison of linear regression, k-nearest neighbour and random forest methods in airborne laser-scanning-based prediction of growing stock. Forestry, 94: 311–323.
Crookston N.L. and Finley A.O. 2008. yaImpute: an R package for kNN imputation. J. Stat. Softw. 23(10): 1–16.
de Almeida C.T., Galvão L.S., Ometto J.P.H.B., Jacon A.D., Pereira F.R., de Souza Pereira F.R., et al. 2019. Combining LiDAR and hyperspectral data for aboveground biomass modeling in the Brazilian Amazon using different regression algorithms. Remote Sens. Environ. 232: 111323.
Dudani S.A. 1976. The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybernet. SMC-6(4): 325–327.
Eerikäinen K. 2009. A multivariate linear mixed-effects model for the generalization of sample tree heights and crown ratios in the Finnish National Forest Inventory. For. Sci. 55(6): 480–493.
Fassnacht F.E.E., Hartig F., Latifi H., Berger C., Hernández J., Corvalán P., and Koch B. 2014. Importance of sample size, data type and prediction method for remote sensing-based estimations of aboveground forest biomass. Remote Sens. Environ. 154: 102–114.
García-Gutiérrez J., Martínez-Álvarez F., Troncoso A., and Riquelme J.C. 2015. A comparison of machine learning regression techniques for LiDAR-derived estimation of forest variables. Neurocomputing, 167: 24–31.
Görgens E.B., Montaghi A., and Rodriguez L.C.E. 2015a. A performance comparison of machine learning methods to estimate the fast-growing forest plantation yield based on laser scanning metrics. Comput. Electron. Agric. 116: 221–227.
Görgens E.B., Packalen P., da Silva A.G.P., Alvares C.A., Campoe O.C., Stape J.L., and Rodriguez L.C.E. 2015b. Stand volume models based on stable metrics as from multiple ALS acquisitions in Eucalyptus plantations. Ann. For. Sci. 72(4): 489–498.
Gregoire T.G., Lin Q.F., Boudreau J., and Nelson R. 2008. Regression estimation following the square-root transformation of the response. For. Sci. 54(6): 597–606.
Hastie, T.J. 2020. gam: generalized additive models. Available from https://cran.r-project.org/package=gam [accessed 2 March 2021].
Hastie T. and Tibshirani R. 1986. Generalized additive models. Stat. Sci. 1(3): 297–310.
Hastie, T., Tibshirani, R., and Friedman, J. 2009. The elements of statistical learning. 2nd ed. Springer-Verlag, New York, NY.
Hawkins D.M. 2004. The problem of overfitting. J. Chem. Inf. Comput. Sci. 44(1): 1–12.
Hawkins D.M., Basak S.C., and Mills D. 2003. Assessing model fit by cross-validation. J. Chem. Inf. Comput. Sci. 43(2): 579–586.
Jain A.K., Duin P.W., and Mao J. 2000. Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1): 4–37.
James, G., Witten, D., Hastie, T., and Tibshirani, R. 2013. An introduction to statistical learning with applications in R. Springer New York, NY.
Karatzoglou A., Smola A., Hornik K., and Zeileis A. 2004. kernlab — an S4 package for kernel methods in R. J. Stat. Softw. 11(9): 1–20.
Kirkpatrick S., Gelatt C.D., and Vecchi M.P. 1983. Optimization by simulated annealing. Science, 220(4598): 671–680.
Kotivuori E., Korhonen L., and Packalen P. 2016. Nationwide airborne laser scanning based models for volume, biomass and dominant height in Finland. Silva Fenn. 50(4): 1567.
Kukkonen M., Maltamo M., Korhonen L., and Packalen P. 2019a. Multispectral airborne LiDAR data in the prediction of boreal tree species composition. IEEE Trans. Geosci. Remote Sens. 57(6): 3462–3471.
Kukkonen M., Maltamo M., Korhonen L., and Packalen P. 2019b. Comparison of multispectral airborne laser scanning and stereo matching of aerial images as a single sensor solution to forest inventories by tree species. Remote Sens. Environ. 231: 111208.
Laasasenaho, J. 1982. Taper curve and volume functions for pine, spruce and birch. Communicationes Instituti Forestalis Fenniae, No 108.
Li M., Im J., Quackenbush L.J., and Liu T. 2014. Forest biomass and carbon stock quantification using airborne LiDAR data: a case study over Huntington wildlife forest in the Adirondack park. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 7(7): 3143–3156.
Liaw A. and Wiener M. 2002. Classification and regression by randomForest. R News, 2: 18–22.
Liu R. and Gillies D.F. 2016. Overfitting in linear feature extraction for classification of high-dimensional image data. Pattern Recognit. 53: 73–86.
Maanmittauslaitos. 2021. Avoimien tiedostojen latauspalvelu (maastokarttasarja). [File service of open data (Topographic map series)]. Available from https://tiedostopalvelu.maanmittauslaitos.fi/tp/kartta [accessed 17 November 2021].
Maltamo M., Bollandsas O.M., Naesset E., Gobakken T., Packalen P., Bollandsås O.M., et al. 2011. Different plot selection strategies for field training data in ALS-assisted forest inventory. Forestry, 84(1): 23–31.
Maltamo, M., and Packalen, P. 2014. Species-specific management inventory in Finland. In Forestry Applications of Airborne Laser Scanning. Edited by M. Maltamo, E. Næsset, and J. Vauhkonen. Springer, Dordrecht. pp. 241–252.
McRoberts R.E., Chen Q., Domke G.M., Næsset E., Gobakken T., Chirici G., and Mura M. 2017. Optimizing nearest neighbour configurations for airborne laser scanning-assisted estimation of forest volume and biomass. Forestry, 90(1): 99–111.
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., and Leisch, F. 2020. e1071: misc functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. Available from https://cran.r-project.org/package=e1071 [accessed 4 March 2021].
Mountrakis G., Im J., and Ogole C. 2011. Support vector machines in remote sensing: a review. ISPRS J. Photogramm. Remote Sens. 66(3): 247–259.
Næsset E. 2004. Practical large-scale forest stand inventory using a small-footprint airborne scanning laser. Scand. J. For. Res. 19(2): 164–179.
Natural Earth. 2021. Free vector and raster map data (countries). Available from www.naturalearthdata.com [accessed 17 November 2021].
Niska H., Skön J.P., Packalén P., Tokola T., Maltamo M., and Kolehmainen M. 2010. Neural networks for the prediction of species-specific plot volumes using airborne laser scanning and aerial photographs. IEEE Trans. Geosci. Remote Sens. 48(3): 1076–1085.
Packalén P., Temesgen H., and Maltamo M. 2012. Variable selection strategies for nearest neighbor imputation methods used in remote sensing based forest inventory. Can. J. Remote Sens. 38(5): 557–569.
Packalen P., Strunk J., Packalen T., Maltamo M., and Mehtätalo L. 2019. Resolution dependence in an area-based approach to forest inventory with airborne laser scanning. Remote Sens. Environ. 224: 192–201.
QGIS Development Team. 2021. QGIS (Geographic Information System). Open Source Geospatial Foundation. Available from http://qgis.org [accessed 17 November 2021].
R Core Team. 2020. R: a language and environment for statistical computing (v4.0.0). R Foundation for Statistical Computing, Vienna, Austria. Available from https://www.r-project.org/ [accessed 5 July 2021].
Rasmussen, C.E., and Williams, C.K.I. 2006. Gaussian processes for machine learning. MIT Press, Cambridge, Mass.
Rex F.E., Silva C.A., Corte A.P.D., Klauberg C., Mohan M., Cardil A., et al. 2020. Comparison of statistical modelling approaches for estimating tropical forest aboveground biomass stock and reporting their changes in low-intensity logging areas using multi-temporal LiDAR data. Remote Sens. 12(9): 1498.
Roberts D.R., Bahn V., Ciuti S., Boyce M.S., Elith J., Guillera-Arroita G., et al. 2017. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8): 913–929.
Shin J., Temesgen H., Strunk J.L., and Hilker T. 2016. Comparing modeling methods for predicting forest attributes using lidar metrics and ground measurements. Can. J. Remote Sens. 42(6): 739–765.
Silva C.A., Klauberg C., Hudak A.T., Vierling L.A., Jaafar W.S.W.M., Mohan M., et al. 2017. Predicting stem total and assortment volumes in an industrial Pinus taeda L. forest plantation using airborne laser scanning data and random forest. Forests, 8(7): 254.
Simon N., Friedman J., Hastie T., and Tibshirani R. 2011. Regularization paths for Cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39(5): 128–129.
Ståhl G., Saarela S., Schnell S., Holm S., Breidenbach J., Healey S.P., et al. 2016. Use of models in large-area forest surveys: comparing model-assisted, model-based and hybrid estimation. For. Ecosyst. 3(1): 5.
Strunk J., Temesgen H., Andersen H.-E., Flewelling J.P., and Madsen L. 2012. Effects of lidar pulse density and sample size on a model-assisted approach to estimate forest inventory variables. Can. J. Remote Sens. 38(5): 644–654.
Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1): 267–288.
Vapnik, V.N. 2000. The nature of statistical learning theory. 2nd ed. Springer, New York, NY.
Varvia P., Lahivaara T., Maltamo M., Packalen P., and Seppanen A. 2019. Gaussian process regression for forest attribute estimation from airborne laser scanning data. IEEE Trans. Geosci. Remote Sens. 57(6): 3361–3369.
Vastaranta M., Holopainen M., Yu X., Haapanen R., Melkas T., Hyyppä J., and Hyyppä H. 2011. Individual tree detection and area-based approach in retrieval of forest inventory characteristics from low-pulse airborne laser scanning data. Photogramm. J. Finl. 22(2): 1–13.

Supplementary Material

Supplementary data (cjfr-2021-0192suppla.docx)

Information & Authors

Information

Published In

cover image Canadian Journal of Forest Research
Canadian Journal of Forest Research
Volume 52Number 3March 2022
Pages: 385 - 395

History

Received: 5 July 2021
Accepted: 10 November 2021
Accepted manuscript online: 18 November 2021
Version of record online: 18 November 2021

Permissions

Request permissions for this article.

Key Words

  1. LiDAR
  2. machine learning
  3. remote sensing
  4. area-based approach
  5. sampling size

Mots-clés

  1. LiDAR
  2. apprentissage machine
  3. télédétection
  4. approche territoriale
  5. taille de l’échantillon

Authors

Affiliations

Diogo N. Cosenza [email protected]
Forest Research Centre, School of Agriculture, University of Lisbon, Tapada da Ajuda, Lisbon 1349-017, Portugal.
Petteri Packalen
Natural Resources Institute Finland (Luke), Latokartanonkaari 9, Helsinki FI-00790, Finland.
Matti Maltamo
School of Forest Sciences, University of Eastern Finland, Joensuu 80101, Finland.
Petri Varvia
School of Forest Sciences, University of Eastern Finland, Joensuu 80101, Finland.
Janne Räty
Norwegian Institute of Bioeconomy Research (NIBIO), Division of Forest and Forest Resources, National Forest Inventory, Høgskoleveien 8, Ås 1433, Norway.
Paula Soares
Forest Research Centre, School of Agriculture, University of Lisbon, Tapada da Ajuda, Lisbon 1349-017, Portugal.
Margarida Tomé
Forest Research Centre, School of Agriculture, University of Lisbon, Tapada da Ajuda, Lisbon 1349-017, Portugal.
Jacob L. Strunk
USDA Forest Service, Pacific Northwest Research Station, 3625 93rd Avenue SW, Olympia, WA 98512, USA.
Lauri Korhonen
School of Forest Sciences, University of Eastern Finland, Joensuu 80101, Finland.

Funding Information

:
This research was funded by the Forest Research Centre, a research unit funded by Fundação para a Ciência e a Tecnologia I.P. (FCT), Portugal (grant No. UIDB/00239/2020). The research activities of Diogo N. Cosenza were supported by Fundação para Ciência e Tecnologia I.P. (FCT) (grant No. PD/BD/128489/2017). Petteri Packalen was supported by the Academy of Finland through the project Unmanned Aerial Vehicles in Forest Remote Sensing (grant No. 323484) under the UNITE flagship ecosystem (grant No. 337655).

Metrics & Citations

Metrics

Other Metrics

Citations

Cite As

Export Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

1. Deep learning algorithms for addressing overfitting and biological realism in tree taper and volume predictions
2. Effects of model-overfit on model-assisted forest inventory in boreal forests with remote sensing data
3. Corrosion behavior prediction for hull steels under dynamic marine environments by jointly utilizing LSTM network and PSO-RF model
4. Remote sensing in forestry: current challenges, considerations and directions
5. Comparison of Semi-Physical and Empirical Models in the Estimation of Boreal Forest Leaf Area Index and Clumping With Airborne Laser Scanning Data
6. A systematic review of remote sensing and machine learning approaches for accurate carbon storage estimation in natural forests
7. Improving crop modeling in saline soils by predicting root length density dynamics with machine learning algorithms
8. Unmanned aerial vehicle (UAV) imaging and machine learning applications for plant phenotyping
9. Novel Features of Canopy Height Distribution for Aboveground Biomass Estimation Using Machine Learning: A Case Study in Natural Secondary Forests
10. Stand validation of lidar forest inventory modeling for a managed southern pine forest
11. Enhancing Forest Attribute Prediction by Considering Terrain and Scan Angles From Lidar Point Clouds: A Neural Network Approach
12. Silvicultural experiment assessment using lidar data collected from an unmanned aerial vehicle
13. Silvicultural Experiment Assessment Using Lidar Data Collected from Unmanned Aerial Vehicle

View Options

Login options

Check if you access through your login credentials or your institution to get full access on this article.

Subscribe

Click on the button below to subscribe to Canadian Journal of Forest Research

Purchase options

Purchase this article to get full access to it.

Restore your content access

Enter your email address to restore your content access:

Note: This functionality works only for purchases done as a guest. If you already have an account, log in to access the content to which you are entitled.

View options

PDF

View PDF

Full Text

View Full Text

Figures

Tables

Media

Share Options

Share

Share the article link

Share on social media