This text was copied from Chapter 18 of:
- The World of Zero-Inflated Models. Volume 3: Using GLLVM. Zuur AF and Ieno EN. Exclusively available at: www.highstat.com
First encounter
This blog introduces generalised linear latent variable models (GLLVMs) for analysing multiple response variables within a single model. GLLVM combines the principles of generalised linear models (GLMs) or generalised linear mixed-effects models (GLMMs) with classical multivariate analysis techniques like principal component analysis, factor analysis, and redundancy analysis. The method imposes dependency between the response variables.
Figure 1 illustrates the four key components of a basic GLLVM.
Suppose that we are studying the distribution and interaction of squirrel species across 100 forest patches. In each patch, we record the abundance of five squirrel species: the Eastern Gray Squirrel, the American Red Squirrel, the Fox Squirrel, the Flying Squirrel, and the Ground Squirrel. The resulting \(100 \times 5\)
abundance data matrix is shown in the top left of Figure 1.
At a basic level, we could analyse each species separately using simple GLMs. However, nature is complex. These species do not exist in isolation—they might interact, compete for resources, or respond to shared environmental factors. A more comprehensive approach is the GLLVM, whose components are outlined below (and shown in Figure 1):
-
Unique patch characteristics (random effects): Some forest patches might naturally support more squirrels. Perhaps they offer a perfect mix of resources or are safe havens with fewer predators. This variability between patches can be captured using a random effect, which accounts for unmeasured characteristics specific to each patch that influence squirrel abundance.
-
Species-specific baselines (intercepts): Each species has its own baseline abundance, represented by the intercept in the model. For instance, the nocturnal Flying Squirrel might generally be less abundant.
-
Environmental effects (covariates): Environmental factors like tree density, rainfall, or proximity to roads can influence squirrel abundance. Some species may be more sensitive to these covariates than others, and the model allows for these species-specific responses.
-
Shared variation and correlation (Latent factors): This is the defining feature of GLLVMs. Latent factors represent unmeasured variables that affect multiple species simultaneously and, in doing so, introduce correlations between species. For example, the presence of a predator might lead to reduced coexistence of Gray and Red Squirrels, creating a negative correlation between their abundances. There could be one latent factor, two, or more, depending on the complexity of the data. Each latent factor impacts species differently, capturing subtle interdependencies.
Using a GLLVM provides more than just individual species analyses. It offers a holistic view of the squirrel community across the forest patches. The model not only detects covariate effects but also uncovers relationships between species. For instance, Fox Squirrels might thrive in patches where Ground Squirrels are scarce, or Gray and Flying Squirrels might frequently coexist. By applying a GLLVM, we gain insights into not just where squirrels are found, but also why they are there and how they influence each other’s presence.
A GLLVM is essentially a GLM(M) that incorporates not only covariates but also latent variables, similar to the factors in factor analysis. The software estimates both the regression parameters and the latent variables (along with their loadings). These types of models have been around for quite some time. For ecological applications, one can refer to works such as Kooijman (1977), Gauch (1982), and Braak (1985). The latter paper demonstrates how methods like correspondence analysis and canonical correspondence analysis approximate certain parameters of these models. More recent references on GLLVMs in ecology include Warton et al. (2015), Hui et al. (2015), Niku et al. (2019), and Veen et al. (2023).
The widespread application of GLLVMs was historically limited by the lack of user-friendly and efficient software. However, recent advancements have introduced software capable of fitting these models rapidly. In this book, we will primarily use the gllvm
package (Niku et al. 2024) to implement GLLVMs. This package is discussed in detail in Niku et al. (2019). We also recommend exploring the GitHub repository of the first author of that paper (https://github.com/JenniNiku/gllvm) for examples and discussions. Please note that the development of the gllvm
package is ongoing, and updates to the code may lead to changes in functionality or results over time. Users are encouraged to regularly check the repository for the latest updates and documentation.
Dynamic factor analysis
It is worth noting that the lead author of this book has published three papers on a related method called dynamic factor analysis (A. F. Zuur et al. 2003; A. F. Zuur, Tuck, and Bailey 2003; A. F. Zuur and Pierce 2004). These models estimate covariate effects alongside latent variables, with the added constraint that latent variables are auto-correlated over time. However, modern approaches to parameter estimation are far superior to the earlier applications of these models.
References
Braak, Cajo J. F. ter. 1985. “Correspondence Analysis of Incidence and Abundance Data: Properties in Terms of a Unimodal Response Model.” Biometrics 41 (4): 859–73. https://doi.org/10.2307/2530959.
Gauch, Hugh G. 1982. Multivariate Analysis in Community Ecology. Cambridge Studies in Ecology. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511623332.
Hui, Francis K. C., Sara Taskinen, Shirley Pledger, Scott D. Foster, and David I. Warton. 2015. “Model-Based Approaches to Unconstrained Ordination.” Methods in Ecology and Evolution 6 (4): 399–411. https://doi.org/10.1111/2041-210X.12236.
Kooijman, S. A. L. M. 1977. “Species Abundance with Optimum Relations to Environmental Factors.” In Annals of Systems Research: VOLUME 6,1977, edited by B. Van Rootselaar and H. Koppelaar, 123–38. Annals of Systems Research. Boston, MA: Springer US. https://doi.org/10.1007/978-1-4613-4074-4_7.
Niku, Jenni, Wesley Brooks, Riki Herliansyah, Francis K. C. Hui, Pekka Korhonen, Sara Taskinen, Bert van der Veen, and David I. Warton. 2024. Gllvm: Generalized Linear Latent Variable Models. https://jenniniku.github.io/gllvm/.
Niku, Jenni, Francis K. C. Hui, Sara Taskinen, and David I. Warton. 2019. “Gllvm: Fast Analysis of Multivariate Abundance Data with Generalized Linear Latent Variable Models in r.” Methods in Ecology and Evolution 10 (12): 2173–82. https://doi.org/10.1111/2041-210X.13303.
Veen, Bert van der, Francis K. C. Hui, Knut A. Hovstad, and Robert B. O’Hara. 2023. “Concurrent Ordination: Simultaneous Unconstrained and Constrained Latent Variable Modelling.” Methods in Ecology and Evolution 14 (2): 683–95. https://doi.org/10.1111/2041-210X.14035.
Warton, David I., F. Guillaume Blanchet, Robert B. O’Hara, Otso Ovaskainen, Sara Taskinen, Steven C. Walker, and Francis K. C. Hui. 2015. “So Many Variables: Joint Modeling in Community Ecology.” Trends in Ecology & Evolution 30 (12): 766–79. https://doi.org/10.1016/j.tree.2015.09.007.
Zuur, A F, I D Tuck, and N Bailey. 2003. “Dynamic Factor Analysis to Estimate Common Trends in Fisheries Time Series.” Canadian Journal of Fisheries and Aquatic Sciences 60 (5): 542–52. https://doi.org/10.1139/f03-030.
Zuur, A. F., R. J. Fryer, I. T. Jolliffe, R. Dekker, and J. J. Beukema. 2003. “Estimating Common Trends in Multivariate Time Series Using Dynamic Factor Analysis.” Environmetrics 14 (7): 665–85. https://doi.org/10.1002/env.611.
Zuur, A. F., and G. J. Pierce. 2004. “Common Trends in Northeast Atlantic Squid Time Series.” Journal of Sea Research 52 (1): 57–72. https://doi.org/10.1016/j.seares.2003.08.008.