Modelling total diversity with betta

This function tests for heterogeneity of total diversity (observed plus unobserved) across multiple sites. It can account or test for fixed effects that may explain diversity. It returns the significance of the covariates in explaining diversity and a hypothesis test for heterogeneity.

betta(
  chats = NULL,
  ses,
  X = NULL,
  initial_est = NULL,
  formula = NULL,
  data = NULL
)

Arguments

chats	A vector of estimates of total diversity at different sampling locations. `breakaway` estimates are suggested in the high-diversity case but not enforced.
ses	The standard errors in `chats`, the diversity estimates. This can either be a vector of standard errors (with the arguments `chats` and `X`), or the name of the variable in the dataframe `data` that contains the standard errors (with the arguments `formula` and `data`).
X	A numeric matrix of covariates. If not supplied, an intercept-only model will be fit. This is optional with the `chats` argument.
initial_est	(Optional) A vector of length 1 + ncol(X) giving the starting values for the likelihood maximisation search. The first element is the starting estimate for sigma^2_u, and the remaining elements are the starting elements for beta. Defaults to NULL, in which case the starting values outlined in the paper are used.
formula	A formula object of the form \(y ~ x \| group\). Required with the `data` argument.
data	A dataframe containing the response, response standard errors, covariates, and grouping variable. Required with the `formula` argument.

Value

table

A coefficient table for the model parameters. The columns give the parameter estimates, standard errors, and p-values, respectively. This model is only as effective as your diversity estimation procedure; for this reason please confirm that your estimates are appropriate and that your model is not misspecified. betta_pic may be useful for this purpose.

cov

Estimated covariance matrix of the parameter estimates.

ssq_u

The estimate of the heterogeneity variance.

homogeneity

The test statistic and p-value for the test of homogeneity.

global

The test statistic and p-value for the test of model explanatory power.

blups

The conditional expected values of the diversity estimates (conditional on the random effects). The authors propose that if the practitioner believes that information from one diversity estimator may inform the others, then using the condfits as estimators of total diversity rather than Chats may reduce variance of diversity estimates by ``sharing strength'' across the samples.

blupses

The estimated standard deviation (standard errors) in the blups.

loglikelihood

The log likelihood of the fitted model.

aic

The Akaike information criterion for the fitted model.

aicc

The finite sample correction of the Akaike information criterion for the fitted model.

r_squared_wls

The weighted R^2 statistic, appropriate for heteroskedastic linear models.

function.args

A list containing values initially passed to betta_random.

Note

Ecologists who are interested in the way species richness varies with covariate information often run a regression-type analysis on the observed diversity using their covariate information as predictors. However, in many settings (especially microbial), rare and unobserved taxa play a hugely important role in explaining the subtleties of the ecosystem, however, a regression analysis on the observed diversity level fails to account for these unobserved taxa. By predicting the total level of diversity (for example, via breakaway) and estimating the standard error in the estimate, one can take account of these unobserved, but important, taxa. In order to account for the estimated nature of the response, a mixed model approach is taken, whereby the varying levels of confidence in the estimates contributes to a diagonal but heteroscedastic covariance matrix. Given covariates constitute the fixed effects in the mixed model, and significance of the random effect term ``sigsq_u'' reflects heterogeneity in the sample, that is, variability that cannot be explained by only the covariates. The authors believe this to be the first attempt at modelling total diversity in a way that accounts for its estimated nature.

References

Willis, A., Bunge, J., and Whitman, T. (2015). Inference for changes in biodiversity. arXiv preprint.

Willis, A. and Bunge, J. (2015). Estimating diversity via frequency ratios. Biometrics.

Author

Amy Willis

Examples


















df <- data.frame(chats = c(2000, 3000, 4000, 3000), ses = c(100, 200, 150, 180),
                 Cont_var = c(100, 150, 100, 50))

# formula notation
betta(formula = chats ~ Cont_var, ses = ses, data = df)
#> $table
#>                Estimates Standard Errors p-values
#> (Intercept) 2.995149e+03      416.145952        0
#> Cont_var    1.730997e-03        3.931275        1
#> 
#> $cov
#>             (Intercept)    Cont_var
#> (Intercept)  1575149.61 -14038.4223
#> Cont_var      -14038.42    140.5715
#> 
#> $ssq_u
#> [1] 666667.8
#> 
#> $homogeneity
#> [1] 143.9291   0.0000
#> 
#> $global
#> [1] 5.180787e+01 6.120660e-13
#> 
#> $blups
#> [1] 2014.709 2999.740 3967.199 2999.779
#> 
#> $blupses
#> [1]  99.44868 198.55147 148.15536 178.94475
#> 
#> $loglikelihood
#> [1] -32.03712
#> 
#> $aic
#> [1] 70.07423
#> 
#> $aicc
#> [1] Inf
#> 
#> $r_squared_wls
#> [1] -4.376681e-05
#> 
#> $function.args
#> $function.args$chats
#>    1    2    3    4 
#> 2000 3000 4000 3000 
#> 
#> $function.args$ses
#> [1] 100 200 150 180
#> 
#> $function.args$X
#>   (Intercept) Cont_var
#> 1           1      100
#> 2           1      150
#> 3           1      100
#> 4           1       50
#> attr(,"assign")
#> [1] 0 1
#> 
#> $function.args$initial_est
#> [1] 6.666667e+05 3.000000e+03 7.105427e-15
#> 
#> $function.args$formula
#> chats ~ Cont_var
#> <environment: 0x7f87bd4ae270>
#> 
#> $function.args$data
#>   chats ses Cont_var
#> 1  2000 100      100
#> 2  3000 200      150
#> 3  4000 150      100
#> 4  3000 180       50
#> 
#> $function.args$model_type
#> [1] "fixed"
#> 
#> 

# direct input
betta(c(2000, 3000, 4000, 3000), c(100, 200, 150, 180), cbind(1, c(100, 150, 100,
    50)))
#> $table
#>         Estimates Standard Errors p-values
#> [1,] 2.995149e+03      416.145952        0
#> [2,] 1.730997e-03        3.931275        1
#> 
#> $cov
#>            [,1]        [,2]
#> [1,] 1575149.61 -14038.4223
#> [2,]  -14038.42    140.5715
#> 
#> $ssq_u
#> [1] 666667.8
#> 
#> $homogeneity
#> [1] 143.9291   0.0000
#> 
#> $global
#> [1] 5.180787e+01 6.120660e-13
#> 
#> $blups
#> [1] 2014.709 2999.740 3967.199 2999.779
#> 
#> $blupses
#> [1]  99.44868 198.55147 148.15536 178.94475
#> 
#> $loglikelihood
#> [1] -32.03712
#> 
#> $aic
#> [1] 70.07423
#> 
#> $aicc
#> [1] Inf
#> 
#> $r_squared_wls
#> [1] -4.376681e-05
#> 
#> $function.args
#> $function.args$chats
#> [1] 2000 3000 4000 3000
#> 
#> $function.args$ses
#> [1] 100 200 150 180
#> 
#> $function.args$X
#>      [,1] [,2]
#> [1,]    1  100
#> [2,]    1  150
#> [3,]    1  100
#> [4,]    1   50
#> 
#> $function.args$initial_est
#> [1] 6.666667e+05 3.000000e+03 7.105427e-15
#> 
#> $function.args$formula
#> NULL
#> 
#> $function.args$data
#> NULL
#> 
#> $function.args$model_type
#> [1] "fixed"
#> 
#> 

## handles missing data
betta(c(2000, 3000, 4000, 3000), c(100, 200, 150, NA))
#> Warning: At least one of your standard errors is 0 or NA. Any observation with
#>             a standard error of 0 or NA has been dropped from the analysis.
#> $table
#>      Estimates Standard Errors p-values
#> [1,]      3000        584.2429        0
#> 
#> $cov
#>          [,1]
#> [1,] 341339.7
#> 
#> $ssq_u
#> [1] 1e+06
#> 
#> $homogeneity
#> [1] 144.4444   0.0000
#> 
#> $global
#> [1] 26.36669  0.00000
#> 
#> $blups
#> [1] 2009.901 3000.000 3977.995       NA
#> 
#> $blupses
#> [1]  99.67172 197.39929 148.89651        NA
#> 
#> $loglikelihood
#> [1] -24.49984
#> 
#> $aic
#> [1] 52.99968
#> 
#> $aicc
#> [1] Inf
#> 
#> $r_squared_wls
#> [1] -2.220446e-16
#> 
#> $function.args
#> $function.args$chats
#> [1] 2000 3000 4000 3000
#> 
#> $function.args$ses
#> [1] 100 200 150  NA
#> 
#> $function.args$X
#>      [,1]
#> [1,]    1
#> [2,]    1
#> [3,]    1
#> [4,]    1
#> 
#> $function.args$initial_est
#> [1] 1e+06 3e+03
#> 
#> $function.args$formula
#> NULL
#> 
#> $function.args$data
#> NULL
#> 
#> $function.args$model_type
#> [1] "fixed"
#> 
#> 

## A test for heterogeneity of apples diversity estimates vs butterfly estimates
betta(c(1552, 1500, 884), c(305, 675, 205), cbind(1, c(0, 0, 1)))
#> $table
#>      Estimates Standard Errors p-values
#> [1,]      1526        294.0600     0.00
#> [2,]      -642        424.3689     0.13
#> 
#> $cov
#>           [,1]      [,2]
#> [1,]  166341.7 -166341.7
#> [2,] -166341.7  346430.7
#> 
#> $ssq_u
#> [1] 138064
#> 
#> $homogeneity
#> [1] 0.008750509 0.925471271
#> 
#> $global
#> [1] 1.833863e+01 1.849194e-05
#> 
#> $blups
#> [1] 1541.534 1519.954  884.000
#> 
#> $blupses
#> [1] 287.2854 451.5839 205.0000
#> 
#> $loglikelihood
#> [1] -21.63179
#> 
#> $aic
#> [1] 49.26357
#> 
#> $aicc
#> [1] 25.26357
#> 
#> $r_squared_wls
#> [1] 0.9951037
#> 
#> $function.args
#> $function.args$chats
#> [1] 1552 1500  884
#> 
#> $function.args$ses
#> [1] 305 675 205
#> 
#> $function.args$X
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    1    0
#> [3,]    1    1
#> 
#> $function.args$initial_est
#> [1] 138064   1526   -642
#> 
#> $function.args$formula
#> NULL
#> 
#> $function.args$data
#> NULL
#> 
#> $function.args$model_type
#> [1] "fixed"
#> 
#>