Estimation of effective sample size for catch-at-age and catch-at-length data using simulated data from the Dirichlet-multinomial distribution
The incorporation of ‘effective sample size’ (ESS) in integrated assessments is an approximate but simple way of modelling the distribution of catch-at-age or catch-at-length frequencies using a multinomial likelihood when there is extra-multinomial heterogeneity. Accurate estimation of ESS for catch-frequency data for each fishery and fishing year is important for such assessments, and this issue is studied using simulation. Between-haul heterogeneity within fishing year was simulated using samples from the Dirichlet-multinomial (D-M) distribution, with marginal class probabilities generated using a simple age-structured model incorporating fishing selectivity. Four methods of estimation of effective sample size were compared using this simulation model and its variants. One of the methods is based on the lack-of-fit of predictions of class probabilities using aggregate year-level frequencies. The other three estimators use the haul-level frequencies, including a method based on an approximate profile maximum likelihood estimate (PMLE) of the D-M dispersion parameter. The remaining two estimators based on haul-level frequencies are derived from models for the empirical coefficient of variation (CV) in the proportions, with one being based on an existing CV model used for CCAMLR fisheries while the other is a new method. The methods that use haul-level frequencies gave accurate estimators of an ESS that is appropriate for haul-level heterogeneity with increasing accuracy in the following order: (i) the estimator based on the existing CV model; (ii) that based on the new CV model; and (iii) that based on the PMLE. The year-level method gave very inaccurate estimates of this ESS with relative mean square error two orders of magnitude worse than the best haul-level method.
To account for process error in the calculation of the ESS, the lack of fit of the age-structured model in predicting class/bin by year frequencies is used to obtain a single, across-years, over-dispersion parameter. The ESS is then rescaled by dividing by the over-dispersion parameter, and the model refitted, giving a two-step iterative procedure. The ESS will be over-corrected if there is a systematic component to the lack of fit. A simple generic model of systematic lack-of-fit (SLOF) is presented, and its performance, in terms of providing unbiased estimates of ESS when SLOF is either present or absent, is studied using perturbations of the age-structured model. These perturbations consisted of either systematic or random variation across years in one of the selectivity function parameters and similarly for the mortality rate parameter when combined with systematic or random variation in recruitment. The SLOF model substantially reduced the bias when SLOF was present and is useful when its source is not clear or cannot be rectified by changing the underlying age-structured assessment model.