Addressing the Uncertainty Due to Random Measurement Errors in Quantitative Analysis of Microorganism and Discrete Particle Enumeration Data

The concentration of microorganisms (or discrete particles) in water is often evaluated by enumeration: the count obtained from a sample of specified volume is used to estimate the concentration. There are, however, several sources of random variability associated with the process of collecting enumeration data that can cause the count per unit volume to be a biased concentration estimate and that will make it an imprecise estimate. The actual concentration that is estimated using the available data is, therefore, uncertain. The error in concentration estimates is described as measurement error because the concentration cannot be measured exactly. Measurement error may include both unavoidable random errors in sample collection (e.g. randomness in the number of microorganisms contained in a sample) and analytical errors in counting the microorganisms in the sample (e.g. imperfect analytical recovery due to losses during sample processing or counting errors). To calibrate concentration estimates to actual microorganism concentrations, the count per unit volume must be divided by either the mean analytical recovery of the enumeration method or a sample-specific recovery estimate. Accordingly, appropriate analysis of enumeration data is dependent upon information about the analytical recovery of the enumeration method that is used.

The analytical recovery of an enumeration method is evaluated by seeding samples with known quantities of microorganisms or particles and then comparing the number that are observed in the enumerated sample to the number that were seeded. The fraction of seeded microorganisms that are observed, however, is an imprecise estimate of analytical recovery (particularly if the number of microorganisms seeded into the sample is not precisely known). It is demonstrated in this thesis that the standard deviation of such recovery estimates will be greater than the standard deviation of analytical recovery itself because of measurement error in the recovery estimates. Accordingly, the effect of the seed dose (and the precision thereof) upon the precision of the recovery estimates must be addressed in experiments that are used to quantify analytical recovery (and the variability therein). Additionally, approaches that are used to analyze recovery data must appropriately address the measurement error associated with recovery estimates. Probabilistic models are developed herein to describe the variability in recovery estimates as a function of the seed dose and the variability in analytical recovery itself. These models are used to facilitate analysis of alternative recovery experiment designs so that experiments can be designed to yield adequately precise estimates of analytical recovery (or the mean and standard deviation thereof). Additionally, the probabilistic models are used to develop statistical tools that enable analysis of recovery data with appropriate regard for measurement errors. Direct use of conventional hypothesis tests and confidence intervals to analyze recovery estimates is often inappropriate because recovery estimates are often not normally distributed and may have non-constant error. Moreover, it is demonstrated in this thesis that such statistical tools will yield biased estimates of the standard deviation of analytical recovery (due to the effects of measurement error) and this will reduce the power of hypothesis tests to classify an obtained difference between the mean (or the difference between two means) and the null hypothesis as statistically significant. It is imperative to use statistical tools that enable appropriate analysis of the available recovery data because proper analysis of microorganism concentration data depends upon appropriate quantification of analytical recovery.

In this thesis, a statistical framework (using probabilistic modelling and Bayes’ theorem) is developed to enable appropriate analysis of microorganism concentration estimates given information about analytical recovery and knowledge of how various random errors in the enumeration process affect count data. This framework is used to address several problems: (1) estimation of a single concentration value and quantification of the uncertainty therein from single or replicate data (possibly including non-detect samples), (2) estimation of the log-reduction of a treatment process (and the uncertainty therein) that is estimated by comparing pre- and post-treatment concentrations, (3) quantification of random concentration variability over time from temporally distributed enumeration data, and (4) estimation of the sensitivity (i.e. probability that microorganisms will be detected) of enumeration processes given knowledge about the associated measurement errors and analytical recovery. Each of these problems is of interest in drinking water treatment and research, and in Quantitative Microbial Risk Assessment (QMRA).

Investigation of the contemporary strategies that are used to analyze temporally variable pathogen concentrations in Monte Carlo QMRA has revealed that measurement errors in concentration estimates and the analytical recovery of the enumeration method (if addressed at all) are often addressed improperly and in ways that will result in bias (e.g. over-estimated risks). In contrast, the Bayesian framework that is developed within this thesis is a robust and appropriate strategy to address variability in pathogen concentrations (and the measurement errors therein) in Monte Carlo QMRA. Estimation of the sensitivity of an enumeration-based detection method is useful in the context of water treatment, but it is also particularly important in the analysis of errors in medical and epidemiological diagnoses. A statistical approach is developed herein that uses information about the analytical recovery of the enumeration method (and not just the relative frequency of non-detects) to rigorously analyze sensitivity.

Probabilistic models that describe the sources of random error in the enumeration process are not only useful to develop appropriate quantitative analysis approaches; they can also be used to evaluate the design of experiments. Simple probabilistic models and variance decomposition are used herein to develop experimental design guidelines for recovery experiments and for collecting more reliable microorganism concentration estimates. In the latter case, it is demonstrated that sample volumes should be chosen such that samples will typically contain at least 10 microorganisms in order to obtain acceptably reliable concentration estimates. It is also demonstrated that improving the analytical recovery of enumeration methods (e.g. reducing losses or the variability in analytical recovery) does not always have an appreciable effect upon the precision of associated concentration estimates. Therefore, method development should focus on providing inexpensive, efficient, and convenient methods that enable enumeration of large sample volumes rather than upon small improvements in analytical recovery.

This research demonstrates that probabilistic modelling that addresses random measurement errors in the enumeration process is a powerful tool to facilitate appropriate quantitative analyses in many different applications that are important in the water treatment industry. It also enables evaluation of the design of experiments so that more informative data can be obtained using the available resources.