PARAMETER ESTIMATION FOR AN ALLOMETRIC FOOD WEB MODEL

: The application of mechanistic models to natural systems is of interest to ecological researchers. We use the mechanistic Allometric Trophic Network (ATN) model, which is well-studied for controlled and theoretical systems, to describe the dynamics of the aphid Rhopalosiphum padi in an agricultural ﬁeld. We diagnose problems that arise in a ﬁrst attempt at a least squares parameter estimation on this system, including formulation of the model for the inverse problem and information content present in the data. We seek to establish whether the ﬁeld data, as it is currently collected, can support parameter estimation for the ATN model.


Introduction
The effects of increased biological diversity on species and ecosystems has long been a central area of focus in ecology.Early biodiversity-stability debates [21,30,36] have given rise to more general explorations of how biodiversity affects ecosystem functioning and services [27,1,31,29].One critical ecosystem service, biological control, has been the subject of intense scrutiny for decades, though often in a simplified way that focuses on single species or a handful of species interactions within only one trophic level [18,28,34].A broader understanding of how predator richness affects pest suppression remains elusive [26]; models of multi-species interactions in a food-web context across several trophic levels indicate that biological control can be both positively and negatively affected by biological diversity [35].We present here a predictive, mechanistic approach to modelling multiple species interactions that allows us to investigate the link between biological control and biodiversity.We use an Allometric Trophic Network (ATN) model, a dynamic food web model which assumes a general relationship between trophic interaction strengths and the body size of predator and prey.The strength of the ATN model, particularly compared to standard Lotka-Volterra and matrix models, is that in parameterizing interactions by body mass, we reduce the number of parameters which must be identified to compute a model solution.The ATN model is well-studied in theoretical ecology [14,32,11] and has been applied to controlled microcosms [25,33].The model has also been used to describe the average seasonal behaviors of an aquatic food web [12].Multivariate autoregressive (MAR) models [19,23] have also been successfully applied to aquatic food webs [24,22] and can be used with model comparison tests to identify the dominant interactions in a foodweb.Despite having a strong statistical framework which accommodates process-driven deviation from the model, these MAR models only support linear species interactions.In comparison, the ATN model is formulated with a Holling-type functional response [33] and therefore supports dynamic species interactions which can change with availability of alternative prey.In order to test the robustness of our approach, we validate the model by fitting field data to our model.
We consider the use of the ATN model to describe the dynamics of the aphid Rhopalosiphum padi, an agricultural pest, subject to population loss by the naturally-occurring community of predators in a single-season barley field.In initial efforts at solving a least squares inverse problem for this model using observational data, we found that the model's complexity, when paired with the sparsity of available data, led to numerical difficulties and poor fits to data [2].
Here, we investigate the limitations present in this first parameter estimation.We explore the statistical model for observational error in the data, formulation of the mathematical model used to describe the system, and the information content necessary to estimate the parameters in the ATN model.Revisiting the underlying statistical and mathematical models used in the formulation of the inverse problem is an expected step in the iterative modelling process [10] and allows us to understand the conditions under which the parameters for the study system and model can be estimated with available field data.

ATN Model
We have data for the arthropod populations of ten barley fields in Uppland, Sweden, for six weeks from late May to early July, 2011.The arthropods are divided into 15 groups, which we will call nodes, to form the basis for the food web in the barley fields.See [2] for a description of the experimental design and assumptions used in the formulation of the food web.
We attempt to model the dynamics of the bird cherry-oat aphid, Rhopalosiphum padi, with an Allometric Trophic Network (ATN) model.The ATN model assumes Lotka-Volterra dynamics with a Holling Type-II response, where parameters are assumed to depend on the body masses of interacting species [33].The dynamics of the aphid population, N 1 , are given by where N i is the population density of node i, C i is the set of consumers for node i, R i is the set of prey for node i, a ij is the attack rate of node j on node i, c j is the coefficient of intraspecific interference competition for node j, h ij is the handling time for one individual of node j to catch and consume one individual of node i.Here, r(T (t)) is the temperature-dependent growth rate of the aphid population, r(T (t)) = .024T(t) − .089.
The time-varying temperature, T (t), is given by observational data.For all j = 1, the density N j is an input to the system from observational data.
The body mass-dependent parameters are , where W i is the body mass of node i, R opt is the optimal body mass ratio between predator and prey, φ scales the dependency of a successful attack on predator-prey body mass ratio, and a 0 , h 0 , and c 0 are normalizing constants.The body masses W i are known, but we must estimate the parameter set q = [a 0 , h 0 , c 0 , R opt , φ].We note that the parameters a 0 , h 0 , c 0 , and φ do not have a simple, physical meaning; experimentally determining empirical values for these parameters would be challenging, in particular for realistic predator-prey species compositions.Additionally, we expect that some of these parameters will vary between study systems; for example, the optimal predator-prey body mass ratio has been found to vary between predator types [13], so we can expect R opt to vary between systems.For theoretical studies, the parameters for the ATN model have historically been generalized from metabolic theory and known allometric relationships across many species and ecosystems [14,15,37].In empirical studies, parameters are hand-selected by choosing the best-fitting parameters to a data set over many simulations [33,25].We use a least squares parameter estimation to find the parameters for the system.

Formulation of the Inverse Problem
Following [9,10], we assume a relative error statistical or observational model for Y j the observation of our system at a point t j and θ 0 = [q 0 , y 0 ] T , the parameter set that we must estimate.Here f (t j ; θ 0 ) is the solution N 1 (t j ; q 0 , y 0 ) of (1) at point t j with nominal parameters q 0 and initial condition y 0 , γ is a constant weight that specifies the parameters of the error's distribution, and E j is independent, identically distributed random noise assumed to be N (0, 1).
For an inverse problem with n observations, the iterative weighted least squares (IWLS) (also referred to as generalized least squares (GLS)) cost functional gives an estimator Θ of the parameter θ 0 defined by where Ω is the space of admissible parameters.Our experiences [4,3,5,6] with other ecological and population modeling efforts involving population counts have suggested that some type of error process depending on the size of the population is likely to be most appropriate.This motivates our choice of statistical model in (2).The collected data N j 1 is a realization of Y j , for ǫ j a realization of E j .Then a realization of the minimizing parameter is given by θ = arg min The weight γ is unknown and can generally be identified using the method of residual plots, outlined in [10,9].From the statistical model, we know that E j = [Y j − f (t j ; θ 0 )]/f γ (t j ; θ 0 ) and that the realizations of E j should be independent and identically distributed as random noise.That is, we expect a plot of the modified residuals will be randomly distributed for an appropriate choice of γ.
However, the use of residual plots to determine the statistical model tacitly assumes that our mathematical model accurately describes the system dynamics.Initial treatment of the inverse problem in [2] suggested that the ATN model cannot capture some aspects of aphid population dynamics.Therefore, we first use difference-based pseudo-measurement errors, as described in [7], to test our statistical model without tacitly assuming a mathematical model.We compute estimates of the measurement error, f γ (t j ; θ)ǫ j , from the observed N j 1 .We employ the second-order central differencing scheme for pseudo-measurement errors To find γ for use in our statistical model, we compute for different values of γ until we find the value that results in a random distribution of η j,γ when plotted against observations t j .We assume this resulting value of γ to facilitate the use of model comparison tests for our system.After assuming an appropriate mathematical model, we may return to the traditional method of residual plots to verify that our choice of γ is accurate.

Model Comparison
We will refer to the model given by the equation in (1) as Model 1.We will also consider the restriction of (1) to a parameter space where h 0 = c 0 = 0.This results in a linear functional response In this restriction, we note that there are only three constants, [a 0 , R opt , φ], and the initial condition N 0 1 to be estimated.We denote this as Model 2. The aphid population data exhibits unexpected population declines that cannot reasonably be attributed to the predators in the ATN model.Therefore, we also consider the addition of an as yet unknown source of mortality to the model.We assume that the mortality occurs over a full day and after the last data point before an unexpected population decline.That is, if the aphid population decreases by 25% or more between the observation collected at t j and the observation collected at t j+1 , then we denote the time T k = t j .We assume that the aphids suffer some extrinsic mortality factor µ over the interval We denote the set of all such times as M = k M k and add this discrete mortality term to ( 5) where χ M is the indicator function for the set M. We refer to this as Model 3, and it requires the specification of the constants [a 0 , R opt , φ, µ] as well as the initial condition N 0 1 .Note that Model 2 is a restriction of Model 3 to a parameter space where µ = 0.
Parameter estimation on Model 2 yields values of φ ∈ [.3, 2], so we also consider the restriction of (5) to a parameter space where φ = 1 This is equivalent to assuming that the sensitivity of a successful attack to predator-prey body mass ratio does not need to be tuned for the system.We will refer to this as Model 4, where the only parameters to estimate are the constants [a 0 , R opt ] and the initial condition N 1 0 .To test if the fit obtained by a particular model is a significant improvement over ones of its restrictions, we follow the methods developed in [8] and carefully summarized in [9,10].We denote J(Y, θ) to be the cost functional associated with a parameter θ and observations Y , that is, We let Θ 1 be the estimator of the nominal parameter θ 0 , as defined in (3), drawn from an admissible parameter space Ω 1 .That is, For any Ω 2 ⊂ Ω 1 , we let Θ 2 be the estimator of θ 0 drawn from an admissible parameter space Ω 2 , i.e., Θ 2 = arg min We test the null hypothesis that the nominal parameter exists in the constrained parameter space, or, that θ 0 ∈ Ω 2 .We define the test statistic Here, n is the number of data points used in the inverse problem.For realizations of Y j , denoted N1 = {N j 1 } n 1 , we let θ1 and θ2 be corresponding realizations of Θ 1 and Θ 2 , as defined in (4).Then we have a realization of the test statistic given by Ûn ( N1 ) = n J( N1 , θ1 ) − J( N1 , θ2 ) Given a realization Ûn ( N1 ), we reject the null hypothesis with (1 − α) • 100% certainty if Ûn ( N1 ) > τ α .Here τ α is specified by the critical value of the χ 2 (r) distribution, for r the additional degrees of freedom that the model associated with Ω 1 has over the restriction given by Ω 2 .We set the threshold for model rejection at 75%, since we are dealing with rather sparse experimental data sets.

Selecting γ
We compute pseudo-measurement errors for the aphid population densities in the ten fields where data was collected.We note that the data sets for all fields were sparse, with five or six observations for each field.After central differencing, we are left with three or four estimates of the pseudo-measurement error, and it is difficult to identify randomness in so small a set of points.However, the sparsity of the data is problematic even when using residuals from the model.As in [2], we cannot compare the mathematical model with data when crop quality deteriorates late in the season, and only four or five data points from each field can be used in residual analysis.
For some fields, we successfully identify the appropriate statistical model using pseudo-measurement errors.The values of η j,γ for Field JC are plotted in Figure 1, where we can see that γ = 0 gives a distribution that appears random, while γ = 1 and γ = 2 do not perform as well.Similar results were obtained in Fields JO, KC, KO, and OO, plotted in Figures 5, 6, 7, and 11 in the appendix.In the remaining fields, the data sparsity makes it difficult to choose an appropriate value of γ.For example, we plot the results from Field SO in Figure 2. No choice of γ appears to give a random distribution of η j,γ , so we cannot choose a value of γ for the statistical model.We reach the same conclusion for Fields MC, MO, OC, and SC (see Figures 8, 9, 10, and 12).
For simplicity, and having insufficient data to assume otherwise, we take γ = 0 and plot the solutions to the inverse problem with absolute error formulation in Figure 3.Although we do not necessarily expect that γ will be the same for all fields, we require additional data to identify the correct value for the statistical model.In summary, we find that γ = 0 is an appropriate choice for some fields, and do not have sufficient information to select a different appropriate value of γ in the remaining fields.

Model comparison test
We first compare the performance of Model 1, the full ATN model, to its restriction to a linear functional response in Model 2, which assumes that h 0 = c 0 = 0.
The results of the model comparison test are given in Table 1, while the model solutions are plotted in Figure 3.The least squares minimization for the full model is to take h 0 = c 0 = 0 in almost all fields, and so there is little variation between the solutions to Models 1 and 2. As expected, we see in Table 1 a failure to reject the null hypothesis, that the solution is contained in the restriction given by Model 2.
Although we fail to show that the nominal parameters do not satisfy h 0 = c 0 = 0, we cannot readily conclude from this test that the functional response in the ATN model should be linear.Our data set is sparse, and there may be unobserved dynamics which require specification of h 0 , c 0 = 0.In other words, the current data do not support with strong statistical significance the inclusion of the more complicated dynamics of Model 1 over the simpler dynamics of Model 2. For these data sets, we take h 0 = c 0 = 0, having no statistical motivation to do otherwise, but we must revisit the nonlinear formulation of the functional response if we acquire new, hopefully more extensive, data sets.
We consider the addition of a mortality term by asserting the null hypothesis that the nominal parameters are contained in the restriction given by Model 2 instead of the model with mortality, Model 3. The results from the comparison test, for the six fields that satisfy the requirement that a 25% or greater  population decline occur between observations, are given in Table 2.We reject the model without mortality with 99.9% confidence in all but Field OC.We conclude that in Fields JC, JO, KO, OO, and SC, the data supports the estimation of some additional mortality event in the aphid population.We last assert the null hypothesis that the nominal parameters are contained in the restriction given by Model 4, that is φ = 1, instead of Model 2. The results from the model comparison test are given in Table 3, and we reject Model 4 with 75% or higher certainty in fields except JC and JO.In these two exceptions, the estimated value of φ is close to 1, and so it is not surprising that the comparison test fails.We conclude that there is sufficient information Figure 3: Least squares solutions for Models 1-4 with γ = 0. Data is plotted with a star marker with error bars to indicate the standard deviation in the data and the horizontal line indicates the cutoff date for data to be used in the inverse problem.See Table 5 in the appendix for a list of the parameters used in each field.in the data set to justify the estimation of φ in the ATN model.

ATN solution with additional data
For all fields, we find that the available data supports the estimation of the ATN model with a linear functional response; the inclusion of h 0 and c 0 as unknown parameters in the inverse problem seldom adds value to the estimates.Additionally, the numerical minimization necessary for the inverse problem using the full model is sensitive to the initial iterate supplied to the program.We only find that h 0 = c 0 = 0 by starting sufficiently close to the solution, with initial iterates for each parameter on the order of 10 −4 or 10 −3 across fields.The   physically admissible range for each parameter is h 0 ∈ [0, 0.42] and c 0 ∈ [0, 0.24] when converted to the units for our system [25].With the current data, the inverse problem for the full ATN model is too sensitive to initial conditions to be tractable.We conjecture that with data collected at a higher sampling rate, the problem could be solved with less sensitivity to the initial conditions of the numerical minimizer.
We consider this assumption for Field MC, where the aphid population does not experience a mortality event and is well-fit by the inverse problem using the reduced ATN model with initial iterate a 0 = .8,R opt = 150, and φ = 1, yielding parameters a 0 = 0.4207, R opt = 160, and φ = 0.8570.We construct a synthetic data set with these parameters and solve the inverse problem for the full model, using initial iterate a 0 = .4,R opt = 160, and φ = 0.8, with an increasing number of data points added between each existing observation.The resulting solutions are plotted in Figure 4 and the parameters used to generate the synthetic data, initial iterate for the inverse problem using the synthetic data, and resulting parameter estimates are given in Table 4 The inverse problem for the full ATN model in Field MC requires five additional observations between each existing point to visually fit the data, but the solution does not give the same parameters as we used to generate the synthetic data.This rate of sampling would require population measurements approximately three times a week, where a single measurement of the aphid population requires a sweepnet count of the aphids on 100 barley tillers.This would be a significant undertaking, and even then we do not claim that this rate of data collection would support parameter estimation for the full ATN model in future experiments.We only demonstrate that the current limitations of the inverse problem on the full ATN model can be remedied by increased collection of data; we can not readily generalize these results to a statement about the amount of data necessary to estimate the parameters for the full ATN model in a given study system.

Conclusion
The results presented here are an important first step in developing more accurate models for describing the dynamics of species interactions in food webs.Despite recent calls for better integration of food web ecology and ecosystem functioning and services [20], there has been little theory developed to date [20,16,17].Agroecosystems, with their simplified dynamics and species diversity, are an ideal system for testing these models especially for identifying and establishing the conditions under which the parameters for the dynamic models can be estimated with some degree of confidence.The present results show that although the inverse problem was not tractable as originally formulated for the ATN model, this can be ameliorated by reducing the number of parameters to be estimated or increasing data available for least squares

Figure 1 :
Figure 1: Values of η j,γ are plotted against time for γ = 0, 1, and 2 for data from field JC.

Figure 2 :
Figure 2: Values of η j,γ are plotted against time for γ = 0, 1, and 2 for data from field SO.

Figure 3 :
Figure 3: (Continuation) Least squares solutions for Models 1-4 with γ = 0. Data is plotted with a star marker with error bars to indicate the standard deviation in the data and the horizontal line indicates the cutoff date for data to be used in the inverse problem.See Table5in the appendix for a list of the parameters used in each field.

Figure 4 :Figure 5 :Figure 6 :Figure 7 :
Figure 4: The solution to the inverse problem using the full ATN model with the original data (left) and a synthetic data set with increased sampling (right) is plotted with a solid line.The data used in the inverse problem is plotted with a star marker.

Figure 8 :
Figure 8: Values of η j,γ are plotted against time for γ = 0, 1, and 2 for data from field MC.

Figure 9 :
Figure 9: Values of η j,γ are plotted against time for γ = 0, 1, and 2 for data from field MO.

Figure 10 :
Figure 10: Values of η j,γ are plotted against time for γ = 0, 1, and 2 for data from field OC.

Figure 11 :
Figure 11: Values of η j,γ are plotted against time for γ = 0, 1, and 2 for data from field OO.

Figure 12 :
Figure 12: Values of η j,γ are plotted against time for γ = 0, 1, and 2 for data from field SC.

Table 1 :
Realizations of the test statistic for Model 2 compared to Model 1.

Table 2 :
Realizations of the test statistic for Model 2 compared to Model 3.
Table 5 in the appendix for a list of the parameters used in each field.

Table 3 :
Realizations of the test statistic for Model 4 compared to Model 2.

Table 4 :
. Parameter values and initial iterates used in the inverse problem on synthetic data.