EVALUATING THE IMPORTANCE OF MITOTIC ASYMMETRY IN CYTON-BASED MODELS FOR FSE-BASED FLOW CYTOMETRY DATA

We carry out computational inverse problem investigations to determine the importance of allowing for unequal label allocation to daughter cells during mitosis. Parameter estimations are performed using previously developed label-structured partial differential equation models that utilize cytons to account for variability in times between cell divisions as well as times until cell death. These models are augmented to allow for asymmetric cell divisions through inclusion of an additional model parameter. We employ well-known model comparison tests to analyze CFSE-based flow cytometry data with respect to the importance of asymmetric division during proliferation of human CD4+ and CD8+ T cells.


Introduction
The adaptive immune response constitutes a major component of the mammalian immune system's defense against invading pathogens.B cells and T cells, which are the two classes of lymphocytes that comprise the adaptive immune system, recognize invaders by specific cell surface receptors and exert their responses through humoral and cellular effector mechanisms (i.e., through antibodies and cytotoxic T cells).As the success of the adaptive immune system in overcoming a perceived threat depends on the capacity of lymphocytes to proliferate, the ability to accurately predict cell proliferation dynamics in the presence of specific environmental stimuli has important implications for human health research in areas such as the treatment and prevention of infectious disease and immunosuppression for patients receiving organ and tissue transplants.Our recent efforts [2,3,4,6,7] have focused on estimating cell proliferation parameters and quantifying uncertainty in such estimates, but thus far we have always utilized mathematical models that are based upon the assumption of symmetric cell division.Here we consider how the incorporation of asymmetric cell division, which has been described as "almost an axiom of cell biology" [22], into our division-and label-structured partial differential equation models affects their ability to accurately describe data from CFSE dilution assays.
Over the last half-century, many mathematical models have been proposed that attempt to describe the dynamics of a proliferating population of lymphocytes, but until fairly recently it has been a challenge to validate such models.The discovery of the intracellular dye carboxyfluorescein succinimidyl ester (CFSE) was a major milestone in overcoming this obstacle.CFSE was originally developed as a tool for labeling lymphocytes so that their movements within animal subjects could be tracked over many months [24], but subsequently researchers determined that the dye could also be used to monitor lymphocyte proliferation [19].Also, through the use of fluorescently labeled antibodies specific to various lymphocyte surface markers, it is now possible to follow the proliferative behavior of specific types of lymphocytes [18].
For the interested reader, Kapraun provides a detailed explanation of how CFSE and flow cytometry can be used to produce cell proliferation data [16], but the basic idea behind flow cytometry cell proliferation experiments can be described as follows.Suppose we are able to label all cells in a population with a "dye" that enters the cells and becomes covalently linked to molecules within their cytoplasm.If we then stimulate the cells to divide, mitosis causes each "mother" cell to produce two "daughter" cells.Naturally, each of the daughter cells receives a portion of the cytoplasm that originally belonged to the mother, and therefore each of the daughter cells also receives a portion of the dye that was contained in the mother.Thus, as cell proliferation continues, those cells that have undergone more divisions tend to have less dye than those that have undergone fewer divisions.This allows one to use the total dye content of a cell as an indicator of the number of divisions it has undergone.In CFSE-based flow cytometry experiments, CFSE is the "dye" used to label the cells and flow cytometry is the means by which one can measure the total dye content of individual cells.
A flow cytometer is an instrument that is capable of quickly measuring various characteristics of large numbers of individual cells.Using hydrodynamic focusing, the instrument records the fluorescent intensity of individual cells (and possibly other particles, such as fluorescent count-scaling beads) into a single file.The cells (and in our case also some of the aforementioned beads) then pass one at a time through an interrogation point, where they are excited with a laser.During this process, the flow cytometer measures the fluorescence intensity (FI) at various wavelengths for each cell in the sample and counts the number of beads in the sample.FI observed at wavelengths in the range 515 to 545 nm corresponds to light emitted by CFSE [18,24], while FI observed at other wavelengths can indicate the presence of another labeling agent and can therefore be used to identify the type of cell.Because FI induced by CFSE varies directly with the mass of CFSE within a cell, the FI observed in the relevant range of wavelengths can be used as a surrogate for CFSE mass contained in a particular cell [18,24].
The output of a CFSE-based flow cytometry experiment can be summarized using histograms such as those depicted in Figure 1.To construct these histograms, cells are placed into bins defined by ranges of CFSE FI.Because the measured FI numbers tend to vary over orders of magnitude during the course of typical multi-day experiment, it is common to use the base 10 logarithm of FI as in the figure.Note that each curve in the figure provides a summary of the information collected on a given day, and the abscissa and ordinate for each point on the curve correspond to the lower limit of a histogram bin and the number of cells belonging to that bin, respectively.Therefore, the total number of cells present in a well at a given point in time can be visualized as the total area under the curve corresponding to that time.
In previous publications [2,3,4], the authors have described methods for using CFSE-based flow cytometry data to determine T cell division and death rates (for cells having completed various numbers of divisions).In this manuscript, however, we focus on a new model in which the apportioning of CFSE during mitosis is not assumed to be even and consider whether such a model can describe one particular collection of data sets more accurately than a model based on symmetric (50-50) cell division.These data sets are comprised of CFSE-base flow cytometry information collected for two donors and two types of T cell (CD4+ and CD8+).In all cases, a special dye known as ViViD is utilized to exclude dead-but-not-disintegrated cells from cell counts.Data were collected at five time points corresponding to approximately 24, 48, 72, 96, and 120 hours after stimulation of the cells using a non-specific T cell mitogen.Full details of the experimental protocol and a precise explanation of the data we consider here are provided by Kapraun [16,Chapter 3].

Mathematical Model
Here, we summarize a new class of cell proliferation models that incorporates aspects of the cyton model of Hawkins et al. [14,15] and the separable PDE models of Schittler et al. [20] and Hasenauer et al. [13], as well as the possibility of asymmetric cell divisions as proposed by Bocharov et al. [9] and Luzyanina et al. [17].These models extend the cyton-based division-and label-structured PDE models previously used by Banks et al. [2,3,4] by including an additional parameter m that describes the degree to which cytoplasm apportionment during mitosis is asymmetric.

Basic Asymmetric Division-and Label-Structured Model for Cell Densities
Let n i (t, x) be a structured density (in cells per unit FI), where i is a whole number representing the number of divisions completed for a specific "generation" of cells, t denotes the time elapsed (in hr) since some arbitrary starting time, and x denotes FI induced by CFSE.Also, let {α i (t)}, {β i (t)}, and v(t) denote exponential division rates, exponential death rates, and the CFSE exponential decay rate, respectively (all in hr −1 ).Finally, let m ∈ (0, 1  2 ] be an "asymmetric division parameter" so that the division of each mother cell results in two daughter cells that receive proportions m and 1 − m, respectively, of the cytoplasm (and CFSE) that was contained in the mother cell.Then the dynamics of a population of cells are described by ) where x ≥ 0 and the "recruitment" terms are given by for i ≥ 1.Note that when m = 1 2 , the expression for R i (t, x) given here simplifies such that the recruitment terms match those presented previously for the symmetric division models of Banks et al. [2,3,4].The initial conditions are given by where t 0 indicates the time of the first observation and Φ(x) is the structured density for cells in the initial (undivided) population at that time.Kapraun [16] provides a detailed explanation of this model as well as a derivation from basic conservation principles.We remark here that t 0 typically coincides with the time at which the cells were stimulated to divide, but for the experimental data described in Section 1, the first observation actually occurred approximately 24 hours after stimulation.As in the model of Schittler et al. [20], the solutions to the partial differential equations (PDEs) given in (2.1) can be factored as where N i (t) indicates the number of cells having completed i divisions at time t and ni (t, x) describes the distribution of CFSE FI within that generation of cells at time t; that is, ni (t, x) is a probability density function (pdf) in the variable x, so that for any fixed t, ni (t, x) ≥ 0 for all x and ∞ 0 ni (t, x) dx = 1.
We offer the following formal proposition of this factorability of solutions below.Kapraun [16] provides a proof of this proposition following the general approach of Schittler et al. [20].
Proposition 2.1.Let {N i (t)} ∞ i=0 be a set of functions satisfying the system of weakly coupled ordinary differential equations (ODEs) given by and the initial conditions given by Also, let {n i (t, x)} ∞ i=0 be a set of functions such that each ni satisfies the PDE and the initial condition for all x ≥ 0. Then the solution to (2.1) and (2.3) is given by (2.4) for i ∈ {0, 1, 2, . ..}.

Autofluorescence
Thus far, we have described a model that accounts only for FI induced by CFSE, but as noted by Thompson [23], the experimentally measured FI of a cell is actually the sum of CFSE-induced FI and the cell's natural "autofluorescence".Therefore, following the work of Hasenauer et al. [13], we let ñi (t, x) be a structured density (in cells per unit FI), where i again denotes a specific generation of cells, t denotes time elapsed (in hr), and x denotes measured FI.Here where x and x a represent the FI due to CFSE content and cellular autofluorescence, respectively.If we assume solutions n i (t, x) to (2.1) and (2.3) have already been computed and that x a is a realization of a random variable X a with pdf f Xa (x a ; t), then the densities ñi (t, x) can be computed using the convolution formula [10] as proposed by Hasenauer, Schittler, et al. [13,20].These authors demonstrate that, under certain assumptions, this convolution can be computed quickly and efficiently [13].

Division-and Label-Structured Cyton Model for Cell Densities
The cyton model [14,15] is an alternative to (2.5) that arises from two simple assumptions.The first, which is self-evident, is that any given cell must eventually either divide or die.The second, which is based upon experimental evidence, is that the processes of cell division and death operate independently of one another [15].Thus, we can assume that the destiny of any particular cell is governed by two fixed numbers: a "time until division" and a "time until death".In particular, the actual fate of the cell (division or death) can be determined by observing which of these two numbers is smaller.For an individual cell within a population of cells sharing similar characteristics (e.g., cells of the same type having undergone the same number of divisions), it is reasonable to assume that the "time until division" and "time until death" are realizations of two independent random variables.These random variables are described by probability distributions, and so the cyton model requires parameters that can be used to uniquely determine the probability distributions for times until division and death of cells in a given population (e.g., CD4+ T cells having undergone 1 division).Hawkins et al. chose the term "cyton" to denote the "combination of independent cellular machines governing times to divide and die" and represented a cyton mathematically using a pair of probability density functions [15] .For example, if φ i and ψ i are the pdfs for time until division and time until death, respectively, of cells in generation i, then the cyton for generation i can be denoted (φ i , ψ i ).One additional consideration is that, in reality, not all cells in a given population will divide if they avoid death (at least not within the time frame of a typical in vitro cell culturing experiment) [15].Therefore, the cyton model includes the notion of "progressor fraction": for a given generation of cells, only a certain proportion have the potential to "progress" to the next generation via cell division.
In the cyton model, n div i (t) and n die i (t) are rates (in cells/hr) at which cells in generation i divide and die, respectively.The cytons {(φ i , ψ i )} are used to compute {n div i (t)} and {n die i (t)}, which are in turn used to compute {N i (t)}, the numbers of cells present in each generation i at any given time t.Banks et al. [2,3,4] and Kapraun [16] provide details of the cyton model, and use the same notation and symbols we use here.
As in earlier work [2,3,4], we incorporate the cyton model for cell numbers into the division-and label-structured model framework described previously by replacing the sink and source terms in the right-hand sides of (2.1) with terms involving the cyton-based rates {n div i (t)} and {n die i (t)} to obtain (2.10) Solutions of this system are then given by n i (t, x) = N i (t)n i (t, x), where the N i (t)'s satisfy the cyton model equations [16, Equation 2.11] and initial conditions (2.6) and each ni (t, x) satisfies (2.7) and (2.8) as before.We state this claim as a proposition.As with the previous proposition, Kapraun [16] provides a proof of this proposition following the general approach of Schittler et al. [20].
i=0 be a set of functions satisfying the cyton model [16,Equation 2.11], where the initial condition N 0 is given by the relation shown in (2.6).Also, let {n i (t, x)} ∞ i=0 be a set of functions such that each ni satisfies the PDE (2.7) and the initial condition (2.8) for all x ≥ 0. Then the solution to (2.10) with initial conditions (2.3) is given by (2.4) for i ∈ {0, 1, 2, . ..}.
It is worth noting that cells will only divide a finite number of times in the time frame of a typical in vitro cell culturing experiment.Therefore, one typically computes solutions to (2.10) only for i ∈ {0, 1, . . ., i max }, where i max is the largest number of divisions we expect a cell from the initial population to undergo during the period of observation.For a five-day experiment such as that described in Section 1, it is rare for cells to undergo more than 12 divisions.Because of the compounding expense of the recursive function calls that are required when computing numerical solutions for the asymmetric division model [16, Appendix B], we only perform computations for cells that have divided 8 or fewer times; i.e., we let i max = 8.Further explanation and justification of this choice are provided by Kapraun [16,Chapter 5].
Finally, we remark that (2.10) actually describes an entire class of models.In order to specify a particular model for further investigation, we must decide on forms for the distribution of the autofluorescence X a , the (exponential) label decay rate v(t), the cytons {(φ i (t), ψ i (t))}, and the progressor fractions {F i }.

Assumptions and Parameterization for a Specific Mathematical Model
Here, we list the assumptions for the specific cyton-based mathematical model we consider and describe the parameters that we use to designate this model.All of the parameters for our specific mathematical model are provided in Table 1.
First, we assume that the random variable X a is time-independent and has a lognormal distribution with mean and variance denoted E [X a ] and Var [X a ], respectively.Although experiments indicate that the distribution of autofluorescence does, in fact, vary somewhat with time, ignoring this time-dependence greatly reduces the number of parameters required to designate the model and still allows for a reasonable fit to summary histogram data [4].We therefore have two parameters that completely describe the distribution of autofluorescence: E [X a ] and SD [X a ] = (Var [X a ]) 1/2 , which are listed as parameters 1 and 2, respectively, in Table 1.
Next, we assume that the rate of decay in CFSE-induced FI is given by v(t) = c, where c > 0 is some constant.This follows the convention established in our earlier work [4], in which we assume that an exponential decay model is sufficient to describe decay of CFSE in experiments for which the first data collection time occurs after approximately 24 hours.Therefore, we require only one parameter to completely describe the decay of CFSE: c, which is listed as parameter 3 in the table.
As previously asserted, we assume that m ∈ (0, 1  2 ] is a constant such that the division of each mother cell results in two daughter cells that receive proportions m and 1 − m, respectively, of the cytoplasm (and CFSE) that was contained in the mother cell.This is listed as parameter 4.
We assume that each T div i has a lognormal distribution with mean and variance denoted E T div i and Var T div i , respectively.We further assume that, for i ≥ 1, all T div i are independent and identically distributed (i.i.d.).Such assumptions are consistent with earlier work using the cyton model [4,14,15], as well as experimental evidence [12].We therefore have four parameters that completely describe These are listed as parameters 5 through 8, respectively.
We also assume that the random variables T die i for i ≥ 1 are i.i.d. with a lognormal distribution, in this case with mean and variance denoted E T die and Var T die , respectively.Again such assumptions are consistent with earlier modeling work [4,14,15].We further assume that there is no death for undivided cells (i.e., those cells in generation i = 0).In reality, there tends to be a large die-off of cells following stimulation with PHA [7], but we reiterate that for the data analyzed in this manuscript the first measurements were made approximately 24 hours post-stimulation.As a result, the initial conditions for our mathematical model only reflect those cells that have not died in the first 24 hours after stimulation.Therefore, as in our earlier work [4], we assume that the cells in our "initial population" (consisting of those undivided cells that are still alive 24 hours after stimulation) that do not go on to divide experience essentially no death for the duration of the experiment.Hence, we have two parameters that completely describe which are listed as parameters 9 and 10, respectively.
The only remaining parameters that are required to characterize our model are the progressor fractions {F i }.We allow F 0 to be one of our required model parameters and assume that each progressor fraction F i with i ≥ 1 is uniquely determined by the mean and standard deviation of a "discrete normal distribution", denoted D µ and D σ , respectively.This is consistent with the "division destiny" approach for determining progressor fractions that has been employed by Hawkins et al. [15] and Banks et al. [4], and we refer the interested reader to the latter reference for a complete discussion of the method by which the progressor fractions are computed.We therefore have three parameters that can be used to determine all the progressor fractions: F 0 , D µ , and D σ , which are listed as parameters 11 through 13, respectively.
Thus, our specific model depends on exactly 13 parameters.We remark that parameters 1 through 3, while important for describing the data, are not considered to be "biologically relevant" parameters in the sense that they do not have any bearing on the proliferative behavior of a population of cells.Note that the specific cyton-based model described here closely resembles the model denoted "Model 6" (with exponential label decay) in our previous work [4], except that it allows for asymmetric division.Kapraun [16] provides precise details of the numerical methods we use for computing solutions for this model in his Appendix B.

Parameter Estimation
In order to estimate the parameters in our specific mathematical model, we must first describe a statistical model that relates observable data to the mathematical model.As was previously explained, CFSE-based flow cytometry data are typically summarized in the form of histograms, and furthermore, measured FI is commonly represented on a logarithmic scale (cf. Figure 1).Therefore, we begin by describing how our model can be used to obtain information on cell numbers in a form that can be compared directly with such summarized experimental data.
If we define the structured densities ñi (t, x) (in terms of measured FI) as in Section 2.2, then the structured density for the entire population of cells is Now, because CFSE histogram data are most commonly reported using a base 10 logarithmic scale, we make the change of variables z = log 10 (x) to obtain n(t, z) = 10 z log(10) ñ(t, 10 z ) as the structured density in cells per base 10 log unit FI.
In the discussion that follows, we let q 0 denote a hypothetical "true" parameter vector (so that, in the case of our specific mathematical model, q 0 ∈ R 13 ) and let denote the total number of cells with log (base 10) FI in the interval [z k , z k+1 ] at time t j .Also, we let B denote the (fixed) total number of beads in each sample tube and b j denote the number of beads counted for the sample measured at time t j .Now, let N j k be a random variable representing the number of cells with log FI in the interval [z k , z k+1 ) measured at time t j .Then it has been argued [4] that i.e., each N j k is normally distributed with mean I[n](t j , z k ; q 0 ) and variance B b j I[n](t j , z k ; q 0 ).Note that this does not lead to either (1) a constant variance model or (2) a constant coefficient of variance model.Although these latter two types of statistical models are commonly assumed to underly data-collection processes [8,11,21], modified residual plots indicate that (3.2) is a better choice in this case [5,23].Define the generalized least squares (GLS) parameter estimator [1,11] q GLS = argmin q∈Q J( q; {N j k }), where Q denotes the set of allowable parameter vectors, and the weights (selected to match the variance of the N j k 's) are given by The value of I * is positive to prevent division by zero, and in practice it is selected such that the modified residual plots produce uniform random patterns.We once again follow the convention established in our earlier work [4] and set I * = 200.
If we consider the measured data to be a set of realizations {n j k } of the random variables {N j k }, we can obtain the GLS parameter estimate qGLS = argmin q∈Q J( q; {n j k }). (3.5) Note that to compute the weights {w j k } we need q 0 , but to estimate q 0 we need the weights.In order to overcome this obstacle, we use a conventional generalized least squares iterative estimation procedure [1,8] as described in Algorithm 3.1.In this algorithm, note that ε is a threshold tolerance that allows the user to specify a termination criterion, q typ is a vector with elements that reflect the relative sizes of the parameters to be estimated, and "./" denotes element-wise division.Kapraun [16] provides specific details of the implementation we employ for this algorithm in his Appendix B. Algorithm 3.1 Parameter Estimation Procedure 1. Obtain initial estimate q(0) using (3.5) with w j k = 1 for all j, k.
3. Initialize the iteration counter ℓ with the value 1.
4. Do each of the following: • Compute q(ℓ) using (3.5) with current weights w j k .• Update the weights using (3.4) with q 0 replaced by q(ℓ) .

Results and Discussion
We are ultimately interested in determining whether or not refining our cell proliferation model to allow for asymmetric division is worthwhile.That is, we wish to learn whether the asymmetric division model given by (2.1) -(2.3) produces better fits with values of m < 0.5 than with m = 0.5.To make this determination, we perform parameter estimations as described in Section 3. As we mentioned in Section 2.3, the algorithm for generating solutions to the 13-parameter asymmetric model is considerably more computationally complex than that previously used by the authors to generate solutions to the 12parameter symmetric division model [2,3,4].In fact, parameter estimations for the asymmetric division model require an average of 851.3 minutes per inverse problem even when only considering cells that have divided i max = 8 or fewer times, whereas parameter estimations for the symmetric division model require and average of only 6.41 minutes per inverse problem when using i max = 16 [4].(Timings provided here are based upon runs using MATLAB Release 2012a on a Dell Optiplex 990 with eight (8) 3.4 GHz Intel Core i7-2600 processors and 8 GB of 1333 MHz memory.) Because of the computational time required, we choose to perform parameter estimations only for a select subset of the data sets that were presented and analyzed previously by Kapraun [16].Whereas he considered 4 × 162 = 648 data sets for Donor 1 and 4 × 243 = 972 data sets for Donor 2, we narrow the list of data sets by identifying the three lowest cost and three highest cost data sets for each donor and cell type.Note that the "cost" of a data set is determined by the value of the minimized GLS cost functional similar to that given in (3.3), but based on a symmetric division model.The 24 data sets satisfying these criteria are listed in Table 2.
Applying our parameter estimation scheme to the the 24 data sets listed in Table 2 leads to the results shown in Table 3.Since we are primarily interested in the asymmetric division parameter m, the table only shows estimates for that parameter.We see that in 8 of 24 cases (33.3% of the data sets), the parameter estimation scheme returns a value of precisely 0.5 for the parameter m.Recall that a value of m = 0.5 in the asymmetric division model yields model output equivalent to that of a symmetric division model.Moreover, in another 6 cases the value of m returned does not differ from 0.5 by more than 4 percent.That is, in 8+6 = 14 cases (58.3% of the data sets), the parameter estimation scheme returns a value of m in the interval [0.48, 0.5].This immediately suggests that asymmetry does not play an important role in the majority of the data sets considered.
To assess the degree to which the parameter m is influential in describing the 16 data sets for which m is not precisely 0.5, we perform statistically based model comparison tests.The approach we use is based on analysis of variance (ANOVA) hypothesis testing as outlined by Banks and Tran in their text on mathematical modeling [8].We consider two distinct mathematical models, both of which can be evaluated using the cost functional J given in (3.3).The first is the 13-parameter model described in Section 2.4 and the second is the nested model that results when the parameter m in that model is fixed at the value 0.5 (which corresponds to perfectly symmetric cell divisions).
To formulate our statistical hypotheses, we let Q ⊂ R 13 denote the set of all admissible parameters for the 13-parameter model and Q H = { q ∈ Q : H q = c} ⊂ Q be the set of admissible parameters for the nested model, where H ∈ R 1×13 and c ∈ R. Using the parameter ordering suggested by Table 1 and setting H = 0 0 0 1 0 0 0 0 0 0 0 0 and c = 0.5 results in the nested model just described.We wish to test the null hypothesis that the "true" parameter vector q 0 is in the restricted set Q H ; i.e., Therefore, we follow the usual inferential statistics practice of defining a test statistic.Let {N j k } be a set of random variables as in (3.2) with corresponding realizations {n j k } constituting observed data so that we can define the GLS estimators q GLS = argmin q∈Q J( q; {N j k }) and q H GLS = argmin We note here that the inequality J(q H GLS ; {n j k }) ≥ J(q GLS ; {n j k }) should always hold because the estimate qH GLS is obtained by optimizing over a subset of Q, while qGLS is obtained by optimizing over all of Q.Using the GLS estimators and estimates, we can define the test statistic with corresponding realization where is the number of observations in a data set.We remark that, for the five-day time series data sets considered in this study, we use 1024 bins and 5−1 = 4 time points (because the first time point is used to construct an initial condition and is not considered to be a data point), so n = 1024 × 4 = 5096.As discussed by Banks and Tran [8], the test statistic U converges in distribution to a χ 2 distribution with r = 1 degree of freedom (where r is the number of constraints defined by the system H q = c) as n → ∞.We use this statistic to test our null hypothesis.
As shown in Table 4, the costs associated with the optimal parameter vector qH GLS on the restricted set Q H tend to be significantly greater than those associated with with the optimal parameter vector qGLS on the set Q. Note that the results shown in the table were obtained only for those data sets from Table 3 for which m was not equal to 0.5.We remark that for 3 of the selected data sets J(q H GLS ; {n j k }) < J(q GLS ; {n j k }), which leads to a negative value for Û ({n j k }).While this theoretically should not occur, we point out that all estimates are obtained through numerical optimization (cf.Section 3).The optimization algorithm involved in parameter estimation apparently failed to find a global minimum on Q because it first arrived at a local minimum in these cases.Based on the very low p-values (in some cases these were below the machine precision of approximately 10 −16 and therefore appear as zeros) resulting from the model comparison tests in the majority of cases, we should reject the null hypothesis and infer that the asymmetric division parameter m is important for describing the behavior of this population of proliferating T cells.
Thus, we have arrived at two seemingly contradictory results concerning the importance of incorporating asymmetric division into our model.In one third of the data sets we examined, our parameter estimation algorithm returned a value of m = 0.5, which indicates perfectly symmetric division.For the remaining data sets, however, our parameter estimation algorithm returned a value of m less than 0.5.Furthermore, if we fix m at 0.5 for these data sets, the GLS cost tends to be significantly greater.So, for two thirds of the data sets examined, allowing for asymmetric division does appear to lead to better agreement between the model and the data.We hypothesize that the apparent importance of incorporating asymmetry that is implied by the model comparison experiments might be due to a type of experimental confounding.Recall that the statistical model we use to relate experimental data to our mathematical model is not a constant coefficient of variance model (cf.(3.2)).In fact, data points (histogram bins) with higher cell counts tend to contribute more to the cost functional given by (3.3), even despite the moderating effect of the weights.Also, the cell counts in the various bins tend to be larger in the later days of the experiment after the cells have had the opportunity to multiply their numbers through cell division.Therefore, our parameter estimation scheme tends to favor parameter sets that give good model fits (i.e., low residuals) for observations at later days (e.g., Days 4 and 5) over those that give good model fits for observations at earlier days (e.g., Days 2 and 3).As supporting evidence for this hypothesis, consider Figures 2  and 3.These figures show data from Data Set 12 of Table 3 along with the asymmetric model fits obtained (1) when m is free to take on any admissible value and (2) when m is fixed at 0.5.(Note that the latter model fit is equivalent to a symmetric division model fit.)In Figure 2, we clearly see that the symmetric division ("m fixed") model does a better job of capturing all the peaks in the data for the earlier days than does the asymmetric division ("m free") model.In examining Figure 3, it might even be argued that the symmetric division model captures the peaks in the data for the later days just as well as the asymmetric division model, but clearly the residuals produced by the asymmetric division model are smaller.Thus, despite the better overall "peak capturing" performance of the symmetric division model, the asymmetric division model is selected by the model comparison tests because it produces smaller residuals for the later days of observations.It is conceivable that the introduction of the additional parameter m allows one to obtain better fits using the asymmetric model simply because this additional degree of freedom allows one to obtain reasonably low residuals for the early days of observations while simultaneously obtaining very low residuals for the later days of observations.Perhaps applying an analysis of residuals (as described by Banks and Tran [8] and Banks et al. [1]) to the asymmetric division model would lead to a different and better statistical model upon which future parameter estimations could be based.
It should also be pointed out that, in some cases, neither the asymmetric nor the symmetric division model seem to do an adequate job of fitting CFSE FI data of the type used in this study.Figure 4, which shows data from Data Set 22 of Table 3 along with the asymmetric ("m free") and symmetric ("m fixed") model fits, illustrates this point.Notice that the fits obtained with both models are particularly poor at Days 2 and 3.While such poor fits could occur because of imperfections in the parameter estimation algorithm (e.g., optimization routines identifying local rather than global minima) or simply because some of our data sets contain unidentified experimental errors, it is also feasible that our models have yet to account for some important parameters governing T cell proliferation.

Conclusion
We have presented a new cell proliferation model that incorporates aspects of the cyton model of Hawkins et al. [14,15], the separable PDE models of Schittler     3 and optimized model fits for Days 1 through 5. et al. [20], and the asymmetric division parameter m proposed by Bocharov et al. [9] and Luzyanina et al. [17].These latter authors [9,17] consider asymmetry as well as a time delay in their analysis of the division process of mouse hepatitis virus specific CD8+ T cells.They use a Wiener filter with a smooth decreasing low-pass filter function to pretreat (reduce the noise) in the CFSE histogram data.In our models, the use of cytons accounts for variability in division times and time lags in the onset of cell divisions.Moreover, we use unfiltered data from PHA-stimulated human CD4+ and CD8+ T cell experiments.
The model comparisons performed here do not provide conclusive evidence that including a provision for asymmetric division in our models consistently leads to improved fits for the CFSE-based flow cytometry data sets we considered.In some cases, including the parameter m allowed for model fits with significantly lower costs (as evaluated using (3.3)), but in the majority of cases the estimated value of m was not substantially different from 0.5 (which corresponds to perfectly symmetric cell divisions).Upon visual inspection of some of the model fits obtained in Section 4, it was observed that the selection of the asymmetric model over the symmetric model might be due to the form of the statistical model underlying the cost functional (3.3).Thus, it is unclear as to whether the incorporation of asymmetric division into previously studied cyton-based division-and label-structure cell proliferation models is worth the associated increase in computational cost.

Figure 1 :
Figure 1: Histograms summarizing CFSE FI data at various time points.

Figure 2 :
Figure 2: Plots of data from Data Set 12 of Table 3 and optimized model fits at (a) Day 2 and (b) Day 3.

Figure 3 :
Figure 3: Plots of data from Data Set 12 of Table 3 and optimized model fits at (a) Day 4 and (b) Day 5.

Figure 4 :
Figure 4: Plots of data from Data Set 22 of Table3and optimized model fits for Days 1 through 5.

Table 1 :
Parameters for specific mathematical model with asymmetric division.
σ std.dev. of discrete normal distribution (used to compute F i for i ≥ 1)

Table 2 :
Data sets selected for determining the importance of the model parameter m.All data sets include data from five time points as described in Section 1.In all cases, the actual times for data collection coincided with 23.5, 46.0, 67.5, 94.5, and 117.5 hours after stimulation with PHA.

Table 3 :
Parameter estimates for m for 24 selected data sets.Data sets for which m = 0.5 are emphasized in boldface.The actual times for data collection coincided with 23.5, 46.0, 67.5, 94.5, and 117.5 hours after stimulation with PHA.

Table 4 :
Results of the model comparison test described in Section 4.