Determining the control networks regulating stem cell lineages in colonic crypts

The question of stem cell control is at the center of our understanding of tissue functioning, both in healthy and cancerous conditions. It is well accepted that cellular fate decisions (such as divisions, differentiation, apoptosis) are orchestrated by a network of regulatory signals emitted by different cell pop- ulations in the lineage and the surrounding tissue. The exact regulatory network that governs stem cell lineages in a given tissue is usually unknown. Here we propose an algorithm to identify a set of candi- date control networks that are compatible with (a) measured means and variances of cell populations in different com partments, (b) qualitative information on cell population dynamics, such as the existence of local controls and oscillatory reaction of the system to population size perturbations, and (c) statistics of correlations between cell numbers in different compartments. Using the example of human colon crypts, where lineages are comprised of stem cells, transit amplifying cells, and differentiated cells, we start with a theoretically known set of 32 smallest control networks compatible with tissue stability. Utilizing near-equilibrium stochastic calculus of stem cells developed earlier, we apply a series of tests, where we compare the networks’ expected behavior with the observations. This allows us to exclude most of the networks, until only three, very similar, candidate networks remain, which are most compatible with the measurements. This work demonstrates how theoretical analysis of control networks combined with only static biological data can shed light onto the inner workings of stem cell lineages, in the absence of direct experimental assessment of regulatory signaling mechanisms. The resulting candidate networks are dom- inated by negative control loops and possess the following properties: (1) stem cell division decisions are negatively controlled by the stem cell population, (2) stem cell differentiation decisions are negatively controlled by the transit amplifying cell population.


Introduction
The stem cell lineage is a basic unit of hierarchical tissues and as such it has attracted the attention of both experimental biologists, and mathematical/computational modelers. The question of stem cell control is at the center of our understanding of tissue (mal-) functioning. Every day, cells in hierarchical tissues perform their specific functions and die to be replaced by new cell divisions. This process is stochastic in nature (see e.g. Arai, 2016;Wagers et al., 2002 ), and involves very large numbers of cellular events. The cells involved in the functioning and renewal of an organ differ from each other by their division and apoptosis capabil-ities, as well as the types of signals they send and the kinds of cell fate decisions they make.
In Komarova (2013) we developed a framework of reasoning about stem cell signaling networks. Let us suppose that there are three compartments in a set cell lineage: stem cells (SCs), transit amplifying cells (TACs) and differentiated cells (DCs). In the colon and intestinal crypts, as well as other structures, these are linearly ordered with SC at the bottom, DC cells at the top, and TAC in between. In order to maintain the number of each cell type, the rate of removal of the DCs from the top is balanced by division and differentiation of the SCs and TACs below.
Each cellular compartment may receive signals from other compartments (as well as from its own cells), which influence the rate of differentiation and proliferation. For example, having too few DCs may call for the need of faster differentiation of TACs into DCs. This faster differentiation of TACs may be achieved by relieving the negative control on the probability of TACs to differenti-ate exerted by DCs. Many molecules produced by these cells have been detected ( Gregorieff et al., 2005;Hsieh, 2012;Kosinski et al., 2007 ), but their role in each step of the regulatory network is yet to be completely determined ( Crosnier et al., 2006;Medema and Vermeulen, 2011 ).
Theoretically, each cell population may influence (in a negative or positive way) each of the processes that happen in the system, which gives rise to a very large number of networks of cellular control. In Komarova (2013) we examined such controls from the (linear) stability point of view. We called the stable networks with the smallest possible number of loops the "minimal networks". In a three-compartment system, assume that the following five processes can be controlled: divisions of SCs and TACs, differentiations of SCs and TACs, and death of DCs. It turns out that in this case, the smallest number of control loops is three, and there are exactly 32 different three-loop control networks that are stable. These 32 stable networks have different topologies and different signs of control loops (positive or negative).
In the present paper we are interested in determining the most likely control network(s) that govern the regulation of human colon crypt stem cell lineages. While the analysis of Komarova (2013) restricts the total number of possibilities to 20 (plus additional 12 networks with non constant DC death terms), the analysis did not indicate which of the 32 were most likely to describe the regulation in a real biological system.
We have now determined the most likely regulatory network(s), among the 32 three-compartment stable networks, by using actual measurements of the number of SCs, TACs, and DCs in biopsies of human colon crypts, and additional mathematical methodology. Bravo and Axelrod (2013) performed detailed measurements of the number of each of the three cell types (SCs, TACs, DCs) in 49 colon crypts in human biopsy specimens. In this paper we will examine each of the 32 possible networks (including the 20 networks explicitly presented in Komarova (2013) plus 12 additional ones) to determine whether it can produce the correct measured means and variances of cell population numbers. In addition to this static information we have also used data on the dynamics of injury recovery, as well as experimentally obtained intracrypt correlations. Using these criteria, a selection algorithm was devised that identified three of the 32 possible control networks as most likely the ones corresponding to the regulation of homeostasis of human colon crypts.

Data Description
We have measured the number and location of dividing cells (Ki-67 positively stained cells) and non-dividing cells (Ki-67 nonstained cells) in 49 colon crypts in human biopsy specimens. The non-dividing cells at the bottom of the crypt are considered qui-escent stem cells, the non-dividing cells in the top two-thirds of the crypt are considered differentiated cells. The dividing cells near the bottom third of the crypt are considered to consist of transient amplifying cells and active stem cells ( Li and Clevers, 2010 ). The experimental details of the source of the specimens, measurement of each cell type, and determination of reliability of measurements, have previously been described ( Bravo and Axelrod, 2013 ). The measurements are included in the supplement to this article.
For our model we need an approximate distribution of active cells into active stem cells and transit amplifying cells. This can be done by using experimental observations, as described below. Li and Clevers (2010) have reviewed evidence for the existence of quiescent stem cells at the bottom of the crypt and, in addition, of active stem cells among the dividing cells above the bottom. The existence of a quiescent stem cell population is consistent with the observation that mTert-expressing slowly cycling cells are resistant to intestinal injury and function in intestinal regeneration ( Montgomery et al., 2011 ). And the existence of an active stem cell population is consistent with the observation that rapidly cycling Lgr5 + cells are highly sensitive to intestinal damage ( Barker et al., 2007 ). We will denote the fraction of dividing cells that are active stem cells as W , and the fraction of active cells that are transient amplifying cells as (1 − W ) . The value of W can be estimated using the following considerations.
W , the proportion of active stem cells among all of the dividing cells in human colon crypts, can be determined from data available about human colon crypts stained with the stem cell marker Musashi-1, and stained separately with the proliferating cell marker Ki-67.
The percentages of Musashi-1 positively staining cells at different positions in the human colon crypt were reported in Nishimura et al. (2003) Fig. 4 . According to this study, 69% of all positively staining cells are in positions 1-7 at the bottom of the crypt, and 31% of all positively staining cells are in positions 8 and above. Bravo and Axelrod (2013) have reported, in Table 1 , the number of Ki-67 positively and negatively staining cells at different positions of the human colon crypt. The average number of negatively staining cells in positions 1-7, is 35.7 ± 36.3 s.d., 1 and the average number of positively stained cells in positions 8 and above is 623.9 ± 234.1 s.d. The negatively stained cells at the bottom of the crypt are considered quiescent stem cells. The positively stained cells are considered to comprise all of the dividing cells, including both active stem cells and transient amplifying cells.
The probability distribution for the values of W , as well as the lower and upper bounds of W , can be determined. Using the notations defined in Table 1 , the number of active stem cells is given by and the parameter W can be calculated as follows: (1) 1 The considerable variation in the number of quiescent stem cells has previously been noted ( Table 1, Bravo and Axelrod, 2013 ). The method of measuring cell numbers and the measurement reliability has been described in detail in (Additional File 5, Bravo and Axelrod, 2013 ). The experimental error in the measurement of the number of stem cells was determined by 49 repeated measurements of one crypt, C.V. = 7.7%. Since the experimental error in repeated measurements of one crypt was much less than the measured variation between 49 adjacent crypts, C.V. = 102%, it is likely that the measured variation between adjacent crypts is really indicative of a large variation between crypts, and not just due to experimental error.

Table 1
Notations used to calculate W , the fraction of active SCs.

% SCq
The percentage of all of the stem cells that are at the bottom of the crypt, e.g. quiescent stem cells % SCa The percentage of all of the stem cells that are above the bottom of the crypt, e.g. active stem cells in the region of dividing cells # SCq The number of quiescent stem cells at the bottom of the crypt # SCa The number of active stem cells above the bottom of the crypt in the region of dividing cells # Ki 67+ The total number of dividing cells The first quotient in the right hand side of this expression can be calculated from the data of Bravo and Axelrod (2013) , and the second quotient from Nishimura et al. (2003) . The probability distribution of the estimated values of W are shown in Fig. 1 (a). It was approximated numerically using all the realizations of the cell numbers measured experimentally in Bravo and Axelrod (2013) . The mean of this distribution corresponds to W = 0 . 03 , the value used in the calculations presented here, unless otherwise noted. In Fig. 1 (b) we show the frequency histograms of the three cell types obtained from the data by Bravo and Axelrod (2013) using W = 0 . 03 ; again, this was created using all the experimentally obtained values of the cell numbers. We also investigated the effect of W = 0 , as discussed in the last paragraph of Section 3 .

Stochastic model formulation
Consider a three-compartment model consisting of stem cells (SCs), transient amplifying cells (TACs), and differentiated cells (DCs). We will refer to the number of stem cells as I 1 , the number of transient amplifying cells as I 2 , and the number of differentiated cells as I 3 . We assume only symmetric divisions of stem cells, see section Results and Discussion, and employ a Poisson process to describe the dynamics (Poisson processes, and a related birthdeath process, are conventionally used to describe cellular processes, see e.g. Nowak, 2006;Wodarz and Komarova, 2014 ). The cells are subject to the following changes during an infinitesimally small time-increment, t : • With probability L 1 ( I 1 , I 2 , I 3 ) t a stem cell (SC) divides.
• With probability D ( I 1 , I 2 , I 3 ) t , a differentiated cell dies, (I 1 , I 2 , I 3 ) → (I 1 , I 2 , I 3 − 1) . A deterministic model that captures these events can be expressed as the following system of ordinary differential equations: The equilibrium of this system, ( Ī 1 , Ī 2 , Ī 3 ) , can be obtained by solving Eqs.

Stochastic analysis
There are five distinct processes that can take place in this system: differentiation divisions of SCs ( Q 1 ), proliferation divisions of SCs ( Q 2 ), differentiation divisions of TACs ( Q 3 ), proliferation divisions of TACs ( Q 4 ), and death ( Q 5 ). The rates of these processes are given by: The stochastic description in terms of the Kolmogorov forward equation is given by the following equation for the variable ϕ I 1 ,I 2 ,I 3 , the probability to find the system in state ( I 1 , I 2 , I 3 ) at time t: where the processes of the right hand side are presented in the same order as they appear in Section 2.2 . The methodology presented here was developed in Komarova (2013) and Yang et al. (2015b ) and is related to the linear noise approximation of Van Kampen (1992) . A detailed derivation and justification can be found in Sun et al. (2016) . Let us use the symbol H I 1 ,I 2 ,I 3 to denote any of the functions L 1 ( I 1 , I 2 , I 3 ), L 2 ( I 1 , I 2 , I 3 ), P 1 ( I 1 , I 2 , I 3 ), P 2 ( I 1 , I 2 , I 3 ), and D ( I 1 , I 2 , I 3 ). Suppose that we can represent the functions H I 1 ,I 2 ,I 3 near the equilibrium as H I 1 ,I 2 ,I 3 = H( I 1 , I 2 , I 3 ) , where the parameter 1 defines the weakness of the dependence of these rates on the populations that control them. Note that in this methodology, the peak of the probability distribution of the number of cells is assumed to be located near population sizes of the order 1/ and has a width of the order of 1/ 1/2 , see the derivation in Komarova (2013) . While the validity of this approach has been studied extensively, see e.g. Gardiner (2004) and Wallace et al. (2012) , in our context it is important to note that typical fluctuations (of size 1/ 1/2 ) must remain sufficiently small compared with the typical population size ( ∼ 1/ ), such that the system will remain near the equilibrium and stochastic extinction is an unlikely event (for a time-duration which grows with 1/ ). These are conditions of homeostasis in a biological system; our approximation will technically break down outside the homeostatic regime.
It is convenient to denote the continuous variables and further shift the cell counts to be equal to zero at the equilibrium: We can expand the functions H I 1 ,I 2 ,I 3 around the equilibrium Taylor where the subscripts x 1 , x 2 , and x 3 denote the partial derivative of the function with respect to its argument, evaluated at ( Ī 1 , Ī 2 , Ī 3 ) .
Terms of the order of 2 will be dropped.
To obtain the equations for the means and variances, we will follow the stochastic calculus of stem cells methodology developed in Sun et al. (2016) . The stochastic processes defined in Section 2.2 can be characterized by the following cell number changes: Q 1 : Differentiation of SCs, 1 I 1 = −1 , 1 I 2 = 2 , 1 I 3 = 0 , Q 2 : Proliferation of SCs, 2 I 1 = 1 , 2 I 2 = 0 , 2 I 3 = 0 , Q 3 : Differentiation of TACs , 3 I 1 = 0 , Then Eq. (8) can be expressed as: (11) can be rewritten as: Here we use a standard technique to derive equations for the moments. Let us adopt the following notations for the first moments and the second moments of cell numbers: We multiply both sides of Kolmogorov forward Eq. (12) by i m and by i p i q , and sum over the indices i 1 , i 2 , i 3 , to obtain: where m = 1 , 2 , 3 and p, q = 1 , 2 , 3 . The right hand side of the equations is zero because we consider the equilibrium state and the time-derivatives in the Kolmogorov forward equation are zero. Next, we use expansions (10) in Eqs. (13) -(14) , and truncate the expressions by keeping terms of order and 2 in equations for the first and second moments respectively. This results in the following moment equations of the cell numbers: where a m j = 5 Because of definition (9) , the means y m are all zero, and solving Eq. (16) , we can obtain the expressions for the second moments, which are equal to the cell number variances, V ar[ I m ] = y mm , m = 1 , 2 , 3 .

Selection Algorithm
In Komarova (2013) , we have identified 20 different 3compartment minimal control networks that are compatible with stable homeostatic control, see Fig. 2 . These networks are characterized by constant death terms. In addition, there are 12 minimal control networks with non-constant death terms, see Fig. 3 . Horizontal arrows indicate the cell fate decisions: div1 and div2 the division process of SCs and TACs; diff1 and diff2 are the probability of the division to be a differentiation, as opposed to proliferation, for SCs and TACs respectively. The curved positive and negative arrows indicate control. The point of the arrow corresponds to the process that is being controlled, and the base of the arrow corresponds to the cell type controlling the process. Regulated death terms occur in different contexts and have been modelled in the past ( Fuertinger et al., 2012;Mahaffy et al., 1998;Stiehl et al., 2014aStiehl et al., , 2014b. All of these networks have exactly three controls (it was shown that this is the minimal number of controls compatible with stability), and include the 5 processes described in Section (2.2) . We will use a selection algorithm to determine the most likely minimal control network that matches the distribution of the measured data, see Fig. 4 . This algorithm allows us to use biological criteria to exclude many of the possible control networks depicted in Figs. 2 and 3 . It uses the data generated for the distribution of the cell numbers (see Supplement) as well as other considerations from the literature. The algorithm is demonstrated here using the 20 constant death rate networks of Fig. 2 . The 12 networks of Fig. 3 are considered in Appendix C . The following is the step-by-step procedure used.
Choose networks with local controls. Cell-cell communication may occur by direct mechanical contact or by dispersal of molecules from a source cell. Mechanisms for the dispersal of molecules include transport through cell membranes, extracellular Brownian motion, or transport on the outer cell surface ( Schier and Needleman, 2009 ). In each situation the effect of one cell is greatest on an adjacent cell and decreases on cells further away. Taking into account the spatial organization of the crypt, we therefore assume that SCs can only control SCs and TACs; that TACs can only control TACs, SCs, and DCs; and that DCs can only control DCs and TACs. It follows that 11 out of the 20 control networks have local controls, see Fig. 4 . We used Fig. 2 to select local controls, and the results are independent of the values of W .
Choose networks with stable solution and measured means and variances. Let us assume that all the control functions are linear (or consider the linearization of nonlinear controls, see Appendix A ), and the death rate of DCs is constant in our analysis (this assumption is relaxed in Appendix C ). Using Eqs. (5) -(6) as constraints, we define linearized control functions as where coefficients a, b , and c with the appropriate subscripts are constants. Functions ( 17 -20 ) are the most general linear functions compatible with identities ( 5 -6 ). We however are interested in "minimal controls", which is a restricted subset of such functions. It was shown in Komarova (2013) that for stability of a threecompartment system, it is necessary to have at least three control loops, and all three populations must be involved in the control. There are exactly 20 systems with minimal control (that is, only 3 control loops) with constant death terms, see Fig. 2 (the nonconstant death terms are included in networks of Fig. 3 and analyzed in Appendix C ). Each of these control networks is characterized by exactly one nonzero coefficient a , one nonzero coefficient b , and one nonzero coefficient c in system ( 17 -20 ). For example, the topmost network in the left column of Fig. 2 contains control of SC divisions by SCs, control of SC differentiation probabilities by TACs, and control of TAC divisions by DCs. This means that the corresponding linear system of controls, Eqs. (17) -(20) , contains nonzero coefficients with the rest of coefficients being zero. For each minimal control network, there are 5 unknown constants: the equilibrium values L 0 , D 0 , and the nonzero controls ( a, b, c ). Let us denote where q is the ratio between the division rate of the SCs and the death rate of DCs at equilibrium. Eqs. (5) and (6) imply that q ∈ (0, 1/2) (since the probability P 2 ≤ 1). Further, by rescaling the time unit, we can set D 0 = 1 . Therefore, only four unknown coefficients remain: a, b, c, q.  and solve the linear algebraic system of equations given by (16) . In particular, we can obtain the expressions for the three variances y 11 , y 22 , and y 33 . For each control network, these expressions depend on the unknowns q, a, b , and c . Using the data on the numbers of dividing and non-dividing cells for a given value of W , we can compute the numerical distributions of SCs, TACs, and DCs, and measure their means and variances. In this study we will focus on the case: W = 0 . 03 (most of the dividing cells are TACs).
Let us pick a control network, and also fix a q value; in our simulations we took q = 0 . 1 , 0 . 2 , 0 . 4 , 0 . 5 . For each q value, we have a system of equations where the left hand sides are functions of coefficients ( a, b, c ) (under fixed control network and the q value), and the right hand sides are numerically measured values of the cell number variances (under the fixed value of W , the fraction of active SCs among all dividing cells). This linear system of three equations with three unknowns can be solved to find the unknown controls a, b, c , and only the networks that have real stable solutions for at least one q value will be considered. A Mathematica file is provided in the Supplement that accomplishes this task.
In conclusion, networks #6, 17, 20 do not give stable solutions, and therefore are eliminated. The other 8 listed in Fig. 4 give realistic stable solutions.
To confirm the theoretical results, for each network, we then run numerical simulations with the coefficients obtained as described above. Note that the analysis presented here is local, in the sense that only the derivatives of the control at the equilibrium can be determined. We do not have any information on the global shapes of the control functions L 1 ( I 1 , I 2 , I 3 ), L 2 ( I 1 , I 2 , I 3 ), etc. A numerical simulation requires further assumptions on the actual functional form for all the four control functions. The simplest way is to use linear functions, Eqs. (17-20) , for controls for all values of the arguments (cell numbers). This assumption works for many control networks, but sometimes it was observed that stochastic deviations of cell numbers from the mean forced the linear control function to take values outside the realistic range (e.g. a division rate may become negative). In such cases we used nonlinear functions which have the correct values of the derivatives at the equilibrium, but are defined in the biologically relevant range, see Appendix A . A typical simulation of the control dynamics for network #1 is presented in Fig. 5 . The variance of the number of each cell type, is similar to the measured values reported in Bravo and Axelrod (2013) .
Choose networks with appropriate dynamics of recovery from perturbation. Next, we perform the eigenvalue analysis of the networks to study the injury recovery dynamics. The recovery dynamics of the measured data are oscillatory, see e.g. Paulus et al. (1992) . Theoretically, we study the eigenvalues of the linearized system around the equilibrium for each network. Using the deterministic  5. A typical simulation of network #1. Simulation starts at the experimentally measured means and finishes when the number of time steps reaches 2 · 10 7 . Here we used q = 0 . 1 and W = 0 . 03 ; a set of control coefficients which produces means and variances similar to those measured in human crypts was determined by solving system (16) . For the nonlinear control function see Appendix A . Eqs.
(2) -(4) , we can compute the Jacobian of each network (evaluating at the steady state) and its eigenvalues, thus obtain the condition of stability and oscillatory behavior, see Appendix B for details. Complex eigenvalues indicate robust oscillations. The results are listed as follows (see Eqs. (17) -(20) for the notations): • system 1: The system is always stable, and it is oscillatory if a L 1 < b P 1 4 . • system 2: The system is always stable, and it is oscillatory if a L 1 < b P 1 4 . • system 8: The system is always stable but not oscillatory. • system 9: The system is always stable but not oscillatory. • system 12: The system is always stable but not oscillatory. • system 13: The system is always stable, and it is oscillatory if From this analysis we conclude that networks #8, 9 and 12 do not have appropriate oscillatory dynamics and are therefore eliminated.

Choose networks with observed intracrypt correlations of cell types.
From the measurements, the sum of the number of SC's (total stem cells) and TACs is not correlated with the number of DCs, see Fig. 6 (a). For each candidate network, we evaluate the absence of this correlation. We do not need to investigate correlations between TACs and SCs, because these numbers are strongly correlated due to the assumption that a fraction W of dividing cells is SCs and (1 − W ) is TACs. We use W = 0 . 03 , from Section 2.1 .
From the previous step, the remaining networks are: 1, 2, 5, 13, and 18. For each of these networks, we perform simulations of the dynamics in homeostatic conditions, Fig. 6 (b-f). Each simulation starts at the experimentally measured mean and finishes when time steps reach 2 · 10 7 . Each data point in Fig. 6 is collected every 4 · 10 5 time steps: the x -value is the sum of SC and TAC population numbers, and y-value is the DC population number (that is, for each panel in 6 , we plotted 50 data points). Using the statistical package R, we can check the correlation for each network: we fit a linear regression model of the simulated DCs against the sum of simulated SCs and TACs. We then perform hypothesis testing on the linear relation: suppose the model is expressed as y = αx + β, then the null hypothesis is H 0 : α = 0 , and the alternative hypothesis is H a : α = 0. P-values of α can be obtained from R. A p-value less than 0.05 indicates that a linear correlation was unlikely due to chance, and a p-value greater than 0.05 indicates that the correlation could have been due to chance. The result is presented in Fig. 6 , and p-values are given in the caption of the figure.
From this part of the analysis we conclude that networks #5, 13, and 18 have significant correlations between the number of DCs and sum of SCs and DCs, unlike the observed data, and therefore are eliminated. However, for networks #1 and #2, the intracrypt correlations are not significant, as are the observed data not significant, and they are retained.
Revisit injury recovery dynamics. Besides the eigenvalue analysis, we also look at the actual trajectories of the cell numbers following in injury. Injury recovery measurements are available in the literature, see e.g. Paulus et al. (1992) , where the cell numbers in the mouse small intestine were measured following a perturbation of homeostasis of cell dynamics by a dose of radiation. Further, Hua et al. (2012) show very similar results, see Fig. 6(b) in their paper that depicts oscillations of DNA synthesizing cells in mouse intestinal crypts after irradiation.
We observe that oscillation trajectories in recovery dynamics feature a certain overshoot followed by diminishing oscillations around the mean number of the cells. In particular, Fig. 1(c) in Paulus et al. (1992) shows recovery oscillation of (clonogenic) stem cells, and Fig. 1(d) in the same paper shows the recovery of total cells per crypt. Even though direct measurements of oscillatory crypt recovery dynamics are only available for murine crypts, we expect that similar behavior will be observed in human crypts, see also simulations of the human colon crypt in Bravo and Axelrod (2013) , where Fig. 5 shows oscillatory behavior similar to that of mouse crypts. Therefore, we argue that a reasonable control network should exhibit oscillatory behavior both in the numbers of stem cells and in the total number of cells.
In the previous stage of the algorithm, we have used the eigenvalue analysis to study the oscillatory behavior of each candidate network, and now we check if each remaining network produces a recovery trajectory that is qualitatively similar to the measurements. For each network, we start a numerical simulation at the state of equilibrium. We reduce the total cell number to 0.8 of the unperturbed value, run the simulation when the time steps reach 2 × 10 6 , and then plot the number of cells over the time course. The numerical results for network #1 are presented in Fig. 7 (for network #2, the dynamics look very similar and are not shown). We observe that networks #1 and #2, each produce recovery dynamics of total cell numbers similar to the experimentally observed recovery dynamics reported in Paulus et al. (1992) and Hua et al. (2012) . In particular, both the number of SCs and the total number of cells are characterized by oscillatory recovery trajectories consistent with the experiments. In contrast with that, other (non-oscillatory) networks exhibit qualitatively different behavior, which is illustrated by the example of network #12, see Fig. 8 .
A note on the variance of SCs. The calculated mean number of SCs in the crypts was only slightly larger than the standard deviation. As long as parameter W (the fraction of dividing cells that are active SC) was not too much lower than its calculated mean, the system under the minimal controls studied here was able to maintain robust homeostasis. If we used W = 0 , however, we observed that under the parameter values that produced the experimentally measured variance, the SCs were subject to relatively frequent extinction events. This observation allows several interpretations. (i) If we were to interpret the inter-crypt variation as an indicator of temporal intra-crypt variation, and assumed that W = 0 , the minimal control networks studied here are not enough to explain the system behavior, and additional processes such as TAC de-differentiation activated by SCs falling below a certain level would have to be included. (ii) Alternatively, it is possible that the relatively high inter-crypt variance of the SC numbers is a consequence of inter-crypt parameter variation, and the actual temporal homeostatic variability of crypts in lower than this. (iii) The value of W does not fall much below the measured mean of W = 0 . 03 , in which case no further model modifications are necessary.

Summary of findings.
Of the 20 possible constant-death control networks ( Fig. 2 ), only networks #1 and #2 have stable dynamics that reproduce the measured mean and variance of each cell type, exhibit the correct intra-crypt correlation patterns, and show realistic oscillatory recovery dynamics. Further, as demonstrated in Appendix C , out of the 12 additional, non-constant death control networks ( Fig. 3 ), only network #27 satisfies the same criteria. The three networks, #1,#2, and#27, are shown in Fig. 9 .

Results and discussion
Investigating possible regulation of stem cell dynamics in colon and intestinal crypts has been a popular subject for computational and mathematical modeling ( Carulli et al., 2014;De Matteis et al., 2013;Kershaw et al., 2013;Van Leeuwen et al., 2006 ). This is because crypts have a few distinguishable cell types organized in a hierarchy of fewer than 2500 cells that maintain homeostasis, and can recover after perturbation. It is the kind of dynamical system that lends itself to formulating possible regulatory models, and testing the model behaviors by comparing simulation results to experimental observations of real biological crypts.
In this study we investigated the 32 theoretically possible minimal control networks for a three-compartment system consisting of SCs, TACs, and DCs. We used the data obtained on 49 human colonic crypts, where the numbers of dividing and non-dividing cells were measured. From this information assuming that the fraction W of all dividing cells were active SCs, and fraction 1 − W were TACs, we obtained the distributions of the three cell types. Using this information as well as observations of crypt recovery from injury and intra-crypt correlations, we devised an algorithm which allowed us to test all 32 networks. All but three were excluded based on their inconsistency with the measured data. In particular we found that control networks #1, #2, and #27 are the best systems that describe the measured data, see Fig. 9 .  A conceptually surprising outcome is that an argument about the interactions among the compartments of a stem cell lineage can be made based on only a very limited set of measurements, which does not contain any direct assessment of signaling mechanisms. The biological input consists of quantitative static measurements (the sample means and the variances of the cell numbers together with their correlations) and qualitative dynamic measurements (the existence of oscillations in tissue recovery process). Based on these pieces of evidence, and on our analysis of mathematically possible networks, we were able to restrict the number of possible regulatory networks to only three.
The resulting candidate networks #1 and #2 ( Fig. 2 ) are among only three networks (among the 32 networks) that consist entirely of negative loops (network #6 is the third such network, and it was eliminated at the first step of the analysis because it failed to produce a stable root with the means and variances matching the observations, see Fig. 4 ). Network #27 is one of only two networks (among those where death of DCs is controlled, Fig. 3 ) that contain two negative control loops (non-local network #29 is the second one). In general, negative feedback controls are common in biological systems at many levels, from repressor protein effects on transcription of the lac operon in DNA to the effect of insulin on glucose in the blood. Negative controls have also been invoked to model other cell lineages ( Lo et al., 2009;Mangel and Bonsall, 2008 ). Novák and Tyson (2008) emphasize the important role that negative signaling loops play in oscillatory behavior in a wide variety of biological systems.
We further notice that the candidate control networks identified by our algorithm are all very similar: out of three control loops, two occur in all the three networks. These are: (1) the negative regulation of SC division rate by the cells in the SC compartment, and (2) the negative regulation of SC differentiation probability by the TAC compartment. These control loops have a simple explanation through the mechanism of crowding: having too many SCs prevents them from further divisions, and too many TACs restricts differentiation divisions in favor of proliferation divisions. The DCs in the three candidate networks perform different tasks: they exhibit negative control of divisions (#1) and differentiations (#2) of TACs, and they regulate their own death (#27). Our current algorithm cannot distinguish between these three possibilities; further biological information may further narrow the set of candidate control networks.
Understanding the design principles of control networks is a fascinating research direction, to which this work can contribute. Given the results of our analysis, we hypothesize that perhaps the pattern of two negative control loops that occurs in all the three networks selected by our algorithm may be important for effective control of hierarchically organized tissues such as crypts.
This work should be considered more of a demonstration of principle than a final result. For example, we cannot conclude that one of the networks #1, #2 and #27 is definitely the one that acts in human colonic crypts. We have only tested the minimal (3control) regulatory networks. It is still possible that a more complicated network with more than three essential control mechanisms is in place. All we can say is that we found the two simplest (in the sense of having the minimal number of loops) control networks that are compatible with (both dynamic and static) observations in colonic crypts. The same holds for additional cellular processes that were not included in the present model, such as cell de-differentiation. The methods proposed here however can be extended to such systems, see Sun et al. (2016) . The process of de-differentiation can be especially relevant for the system in question, as the measured inter-crypt standard deviation of the SC numbers is not much smaller than their mean, and in certain regimes it may be necessary to include TAC de-differentiation to compensate for stochastic loss of SCs.
In principle, SCs are capable of three types of division: (1) asymmetric divisions, where one of the daughter cells retains the stemness property while the other is more differentiated; (2) symmetric proliferation, where both offspring are stem cells, and (3) symmetric differentiation, where both offspring have higher degree of differentiation compared to the dividing cell. A large number of studies has been devoted to understanding the symmetry of SC divisions, and it appears that in some organisms SCs divide mostly asymmetrically, and in others both division types happen, depending on the specific context. The prevalence of symmetric divisions depends on the tissue. For example, in the mouse epidermis, it has been reported that about 20% of SC divisions are symmetric in the ear and tail epidermis, while 40% of SC divisions are symmetric in the paw epidermis. In the epidermis it has been argued in the recent years that SCs divide predominantly asymmetrically ( Clayton et al., 2007;Doupé et al., 2010;Lim et al., 2013;Mascré et al., 2012 ). In crypts, however, SC symmetrical divisions play an important role ( Barker, 2014;Lopez-Garcia et al., 2010;Simons and Clevers, 2011;Snippert et al., 2010 ). Therefore, in the present paper we have used the model with symmetric divisions (types (2) and (3) above). Again, asymmetric divisions can be added by using the present methodology, see Yang et al. (2015a ).
In this paper we used the measurements of Ki-67, where the positively stained cells were identified as dividing, and the nonstained cells as non-dividing, to study cellular control networks. Different cell types (such as SCs, TACs, DCs) control various cell fate decisions. In order to convert our measurements into information on the numbers of SCs, TACs, and DCs, we used the assumption that some stem cells are quiescent and others are active. Li and Clevers (2010) has reviewed evidence for this understanding of stem cells. They describe evidence for the co-existence of quiescent and active stem cell populations in the hair follicle, small intestine, and bone marrow. The model includes interconversion between the two types of stem cells, with quiescent stem cells replenishing damaged active stem cell population, and the possibility of active cells converting to quiescent stem cells. The number of quiescent cells are regulated by negative feedback from the active stem cells and/or their progeny. This model extends the previous models that have described each cell in the stem cell population as having a probability of not dividing, or dividing symmetrically to produce more stem cells or asymmetrically to produce a stem cell and a transient amplifying cell ( Humphries and Wright, 2008 ).
Our model only includes spatial considerations in the most rudimentary sense. We exclude the networks with nonlocal control, bearing in mind the geometry of the crypts. A more detailed, spatial model can be designed to test if our conclusions still hold.
A limitation of such numerical studies is that complex spatial models do not allow for analytical solutions, and a comprehensive parameter study of the kind presented in our algorithm cannot be implemented. The advantage of our approach is that we obtained analytical expressions for the means and the variances of numbers of each cell type, and this allowed us to recover the values for the controls, given the experimentally obtained cell population measurements.
In conclusion, we have compared the simulated behavior of several possible regulatory networks with the numbers of cell types measured in human biopsy specimens, and have determined that the most likely form of regulation is by local control of stem cells on their own division, transient amplifying cells on the differentiation of stem cells, and differentiated cells on the division and differentiation of transient amplifying cells.

Acknowledgments
D.E.A. was supported by the Human Genetics Institute of New Jersey, the New Jersey Breast Cancer Research Fund, and the Rutgers Cancer Institute of New Jersey ( PA30CA072720 ). He thanks Dr. Steven Schiff of the R-CINJ for provided biopsy specimens, and Dr. Michael de Frietas for informative discussions about modeling crypt cell dynamics. N.L.K gratefully acknowledges the support of NIH grant 1 U01 CA187956-01 . We thank an anonymous reviewer for suggesting that we analyze networks with non-constant death.

Appendix A. Nonlinear control functions
For some parameter values it was possible to assume that the controls are described by functions ( 17 -20 ), as long as the probability values (functions P 1 and P 2 ) are within the interval [0, 1], and the rate functions ( L 1 and L 2 ) are nonnegative. If in the course of stochastic dynamics the functions given by Eqs. (17) -(20) fall outside these bounds, we implemented a rule where the functions were replaced by zeros or ones (that is, values inside the boundaries). For a set of parameters (specifically, for relatively small values of W ), this rule resulted in a system failure, as depicted in Fig. 10 for network #1, W = 0 . 03 . As observed, SCs stop dividing after around some time; that is L 1 ≤ 0. In the following example we explore what happens. Using Eq. (17) , one can see that the abnormal situation is due to the form of function L 1 . We have for control system #1: where i 1 , i 2 and i 3 are given by Eq. (9) . We observe that L 1 drops below zero when the SC number is sufficiently large. In the case of this control system, the dynamics often drives the population of stem cells toward such values where L 1 drops below zero. In other words, linear function L 1 attains biologically unrealistic values within the range of normal fluctuations of x . Therefore, we need to replace the linear function L 1 with a nonlinear function, which remains biologically relevant within the range of x that is attained regularly by the stochastic system. Notice that such alternative, nonlinear function L s 1 should have the following properties: • L s 1 depends only on i 1 , and it is decreasing; For other networks, we perform a similar procedure whenever the assumption of linearity is violated.

Appendix B. Eigenvalues Analysis
We first illustrate the eigenvalues analysis by computing eigenvalues on system 1. Using Fig. 2 and the notations in Eqs. (17) -(20) , system 1 can be characterized as: The Jacobian of the system (evaluating at the steady state) can be obtained using the deterministic Eqs.
(2) -(4) : By solving the roots of the characteristic polynomial | J − λI| , we obtain the eigenvalues: It follows that system 1 is always stable (all eigenvalues have negative real part), and it is oscillatory if L 0 b P 1 − 4 L 0 a L 1 > 0 (complex eigenvalues indicate robust oscillations); that is, a L 1 < b P 1 4 . We then performed the same analysis for system 2, 8, 9, 12, 13 and 18. The results are summarized below: • system 2: 2 .
-The system is always stable but not oscillatory.
• system 9: -The system is always stable but not oscillatory.
• system 12: -The system is always stable but not oscillatory.  -Eigenvalues are 2 .
-The system is always stable, and it is oscillatory if a P 2 > − L 0 b P 1 4 D 0 .
Appendix C. Analysis of non-constant death rate minimal networks of Figure 3 We followed the steps of the algorithm described in the main text to analyze the 12 minimal networks depicted in Fig. 3 . The consecutive elimination steps are shown in Fig. 11 . First, we eliminated the non-local networks, which excluded networks 29-31. Out of the remaining networks 21-28, all eight allow for control coefficients compatible with the measured means and variances. The stochastic analysis is based on the linearization of the control functions given by Eqs. (17) -(20) and the equation for the death rate, Analysis of eigenvalues shows that networks #21, #22, and #23 are non-oscillatory. Networks #24 and #25 have complex eigenvalues, but the solution for the stem cell component is non-oscillatory. Similarly, network #26 has oscillatory eigenvalues but the total number of cells is non-oscillatory. Finally, networks #27 and #28 both have complex eigenvalues, and both the stem cells and total numbers of cells have oscillatory recovery dynamics. Of these two remaining networks, #28 has statistically significant intracrypt correlations, while #27 does not, see Fig. 12 . Further, network #27 exhibits oscillatory recovery dynamics similar to that depicted in Fig. 7 . Therefore, we conclude that the only remaining candidate network is #27.

Supplementary material
Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.jtbi.2017.06.033 .