# A dispersal-limited sampling theory for species and

## Comments

## Transcription

A dispersal-limited sampling theory for species and

Ecology Letters, (2005) 8: 1147–1156 doi: 10.1111/j.1461-0248.2005.00817.x LETTER A dispersal-limited sampling theory for species and alleles Rampal S. Etienne1* and David Alonso2 1 Community and Conservation Ecology Group, University of Groningen, PO Box 14, 9750 AA Haren, The Netherlands 2 Ecology and Evolutionary Biology, University of Michigan, 830 North University Av, Ann Arbor, MI 48109-1048, USA *Correspondence: E-mail: [email protected] Abstract The importance of dispersal for biodiversity has long been recognized. However, it was never advertised as vigorously as Stephen Hubbell did in the context of his neutral community theory. After his book appeared in 2001, several scientists have sought and found analytical expressions for the effect of dispersal limitation on community composition, still in the neutral context. This has been done along two relatively independent lines of research that have a different mathematical approach and focus on different, yet related, types of results. Here, we study both types in a new framework that makes use of the sampling nature of the theory. We present sampling distributions that contain binomial or hypergeometric sampling on the one hand, and dispersal limitation on the other, and thus views dispersal limitation as ubiquitous as sampling effects. Further, we express the results of one line of research in terms of the other and vice versa, using the concept of subsamples. A consequence of our findings is that metacommunity size does not independently affect the outcome of neutral models in contrast to a previous assertion (Ecol. Lett., 7, 2004, p. 904) based on an incorrect formula (Phys. Rev. E, 68, 2003, p. 061902, eqns 11–14). Our framework provides the basis for development of a dispersal-limited non-neutral community theory and applies in population genetics as well, where alleles and mutation play the roles of species and speciation respectively. Keywords Binomial sampling, biodiversity, community, dispersal-limited sampling, Ewens sampling formula, hypergeometric sampling, neutral model, random sampling. Ecology Letters (2005) 8: 1147–1156 INTRODUCTION The importance of dispersal in ecology has long been recognized (e.g. Grinnell 1922; MacArhur & Wilson 1967; Levins & Culver 1971; Brown & Kodric-Brown 1977; Hanski 1983; Tilman 1994; Loreau & Mouquet 1999). Yet, seldom has a more vigorous (quantitative) case been made than by Hubbell (1997, 2001) who presented a comprehensible suite of stochastic neutral models of community structure based on the fundamental processes of speciation, extinction and dispersal. In the most often cited model of these, the local community consists of J individuals of different species whose offspring compete for sites that are left open after an individual dies. They do not only compete with one another, but they also compete with immigrants from outside the local community: there is a probability m that an open site is colonized by an immigrant. If m < 1 the local community is called dispersal-limited. With probability 1 ) m, the open site is colonized by offspring of a local individual. Each individual in the local community, regardless of species, has an equal chance of colonizing the open site (the neutrality assumption). Each open site is immediately recolonized so community size remains constant (the zero-sum assumption). The immigrants come from a regional species pool (the metacommunity; Hubbell 2001) that is in a stochastic balance between speciation and extinction. This balance is characterized by the parameter h, a composite of the speciation rate m and metacommunity size JM. Speciation in this model occurs by Ôpoint mutationÕ [in other models Hubbell (2001) uses Ôrandom fissionÕ speciation which is a first step towards modelling allopatric speciation]. This model resembles the continent-island infinite alleles model with Moran (1962)-like reproduction in population genetics (Wright 1931; Moran 1962; Ewens 1972); the difference with Moran (1962) reproduction is that the individual that dies does not produce any offspring that 2005 Blackwell Publishing Ltd/CNRS 1148 R. S. Etienne and D. Alonso could replace it. We note that the terminology ÔcontinentislandÕ is only historical; the theory also applies to a local sample from a continuous landscape. Hubbell’s (2001) model has been heavily criticized, mostly because of its neutrality assumption. But even if this assumption turns out to be untenable, we should not reject the theory completely, as this would be throwing out the baby with the bath water. It is now realized that the neutral model is the appropriate null model with which other models containing more processes should be compared. Hubbell (2001) thus effectively introduced Ockham’s razor to community ecology, i.e. the maxim that science should aim at finding the minimal set of processes that can satisfactorily explain observed phenomena. However, less attention has been given to the fact that Hubbell (2001) put dispersal at the top of this minimal set. In the present study, we argue that dispersal is just as ubiquitous as sampling effects and can even be framed in the same mathematical setting. While Hubbell (2001) presented analytical results for his model without dispersal limitation (m ¼ 1) because these were already known in population genetics (Ewens 1972; Karlin & McGregor 1972), he provided only simulation results for the biologically more interesting case with dispersal limitation (m < 1). This made it difficult to test accurately whether the neutral model can explain observed diversity patterns, such as the species-abundance distribution, better or worse than other community models (McGill 2003). Recently, however, analytical results for the case m < 1 have been found, along two distinct lines of research. These lines of research study the problem from the two perspectives that result from the duality of the theory (Etienne & Olff 2004b) with respect to time: forwards- and backwards-in-time. The forwards-in-time perspective uses a master equation approach with a Markovian description of states and transitions (McKane et al. 2000, 2004; Vallade & Houchmandzadeh 2003; Volkov et al. 2003; Alonso & McKane 2004). This has resulted in exact analytical expressions and various approximations for the Ôexpected number of species with a certain abundanceÕ in a sample of J individuals from a dispersal-limited local community: if n is the abundance, then E[Sn|h, m, J] denotes the expected number of species with this abundance in this sample. Vallade & Houchmandzadeh (2003) and subsequent studies used the shorthand notation of Æ/næ or S(n) for this expectation, but we employ the longer notation to emphasize that this is an expectation that follows from the model in contrast to the actually observed number of species with abundance n, which we will denote by Un as in Etienne (2005). The expected number of species with a certain abundance is the classical approach to study commonness and rarity in community ecology and also a very useful tool in exploring the 2005 Blackwell Publishing Ltd/CNRS behaviour of community models. However, it cannot be used to obtain accurate estimates of the model parameters. The backwards-in-time perspective takes a genealogical, coalescent-type approach where community members are traced back to the ancestors that once immigrated into the community (Etienne & Olff 2004a,b; Etienne 2005). This line has resulted in an analytical expression for the Ôjoint multivariate probability of observing S species with abundancesÕ n1, n2,…,nS in a sample of J individuals from the ~ i.e. local community. Let us denote this collection by D, ~ ¼ ðn1 ; n2 ; . . . ; nS Þ. The joint multivariate probability is D ~ m; J , which can be used in thus the likelihood P½Djh; maximum likelihood estimation of model parameters from species-abundance data (Etienne 2005) or other methods based on the likelihood (Etienne & Olff 2005), but is less useful for studying the behaviour of the model. Because both lines of research work on the same model and have provided exact analytical results, they must somehow be related, but until now the common framework has not been made explicit. In the present study, after presenting the basic results of the two lines of research, we build such a framework. Its most important property is the sampling nature of the theory and the role that dispersal plays in it. We introduce new distributions, called the dispersal-limited binomial and dispersal-limited hypergeometric distributions by which the results of both lines of research arise naturally. As a result we find that the expression for E[Sn|h, m, J] for finite metacommunity size, as reported by Vallade & Houchmandzadeh (2003) is incorrect. An important consequence is that it is not possible to estimate metacommunity size and hence the speciation rate from species-abundance data, as was suggested based on this formula (Alonso & McKane 2004, p. 904). Next, we link the two lines of research by expressing results of one line of research in terms of the other and vice versa, by making use of the concept of subsamples. Most of our results are summarized in Table 1. We end with a discussion of our results that tries to open new doors to further development of neutral as well as non-neutral theories in community ecology and population genetics. RESULTS OF THE TWO LINES OF RESEARCH No dispersal limitation Without dispersal limitation (m ¼ 1), E[Sn|h, J] is given by (Moran 1958, Watterson 1974 and Vallade & Houchmandzadeh 2003): E½Sn jh; J ¼ h Cð J þ 1Þ CðJ þ h nÞ n Cð J þ 1 nÞ CðJ þ hÞ ð1Þ The multivariate probability distribution is given by the Ewens sampling formula (Ewens 1972) A dispersal-limited sampling theory 1149 Table 1 Overview of the analytical results for the species-abundance distribution of a local sample in neutral community theory Quantity JM ﬁ ¥ JM < ¥ m¼1 E[Sn|h, J] R1 Phyp ½nj j; JM ; J E Sj jh; JM QS PJM Phyp ½ni jj;JM ; J E ½Sj jh; JM ¼ i¼1 j¼1 QJ Pbin ½njx; J XðxÞdx QS R 1 P ½n jx; Ji XðxÞdx i¼1 0 bin i QJ ¼ 0 ! P½D jh; J j¼1 m<1 E[Sn|h, m, JM, J] R1 0 i¼1 P½D jh; m; JM ; J ¼ QJ j¼1 ¼ Uj ! PJM j¼1 i¼1 0 Uj ! DL Phyp ½njm; j; JM ; J E Sj jh; JM QS R1 ! DL ½ni jm; x; Ji b Pbin X½xjh; m; D iþ1 dx 0 j¼1 j¼1 DL Pbin ½njm; x; J XðxÞdx QS R1 ! Uj ! PJM ! DL ½ni jm; x; Ji b Pbin X½xjh; m; D iþ1 dx QJ j¼1 Uj ! Let the entire metacommunity consist of JM individuals and let the sample consist of J individuals of S different species with abundances ! ! n1, n2, … ,nS. Let us denote this sample by D , i.e. D ¼ ðn1 ; n2 ; . . . ; nS Þ; Uj is the number of species in the sample that have abundance j. The model parameters are the fundamental biodiversity number h, which is a measure of the regional diversity, and the fundamental dispersal number I. The immigration probability m is a function of I, see eqn 8, m ¼ I þ IJ 1. The quantities E[Sn|h, J] and E[Sn|h, m, JM, J] represent the expected number of species with abundance n in the cases without dispersal limitation (I ¼ ¥, i.e. m ¼ 1) and with dispersal limitation (I < ¥, i.e. m < 1) respectively, according to the neutral model. X(x)dx, where X(x) is given by eqn 21, is the number of species with relative abundance between x and x + dx in the ! b metacommunity (regional species pool); X½xjh; m; D iþ1 dx is a modified version of that, see eqn 39. The probabilities ! ! P½D jh; J and P½D jh; m; J represent the joint multivariate probability of observing S species with abundances n1, n2, … ,nS in a sample of J individuals, again for the cases without and with dispersal limitation respectively. Pbin[n|x, J], Phyp[n|j, JM, J], DL DL Pbin ½njm; x; J and Phyp ½njm; j; JM ; J are the binomial, hypergeometric, dispersal-limited binomial and dispersal-limited hypergeometric distributions respectively, given in eqns 15, 20, 24 and 28. These four distributions are the distributions by which the expressions for the regional species-abundance distribution must be weighed to obtain the expressions for the local sample. The binomial distribution Pbin[n|x, J] and the hypergeometric distribution Phyp[n|j, JM, J] are the limits of the disDL persal-limited hypergeometric distribution Phyp ½njm; j; JM ; J for m ﬁ 1 in the cases JM ﬁ ¥ and JM < ¥ respectively. ! P½D jh; J ¼ QS i¼1 ni hS ðhÞJ j¼1 Uj ! J! QJ ð2Þ where Uj is the observed number of species with abundance j, as we noted above, and (h)J is the Pochhammer symbol defined as ðhÞJ ¼ J J Y Cðh þ J Þ X ¼ ðh þ i 1Þ ¼ sðJ ; jÞhj CðhÞ i¼1 j¼1 ð3Þ In Pochhammer notation, eqn 1 becomes even more compact: h ð J þ 1 nÞn n ð J þ h nÞn With dispersal limitation (m < 1) and metacommunity size JM tending to infinity, E[Sn|h, m, J] is given by Vallade & Houchmandzadeh (2003) and Alonso & McKane (2004): Z 1 h J ð1 xÞh1 E½Sn jh; m; J ¼ dx ðIxÞn ½I ð1 xÞJ n ðI ÞJ n 0 x ð6Þ where C(x) is the gamma function and s ( j, k) is the socalled unsigned Stirling number of the first kind. We will frequently use the last two equalities in our formulas below. We also note that s ( j, 1) ¼ C( j) ¼ ( j ) 1)!. Below we will also frequently use the definition of the beta function: Z 1 CðaÞCðbÞ ¼ Bða; bÞ ¼ x a1 ð1 xÞb1 dx ð4Þ Cða þ bÞ 0 E½Sn jh; J ¼ Dispersal limitation ð5Þ Note that JM does not enter eqns 1 and 2, except by its role in h. Below, we make this more explicit. where, we used notation of Etienne (2005) for later com parison. Here, Jn is the usual binomial coefficient, J! J ð7Þ ¼ n!ð J nÞ! n and I is a transformed immigration parameter, m I ¼ ð J 1Þ ð8Þ 1m The parameter I is called l in Vallade & Houchmandzadeh (2003) and c in Alonso & McKane (2004), while Ix is called k in Volkov et al. (2003). I is related to the immigration probability m and local community size J as the fundamental biodiversity number h is related to the speciation probability m and metacommunity size JM (Vallade & Houchmandzadeh 2003; Alonso & McKane 2004; Etienne 2005), 2005 Blackwell Publishing Ltd/CNRS 1150 R. S. Etienne and D. Alonso m ðJM 1Þ ð9Þ 1m In analogy to h, we will call I the Ôfundamental dispersal numberÕ. Vallade & Houchmandzadeh (2003) derived a different expression for E[Sn|h, m, JM, J] for finite metacommunity J M: h i X I JMj I 1 JMj JM J n J n E ðSn jh; m; JM ; J ¼ n j¼1 ðI ÞJ E Sj jh; JM h¼ ð10Þ We will show below that this expression is incorrect (hence the *), and that the expression for E[Sn|h, m, JM, J] for finite JM is also given by eqn 6. This important finding that JM only enters the formulae through h, see eqn 9, will be discussed later. The joint multivariate probability distribution for m < 1 is given by a new sampling formula (Etienne 2005) ! P½D jh; m; J ¼ QS i¼1 ni J hS X ! IA K ðD ; AÞ ðI ÞJ A¼S ð hÞ A j¼1 Uj ! J! QJ ð11Þ ! Here, the K ðD ; AÞ for A ¼ S, … ,J are coefficients fully determined by the data, being defined as X ! K ðD ; AÞ ¼ fa1 ;...;aS j PS a ¼Ag i¼1 i S Y s ðni ; ai Þs ðai ; 1Þ s ðni ; 1Þ i¼1 ð12Þ In Appendix A (see Supplementary Material) we show that eqn 11 can also be written in integral notation Z 1 ! J! hS P½D jh; m; J ¼ QS ... QJ ðI ÞJ 0 j¼1 Uj ! i¼1 ni " # Z 1Y S ð1 xi Þh1 ðIi xi Þni dx1 . . . dxS xi 0 i¼1 THE SAMPLING NATURE OF THE NEUTRAL THEORY The essential difference between the actual distribution of species abundances in the whole community and the observed abundance distribution in samples was already recognized by Fisher et al. (1943), and addressed by using Poisson random sampling (Pielou 1969; Bulmer 1974) and, more recently and in a fully exact way, by using hypergeometric random sampling (Dewdney 1998). In population genetics, it was immediately acknowledged that the Ewens sampling formula represents a theory where such sampling effects are fully taken into account (hence the name). However, it has not been emphasized enough in community ecology that this is also true for Hubbell’s (2001) extension of the theory that includes dispersal limitation. In this section, we emphasize this by building a single sampling framework that contains the previous expressions that come from the two separate lines of research. A particular property of our model formulation is the invariance of the formulae under hypergeometric sampling (drawing without replacement), i.e. if we take a subsample of size J2 from a sample of size J1 ( J1 > J2), then the formulae for the subsample are identical to those for the sample when we simply substitute J2 for J1. The mathematical formulation is as follows. We first define the hypergeometric distribution as Phyp ½nj j; J1 ; J2 ¼ J1 j J2 n ð15Þ J1 J2 which is the probability of sampling n individuals of a species in a subsample of size J2 given that there are j individuals of this species in the sample of size J1. More generally, given a sample of size J1 that contains S1 species with abundances j1, … , jS1, the probability of drawing a subsample of size J2 with abundances n1, … ,nS1 (some of which may equal 0) is given by Q S1 j i ! ! i¼1 ni ð13Þ Phyp ½D2 j D1 ; J1 ; J2 ¼ ð14Þ where D1 ¼ ðj1 ; . . . ; jS1 Þ and D2 ¼ ðn1 ; . . . ; nS1 Þ with some of the ni equalling 0 if S2 < S1. Invariance under sampling then means where Ii ¼ I j n ! i 1 Y ð 1 xk Þ k¼1 Equation 13 provides a way to avoid Stirling numbers in computing the multivariate probability, e.g. by Monte Carlo integration. This will, however, be very computationally intensive for a large number of species S. QJWe also note that eqns 2 and 11 must be multiplied by U! j¼1 j if the species are labelled in some way because their S! identity matters (Johnson et al. 1997, chapter 41). 2005 Blackwell Publishing Ltd/CNRS E ½Sn jh; m; J2 ¼ ð16Þ J1 J2 ! J1 X Phyp ½nj j; J1 ; J2 E Sj jh; m; J1 ð17aÞ j¼n ! P½D2 jh; m; J2 ¼ X ! ! ! ! Phyp ½D2 j D1 ; J1 ; J2 P½D1 jh; m; J1 D1 ð17bÞ A dispersal-limited sampling theory 1151 where the sum in the second line is over all distinct data sets ! D1 that have size J1. No dispersal limitation When there is no dispersal limitation, a local community is a simple sample from the metacommunity. Then we have eqn 17a with J1 ¼ JM and J2 ¼ J; hence E ½Sn jh; J ¼ JM X Phyp ½nj j; JM ; J E Sj jh; JM Pbin ½njx; J XðxÞdx ð19Þ where Pbin[n|x, J] is the binomial distribution (drawing with replacement), J n Pbin ½njx; J ¼ x ð1 x ÞJ n ð20Þ n and hð1 xÞh1 x ð23Þ J ðIx Þn ðI ð1 x ÞÞJ n n ðI ÞJ ð24Þ 0 DL Pbin ½njm; x; J ¼ 0 XðxÞ ¼ DL Pbin ½njm; x; J XðxÞdx where For infinite metacommunity size JM this can also be written as E ½Sn jh; J ¼ E ½Sn jh; m; J ¼ Z1 ð18Þ j¼1 Z1 JM). We will derive an expression for the corresponding distribution. We first consider a metacommunity of infinite size. Let us write eqn 6 as (see also Table 1) ð21Þ is the abundance distribution in the infinite metacommunity (Ewens 1972; Alonso & McKane 2004; see also Table 1). We remark that the binomial distribution is the limit of the hypergeometric distribution for infinite metacommunity size (in which case there is no difference between sampling with and without replacement). Equations 18 and 19 are identical for finite JM as well: they both lead to eqn 1, the former due to the sampling nature of the theory expressed in eqn 17a, the latter by recognizing the beta distribution in the integrand and writing factorials as gamma functions: Z 1 J hð1 xÞh1 E ½Sn jh; J ¼ dx x n ð1 x ÞJ n n 0 x Cð J þ 1Þ CðnÞCðh þ J nÞ ¼h Cðn þ 1ÞCð J n þ 1Þ C ðh þ J Þ h Cð J þ 1Þ Cðh þ J nÞ ¼ n Cð J n þ 1Þ Cðh þ J Þ ð22Þ Dispersal limitation With dispersal limitation, the local community is no longer a simple hypergeometric sample from the metacommunity. It is a dispersal-limited hypergeometric sample (which is dispersal-limited binomial for infinite and X(x) is given by eqn 21. Equation 24 was first calculated in the context of a stochastic model of community dynamics based on the community matrix (McKane et al. 2000; Solé et al. 2000), and then applied to the context of neutral community ecology (Volkov et al. 2003; McKane et al. 2004). It also appears in a similar model in population genetics (Wakeley & Takahashi 2004). Mathematically, it is known as the negative hypergeometric distribution which is a special case of the Pólya-Eggenberger distribution which in turn is a special case of the unified hypergeometric distribution (Johnson et al. 1997, chapters 39 and 40). In eqn DL 23, Pbin ½njm; x; J must be interpreted as the probability for a dispersal-limited species of relative abundance x in the metacommunity (with infinite size) to be represented by exactly n individuals in a sample of size J (McKane et al. DL 2004). Our notation of Pbin ½njm; x; J refers to the fact that eqn 24 is the dispersal-limited binomial distribution; it becomes the binomial distribution (eqn 20) as m ﬁ 1 (Alonso & McKane 2004). We can generalize eqn 24 to QS ! ! J! i¼1 ðIi xi Þni DL Pbin ½D1 jm; D2 ; J ¼ ð25Þ n1 ! . . . nS ! ðI ÞJ ! where, Ii is given by eqn 14 and D2 is a vector of relative abundances xi. This provides an alternative derivation of eqn 13; this is most easily done with the Ôlabelled-speciesÕ form of eqn 11. For finite metacommunity size the analogue of the DL dispersal-limited binomial distribution Pbin will be called the DL dispersal-limited hypergeometric distribution Phyp . Here, we derive an expression for this distribution. We follow the second line of research in tracing back individuals in a sample from the local community to their ancestors that once immigrated into that local community (Etienne & Olff 2004b). These ancestors represent a sample from the metacommunity and thus obey all the formula we have presented for the case m ¼ 1. We only need to establish the link between the current sample and this sample of ancestors. Let the sample of ancestors contain A ancestors. Its probability distribution is also governed by the Ewens 2005 Blackwell Publishing Ltd/CNRS 1152 R. S. Etienne and D. Alonso The dispersal-limited hypergeometric distribution is therefore a sum of the product of the three probabilities given in eqns 15, 26 and 27 over all possible values of A and a: DL Phyp ½njm; j; JM ; J ¼ J P ½ AjmðI Þ; J ¼ n J X n X E[Sn | m, θ , J] Let there be a ancestors of the species under consideration. The probability of finding a ancestors of this species, given that there are j individuals of this species in the metacommunity, is the hypergeometric distribution Phyp[a|j, JM, A] of eqn 15. The probability that a ancestors have n descendants among the J individuals in our dispersal-limited sample is computed as follows. From combinatorics it is known that there are s (J, A) partitions of J individuals into A groups (each group containing at least one individual). For example, if J ¼ 4 and A ¼ 3, the possible partitions are (a, b, cd), (a, bc, d), (ab, c, d), (ac, b, d), (ad, b, c) and (a, bd, c). Likewise there are s (n, a) partitions of n individuals into a groups and s (J ) n, A ) a) partitions of the remaining J)n individuals into A ) a groups. There are Jn ways of choosing n out of J individuals. Likewise, there are Aa ways of choosing a out of A ancestors. The probability P[n|a, A, J] that n individuals in our local community sample descend from exactly a ancestors in our metacommunity sample is given by Wakeley (1999) J s ðn; a Þ s ð J n; A a Þ ð27Þ P ½nja; A; J ¼ An s ð J ; AÞ a Eq. 6 Eq. 10 (JM = 10 6) 40 30 20 10 0 0 2 3 10 10 4 10 Eq. 6 Eq. 10 (JM = 10 4) 30 20 10 0 0 1 10 10 2 10 Abundance 3 10 4 10 Figure 1 Example of the difference in expected number of species between the exact result (eqn 6) and the approximation (eqn 10) by Vallade & Houchmandzadeh (2003) for two different values of metacommunity size. The parameter values used are h ¼ 50 and m ¼ 0.5. Local community size is J ¼ 20 000. Particularly the diversity of species with low abundances is underestimated with eqn 10. The lower and upper boundaries of the abundance classes are such that abundance class i contains all abundances n for which 2i)1 £ n < 2i. distribution (eqn 28), we can write the analogue of eqn 23 for finite JM (see also Table 1): E ½Sn jh; m; JM ; J ¼ JM X DL Phyp ½njm; j; JM ; J E Sj jh; JM ð29Þ j¼1 s ðn; a Þs ð J n; A a Þ A¼1 a¼1 A I 1 A Phyp ½aj j; JM ; A ðI ÞJ a ð28Þ For m ﬁ 1, I becomes infinite and only the term A ¼ J and a ¼ n contribute to the sum, so eqn 28 becomes Phyp[n|j, JM, J], because s (n, n) ¼ 1. For JM ﬁ ¥, the hypergeometric distribution Phyp[a|j, JM, A] becomes the binomial with parameter x ¼ j=JM and the remaining sums in terms of Stirling numbers and powers of x can be written as Pochhammer symbols resulting in eqn 24. So, the new dispersal-limited hypergeometric distribution has the right limit behaviour. For any value of JM, when m tends to 1, it tends to the random hypergeometric sampling distribution. When JM tends to infinity, for any value of m, it tends to the dispersal-limited binomial distribution. With the new 2005 Blackwell Publishing Ltd/CNRS 10 40 P ½nja; A; J Phyp ½aj j; JM ; A A¼1 a¼1 X J X n 1 10 E[Sn | m, θ , J] sampling formula, with parameter I (Etienne & Olff 2004b; see Wakeley 1998 for similar equation in population genetics): IA P ½ AjmðI Þ; J ¼ s ð J ; AÞ ð26Þ ðI ÞJ When we compare this to the result of Vallade & Houchmandzadeh (2003) given in eqn 10, we see that these expressions are different in general, being only equal for infinite JM for which we have eqn 23. The expression of Vallade & Houchmandzadeh (2003) given in eqn 10 is incorrect, because it is not invariant under hypergeometric sampling. In fact, it corresponds to an approximate discretization of the exact integral result (eqn 6) and only converges to eqn 6 when JM tends to infinity (see Appendix B). In Fig. 1 we show that eqn 10 converges to the exact result (eqn 6) when JM is large enough, but substantially deviates from it for lower values of JM. As in the case without dispersal limitation, the expressions (eqns 23 and 29) for infinite and finite metacommunity size JM are identical, as we shown in Appendix C (see also Table 1). The dispersal-limited hypergeometric distribution can be generalized to A dispersal-limited sampling theory 1153 h! i ! J! DL Phyp D1 jm; D2 ; JM ; J ¼ n1 ! . . . nS ! " # ! J X nS 1 n1 S 1 S 1 S 1 X Y X X X ... s ðni ; ai Þ s J ni ; A ai A¼1 a1 ¼1 aS 1 ¼1 i¼1 i¼1 I A a1 ! . . . aS ! ! ! Phyp ½ a j j ; JM ; A A! ðI ÞJ i¼1 ð30Þ which leads to eqn 11 when applied to a sample from the metacommunity [which is governed by the (Ôlabelled-speciesÕ form of the) Ewens sampling formula (eqn 2)]. While eqn 28 has a parallel expression in population genetics (Wakeley 1999), its generalization (eqn 30) is, to our knowledge, entirely new. The subsample approach In this section, we relate the expected number of species, eqns 1 and 6, to the corresponding multivariate probability distributions, eqns 2 and 11. First, we examine whether eqns 2 and 11 can be expressed in terms of eqns 1 and 6, respectively, for the observed values n1, … ,nS. This does not only show the link between the two types of expressions (from two lines of research), but it has practical importance as well, because the expected number of species with a particular abundance is usually easier to obtain (using the master equation approach) than the multivariate probability distribution. We need the concept of subsamples. First, we note that ! P½D jH; J ¼ P½n1 ; . . . ; nS jH; J can, like every multivariate probability, be written as ! P½D jH; J ¼ P ½n1 ; . . . ; nS jH; J ¼ P ½n1 jH; J P ½n2 jn1 ; H; J . . . P ½nS jn1 ; . . . ; nS 1 ; H; J ð31Þ where Q represents the model parameters [h or (h, m)]. Equation 31 just follows from the definition of conditional probabilities. The first term in eqn 31, P [n1|Q, J], is the probability of a species in a sample of size J to have exactly abundance n1. The second term in eqn 31, P [n2|n1, Q, J], is the probability of a species in sample size of size J to have exactly abundance n2 given that another species in the sample has abundance n1. This probability is equivalent to the probability of a species in sample of size J ) n1 to have exactly abundance n2. It can therefore be expressed as P ½n2 jn1 ; H; J ¼ P ½n2 jH; J n1 ð32Þ We call the sample size J ) n1 the effective sample size for species 2. More generally, we can define the effective sample size Ji for species i as Ji ¼ J i1 X ð33Þ nk k¼1 This definition implies, for instance that J1 ¼ J, JS ¼ nS and JS+1 ¼ 0. For later convenience, we define the partial data sets ! Di ¼ ðni ; . . . ; nS Þ ! D1 ð34Þ ! DS ! ¼ D and ¼ nS . We further define Uni as entailing the number of species with abundance ni in the subsample ! Di . With the definitions in eqn 33, eqn 31 becomes ! P½D jH; J ¼ S Y P ½ni jH; Ji ð35Þ i¼1 In Appendix D we show that this leads to the following expressions (see also Table 1): QS ! E ½Sni jh; Ji P½D jh; J ¼ i¼1 ð36Þ QJ j¼1 Uj ! and QS ! b ½Sni jh; m; Ji E QJ j¼1 Uj ! i¼1 P½D jh; m; J ¼ ð37Þ with b ½Sni jh; m; Ji ¼ E Z1 ! DL b Pbin ½ni jm; x; Ji Xðxjh; m; D iþ1 Þdx 0 ð38Þ DL where Pbin ½ni jm; x; Ji is defined in ! b Xðxjh; m; D iþ1 Þ is defined by ! ! b Xðxjh; m; D iþ1 Þ ¼ Xðx ÞF xjh; m; D iþ1 eqn 24 and ð39Þ ! with X(x) given eqn 21 and F ðxjh; m; D iþ1 Þ defined in equation (D-7) in Appendix D. Comparing eqns 23 and 38 we can interpret eqn 38 as having an abundance distribution X(x) that is!modified by a factor that takes into account the subsample D iþ1 . We further note that eqns 36 and 37 are even simpler when species are labelled: then there is only S ! in the denominator. We also note that eqns 1 and 6 can be derived from the multivariate probability distributions (eqns 2 and 11) using the equality E ½Sn jH; J ¼ J X Un P ½Un jH; J ð40Þ Un ¼0 where P[Un|h, J] is the probability that exactly Un species with abundance n are observed. This is a sum over all possible data sets that have Un species with abundance n: 2005 Blackwell Publishing Ltd/CNRS 1154 R. S. Etienne and D. Alonso E ½Sn jH; J ¼ J X Un ¼0 Un X ! D jUn ! P½D jH; J ð41Þ In Appendix E we show that with help of the subsample concept this indeed leads to eqns 1 and 6. Watterson (1974) already provided alternative derivations for the mathematically identical model in population genetics when m ¼ 1. However, no such derivations have been given for the case with dispersal limitation. DISCUSSION We have presented previously obtained results of neutral community theory in a general framework where the dispersal-limited sampling nature of the theory plays a central role. We have summarized our results in Table 1. For the first time in neutral community ecology, the main results of two lines of research – E[Sn|h, m, J], the expected number!of species with abundance n in a sample of size J, and P½D jh; m; J , the joint multivariate probability of observing S species with abundances n1, n2,…,nS in a sample of size J – have been presented together and related to one another. In the case without dispersal limitation (m ¼ 1), ! P½D jh; J can even be expressed in terms of E[Sni|h, Ji] ! using subsamples D i , whereas in the case with dispersal limitation, this expression must be somewhat modified, but has a similar form. Also, we have derived E[Sn|h, m, J] and ! E[S|h, m, J] from P½D jh; m; J . Although this has been derived in the mathematically identical theory in population genetics for the case without dispersal limitation, the derivation for the case with dispersal limitation is given here for the first time. Relating expected values to multivariate distributions is important because it is much easier to write and solve for stationarity dynamical onedimensional models involving expected values (McKane et al. 2000, 2004; Vallade & Houchmandzadeh 2003) than it is for their corresponding multivariate distributions. However, we emphasize that precisely these exact multivariate sampling distributions taken as likelihood functions are actually needed to perform maximum likelihood estimation of model parameters (Etienne 2005) and sound statistical model comparisons (Etienne & Olff 2005). Moreover, our sampling framework has enabled us to show that the sampling distributions are valid for a metacommunity of any size JM. In other words, two samples of equal size from two metacommunities of different sizes JM, 1 and JM, 2 are characterized by exactly the same sampling distributions, as long as both metacommunities are described by the same biodiversity number (h1 ¼ h2). This has not been emphasized in previous work. This is important for two reasons. First, an already existing expression E[Sn|h, m, JM, J] when JM is finite (Vallade & 2005 Blackwell Publishing Ltd/CNRS Houchmandzadeh 2003) turns out to be incorrect. Alonso & McKane (2004), assuming Vallade & Houchmandzadeh (2003) to be correct, suggested that species-abundance data can be used to estimate the metacommunity size and hence the speciation rate m because h : ¼ mðJ1Mm1Þ (Vallade & Houchmandzadeh 2003; Alonso & McKane 2004; Etienne 2005). The independence of metacommunity size that we have shown in the present study, however, implies that this is not possible. Second, as metacommunity size does not matter, we can safely assume infinite metacommunity size, which simplifies our formulae, because we can use binomial sampling instead of hypergeometric sampling. We want to stress, however that it is invariance under hypergeometric sampling that provided the basis for our sampling theory. Thus, mathematically, our formulas are valid for any JM. Nevertheless, we need to remember the model assumption of separation of spatiotemporal scales: a local scale with immigration as the source of new species vs. a regional metacommunity scale with speciation as the source of new species. We cannot, therefore, choose any size JM we want; we need to require that JM J. This assumption allows us to safely ignore speciation at the local level, and to assume that local dynamics are much faster than regional dynamics, so the metacommunity composition does not change appreciably when the ancestors are sampled (which occurs at different instances). The assumption JM J is biologically very realistic, because, within our framework, J is the sample size that is in practice much lower than the metacommunity size. We already noted that sampling effects have been recognized since Fisher et al. (1943). However, other stochastic models of communities do not (fully) take this into account (Volkov et al. 2003; He 2005), or impose Poisson sampling afterwards (Engen & Lande 1996a,b, Dewdney 2000; Diserud & Engen 2000). This makes comparison of different models difficult, even in the latter case, because the expressions may be conditioned differently. Some (implicitly) assume the number of sampled species S and others assume the number of sampled individuals J, as do our formulas. For a correct comparison, we need to condition on both (Etienne & Olff 2005). Neutral community theory as formulated by Hubbell (2001) can be seen as an extension of EwensÕ (1972) theory into the ecological arena. This extension is far from trivial because Hubbell’s (2001) main intuition is that, in addition to neutral (or ecological) drift, it is dispersal limitation that is the leading factor structuring ecological communities. All recent theoretical advances in neutral community theory based on Hubbell’s (2001) formulation can now be translated back to population genetics to extend EwensÕ (1972) work as Ôa dispersal-limited sampling theory of selectively neutral allelesÕ. With the dispersal-limited sampling distributions introduced in this work, we can not only examine whether a certain allelic polymorphism is maintained neutrally, but we can also easily A dispersal-limited sampling theory 1155 estimate the amount of dispersal limitation (or degree of isolation) of the locality where this allelic polymorphism comes from. It also enables computation of the ages of alleles in dispersal-limited populations. Concerning the evolutionary age of species (or, equivalently, species time-to-extinction), the neutral theory has been strongly criticized for yielding unrealistically old species (Lande et al. 2003; Nee 2005). However, this finding may depend more on other model assumptions than on the assumption of neutrality. For instance, Nee’s (2005) estimates of species ages are based on EwensÕ (1972) equilibrium model for fixed community size with h ﬁ 0 and m ¼ 1. Griffiths & Lessard (2005) recently presented a formula for any value of h that makes species ages already a few orders of magnitude smaller. Species ages might also be appreciably different if dispersal limitation is taken into account. Furthermore, non-equilibrium dynamics and fluctuations in community size may substantially affect effective community size and thereby the time scales of species origination. Also, even if species ages are better explained by non-neutral processes at evolutionary time scales, such as ecological succession (a process involving ecologically nonequivalent species interacting through non-neutral processes such as facilitation and hierarchical competition), the final mature community that we observe today may still be consistent with neutral dynamics. In sum, the use of species ages to falsify the neutral theory is rather premature. A stronger test of neutrality than the goodness-of-fit of a single species-abundance distribution is a test whether two local communities that are both dispersal-limited hypergeometric samples from the same metacommunity, but are separated by a known distance have the (dis)similarity in their species-abundance distributions that one would expect from neutrality. We believe that our sampling framework is able to provide such a test in principle. As the distance between the local communities obviously matters, a spatially explicit model seems to be unavoidable, but perhaps the spatially implicit model with appropriately chosen parameters may be used as a proxy that captures the essence. In any case, this is a difficult task mathematically, but one that merits further study. Ideas in population genetics involving Ôisolation by distanceÕ (e.g. Wakeley & Aliacar 2001) may provide fruitful starting points. We have expressed the local community as a sample from the larger regional metacommunity, a sample which may or may not be affected by dispersal limitation. In our expressions the metacommunity is purely regulated by speciation and extinction, and thus governed by the Ewens sampling formula, but this is not necessary. Our dispersallimited hypergeometric distribution can also be applied to metacommunities that are structured according to other, even non-neutral, rules. Although at the local community level the dynamics is neutral, any differences in species abundances because of (non-neutral) metacommunity structure propagate to this local level. This allows for a dispersallimited sampling theory for non-neutral communities. A more exact but more challenging approach would be to replace the dispersal-limited hypergeometric distribution of eqns 28 and 30 that assume local neutrality by a new dispersal-limited distribution that takes into account, at the local level, the same non-neutral factors controlling abundances in the metacommunity. This can potentially be done in essentially the same formalism we have presented here (possibly following suggestions in the population genetics literature (e.g. Wakeley & Takahashi 2004; Slade & Wakeley 2005). Our expressions are however, good approximations that are fully in line with the model assumptions on the time scale discussed above. The picture that emerges is thus: species and niche assembly originate through evolutionary time shaping species abundances on the regional, long temporal scale. The very spatially extended nature of ecological systems involves dispersal limitation on the local and short temporal scale. So, if a particular locality is sampled, we will always have some degree of dispersal limitation in addition to other factors determining species abundances at the metacommunity level. The current challenge is to develop a dynamic community theory that can quantify the relative importance of dispersal limitation vs. other, neutral or non-neutral, factors determining species abundances through evolutionary time. We strongly believe that our dispersal-limited sampling theory provides the basis for such a unifying theoretical framework. ACKNOWLEDGEMENTS Authors thank three anonymous referees, John Wakeley, Jérôme Chave and Han Olff for very constructive comments. D.A. thanks the support of the James S. McDonnell Foundation through a Centennial Fellowship to Mercedes Pascual. REFERENCES Alonso, D. & McKane, A.J. (2004). Sampling Hubbell’s neutral theory of biodiversity. Ecol. Lett., 7, 901–910. Brown, J.H. & Kodric-Brown, A. (1977). Turnover rate in insular biogeography: effect of immigration on extinction. Ecology, 58, 445–449. Bulmer, M.G. (1974). On fitting the Poisson lognormal distribution to species-abundance data. Biometrics, 30, 101–110. Dewdney, A.K. (1998). A general theory of the sampling process with applications to the Ôveil lineÕ. Theor. Popul. Biol., 54, 294–302. Dewdney, A.K. (2000). A dynamical model of communities and a new species-abundance distribution. Biol. Bull., 35, 152–165. Diserud, O.H. & Engen, S. (2000). A general and dynamic species abundance model, embracing the lognormal and the gamma models. Am. Nat., 155, 497–511. 2005 Blackwell Publishing Ltd/CNRS 1156 R. S. Etienne and D. Alonso Engen, S. & Lande, R. (1996a). Population dynamic models generating the lognormal species abundance distribution. Math. Biosci., 132, 169–183. Engen, S. & Lande, R. (1996b). Population dynamic models generating the species abundance distributions of the Gamma type. J. Theor. Biol., 178, 325–331. Etienne, R.S. (2005). A new sampling formula for neutral biodiversity. Ecol. Lett., 8, 253–260. Etienne, R.S. & Olff, H. (2004a). How dispersal limitation shapes species – body size distributions in local communities. Am. Nat., 163, 69–83. Etienne, R.S. & Olff, H. (2004b). A novel genealogical approach to neutral biodiversity theory. Ecol. Lett., 7, 170–175. Etienne, R.S. & Olff, H. (2005). Bayesian analysis of speciesabundance data: assessing the relative importance of dispersal and niche-partitioning for the maintenance of biodiversity. Ecol. Lett., 8, 493–504. Ewens, W.J. (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol., 3, 87–112. Fisher, R.A., Corbet, A.S. & Williams, C.B. (1943). The relation between the number of species and the number of individuals in a random sample of an animal population. J. Anim. Ecol., 12, 42–58. Griffiths, R.C. & Lessard, S. (2005). EwensÕ sampling formula and related formulae: combinatorial proofs, extensions to variable population size and applications to ages of alleles. Theor. Popul. Biol. (in press). Grinnell, J. (1922). On the role of the accidental. Auk, 39, 373– 380. Hanski, I. (1983). Coexistence of competitors in patchy environment. Ecology, 64, 493–500. He, F.L. (2005). Deriving a neutral model of species abundance from fundamental mechanisms of population dynamics. Funct. Ecol., 19, 187–193. Hubbell, S.P. (1997). A unified theory of biogeography and relative species abundance and its application to tropical rain forests and coral reefs. Coral Reefs, 16, S9–S21. Hubbell, S.P. (2001). The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, Princeton, NJ, USA. Johnson, N.L., Kotz, S. & Balakrishnan, N. (1997). Discrete Multivariate Distributions. Wiley, New York, NY, USA. Karlin, S. & McGregor, J. (1972). Addendum to a paper of W. Ewens. Theor. Popul. Biol., 3, 113–116. Lande, R., Engen, S. & Saether, B.-E. (2003). Stochastic Population Dynamics in Ecology and Conservation. Oxford Series in Ecology and Evolution. Oxford University Press, Oxford, UK. Levins, R. & Culver, D. (1971). Regional coexistence of species and competition between rare species. Proc Natl Acad Sci U S A, 68, 1246–1248. Loreau, M. & Mouquet, N. (1999). Immigration and the maintenance of local species diversity. Am. Nat., 154, 427–440. MacArhur, R.H. & Wilson, E.O. (1967). Island Biogeography. Princeton University Press, Princeton, NJ, USA. McGill, B.J. (2003). A test of the unified neutral theory of biodiversity. Nature, 422, 881–885. McKane, A.J., Alonso, D. & Solé, R.V. (2000). A mean field stochastic theory for species rich assembled communities. Phys. Rev. E, 62, 8466–8484. 2005 Blackwell Publishing Ltd/CNRS McKane, A.J., Alonso, D. & Solé, R.V. (2004). Analytic solution of Hubbell’s model of local community dynamics. Theor. Popul. Biol., 65, 67–73. Moran, P.A.P. (1958). Random processes in genetics. Proc Camb Philol Soc, 54, 60–71. Moran, P.A.P. (1962). Statistical Processes of Evolutionary Theory. Clarendon Press, Oxford, UK. Nee, S. (2005). The neutral theory of biodiversity: do the numbers add up? Funct. Ecol., 19, 173–176. Pielou, E.C. (1969). An Introduction to Mathematical Ecology. Wiley, New York, NY, USA. Slade, P.F. & Wakeley, J. (2005). The structured ancestral selection graph and the many-demes limit. Genetics, 169, 1117–1131. Solé, R.V., Alonso, D. & McKane, A.J. (2000). Scaling in a network model of multispecies communities. Physica A, 286, 337–344. Tilman, D. (1994). Competition and biodiversity in spatially structured habitats. Ecology, 75, 2–16. Vallade, M. & Houchmandzadeh, B. (2003). Analytical solution of a neutral model of biodiversity. Phys. Rev. E, 68, 061902. Volkov, I., Banavar, J.R., Hubbell, S.P. & Maritan, A. (2003). Neutral theory and relative species abundance in ecology. Nature, 424, 1035–1037. Wakeley, J. (1998). Segregating sites in Wright’s island model. Theor. Popul. Biol., 53, 166–175. Wakeley, J. (1999). Non-equilibrium migration in human history. Genetics, 153, 1863–1871. Wakeley, J. & Aliacar, N. (2001). Gene genealogies in a metapopulation. Genetics, 159, 893–905; Corrigendum in Genetics 160, 1263 (2001). Wakeley, J. & Takahashi, T. (2004). The many-demes limit for selection and drift in a subdivided population. Theor. Popul. Biol., 66, 83–91. Watterson, G.A. (1974). Models for the logarithmic species abundance distribution. Theor. Popul. Biol., 6, 217–250. Wright, S. (1931). Evolution in Mendelian populations. Genetics, 16, 97–159. SUPPLEMENTARY MATERIAL The following supplementary material is available for this article from http://www.Blackwell-Synergy.com: Appendix A Derivation of eqn 13. Appendix B The relation of the approximation (eqn 10) to the exact result (eqn 6). Appendix C Proof of the equality of eqns 23 and 29. Appendix D Derivation of eqns 36 and 37. Appendix E Derivation of eqns 1 and 6 from eqns 2 and 11. Appendix F A historical note on the origins of the binomial and hypergeometric distributions. Editor, Jerome Chave Manuscript received 11 May 2005 First decision made 20 June 2005 Second decision made 11 July 2005 Manuscript accepted 12 July 2005