Hi everyone,
I write to report a bug in the ses.pd function. As you may know, the arguments of ses.pd are passed through the function pd, which means that one can specify whether to include the distance from the most recent common ancestor of the species and the root of the tree (MRCA – root distance). The default setting (in the pd function) is "include.root = TRUE". However, I have noted that ses.pd computes null PD values without including the MRCA – root distance regardless of the logical value that is specified in the include.root argument. This is, if include.root = TRUE (default = TRUE), the observed PD will include the MRCA – root distance, but the null PD values will not. For example:
require(picante)
data(phylocom)
phylocom$sample -> Sample
phylocom$phylo -> Tree
prune.sample(Sample,Tree)->Tree_F
Sample <- Sample[,Tree_F$tip.label]
set.seed(12345) # fix the seed to make the analyses reproducible
ses.pd(Sample,Tree_F,null.model="taxa.labels",runs=999,include.root=FALSE) ->
No_root_1 # the MRCA – root distance should be excluded from all the computations
set.seed(12345) # back to the same seed
ses.pd(Sample,Tree_F,null.model="taxa.labels",runs=999,include.root=TRUE) ->
root_1 # the MRCA – root distance should be included in all the computations
No_root_1
root_1
A quick look to the results reveals that as expected, the observed PD values between "No_root_1" and "root_1" are different for communities "clump1" and "clump2a" (the only two communities where the root of the tree is not traversed by the minimum spanning path connecting the species in the phylogeny), because in "No_root_1" the include.root argument was set to FALSE. HOWEVER, the descriptors of the null distributions in "No_root_1" and "root_1" are identical, and this is because the null PD values are computed without including the MRCA – root distance in all cases regardless of the logical value that is specified in the include.root argument.
The good news are that this anomaly only significantly affects communities with low species richness (less than four species). For more details, you may take a look to the following link, which includes a preprint with detailed information on the issue (a step by step tutorial) along with a simulation exercise (which informs about the communities that may be affected) and two alternatives to fix the problem (all the R code is fully available in the Supplementary Material of the preprint): https://www.biorxiv.org/content/10.1101/579300v1
Rafael Molina Venegas
Hi everyone,
I write to report a bug in the ses.pd function. As you may know, the arguments of ses.pd are passed through the function pd, which means that one can specify whether to include the distance from the most recent common ancestor of the species and the root of the tree (MRCA – root distance). The default setting (in the pd function) is "include.root = TRUE". However, I have noted that ses.pd computes null PD values without including the MRCA – root distance regardless of the logical value that is specified in the include.root argument. This is, if include.root = TRUE (default = TRUE), the observed PD will include the MRCA – root distance, but the null PD values will not. For example:
require(picante)
data(phylocom)
phylocom$sample -> Sample
phylocom$phylo -> Tree
prune.sample(Sample,Tree)->Tree_F
Sample <- Sample[,Tree_F$tip.label]
set.seed(12345) # fix the seed to make the analyses reproducible
ses.pd(Sample,Tree_F,null.model="taxa.labels",runs=999,include.root=FALSE) ->
No_root_1 # the MRCA – root distance should be excluded from all the computations
set.seed(12345) # back to the same seed
ses.pd(Sample,Tree_F,null.model="taxa.labels",runs=999,include.root=TRUE) ->
root_1 # the MRCA – root distance should be included in all the computations
No_root_1
root_1
A quick look to the results reveals that as expected, the observed PD values between "No_root_1" and "root_1" are different for communities "clump1" and "clump2a" (the only two communities where the root of the tree is not traversed by the minimum spanning path connecting the species in the phylogeny), because in "No_root_1" the include.root argument was set to FALSE. HOWEVER, the descriptors of the null distributions in "No_root_1" and "root_1" are identical, and this is because the null PD values are computed without including the MRCA – root distance in all cases regardless of the logical value that is specified in the include.root argument.
The good news are that this anomaly only significantly affects communities with low species richness (less than four species). For more details, you may take a look to the following link, which includes a preprint with detailed information on the issue (a step by step tutorial) along with a simulation exercise (which informs about the communities that may be affected) and two alternatives to fix the problem (all the R code is fully available in the Supplementary Material of the preprint): https://www.biorxiv.org/content/10.1101/579300v1
Rafael Molina Venegas