-
Notifications
You must be signed in to change notification settings - Fork 2
Description
While everybody uses a stitch distance of 12.5 kb, the rationale for that value isn't really clear. The original paper says:
We noted that there were often closely spaced enriched regions with very high signal, and we wished to capture that whole span as a single region. Thus, we further combined the constituent enhancers that occurred within 12.5kb of each other into a single larger enhancer domain. We used a distance of 12.5kb based on an analysis of the enriched regions in murine ESCs. In that dataset we found that a distance of 12.5kb was optimal for stitching together the closely spaced enriched regions with very high signal while not being so large as to stitch together the more widely spaced regions with lower signal.
It's not clear what this analysis actually entailed. Presumably, it was some maximization of high signal regions to background/low signal regions, but this is something that will naturally decrease as stitch distance is increased. Perhaps the minimum decrease in this ratio past a certain point? So proceeding with like 250 bp steps, perform stitching, calculate mean/median stitched region length and mean/median length of overlapping constituents. Divide the mean constituent overlap by mean stitched region length to get a coverage fraction, then get the difference of that at each step compared to the last step. And then I guess plot that and see how it looks, presumably selecting the minimum difference.
This analysis was also done with Med1 TF ChIP-seq, which is more focal than H3K27ac. It'd be nice to optimize this parameter for different histone marks and cell types.
ESCs are also a rather unique cell state given how broadly accessible their genomes are. I expect this ratio differs between cell types or states, particularly in differentiated cells. This may result in over/under-calling of SEs in many circumstances.