Select Page

The only cousin statistics that acknowledge the differences in paternal and maternal relatives due to recombination rates.

The average recombination rate in mothers is about 42. Conversely, genomes in fathers only recombine about 27 times, on average. This leads to a conclusion that’s intuitive to geneticists: More recombination decreases variance, leading to narrower ranges in shared DNA for maternal relatives. Less recombination results in more variance, which is why fully or predominantly paternal relatives can share a much wider range of DNA. This phenomenon has been blogged about by Graham Coop.

I’ve developed an autosomal DNA model. It doesn’t rely on any mathematical tricks to, for example, take bad data and then stretch, compress, or otherwise manipulate them in order to reconcile the data with the peer-reviewed literature or in order to calculate shared DNA for multiple cousin relationships. It’s a natural model. What I mean by that is that it simulates the processes that DNA goes through in real life. That includes being separated into two copies of 22 chromosomes of known lengths, recombining based on the known maternal and paternal recombination rates sampled from Poisson distributions, and recombining based on the known maternal and paternal rates per chromosome. The model also includes crossover interference. So far, recombination hotspots or jungles aren’t simulated and they likely won’t be unless required to improve accuracy. But one can see from the standard deviations below that the model is already very accurate.

It can compute the averages and ranges for any relationship, multiple cousin relationship, or combination of kits to calculate DNA coverage.

It isn’t possible for the model to produce the wrong averages of shared DNA unless the simulation user introduced an error. One example of this would be if the simulation were coded to compare oneself to an aunt when the intention was to compare to a half-sibling. It’s fairly difficult to make an error like that, but one should employ strict quality control to ensure it doesn’t happen. The means aren’t very sensitive to the number of trials. I generally do 500,000 trials per simulation, but to get means that are off by a tenth of a percentage point for say, full- or half-siblings, one would have to decrease the number of trials to about 2,000.

You can judge the accuracy of a shared DNA chart or table by the known standard deviations of some of its data points. Veller et al. (2019, 2020) have calculated standard deviations between paternal grandparents/grandchildren, paternal half-siblings, full-siblings, and maternal grandparents/grandchildren. They’ve calculated these for the genomic metric (bp), which represents the amount of base pairs that two people actually share, and for the genetic linkage metric (centiMorgan), which shows what they would share as reported by a direct-to-consumer genotyping platform. While my correspondence with geneticists has revealed that they prefer the bp metric, I’m reporting genetic linkage results below for users of genotyping platforms.

I’m currently updating this page with the most recent results. Data resulting from the new changes will be shown below, then a clear separation will be made, and then data resulting from an older version of the model will be shown below that.

If you want to see cM ranges for a particular platform, please click one of the links below:


Table 1. Shared DNA between siblings. Standard deviations for relatives for which values are available in the literature to compare to are given one extra decimal point here to show how closely they approximate known values.

It’s hard to say which is a bigger advantage for this method of computing shared DNA averages and ranges, that it’s the most accurate method or that it can compute any combination of relatives. The latter function is illustrated below, as the model easily computes any type of 3/4 sibling or double first cousin.

Table 2. Results for shared DNA between six different types of 3/4 siblings. HIR = ‘half-identical regions,’ where one of the two chromosome homologues matches. FIR = ‘fully-identical regions,’ where both copies of a chromosome match. HIR + FIR = all of the points on chromosomes where two people match once plus all of the points where they match on both copies. HIR counting includes FIR bp, but only counts them as if they’re half-identical.

Table 3. Results for double first cousins. All parameters are the same as for Tables 1-2.

Table 4. Results for grandparents and some of their descendants. All parameters are the same as for Tables 1-3.

Table 5. Results for descendants of grandparents, continued, for half-relationships. All parameters are the same as for Tables 1-4.

Table 6. Results for great-grandparents and some of their descendants.

Table 7. Results for second cousins. I was surprised to see that some second cousins may not share any DNA.

Results below are for the old model parameters.

Table 8. Results for 2nd great-grandparents.

Table 9. Results for 3rd great-grandparents.

I hope you’ve found these results useful. More will be on the way.

Feel free to ask me about modeling & simulation, genetic genealogy, or genealogical research. To see my articles on Medium, click here. And try out a nifty calculator that’s based on the first of my three genetic models. It lets you find the amount of an ancestor’s DNA you have when combined with various relatives. And most importantly, check out these ranges of shared DNA percentages or shared centiMorgans, which are the only published values that match known standard deviations.