The unit of centiMorgans (cM) tells us how much DNA we share with a relative who, along with ourselves, has also had their DNA genotyped. The most popular charts for determining genetic relationships use cM. It has plenty of benefits and is one of the most used terms/concepts in genetic genealogy. It seems to be used orders of magnitude more than simple “percentage,” which would work just as well or better in more cases.
Percentages or fractions of shared DNA are much more intuitive concepts to genetic genealogy novices and are preferred by scientists as the most natural metric (from correspondence with a prominent, published geneticist). Additionally, people who understand centiMorgans also understand percentages, while the reverse is far from true.
People often compare cM numbers across platforms, as if those numbers mean the same thing on AncestryDNA as they do on GEDmatch. This is not the case. AncestryDNA tests a number of single nucleotide polymorphisms (SNPs) such that the maximum shared cM is 6,700. Conversely, GEDmatch has a maximum of 7,174 cM.
Below is a table that shows the total cM across the largest platforms.
|Site||Half (cM)||Full (cM)|
Comparison of the total number of cM possible across five platforms. FTDNA stands for Family Tree DNA. The second column denotes the amount of cM possible in one copy of the genome, while the third column shows the amount in both copies. Numbers close to these can be found by comparing a kit to itself, a person to their parent, or comparing identical twins, but note that some platforms are going to show only half of the available cM because they show half-identical regions (HIR), by default, and some sites will show the full amount. MyHeritage numbers come from their FAQ page. All numbers except for MyHeritage come from here.
For close relationships, there’s a real difference between GEDmatch and FTDNA. A paternal grandparent could easily share 35% DNA with you. That would be 2,511 cM at GEDmatch or 2,369 cM at FTDNA, a difference of 142 cM. Genetic genealogists have serious arguments over much lower values and people asking for advice are often told that their family has secrets based on differences this high. The differences aren’t as stark between other platforms, but still exist. In order to do an exact comparison of cM from AncestryDNA to a cM value at GEDmatch, one would have to multiply by the fraction 7,174 / 6,950. Essentially, a person is converting from one platform, to percentage, and then to another platform when they do this.
For this reason, percentages are a universal unit, while values in cM aren’t comparable between platforms. It’s no wonder that scientists prefer fractions or percentages of shared DNA over cM.
This brings up an interesting problem. Carefully controlled scientific studies apply the same methods to all of the data involved in the study. But datasets that combine cM from multiple direct-to-consumer genotyping sites have some level of inaccuracy necessarily baked in, as the number of total cM is different for every site. At the very least, one would have to know exactly what proportion of the data came from each platform. And, hopefully, the total cM didn’t change at any of those platforms during the course of data collection.
In my very accurate tables that show ranges of shared DNA, and do so for gender-specific relationships in most cases, I prefer to use percentages. This way, it only takes one step for someone to calculate what cM that would imply at a particular platform. And if I didn’t report my statistics in percentages, it would be hard to decide which platform’s total cM numbers I would use to convert those percentages to cM. Whichever site I chose, people would likely then compare the numbers to other sites without properly converting them.
I admit that percentages for distant relationships are cumbersome. I’d rather see that one of my matches at 23andMe shares 59 cM with me than have it presented as 0.79%. But there’s a higher point at which the benefits of percentages outweigh the benefits of cM. That point could be different for each person, but for me it probably falls somewhere between 1% and 3%, which means that I’d like to see relationships more distant than 2nd cousins reported to me as cM. But, then, I’ll have to decide whether I want to bother making conversions between platforms. Luckily, the differences aren’t as important at those lower levels.
The cM is very useful in providing a “cut-off” to eliminate many potentially false matches. After that cutoff is applied, it’s far more beneficial to use percentages. People may think that using percentages would entail going back to a “dumbed-down” unit, but they would be wrong. Sometimes a simpler solution is actually better, or more accurate, such as in this case. I hope to see more people using percentages in the future.
Cover photo by Annie Spratt. Feel free to ask me about modeling & simulation, genetic genealogy, or genealogical research. To see my articles on Medium, click here. And check out my nifty calculator that’s based on the first of my three genetic models. It lets you find the amount of an ancestor’s DNA you have when combined with various relatives.