Nature’s big news: unprecedented precision, major sequencing breakthrough

Scientists finally have the ability to study genetic changes in any human tissue with unprecedented precision.

Recently, Iñigo Martincorena and his team from the Sanger Institute in the UK published an important study in the journal Nature in which they proposed a new single-cell sequencing technology, NanoSeq, that dramatically improves the accuracy of sequencing to an unprecedented “less than 5 errors per billion base pairs measured “.

It is important to know that this error rate is even much lower than the somatic cell mutation frequency. This will make it possible to detect somatic variants more accurately in any tissue or cell [1].

More importantly, this study not only invented a new sequencing technology, but based on this technology, the researchers also discovered that errors generated by DNA replication during cell division may not be the main cause of mutations. This finding challenges the notion that cell division is the primary mechanism driving mutations in genes.

The new technology is also expected to make it easier to study the effects of carcinogens on healthy cells in the future.

Screenshot of the paper’s front page

Biological differences between people are mainly caused by differences in DNA sequence between individuals (and some may be epigenetically induced), and changes in DNA sequence are called mutations. In general, there are two types of variation – germ cell variation and somatic cell variation. The difference between them is that germ cell mutations can be passed on to the next generation, while somatic mutations are not inherited.

Somatic mutations are the main cause of cellular cancer and may also be associated with aging and other diseases such as neurodegenerative diseases.

As we age, cells in our bodies slowly accumulate somatic variants, but most of them are present in only a few cells in a few tissues or even in a single cell, which means that detection of somatic variants is difficult.

Currently DNA sequencing is the dominant modality for variant detection, and somatic variants are generally detected using single-cell sequencing, but current techniques either have too high an error rate or can only detect non-terminally differentiated cells that are more actively mitotic, whereas most cells in adults are terminally differentiated [3].

Most cancers begin with a small number of somatic variants in a given tissue, but current DNA sequencing technologies are still not accurate enough to precisely identify somatic variants associated with a small number of cellular carcinomas.

In recent years, to improve the accuracy of sequencing at the single cell (single molecule) level, scientists have proposed the use of duplex sequencing, which sequences both strands of DNA separately. The errors in such sequencing mainly arise from single-strand pairing errors introduced during PCR amplification and sequencing.

However, since DNA double strands are complementary, duplex sequencing can circumvent sequencing errors by comparing the sequencing results of two complementary DNA single strands and eliminating those variants that occur in only one strand. Therefore, the theoretical error rate of double sequencing is 1 per billion base pairs.

Principle of double sequencing

BotSeqS using double sequencing was proposed by Margaret L Hoang et al. in 2016 [4]. However, this study found that in practice, errors generated during postback and sequencing library preparation (mostly caused by end repair and nick extension) resulted in an error rate of 200 bases per billion base pairs for BotSeqS.

To circumvent mispairing introduced by end-repair and nick extension during library preparation, NanoSeq uses restriction endonucleases (instead of sonication) to break DNA duplexes so that the resulting flat ends do not require end-repair. In contrast, for gaps created in the duplex, NanoSeq introduces a dideoxynucleotide at the end of the gap to terminate the gap extension.

Principle of NanoSeq

This study also developed a bioinformatics analysis process for reducing sequencing read segment postback errors. Ultimately, the error rate of NanoSeq was reduced to less than 5 bases per billion base pairs.

The researchers next used NanoSeq to detect somatic variants in a variety of different tissues and cells.

It has been suggested that stem cells do not tend to carry a large number of somatic variants, but that stem cells may develop variants during differentiation and proliferation, so that terminally differentiated cells carry more somatic variants.

However, after sequencing hematopoietic stem cells and granulocytes, and colonic stem cells and colonic epithelial cells, respectively, the researchers found that the number of variants in stem cells was surprisingly not statistically significantly different from that of terminally differentiated cells. Moreover, most of the mutations in terminally differentiated cells accumulated at the stem cell stage, and only a small number of mutations occurred during proliferation and differentiation.

This suggests that errors arising from DNA replication during cell division may not be the main cause of mutations.

Comparison of the number of mutations in hematopoietic stem cells and granulocytes (b) and the number of mutations in colonic stem cells and colonic epithelial cells (d)

Most of the cells in the adult body do not divide again, and some of them are even non-renewable, such as cardiomyocytes and neurons. Such cells do not divide and their somatic mutation spectrum has not been obtained in previous studies.

So do these cells that do not undergo DNA replication have somatic mutations, and how do these mutations arise?

The researchers sequenced precortical neurons from healthy individuals and AD patients and found that the number of somatic variants increased with age (point mutations increased by 17.1/year, 13.7-20.5; insertional deletion mutations increased by 2.5/year, 1.7-3.3).

Moreover, by comparison, the mutation spectrum was found to be similar in neurons and granulocytes, as well as in smooth muscle cells from the bladder and colon examined later. This suggests that there is some mechanism contributing to mutation formation in such non-dividing cells as well, and that the mutation frequencies are not very different between different types of somatic cells.

Comparison of mutations in neurons from healthy subjects and AD patients

The researchers hypothesize that the somatic mutations in neurons are caused by methylation, which leads to the mutation of cytosine deamidation to thymine, and that this mutation is fixed by subsequent DNA repair.

Interestingly, the researchers also found that insertion-deletion mutations larger than one base pair in neurons were highly enriched in highly expressed genes, a phenomenon previously found in cancer genomes [5]. It may explain why age is associated with neurodegenerative diseases.

Overall, this double sequencing-based technology achieves unprecedented precision at the single-cell (single-molecule) level – less than 5/billion base pairs – allowing us to study somatic variants in any tissue cell.

Using NanoSeq we initially learned that DNA replication and cell division, which have long been considered the main sources of somatic cell variation, do not significantly increase the number of variants in cells after division. Somatic cell variation accumulates gradually over time, once again confirming its relevance to aging and cancer.

In the future, due to its high precision and non-invasive nature, NanoSeq might be used to study the risk of inducing mutations in vitro, such as the risk of cancer. If this can be achieved, research in related fields may move to a new stage.