www.fgks.org   »   [go: up one dir, main page]

Academia.eduAcademia.edu
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.14.251538; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Differential methylation as a mediator of COVID-19 susceptibility Sandra Steyaert1, Geert Trooskens1, Joris R. Delanghe2, Wim Van Criekinge3 1 doc.ai Inc. 636 Waverley Street, Palo Alto, CA 94301 2 Department of Diagnostic Sciences, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium 3 Biobix, Department of Data Analysis and Mathematical Modelling, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium Corresponding author: Dr. Sandra Steyaert, PhD (sandra@doc.ai) Abstract The COVID-19 outbreak shows a huge variation in prevalence and mortality on geographical level but also within populations1. The ACE2 gene, identified as the SARS-CoV2 receptor, has been shown to facilitate the viral invasion and people with higher ACE2 expression generally are more severely affected2, 3. As there is a lot of variability in ACE2 expression between individuals we hypothesized that differential DNA methylation profiles could be (one of) the confounding factors explaining this variability. Here we show that epigenetic profiling of host tissue, especially in the ACE2 promoter region and its homologue ACE1, may be important risk factors for COVID-19. Our results propose that variable methylation can explain (part of) the differential susceptibility, symptom severity and death rate for COVID-19. Our findings are a promising starting point to further evaluate the potential of ACE1/2 methylation and other candidates as a predictor for clinical outcome upon SARS-CoV2 infection. Keywords SARS-CoV2, DNA methylation, ACE genes, COVID-19 susceptibility, epigenetic age 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.14.251538; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. List of Abbreviations 3’ UTR 3’ untranslated region 5’ UTR 5’ untranslated region 450K HumanMethylation450 array ADT Androgen deprivation therapy ACE1 Angiotensin-converting enzyme 1 ACE2 Angiotensin-converting enzyme 2 BMIQ Beta mixture quantile CDC Centers for disease control and prevention COVID-19 Coronavirus disease 19 D/I Deletion/Insertion GEO Gene expression omnibus IQR Interquartile range M Methylated intensity r2 squared Spearman correlation coefficient SARS-CoV2 Severe acute respiratory syndrome coronavirus 2 TMPRSS2 Transmembrane protease serine 2 TSS Transcription start site U Unmethylated intensity 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.14.251538; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Main Background In December 2019, a novel human coronavirus now named Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV2) emerged and was responsible for the now global outbreak of potentially severe and fatal atypical pneumonia defined as coronavirus disease 19 (COVID-19)1. As of August 15th, 2020, SARS-CoV2 is responsible for more than 21 million detected COVID-19 cases and at least 762k deaths worldwide4. The outbreak of the COVID-19 pandemic is especially unnerving because it’s hard to predict how the virus will affect any individual person. As symptoms vary a lot between infected persons — if they experience them at all — determining if a person is indeed infected by this novel coronavirus is a surprisingly hard and tricky question to answer without testing5. The Centers for Disease Control and Prevention (CDC) reports that the main symptoms are fever/chills, headache, muscle pain, fatigue, coughing, sore throat and shortness of breath, which can appear somewhere between two and 14 days after exposure to the virus6. Confirmed COVID-19 cases have shown a wide range of reported symptoms, going from mild to severe illness. According to the CDC, approximately 80% of infected people present few to mild symptoms, while others have a more severe manifestation of the disease, with extreme cases relying on a ventilator to breathe. There are a few clear risk factors7, including age and general health status but even within these subgroups, there is a huge range in severity. It has been shown that next to age and health status, also socio-economic and environmental factors8, sex9, and even vitamin D status10 are associated with the immune response and differential susceptibility. Identifying what factors are responsible for the interindividual response to the virus is of utmost importance to aid in the identification of population groups at higher risk and to guide more effective strategies and protective measurements. It should be noted that this difference in individual susceptibility is actually not that unusual for infectious diseases, and thus not unique for COVID-19. The same is true for tuberculosis, malaria and the ‘common’ flu11. The underlying reason, however, is mostly disease-specific due to the fact that the biological pathways involved in the manifestation of the illness often differ. Certain genomic variants play a role in the specific immune response to viral infections. The last weeks, efforts were initiated to look into the role of genetics, and if each person’s unique genetics may affect 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.14.251538; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. their susceptibility and severity to COVID-19. Especially the interplay between the ACE genes and SARSCoV2 has been of great interest. The angiotensin-converting enzyme 2 (ACE2) gene, which been identified as the SARS-CoV2 receptor, has been shown to facilitate in the viral invasion2. While most COVID-19 studies currently focus on ACE2, a Belgian research group very recently released data from 33 European countries where they looked at the gene coding for angiotensin-converting enzyme 1 (ACE1)12. Although ACE2 and ACE1 share only 42% of amino acid identity, they both act as carboxypeptidases to cleave amino acids from the peptides’ carboxyl terminal. The ACE1 enzyme is characterized by a genetic deletion/insertion (D/I) polymorphism in intron 16, which is associated with alterations in circulating and tissue concentrations of ACE. Their results demonstrate that prevalence and mortality of COVID-19 infection correlate with the D allele frequency of the ACE1 D/I polymorphism. Another research group recently has launched a similar hypothesis3. Interestingly, the D allele has proven to be associated with a reduced expression of ACE23. It was already known that expression of ACE2 is significantly increased amongst people who smoke or suffer from diabetes, hypertension, heart diseases and other conditions13, and people with higher ACE2 expression generally seem also more severely affected by SARS-CoV214. Interestingly, this mechanistically implicated ACE2 gene has been shown to be epigenetically regulated. Taken together with the patient characteristics for severe COVID-19 outcome this substantiates an essential role for epigenetic regulation. Results To explore the importance of ACE2 methylation, publicly available DNA methylation data from 5 research studies (~1000 samples) was used. All data originated from blood from “healthy” people and methylation analysis was performed on Illumina Infinium HumanMethylation450 BeadChip (450K) DNA methylation arrays. As in all Illumina methylation assays, methylation values of each probe are expressed as β values, which range from 0 (no methylation) to 1 (fully methylated). Raw β values were preprocessed, cleaned and subsequently normalized. In next step, the 8 probes falling into the ACE2 region were fetched and visualized. Figure 1 displays for each individual 8 probe the corresponding methylation levels per sample. The x- and y-axis show each sample’s age and its normalized β value for that position/probe, respectively. Probes are sorted per genomic position in ACE2 (left to right per row). Two groups are noticeable: (i) the big cloud of samples marked with circles and a lighter color, and (ii) the smaller cloud of darker triangles. This latter group represents data from a cohort of centenarians. Of special interest are the promoter (flanking) regions. Here, TSS200 covers the region from transcription start site (TSS) to -200 nucleotides upstream; TSS1500 represents the region -200 to -1500 nucleotides upstream of TSS. Probes falling into these regulatory regions are colored in orange, red and purple 4 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.14.251538; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. (TSS1500/200 + 1stExon/5’UTR) It is apparent that in general upon aging the average methylation levels gradually decline while the spread increases. However, when only looking at the group of centenarians methylation levels appear higher and show a remarkable higher consistency (Table 1). Figure 1: Chronological age (x-axis) vs normalized β values (y-axis) for 8 methylation probes in ACE2. Subjects older than 85years (here referred to as ‘centenarians’) are marked with a triangle and a darker color. Probes are colored as follows: orange=TSS1500; red=TSS200; purple=1stExon and/or 5’UTR; green=Gene Body; blue=3’UTR. There is a lot of variability in ACE2 expression between individuals and many variants of ACE2 are associated with medical conditions such as diabetes, hypertension and cardiovascular disorders13. Thus, it could be that such predisposing genetic variants may also contribute to the susceptibility and explain the enormous variability in infection rate, symptom development/severity and death rate. But, while definitely plausible, to date, limited associations have been found between ACE2 variants and response to SARS-CoV2 infection. As ACE2 plays a role in the SARS-CoV2 infection pathway and ACE2 expression seems very variable between individuals, differential DNA methylation of ACE2 could be one of the confounding factors explaining (part of) this variability. 5 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.14.251538; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Table 1: Summary statistics of methylation (β) values per probe. The first column shows the squared Spearman correlation coefficient (r2) between methylation and age. The second and third columns display for each probe the median β value as well as the interquartile range (IQR) for the non-centenarian group (°) and the centenarian group (*), separately. r2(β - Age) Median β IQR β cg18458833 -0.41 0.88° / 0.94* 0.07° / 0.02* cg21598868 -0.44 0.76° / 0.83* 0.16° / 0.12* cg18877734 -0.41 0.85° / 0.90* 0.09° / 0.04* cg08559914 -0.40 0.82° / 0.89* 0.09° / 0.06* cg16734967 -0.47 0.62° / 0.69* 0.17° / 0.14* cg05748796 -0.40 0.81° / 0.88* 0.10° / 0.04* cg05039749 0.12 0.10° / 0.10* 0.05° / 0.04* cg23232263 -0.46 0.80° / 0.92* 0.13° / 0.04* Hoffmann and coworkers showed that in addition to ACE2, SARS-CoV2 infection also depends on the host cell factor transmembrane protease serine 2 (TMPRSS2), a cellular protease. While ACE2 is used for initial binding and host cell entry, the spike protein of SARS-CoV2 is primed by TMPRSS2. A process of which the authors demonstrate can be blocked by a clinically proven protease inhibitor15. The TMPRSS2 gene is known to be regulated by androgen16. Interestingly, inhibition of the androgen receptor by for example androgen deprivation therapy (ADT) decreases TMPRSS2 levels17 – the same effect as higher (promoter) methylation – resulting in less viral priming options and thus less viral entry18. DNA methylation is not static. With age, the methylation state of various genes may change19. It is also known that general health status and lifestyle has an impact on the methylation signatures. These changes are quantifiable and serve as a means to determine one’s “epigenetic age” which often differs from the chronological age20. Epigenetic age is a collection or footprint reflecting a combination of a person’s genetic make-up but also his/her history of past external experiences/exposures. A compelling example is the case of super-centenarians (people who live > 100 years old and seem to age very healthy). When looking at their epigenetic age, it is significantly less than their chronological age. In these supercentenarians, the underlying reason is probably a combination of both genetic predisposition and lifestyle. Could differential ACE2 promoter methylation in host tissue be causative to the differential expression that is seen amongst COVID-19 patients? Could variable DNA signatures of ACE2 (and TMPRSS2) explain (at least part of) the differential susceptibility, symptom development/severity and death rate for COVID19? Here, we hypothesize that epigenetic profiling, especially DNA methylation signatures, may indeed 6 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.14.251538; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. permit discovery of population-, age- and gender-related risk factors for COVID-19. The more consistent methylation levels in the centenarian cohort is at least very compelling. Our findings are a promising starting point to further evaluate the potential of ACE2 and TMPRSS2 methylation as a predictor for clinical outcome upon SARS-CoV2 infection. Note that blood DNA represents a mixture of DNA from various leucocyte types. However, due to close contact in the alveoli, methylation patterns found in blood can closely reflect methylation in lung tissue. Other studies focusing on the effect of smoking on DNA methylation indeed found that the effects in blood samples were very similar to the changes in lung tissue21, 22. But to prove any association, follow-up research is needed on respiratory samples from patients. We are now collecting these samples and aim to compare methylation profiles of ACE2, TMPRSS2 and other interesting candidate genes from both COVID-19 negatives and positives (with ranging severity/complications). As the virus continues to spread, more mild cases will arise, and healthcare professionals need to recognize these to accurately portray total numbers of COVID19 infections. Differentiating mild and moderate from severe disease may also help clinicians in more accurately triaging cases who need medical attention and minimize the risks on the population, health systems, and economy. Methods Methylation was measured on Illumina’s 450K methylation array for which raw data was downloaded from the Gene Expression Omnibus (GEO)23 (GSE30870, GSE32149, GSE36064, GSE41169, GSE42861). Details on the individual data sets and characteristics of the study cohort can be found in Table 2 below. A full description of each dataset can be found in the original reference. The Illumina 450K BeadChip measures bisulfite-conversion-based, single-CpG resolution DNA methylation levels for over 480K cytosine sites and covers 96% of CpG islands in the human genome. Unlike the previous platform (27K BeadChip) the Illumina 450K BeadChip includes two distinct probe types, Infinium I (n=135,501) and Infinium II (n=350,076). In the Infinium I type, each CpG site is targeted by two 50bp probes: one for detecting the methylated intensity (M) and one for detecting the unmethylated (U) intensity, whereas for Infinium II types, both the M and U intensity of each CpG site are detected by one single probe using different dye colors (green and red). Methylation values per CpG site are indicated by the β-value which ranges from 0 (no methylation) to 1 (fully methylated) and is computed as β=M/(M+U+α) where α is 100 generally24. Raw β-values were preprocessed in R (v3.6.3) with the RnBeads package (v2.4.0)25. Probes not in CpG context were filtered out as well as probes for which the β values were NA or had low variability (standard deviation < 0.005). β-values remaining probes were next normalized using the Beta MIxture Quantile 7 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.14.251538; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. dilation method (BMIQ)26. In BMIQ, the β-values of type II probes are adjusted into a statistical distribution characteristic of type I probes. After normalization, probes corresponding with ACE2 were fetched and the β-values of each sample were plotted against the subjects’ respective chronological age. Table 2: Details on the used 450K methylation data sets and characteristics of the study cohort. ID DNA origin Platform #samples Median Age (range) GSE30870 Blood PMBC 450K 40 44 (0,100) Peripheral blood mononuclear cells from newborns and centenarians. GSE32149 Blood PMBC 450K 46 15 (3.5,76) Peripheral blood leukocytes from a DNA methylation study of Crohn's disease and ulcerative colitis. There was no significant evidence that disease status affects DNA methylation. GSE36064 Blood PMBC 450K 77 3.1 (1,16) Leukocytes from healthy male children from Children's Hospital Boston consists of peripheral blood leukocyte samples from healthy males GSE41169 Blood WB 450K 94 29 (18,65) Whole blood samples from a Dutch population comprised of schizophrenics and healthy control subject. It turned out that schizophrenia status was not related to DNA methylation. GSE42861 Blood WB 450K 689 54 (17,70) Whole Blood from rheumatoid arthritis patients. Effect of rheumatoid arthritis on DNA methylation was found to be negligible. Description References 1. 2. 3. 4. 5. 6. Zheng, J., SARS-CoV-2: an Emerging Coronavirus that Causes a Global Threat. Int J Biol Sci, 2020. 16(10): p. 1678-1685. Zhou, P., et al., A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature, 2020. 579(7798): p. 270-273. Skarstein Kolberg, E., ACE2, COVID19 and serum ACE as a possible biomarker to predict severity of disease. J Clin Virol, 2020. 126: p. 104350. Dong, E., H. Du, and L. Gardner, An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis, 2020. 20(5): p. 533-534. Xie, J., et al., Clinical Characteristics of Patients Who Died of Coronavirus Disease 2019 in China. JAMA Netw Open, 2020. 3(4): p. e205619. Center of Disease Control and Prevention. Symptoms of Coronavirus. [cited 2020 15 June]; Available from: https://www.cdc.gov/coronavirus/2019-ncov/symptomstesting/symptoms.html. 8 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.14.251538; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. Center of Disease Control and Prevention. People who are at higher risk for severe illness. [cited 2020 15 June]; Available from: https://www.cdc.gov/coronavirus/2019-ncov/need-extraprecautions/people-at-higher-risk.html. Manrai A, Patel C. COVID-19: Mapping communities at risk. [cited 2020 15 June]; Available from: https://www.xy.ai/process/covid-19-mapping-communities-at-risk. Jordan, R.E., P. Adab, and K.K. Cheng, Covid-19: risk factors for severe disease and death. BMJ, 2020. 368: p. m1198. Grant, W.B., et al., Evidence that Vitamin D Supplementation Could Reduce Risk of Influenza and COVID-19 Infections and Deaths. Nutrients, 2020. 12(4). Verhein, K.C., H.L. Vellers, and S.R. Kleeberger, Inter-individual variation in health and disease associated with pulmonary infectious agents. Mamm Genome, 2018. 29(1-2): p. 38-47. Delanghe, J.R., M.M. Speeckaert, and M.L. De Buyzere, The host's angiotensin-converting enzyme polymorphism may explain epidemiological findings in COVID-19 infections. Clin Chim Acta, 2020. 505: p. 192-193. Patel, V.B., et al., Role of the ACE2/Angiotensin 1-7 Axis of the Renin-Angiotensin System in Heart Failure. Circ Res, 2016. 118(8): p. 1313-26. Sawalha, A.H., et al., Epigenetic dysregulation of ACE2 and interferon-regulated genes might suggest increased COVID-19 susceptibility and severity in lupus patients. Clin Immunol, 2020. 215: p. 108410. Hoffmann, M., et al., SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell, 2020. 181(2): p. 271-280 e8. Lucas, J.M., et al., The androgen-regulated protease TMPRSS2 activates a proteolytic cascade involving components of the tumor microenvironment and promotes prostate cancer metastasis. Cancer Discov, 2014. 4(11): p. 1310-25. Ferraldeschi, R., et al., Targeting the androgen receptor pathway in castration-resistant prostate cancer: progresses and prospects. Oncogene, 2015. 34(14): p. 1745-57. Montopoli, M., et al., Androgen-deprivation therapies for prostate cancer and risk of infection by SARS-CoV-2: a population-based study (N = 4532). Ann Oncol, 2020. Bell, C.G., et al., DNA methylation aging clocks: challenges and recommendations. Genome Biol, 2019. 20(1): p. 249. Horvath, S., DNA methylation age of human tissues and cell types. Genome Biol, 2013. 14(10): p. R115. Stueve, T.R., et al., Epigenome-wide analysis of DNA methylation in lung tissue shows concordance with blood studies and identifies tobacco smoke-inducible enhancers. Hum Mol Genet, 2017. 26(15): p. 3014-3027. Gao, X., et al., DNA methylation changes of whole blood cells in response to active smoking exposure in adults: a systematic review of DNA methylation studies. Clin Epigenetics, 2015. 7: p. 113. Edgar, R., M. Domrachev, and A.E. Lash, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res, 2002. 30(1): p. 207-10. Pidsley, R., et al., A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics, 2013. 14: p. 293. Muller, F., et al., RnBeads 2.0: comprehensive analysis of DNA methylation data. Genome Biol, 2019. 20(1): p. 55. Teschendorff, A.E., et al., A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics, 2013. 29(2): p. 18996. 9 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.14.251538; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Availability of data and materials The datasets analyzed during the current study are available in the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) under accession numbers GSE30870, GSE32149, GSE36064, GSE41169 and GSE42861. Competing interests The authors declare that they have no competing interests. Authors' contributions SS pre-processed, analyzed and interpreted the methylation data and wrote the manuscript with biological and technical insight from WVC. Additional validation was done by GT. GT and JRD reviewed the analysis and final text. WVC oversaw the work and was a major contributor in writing the manuscript. All authors read and approved the final manuscript. Acknowledgements We thank Dr. Adriaan Verhelle (Scripps Research, La Jolla, CA, USA) for valuable discussions, comments and help with figure layout. 10