Long-read sequencing is rapidly expanding the catalog of long noncoding RNAs (lncRNAs), revealing thousands with roles in cancer and other diseases. Many disease-associated lncRNAs overlap GWAS loci and can reshape the epitranscriptome, influence chromatin organization, or modulate transcription. This striking diversity of molecular phenotypes underscores their importance but also poses a challenge: unlike protein-coding genes, lncRNAs lack sequence conservation or recognizable domains that would enable a systematic framework for functional classification.
What is missing is a unifying strategy to move beyond case-by-case characterization toward scalable annotation. RNA structure offers such a medium, as all well-characterized non-coding RNAs function through higher-order structures. Conserved helices and motifs provide evolutionary signatures of function even when primary sequence diverges, offering a principled way to prioritize lncRNAs for mechanistic study.
I will present ECSFinder, a new computational framework for identifying evolutionarily conserved RNA structures (ECSs). By integrating thermodynamic folding, background modeling, and covariation analysis within a supervised learning classifier, ECSFinder outperforms existing methods across benchmarks. Applied genome-wide to mammalian alignments, it identifies more than 700,000 ECSs enriched in promoters, untranslated regions, enhancers, and lncRNA exons. These structures overlap disease-associated variants at significantly elevated rates and highlight candidate lncRNAs with therapeutic potential.
Together, these findings establish RNA structure as a unifying axis for lncRNA functional annotation, linking disease association, evolutionary conservation, and epitranscriptomic regulation.