Long-noncoding RNAs (lncRNAs) are transcripts of over 200 nt and are involved in the regulation of gene expression. In plants, lncRNAs have been implicated in diverse regulatory processes, including the modulation of chromatin topology, interactions with microRNAs, regulation of alternative splicing, and serving as scaffolds for the assembly of protein complexes. Despite their biological importance, the identification of evolutionarily conserved lncRNAs remains methodologically challenging, mainly due to their limited sequence conservation. Nonetheless, lncRNAs have alternative forms of conservation, such as the preservation of secondary RNA structure1,2, which is crucial for their molecular functionality.
Currently, there is a gap in the knowledge regarding the proportion of lncRNAs with conserved secondary structures within the Brassicaceae clade. In this study, we analyzed the secondary structure conservation of 4,281 annotated lncRNAs in Arabidopsis thaliana. To achieve this, we developed a bioinformatics pipeline to evaluate the secondary structure of these lncRNAs. To identify potential orthologous lncRNAs, we built a whole-genome alignment3 of representative species across the Brassicaceae clade and extracted alignment blocks corresponding to the orthologous positions of A. thaliana lncRNAs. These alignment blocks were consolidated using MAFtools4 to reduce their fragmentation. Finally, we screened the consolidated alignment blocks using a sliding window approach with the RNAz5 tool.
Our pipeline allows us to screen ~50% of Arabidopsis thaliana lncRNAs across 14 species of Brassicaceae spanning ~19 MYA6, and reports that ~10% of them have a signal of RNA secondary conservation, including known lncRNAs such as HID1 and IPS1.