title title
       Home        Search     Help     Download    
site_map
Flowchart of the analysis used to identify unannotated microRNAs

The pipeline begins by selecting the precursor sequences of all microRNAs from a reference species (Human, Mouse, Rat, Chicken, X. tropicalis, Zebrafish, Fugu, Drosophila and C. elegans), which are not annotated in at least one of the species analyzed, as assessed from the analysis of the miRBase and EnsEMBL databases. We then ran BLASTN against the genomic sequence of the microRNA-missing species. In case of BLAST sequence hits satisfying the thresholds of: length > 70 nt, percentage id (%) > 70% and Evalue (E) < 0.01, we ran another BLASTN using the detected hit as query against the human genome sequence (reciprocal Blast). If this analysis revealed a best hit corresponding to the starting microRNA precursor sequence, we ran a third BLASTN analysis to compare the hit sequence with the mature sequence of the starting microRNA. If the resulting percentage identity, Evalue and the length passed the thresholds (respectively, l > 20 nt, (%) > 90% and (E) < 0.01), we performed a secondary structure prediction of the new microRNA by using the UNAFold software. If at least one predicted structure has a free energy value lower than -20 and has a hairpin with a free energy value which is near the average free energy value of the hairpin of the homologous, already annotated microRNAa (plus or minus the standard deviation), the microRNA is labeled as a High Confidence Prediction (HC) otherwise it is labeled as Low Confidence Prediction (LC) Both are stored into the database but only the HC will be considered in the further analysis. We tested specificity (98%) and sensitivity (76%) of our methods using the entire set of human microRNAs against the mouse genome.