Author summary The most important functional parts of proteins are often small—but very specific—sequence motifs. Moreover, these motifs tend to be strongly conserved during evolution due to their functional role. Nevertheless, when trying to align protein sequences of the same family, it is often very difficult to align such motifs using standard multiple sequence alignment methods. Aligning functional residues correctly is essential to detect motif conservation, which can be used to filter out spuriously occurring motifs. Additionally, many downstream analyses, such as phylogenetics, are strongly reliant on alignment quality. We have developed a sequence alignment program named Motif-Aware PRALINE (MA-PRALINE) that incorporates information about motifs explicitly. Motifs are provided to MA-PRALINE in the PROSITE pattern syntax; it then scans the input sequences for instances of the pattern and provides a score bonus to matching sequence positions. Our method provides a reproducible alternative to editing alignments by hand in order to account for motif conservation, which is a tedious and error-prone process. We will show that MA-PRALINE allows the alignment of motif-rich regions to be fine-tuned while not degrading the rest of the alignment. MA-PRALINE is available on GitHub as open source software; this allows it to be easily tailored to similar problems. We apply MA-PRALINE on the HIV-1 envelope glycoprotein (gp120) to get an improved alignment of the N-terminal glycosylation motifs. The presence of these motifs is essential for the virus in evading the immune response of the host.
Motif-Aware PRALINE: Improving the alignment of motif regions
