Detecting sequence signals in targeting peptides using deep learning (2024)

Abstract

In bioinformatics, machine learning methods have been used to predict features embedded in the sequences. In contrast to what is generally assumed, machine learning approaches can also provide new insights into the underlying biology. Here, we demonstrate this by presenting TargetP 2.0, a novel state-of-the-art method to identify N-terminal sorting signals, which direct proteins to the secretory pathway, mitochondria, and chloroplasts or other plastids. By examining the strongest signals from the attention layer in the network, we find that the second residue in the protein, that is, the one following the initial methionine, has a strong influence on the classification. We observe that two-thirds of chloroplast and thylakoid transit peptides have an alanine in position 2, compared with 20% in other plant proteins. We also note that in fungi and single-celled eukaryotes, less than 30% of the targeting peptides have an amino acid that allows the removal of the N-terminal methionine compared with 60% for the proteins without targeting peptide. The importance of this feature for predictions has not been highlighted before.

Original languageEnglish
Article number201900429
JournalLife Science Alliance
Volume2
Issue number5
Number of pages14
ISSN2575-1077
DOIs
Publication statusPublished - 1 Jan 2019

Access to Document

  • FulltextFinal published version, 3.08 MB

OpenUrl availability

Full text

    Fingerprint

    Dive into the research topics of 'Detecting sequence signals in targeting peptides using deep learning'. Together they form a unique fingerprint.

    View full fingerprint

    Cite this

    • APA
    • Author
    • BIBTEX
    • Harvard
    • Standard
    • RIS
    • Vancouver

    Armenteros, J. J. A., Salvatore, M., Emanuelsson, O., Winther, O., Von Heijne, G., Elofsson, A. (2019). Detecting sequence signals in targeting peptides using deep learning. Life Science Alliance, 2(5), Article 201900429. https://doi.org/10.26508/lsa.201900429

    Armenteros, Jose Juan Almagro ; Salvatore, Marco ; Emanuelsson, Olof et al. / Detecting sequence signals in targeting peptides using deep learning. In: Life Science Alliance. 2019 ; Vol. 2, No. 5.

    @article{909568c1cf73473597b9c066b6b4cee0,

    title = "Detecting sequence signals in targeting peptides using deep learning",

    abstract = "In bioinformatics, machine learning methods have been used to predict features embedded in the sequences. In contrast to what is generally assumed, machine learning approaches can also provide new insights into the underlying biology. Here, we demonstrate this by presenting TargetP 2.0, a novel state-of-the-art method to identify N-terminal sorting signals, which direct proteins to the secretory pathway, mitochondria, and chloroplasts or other plastids. By examining the strongest signals from the attention layer in the network, we find that the second residue in the protein, that is, the one following the initial methionine, has a strong influence on the classification. We observe that two-thirds of chloroplast and thylakoid transit peptides have an alanine in position 2, compared with 20% in other plant proteins. We also note that in fungi and single-celled eukaryotes, less than 30% of the targeting peptides have an amino acid that allows the removal of the N-terminal methionine compared with 60% for the proteins without targeting peptide. The importance of this feature for predictions has not been highlighted before.",

    author = "Armenteros, {Jose Juan Almagro} and Marco Salvatore and Olof Emanuelsson and Ole Winther and {Von Heijne}, Gunnar and Arne Elofsson and Henrik Nielsen",

    year = "2019",

    month = jan,

    day = "1",

    doi = "10.26508/lsa.201900429",

    language = "English",

    volume = "2",

    journal = "Life Science Alliance",

    issn = "2575-1077",

    publisher = "Life Science Alliance",

    number = "5",

    }

    Armenteros, JJA, Salvatore, M, Emanuelsson, O, Winther, O, Von Heijne, G, Elofsson, A 2019, 'Detecting sequence signals in targeting peptides using deep learning', Life Science Alliance, vol. 2, no. 5, 201900429. https://doi.org/10.26508/lsa.201900429

    Detecting sequence signals in targeting peptides using deep learning. / Armenteros, Jose Juan Almagro; Salvatore, Marco; Emanuelsson, Olof et al.
    In: Life Science Alliance, Vol. 2, No. 5, 201900429, 01.01.2019.

    Research output: Contribution to journalJournal articleResearchpeer-review

    TY - JOUR

    T1 - Detecting sequence signals in targeting peptides using deep learning

    AU - Armenteros, Jose Juan Almagro

    AU - Salvatore, Marco

    AU - Emanuelsson, Olof

    AU - Winther, Ole

    AU - Von Heijne, Gunnar

    AU - Elofsson, Arne

    AU - Nielsen, Henrik

    PY - 2019/1/1

    Y1 - 2019/1/1

    N2 - In bioinformatics, machine learning methods have been used to predict features embedded in the sequences. In contrast to what is generally assumed, machine learning approaches can also provide new insights into the underlying biology. Here, we demonstrate this by presenting TargetP 2.0, a novel state-of-the-art method to identify N-terminal sorting signals, which direct proteins to the secretory pathway, mitochondria, and chloroplasts or other plastids. By examining the strongest signals from the attention layer in the network, we find that the second residue in the protein, that is, the one following the initial methionine, has a strong influence on the classification. We observe that two-thirds of chloroplast and thylakoid transit peptides have an alanine in position 2, compared with 20% in other plant proteins. We also note that in fungi and single-celled eukaryotes, less than 30% of the targeting peptides have an amino acid that allows the removal of the N-terminal methionine compared with 60% for the proteins without targeting peptide. The importance of this feature for predictions has not been highlighted before.

    AB - In bioinformatics, machine learning methods have been used to predict features embedded in the sequences. In contrast to what is generally assumed, machine learning approaches can also provide new insights into the underlying biology. Here, we demonstrate this by presenting TargetP 2.0, a novel state-of-the-art method to identify N-terminal sorting signals, which direct proteins to the secretory pathway, mitochondria, and chloroplasts or other plastids. By examining the strongest signals from the attention layer in the network, we find that the second residue in the protein, that is, the one following the initial methionine, has a strong influence on the classification. We observe that two-thirds of chloroplast and thylakoid transit peptides have an alanine in position 2, compared with 20% in other plant proteins. We also note that in fungi and single-celled eukaryotes, less than 30% of the targeting peptides have an amino acid that allows the removal of the N-terminal methionine compared with 60% for the proteins without targeting peptide. The importance of this feature for predictions has not been highlighted before.

    U2 - 10.26508/lsa.201900429

    DO - 10.26508/lsa.201900429

    M3 - Journal article

    C2 - 31570514

    AN - SCOPUS:85072779066

    SN - 2575-1077

    VL - 2

    JO - Life Science Alliance

    JF - Life Science Alliance

    IS - 5

    M1 - 201900429

    ER -

    Armenteros JJA, Salvatore M, Emanuelsson O, Winther O, Von Heijne G, Elofsson A et al. Detecting sequence signals in targeting peptides using deep learning. Life Science Alliance. 2019 Jan 1;2(5):201900429. doi: 10.26508/lsa.201900429

    Detecting sequence signals in targeting peptides using deep learning (2024)

    FAQs

    How do you find the sequence of a peptide? ›

    Liquid chromatography-mass spectrometry (LC-MS) is a common method used to determine peptide sequences due to its ease of use and high-throughput workflows.

    How to find signal peptides? ›

    Signal peptides are found in proteins that are targeted to the endoplasmic reticulum and eventually destined to be either secreted/extracellular/periplasmic/etc., retained in the lumen of the endoplasmic reticulum, of the lysosome or of any other organelle along the secretory pathway or to be I single-pass membrane ...

    What is the peptide sequencing method? ›

    Mass spectrometry has either been used in conjunction or largely replaced Edman degradation as the primary technology for peptide sequencing. This is because the innovative process is highly sensitive and fragments long proteins into manageable and quickly sequenced peptides.

    What are signal peptide sequences? ›

    Signal peptides (SP) are short peptides located in the N-terminal of proteins, carrying information for protein secretion. They are ubiquitous to all prokaryotes and eukaryotes.

    How to identify protein sequence? ›

    There are two main methods used to find the amino acid sequences of proteins. Mass spectrometry is the most common method in use today because of its ease of use. Edman degradation using a protein sequenator is the second method, which is most useful if the N-terminus of a protein needs to be characterized.

    How to determine peptide sequence from mass spectrum? ›

    Generally, there are two approaches: database search and de novo sequencing. Database search is a simple version as the mass spectra data of the unknown peptide is submitted and run to find a match with a known peptide sequence, the peptide with the highest matching score will be selected.

    What is the database for signal peptide? ›

    SPdb is a signal peptide database containing signal sequences of archaea, prokaryotes and eukaryotes. The signal-associated data is stored in a MySQL relational database and provided as DNA and protein sequences. FASTA-formatted files containing the sequences are available for download.

    Can peptides be detected? ›

    There are a number of analytical approaches for the detection of peptides, QADs, and β2-agonists with liquid chromatography–mass spectrometry (LC–MS) using electrospray ionization being the preferred technique because of its high sensitivity, selectivity, and throughput.

    How do you predict peptides? ›

    Computational models for prediction of these peptides have been based on a narrow sample of data with an emphasis on the position and chemical properties of the amino acids. In past literature, this approach has resulted in higher predictability than models that rely on the geometrical arrangement of atoms.

    What is target peptide sequence? ›

    There are two types of target peptides directing to peroxisome, which are called peroxisomal targeting signals (PTS). One is PTS1, which is made of three amino acids on the C-terminus. The other is PTS2, which is made of a 9-amino-acid sequence often present on the N-terminus of the protein.

    What is the best method for protein sequencing? ›

    Mass spectrometry offers high sensitivity and precision when conducting N-terminal sequencing. It is applicable to a broad spectrum of proteins and peptides. Limitations: The accuracy of N-terminal sequencing may be affected by the presence of specific amino acids and post-translational modifications.

    What are the different methods of peptide identification? ›

    In mass spectrometry-based proteomics, peptides are typically identified from tandem mass spectra using spectrum comparison. A sequence search engine compares experimentally obtained spectra with those predicted from protein sequences, applying enzyme cleavage and fragmentation rules.

    How to identify signal peptides? ›

    Signal peptides (SPs) are short amino acid sequences in the amino terminus of many newly synthesized proteins that target proteins into, or across, membranes. Bioinformatic tools can predict SPs from amino acid sequences, but most cannot distinguish between various types of signal peptides.

    What is the sequence analysis of peptides? ›

    Usually, MALDI is more frequently used when dealing with large amount of samples. Peptide sequences are then obtained by analyzing the mass spectrum of each of the fragments, which together consist of the full-length protein sequence.

    What is the function of the signal recognition peptide? ›

    The signal recognition particle (SRP) is a ribonucleoprotein particle essential for the targeting of signal peptide-bearing proteins to the prokaryotic plasma membrane or the eukaryotic endoplasmic reticulum membrane for secretion or membrane insertion.

    How do you determine the sequence of amino acids? ›

    The sequence of amino acids can be chemically determined through mass spectrometry or Edman degradation.

    What is the sequence of a polypeptide? ›

    The structure of a polypeptide chain is a linear sequence of amino acids. Each amino acid is connected to the next with a peptide bond. There are twenty different types of amino acids in cells. Each amino acid has a central carbon attached to an amino group, a carboxyl group, and a variable, the R group.

    What is the sequence of a peptide based on the mRNA sequence? ›

    The sequence of the peptide, based on the 5'... UUUUCUUAUUGUCUU 3' mRNA sequence will be Phe-Ser-Tyr-Cys-leu. So, the correct option is - (e) phe-ser-tyr-cys-leu.

    Top Articles
    Latest Posts
    Article information

    Author: Rubie Ullrich

    Last Updated:

    Views: 5983

    Rating: 4.1 / 5 (52 voted)

    Reviews: 91% of readers found this page helpful

    Author information

    Name: Rubie Ullrich

    Birthday: 1998-02-02

    Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

    Phone: +2202978377583

    Job: Administration Engineer

    Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

    Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.