1. General Information



On this page the test protein sets are collected. The test sets are useful for two purposes:

  • to demonstrate how HH-MOTiF works
  • to measure and compare the performance of HH-MOTiF

All the sets produce reproducible output, which matches the presented samples, in standard mode.

2. ELM database



Major performance tests of HH-MOTiF were conducted on the ELM database as of 26.03.2016. Only the motifs with instances at least 3 proteins were considered. Some metrics of the resulting dataset are provided below:

  • 176 motifs
  • 1,677 unique proteins
  • 2,022 proteins gross
  • 1,452,618 total residues gross
  • 17,909 motif residues gross

3. Selected protein sets



TRG_LysEnd_APsAcLL_3

This set contains an ELM motif responsible for the sorting and internalisation signals directing type I transmembrane proteins from the cell surface or TGN to the lysosomal-endosomal compartment (more information here). It consists of 3 non-related proteins of approximately the same length from different organisms and represents an 'easy case'. This motif is recovered by HH-MOTiF almost perfectly with residue-wise F1 of 0.927. This set is used as the default sample set.

LIG_SH3_3

This set contains an ELM motif involved in protein-protein interaction mediated by SH3 domains (more information here). It consists of 12 proteins of diverse length. This motif is harder to find, as it represents a low complexity region (with only conserved residues being Prolines) in the surrounding of other - not motif-containing - low complexity regions. Nevertheless, the motif is getting partially recovered by HH-MOTiF with a moderate residue-wise F1 of 0.279. Although there is no straightforward filter implemented, HH-MOTiF ignores the majority of low complexity regions that do not belong to motifs. The reason behind the specificity is that HH-MOTiF also takes into account the surrounding amino acid context as well as the number of proteins with occurrences.

LIG_AP_GAE_1

This set contains an ELM motif responsible for the interaction with gamma-ear domains (more information here). It consists of 7 proteins, some of which being close homologs. The major difficulty for identifying this motif is, however, its similarity to low complexity regions (D/E-rich). Nevertheless, the advanced filtering algorithm of HH-MOTiF makes it possible to locate this motif with fair accuracy (residue-wise F1 of 0.582).

LIG_EF_ALG2_ABM_2

This set contains an ELM motif responsible for the interaction with apoptosis-linked gene 2 (ALG-2) protein (more information here). It consists of 3 proteins, 2 of which have strongly pronounced low complexity nature. Nevertheless, smart homology filtering of HH-MOTiF allows for almost perfect (residue-wise F1 of 0.865) recognition of this motif without outputting false positives.

LIG_Actin_WH2_1

This set contains an ELM motif responsible for the interaction with actin (more information here). It contains a short α-helix at its N-terminus followed by a disordered region, and in several cases also by the additional conserved pattern T.[DE]...P. This motif is quite long and stands out from the surrounding sequence. However, this ELM set contains only 6 proteins forming only 2 non-homologous groups, which makes the recognition more challenging. HH-MOTiF locates this motif with almost ideal accuracy (residue-wise F1 of 0.965).