MAM is a prediction tool which is developed basing on deep learning algorithms. Majors provide those following several functions. To begin with, the MHC-CNN offer a high accuracy method to predict the affinity between specific MHC molecule and specific peptide. Moreover, we propose a MAM network for generating high-affinity peptides for a specific MHC molecule.

In this work, we propose the Motif Activation Mapping (MAM) network for MHC-I and peptides binding to extract motifs from peptides. Our MAM network is to calculate the contribution scores at each site then generate new peptides by mutating the amino acid with the lowest contribution score. Apart from it, we substitute amino acid randomly according to the motifs for generating peptides with high affinity.

Statistics of different MHC datasets

A statistical histogram of the number of peptides in each individual MHC dataset of our project. The datasets in human (HLA) are generally larger than datasets in animals, especially HLA_A*02:01, which is the largest dataset. In our work, we select datasets from HLA_A*02:01, HLA_A* 02:06, HLA_B*27:05 and Mamu_A1*001:01.

Statistics of datasets with different lengths

A statistical histogram of the number of peptides with different length varied from 7 to 20. The dataset with the length of 9 has the largest amount peptides which is 119997. The second largest dataset is the length of 10 which has 31614 peptides. In our work, we mainly select datasets of these two lengths for training, evaluating and testing.

Proportions of binders vs. nonbinders

The pie chart shows the proportion of binders vs. non-binders in the dataset of our work. Binder indicates the IC50 of the peptide and MHC molecule is less than 500nM (IC50 is an experimental measurement to quantify the binding affinity). Binders take about 40% of the whole dataset according to the figure.