Phage display is a versatile and powerful technology to find ligands for any given target. These targets can be a wide variety of substances, such as small molecules, proteins, glycan, cells, organs, and even whole organisms. Therefore, ligands which bind to polystyrene surface (PS) can appear in the biopanning results unintentionally.
The cysteine amino acids at both ends of the circular peptides were deleted. All peptides harboring ambiguous residues or nonalphabetic characters were excluded. We compared each sequence in the negative dataset with the one in the positive dataset and deleted the identical sequences in negative dataset and replenished the peptides.
To exclude possible PSBP crept in the negative data, we used the Generalized Jaccard similarity to keep the peptide sequence similarity of positive and negative data below 90%. However, there are many PSBPs that do not have the typical motifs. There are no tools capable of rationally predicting PSBP when peptides bear no such motifs.
PSBinder was modeled by the dipeptide features, which successfully responds to these situations. Our model was built with 146 features. The top three features are WG, WF, and WE. According to the analysis of amino acid composition, we found that the most frequently occurring amino acids were W, Y, and F.
And all the hydrophobic amino acids appear in our features. Thus, when a peptide has the amino acids with the benzene ring and is accompanied by many hydrophobic amino acids, it may be a PSBP. In addition, after the completion of our predictor, a paper published very recently reported a PSBP with the sequence of VHWDFRQWWQPS.
Since this peptide is not seen in the training datasets, we used it as an independent case test. PSBinder predicted this peptide as a PSBP (the probability is about 0.88), which agreed with the experimental result. In this paper, we developed a predictor based on SVM to detect if a peptide is a PSBP.
The model constructed by optimized dipeptide features had a good performance. The maximum accuracy of 87.02% was achieved with 0.74 MCC, 88.46% sensitivity, and 85.58% specificity, respectively. In addition, in order to facilitate its usage, the SVM-based model was implemented into an online web service called PSBinder.
PSBinder would be a useful tool to predict PSBPs, whether as TUPs or intended peptides. It will help to speed up the experiment process and facilitate the development of biological products.