Upload PDB File

Instructions

  • Descriptions
    • SplitSeek-Pro is a deep learning–based method designed to evaluate the feasibility of protein engineering strategies involving residue-level splitting, such as circular permutation or split–reconstitutions. The method integrates both sequence information (ESM2-650M embeddings and AAindex encodings) and structural information (pairwise atomic distances) to estimate the splitting probability for each amino acid residue.
    • Currently, users can submit files in PDB format to obtain per-residue splitting probability predictions.
    • On the web server, prediction results are displayed using a color scale (white → red representing scores from 0 to 1). SplitSeek-Pro outputs: A predicted PDB file, where the original B-factor column is replaced with the corresponding splitting scores (0–100). Users can download these files and visualize them locally (e.g., in PyMOL) by coloring residues based on the B-factor.
    • A score of 0.5 is recommended as the threshold to distinguish between feasible and infeasible splitting sites.
    • We will continue to improve the prediction accuracy as more data become available.
  • Tips
    • The predicted score at residue n corresponds to the feasibility of splitting the protein between residues n and residue n+1.
    • Continuous regions with ≥3 consecutive residues scoring above 0.8 generally indicate high splitting feasibility. Single high-scoring residues within a continuous region are less likely to be practically feasible.
    • The model was trained in two stages: (1) pretraining on computational splitting probability dataset and (2) fine-tuning on circular permutation data from structurally similar proteins. Because the fine-tuning dataset is derived from circular permutation examples, predictions are biased toward identifying sites for circular permutation.
    • Predictions tend to be more reliable for proteins with fewer than 400 residues.
  • Shortcomings
    • While the model can partially recognize nonsplittable residues near active sites, the absence of explicit active-site annotations might lead to false positives due to perturbation on functionally relevant residues or ghost effects.
    • The model is primarily optimized for distinguishing splittable sites at the loop-regions. It currently has limited accuracy for identifying splittable sites in rigid secondary structural elements, which is the main source of false negatives.
    • Circular permutation at residues located near the terminal regions is usually well tolerated. However, their underrepresentation in pretraining data and experimental examples leads to underestimated model scores. Therefore, low-scores at terminal residues should require cautious interpretation, while high-scores remain reliable.
  • Download
    • A predicted PDB file, where the original B-factor column is replaced with the corresponding splitting scores (0–100). Users can download these files and visualize them locally (e.g., in PyMOL) by coloring residues based on the B-factor.