Proteins are complex molecules found in all living organisms, as well as in viruses – which do not belong to living organisms but are simply fragments of nucleic acid surrounded by a protein coat. Do we know everything about proteins since everyone’s cells consist of them? Not necessarily. Their structure can be both very simple and highly complicated, so their classification and analysis still cause many problems. The methods of protein recognition used so far have been based mainly on sequence similarity, i.e. the order of amino acids that are the building blocks of these chemicals, and on comparing new proteins with those already known. A team from the Department of Molecular Biology, Faculty of Biology, University of Warsaw presented a different view on this task, turning the complex structure of proteins into a point cloud and teaching their tool how to conclude it. Due to their idea, proteins on the surface of the virus or potential drug binding sites present in them can be identified faster and more accurately, relying on their spatial structure and machine learning, rather than the tedious process of comparing the structures of new proteins to those that have already been classified.
This innovative bioinformatics tool created by Albert Roethel, Dr Piotr Biliński and Dr Takao Ishikawa is called BioS2Net (Biological Sequence and Structure Network) – an advanced algorithm based on deep learning, which learns on its own – without human control – only through data processing.
The use of deep learning by BioS2Net
Deep learning is an element of artificial intelligence and its name comes from the fact that the structure of artificial neural networks consists of many input, output and hidden layers, so the process can be actually called deep. Its complexity was used in the development of the BioS2Net tool. The input protein data is processed by the tool into protein point clouds (shown in the diagram below as protein point cloud). Each of these points is represented by 3D spatial coordinates and may contain additional information, for example about the structural or physicochemical characteristics of the protein. The data is analyzed by a neural network element called a “sequence convolutional extractor with five inception modules” (5 x inception). Protein features are combined with a point cloud and then interpreted by another element of neural networks – a “3D structure extractor” called “PointNet ++” by generating simplified neural networks (PointNet). To better understand the concept of neural networks, it is useful to compare them to the human nervous system on which they are modelled. Artificial neural networks are made up of layers of artificial neurons that mimic the way biological neurons transmit information to each other. By transmitting this data, neural networks learn on their own over time and improve their accuracy, which is the basis of deep learning. Coming back to the described process – the next step is to use the processing of the obtained data with deep learning methods. The result is a global features vector that can finally be used to recognize proteins and classify their protein chain folding. 
Fig. 1 BioS2Net operation diagram. 
The importance of the tool for protein classification
Through its holistic study of the chemical molecules that make up proteins – based on an artificial intelligence algorithm – BioS2Net can solve protein classification problems almost flawlessly. The tool allows the recognition of proteins based on the amino acid sequence and spatial structure, including the detection of proteins that have a similar three-dimensional structure, but a different way of folding the protein chain – which has not been possible so far. BioS2Net can therefore identify proteins that, despite this difference, have similar functions and can be used for the same purposes. After examining the accuracy of the described tool, it was found that BioS2Net achieves a precision of 95.4% in one of the data sets, which indicates its actual usefulness in recognizing protein structures. And because new proteins are constantly being discovered that require analysis and the most precise classification possible, BioS2Net has the potential to become a tool used on a very large scale.
Fig. 2 Examples of proteins with a similar overall structure, differing in the folding of the protein chain. Each of the five examples (A-E) shows two different proteins – left and right, and in the centre, their structural overlap is coloured green and red. The proteins shown may have a common function, for example providing an interaction surface for other molecules because, despite their topological differences, their overall 3D structures are sufficiently similar to enable them to play similar biological roles. The similarities in this 3D structure are detected by BioS2Net. 
-  BioS2Net: Holistic Structural and Sequential Analysis of Biomolecules Using a Deep Neural Network, Albert Roethel, Piotr Biliński, Takao Ishikawa, https://www.mdpi.com/1422-0067/23/6/2966/htm (accessed on October 21, 2022)
- Getting Started with Point Clouds Using Deep Learning – MATLAB & Simulink, MathWorks, https://www.mathworks.com/help/vision/ug/getting-started-with-deep-learning-using-point-clouds.html (accessed on October 21, 2022
- Czym są sieci neuronowe i jaki mają związek z uczeniem głębokim? (What are neural networks and how do they relate to deep learning?), DeepTechnology.ai, https://www.deeptechnology.ai/sieci-neuronowe-i-ich-zwiazek-z-uczeniem-glebokim/ (accessed on October 21, 2022)
- Głębokie sieci neuronowe i ich zastosowania w eksploracji danych (Deep neural networks and their applications in data mining), Stanisław Osowski, https://sep.com.pl/photo/files/04%20-%20Stanisław%20Osowski%20-%20Głębokie%20sieci%20neuronowe%20i%20ich%20zastosowania%20w%20eksploracji%20danych.pdf (accessed on October 21, 2022)