BioCode: A data-driven procedure to learn the growth of biological networks

Sefer, Emre2023-06-132023-06-132022-111545-5963http://hdl.handle.net/10679/8381https://doi.org/10.1109/TCBB.2022.3165092Probabilistic biological network growth models have been utilized for many tasks including but not limited to capturing mechanism and dynamics of biological growth activities, null model representation, capturing anomalies, etc. Well-known examples of these probabilistic models are Kronecker model, preferential attachment model, and duplication-based model. However, we should frequently keep developing new models to better fit and explain the observed network features while new networks are being observed. Additionally, it is difficult to develop a growth model each time we study a new network. In this paper, we propose BioCode, a framework to automatically discover novel biological growth models matching user-specified graph attributes in directed and undirected biological graphs. BioCode designs a basic set of instructions which are common enough to model a number of well-known biological graph growth models. We combine such instruction-wise representation with a genetic algorithm based optimization procedure to encode models for various biological networks. We mainly evaluate the performance of BioCode in discovering models for biological collaboration networks, gene regulatory networks, and protein interaction networks which features such as assortativity, clustering coefficient, degree distribution closely match with the true ones in the corresponding real biological networks. As shown by the tests on the simulated graphs, the variance of the distributions of biological networks generated by BioCode is similar to the known models' variance for these biological network types.engrestrictedAccessBioCode: A data-driven procedure to learn the growth of biological networksarticle1963103311300096671960000510.1109/TCBB.2022.3165092AlgorithmsBiological networksGraph miningNetwork growth models2-s2.0-85127819168