Biomarkers are cornerstones of healthcare spanning a variety of applications from disease diagnosis to stratification and prediction of likely outcome. Despite significant efforts that have identified thousands of potential biomarkers, their translation into clinical practice remains poor, averaging 1.5 per year across all diseases. This inefficiency primarily results from the lack of connection of the candidate biomarkers with the underlying pathophysiological mechanisms that they monitor which results in poor reproducibility in their developmental pipeline. On top of these limitations, the current single-biomarker-tosingle-disease approach does not capture the multifactorial nature of complex diseases like Chronic Kidney Disease (CKD). CKD is a major public health problem that affects approximately to 14% of the general population and requires asymptomatic, early-stage, and diseasespecific, biomarkers to deliver more precise diagnostic and predictive information.
Here we propose, and experimentally validate, a biomarker discovery pipeline that aims to identify molecular signatures that not only allow to discern different CKD subtypes but also capture the underlying biology of the disease. To that end, first, we integrate protein-protein interaction networks with annotated gene-sets into a knowledge-base that captures plethora of information about biological entities and their interactions. This model is then fed with CKD transcriptional data to generate a disease specific model. Third, the model is analysed using different state-of-the-art methods (e.g.: network analysis, pathway analysis) which result in a molecular profile for each protein capturing different disease biology. Relevant features are then selected and optimized by training an elastic network model on plasma samples which is used to predict biomarker performance. Finally, the resulting individual candidates are integrated into a biomarker panel with increased performance and stability using linear discriminant analysis as machine learning integrative method. Results show that our holistic approach can find biomarkers that are associated with disease mechanisms while keeping competent predictive abilities.