Protein scores (=node scores)

Each protein is assigned a score according to its presence or absence in the proteome sample. Proteins from the sample are scored as the negative average value of their connecting edges. The rest of the interactome proteins are assigned the average of all edge scores. A file containing both scores can be downloaded here and this file is used further for scoring the sample proteins in the following steps.

The proteins from the sample, which the user wants to test, have to be mapped to an NCBI entrez gene ID. A UniProt ID or any other ID can be mapped to the gene ID using the UniProt ID Mapping Service (http://www.uniprot.org/mapping/). After obtaining a list with gene IDs, they can be saved in a .txt file (for example “sample_proteins.txt”) with the header “gene_id” on top. Here is an R script, which automatically adds the protein and interactome scores to all proteins. The full path name (full_path_name) has to be used.

#read the tables into R
protein_scores <- read.table("full_path_name/file2_protein_scores.tsv",header=T,sep="\t")
sample <-read.table("full_path_name/sample_proteins.txt", header=T,sep="\t")

#match the sample proteins with their protein scores and create a new file
sample_protein_scores <- merge(sample,protein_scores,by="gene_id")

#delete the interactome scores and rename the protein_scores column to "score1"
sample_protein_scores$protein_score_interactome <- NULL
colnames (sample_protein_scores) <- c("gene_id","score1")

#extract the rest of the interactome proteins
rest_interactome <- as.matrix(setdiff(protein_scores$gene_id, sample_protein_scores$gene_id))
colnames(rest_interactome) <- c("gene_id")

#merge the file with the interactome scores
rest_interactome_scores <- merge(rest_interactome,protein_scores, by="gene_id")

#delete the background scores and rename the protein_scores column to "score1"
rest_interactome_scores$protein_score_sample <- NULL
colnames(rest_interactome_scores) <- c("gene_id","score1")

#merge the protein sample scores with the rest of the genes in the interactome
protein_scores_heinz <- rbind(sample_protein_scores, rest_interactome_scores)

The final file is the node file, which the algorithm heinz will use for calculating the maximum-scoring subnetwork. In the examples below the proteins have already been matched to gene ids using the R script above and assigned to protein scores.

  • Example 1:
    In the example of H9N2 virus-infected cells there are 22 proteins identified in the initial proteomic analysis. Sample proteins are given a value from the file with pre-calculated protein scores. We will add +10 to the protein scores, so that we ensure that the algorithm includes all 22 proteins in the final solution. The file can be downloaded here.

  • Example 2:
    In the example of T-cells there are 861 proteins identified in the initial proteomic analysis. The final protein file can be downloaded here.