A generic method for assignment of reliability scores applied to solvent accessibility predictions

Bent Petersen; Thomas Nordahl Petersen; Pernille Andersen; Morten Nielsen; Claus Lundegaard

doi:10.1186/1472-6807-9-51

A generic method for assignment of reliability scores applied to solvent accessibility predictions

Bent Petersen, Thomas Nordahl Petersen, Pernille Andersen, Morten Nielsen, Claus Lundegaard

LUKKET: Institut for Immunologi og Mikrobiologi

454 Citationer (Scopus)

4 Downloads (Pure)

Abstract

BACKGROUND: Estimation of the reliability of specific real value predictions is nontrivial and the efficacy of this is often questionable. It is important to know if you can trust a given prediction and therefore the best methods associate a prediction with a reliability score or index. For discrete qualitative predictions, the reliability is conventionally estimated as the difference between output scores of selected classes. Such an approach is not feasible for methods that predict a biological feature as a single real value rather than a classification. As a solution to this challenge, we have implemented a method that predicts the relative surface accessibility of an amino acid and simultaneously predicts the reliability for each prediction, in the form of a Z-score.

RESULTS: An ensemble of artificial neural networks has been trained on a set of experimentally solved protein structures to predict the relative exposure of the amino acids. The method assigns a reliability score to each surface accessibility prediction as an inherent part of the training process. This is in contrast to the most commonly used procedures where reliabilities are obtained by post-processing the output.

CONCLUSION: The performance of the neural networks was evaluated on a commonly used set of sequences known as the CB513 set. An overall Pearson's correlation coefficient of 0.72 was obtained, which is comparable to the performance of the currently best public available method, Real-SPINE. Both methods associate a reliability score with the individual predictions. However, our implementation of reliability scores in the form of a Z-score is shown to be the more informative measure for discriminating good predictions from bad ones in the entire range from completely buried to fully exposed amino acids. This is evident when comparing the Pearson's correlation coefficient for the upper 20% of predictions sorted according to reliability. For this subset, values of 0.79 and 0.74 are obtained using our and the compared method, respectively. This tendency is true for any selected subset.

Originalsprog	Engelsk
Artikelnummer	51
Tidsskrift	BMC Structural Biology
Vol/bind	9
Antal sider	10
ISSN	1472-6807
DOI	https://doi.org/10.1186/1472-6807-9-51
Status	Udgivet - 2009

Adgang til dokumentet

10.1186/1472-6807-9-51Licens: CC BY

A generic method for assignment of reliability scores applied to solvent accessibility predictionsForlagets udgivne version, 853 KBLicens: CC BY

Citationsformater

@article{d9c58493b23c4ab698ba1ab6b8734cfa,

title = "A generic method for assignment of reliability scores applied to solvent accessibility predictions",

abstract = "BACKGROUND: Estimation of the reliability of specific real value predictions is nontrivial and the efficacy of this is often questionable. It is important to know if you can trust a given prediction and therefore the best methods associate a prediction with a reliability score or index. For discrete qualitative predictions, the reliability is conventionally estimated as the difference between output scores of selected classes. Such an approach is not feasible for methods that predict a biological feature as a single real value rather than a classification. As a solution to this challenge, we have implemented a method that predicts the relative surface accessibility of an amino acid and simultaneously predicts the reliability for each prediction, in the form of a Z-score.RESULTS: An ensemble of artificial neural networks has been trained on a set of experimentally solved protein structures to predict the relative exposure of the amino acids. The method assigns a reliability score to each surface accessibility prediction as an inherent part of the training process. This is in contrast to the most commonly used procedures where reliabilities are obtained by post-processing the output.CONCLUSION: The performance of the neural networks was evaluated on a commonly used set of sequences known as the CB513 set. An overall Pearson's correlation coefficient of 0.72 was obtained, which is comparable to the performance of the currently best public available method, Real-SPINE. Both methods associate a reliability score with the individual predictions. However, our implementation of reliability scores in the form of a Z-score is shown to be the more informative measure for discriminating good predictions from bad ones in the entire range from completely buried to fully exposed amino acids. This is evident when comparing the Pearson's correlation coefficient for the upper 20% of predictions sorted according to reliability. For this subset, values of 0.79 and 0.74 are obtained using our and the compared method, respectively. This tendency is true for any selected subset.",

keywords = "Algorithms, Computational Biology, Databases, Protein, Neural Networks (Computer), Proteins/chemistry, Solvents/chemistry",

author = "Bent Petersen and Petersen, {Thomas Nordahl} and Pernille Andersen and Morten Nielsen and Claus Lundegaard",

year = "2009",

doi = "10.1186/1472-6807-9-51",

language = "English",

volume = "9",

journal = "BMC Structural Biology",

issn = "1472-6807",

publisher = "BioMed Central Ltd.",

}

TY - JOUR

T1 - A generic method for assignment of reliability scores applied to solvent accessibility predictions

AU - Petersen, Bent

AU - Petersen, Thomas Nordahl

AU - Andersen, Pernille

AU - Nielsen, Morten

AU - Lundegaard, Claus

PY - 2009

Y1 - 2009

N2 - BACKGROUND: Estimation of the reliability of specific real value predictions is nontrivial and the efficacy of this is often questionable. It is important to know if you can trust a given prediction and therefore the best methods associate a prediction with a reliability score or index. For discrete qualitative predictions, the reliability is conventionally estimated as the difference between output scores of selected classes. Such an approach is not feasible for methods that predict a biological feature as a single real value rather than a classification. As a solution to this challenge, we have implemented a method that predicts the relative surface accessibility of an amino acid and simultaneously predicts the reliability for each prediction, in the form of a Z-score.RESULTS: An ensemble of artificial neural networks has been trained on a set of experimentally solved protein structures to predict the relative exposure of the amino acids. The method assigns a reliability score to each surface accessibility prediction as an inherent part of the training process. This is in contrast to the most commonly used procedures where reliabilities are obtained by post-processing the output.CONCLUSION: The performance of the neural networks was evaluated on a commonly used set of sequences known as the CB513 set. An overall Pearson's correlation coefficient of 0.72 was obtained, which is comparable to the performance of the currently best public available method, Real-SPINE. Both methods associate a reliability score with the individual predictions. However, our implementation of reliability scores in the form of a Z-score is shown to be the more informative measure for discriminating good predictions from bad ones in the entire range from completely buried to fully exposed amino acids. This is evident when comparing the Pearson's correlation coefficient for the upper 20% of predictions sorted according to reliability. For this subset, values of 0.79 and 0.74 are obtained using our and the compared method, respectively. This tendency is true for any selected subset.

AB - BACKGROUND: Estimation of the reliability of specific real value predictions is nontrivial and the efficacy of this is often questionable. It is important to know if you can trust a given prediction and therefore the best methods associate a prediction with a reliability score or index. For discrete qualitative predictions, the reliability is conventionally estimated as the difference between output scores of selected classes. Such an approach is not feasible for methods that predict a biological feature as a single real value rather than a classification. As a solution to this challenge, we have implemented a method that predicts the relative surface accessibility of an amino acid and simultaneously predicts the reliability for each prediction, in the form of a Z-score.RESULTS: An ensemble of artificial neural networks has been trained on a set of experimentally solved protein structures to predict the relative exposure of the amino acids. The method assigns a reliability score to each surface accessibility prediction as an inherent part of the training process. This is in contrast to the most commonly used procedures where reliabilities are obtained by post-processing the output.CONCLUSION: The performance of the neural networks was evaluated on a commonly used set of sequences known as the CB513 set. An overall Pearson's correlation coefficient of 0.72 was obtained, which is comparable to the performance of the currently best public available method, Real-SPINE. Both methods associate a reliability score with the individual predictions. However, our implementation of reliability scores in the form of a Z-score is shown to be the more informative measure for discriminating good predictions from bad ones in the entire range from completely buried to fully exposed amino acids. This is evident when comparing the Pearson's correlation coefficient for the upper 20% of predictions sorted according to reliability. For this subset, values of 0.79 and 0.74 are obtained using our and the compared method, respectively. This tendency is true for any selected subset.

KW - Algorithms

KW - Computational Biology

KW - Databases, Protein

KW - Neural Networks (Computer)

KW - Proteins/chemistry

KW - Solvents/chemistry

U2 - 10.1186/1472-6807-9-51

DO - 10.1186/1472-6807-9-51

M3 - Journal article

C2 - 19646261

SN - 1472-6807

VL - 9

JO - BMC Structural Biology

JF - BMC Structural Biology

M1 - 51

ER -

A generic method for assignment of reliability scores applied to solvent accessibility predictions

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater