Sacrificing information for the greater good: how to select photometric bands for optimal accuracy

Kristoffer Stensbo-Smidt; Fabian Cristian Gieseke; Christian Igel; Andrew Wasmuth Zirm; Kim Steenstrup Pedersen

doi:10.1093/mnras/stw2476

Sacrificing information for the greater good: how to select photometric bands for optimal accuracy

Kristoffer Stensbo-Smidt, Fabian Cristian Gieseke, Christian Igel, Andrew Wasmuth Zirm, Kim Steenstrup Pedersen

10 Citations (Scopus)

Abstract

Large-scale surveys make huge amounts of photometric data available. Because of the sheer amount of objects, spectral data cannot be obtained for all of them. Therefore, it is important to devise techniques for reliably estimating physical properties of objects from photometric information alone. These estimates are needed to automatically identify interesting objects worth a follow-up investigation as well as to produce the required data for a statistical analysis of the space covered by a survey.We argue that machine learning techniques are suitable to compute these estimates accurately and efficiently. This study promotes a feature selection algorithm, which selects the most informative magnitudes and colours for a given task of estimating physical quantities from photometric data alone. Using k-nearest neighbours regression, a well-known non-parametric machine learning method, we show that using the found features significantly increases the accuracy of the estimations compared to using standard features and standard methods. We illustrate the usefulness of the approach by estimating specific star formation rates (sSFRs) and redshifts (photo-z's) using only the broad-band photometry from the Sloan Digital Sky Survey (SDSS). For estimating sSFRs, we demonstrate that our method produces better estimates than traditional spectral energy distribution fitting. For estimating photo-z's, we show that our method produces more accurate photo-z's than the method employed by SDSS. The study highlights the general importance of performing proper model selection to improve the results of machine learning systems and how feature selection can provide insights into the predictive relevance of particular input features.

Original language	English
Journal	Monthly Notices of the Royal Astronomical Society
Volume	464
Issue number	3
Pages (from-to)	2577-2596
Number of pages	20
ISSN	0035-8711
DOIs	https://doi.org/10.1093/mnras/stw2476
Publication status	Published - 21 Jan 2017

Access to Document

10.1093/mnras/stw2476

http://arxiv.org/pdf/1511.05424Licence: Other

Cite this

@article{a10e71dc4c9146a784a5bfb628b3acf2,

title = "Sacrificing information for the greater good: how to select photometric bands for optimal accuracy",

abstract = "Large-scale surveys make huge amounts of photometric data available. Because of the sheer amount of objects, spectral data cannot be obtained for all of them. Therefore, it is important to devise techniques for reliably estimating physical properties of objects from photometric information alone. These estimates are needed to automatically identify interesting objects worth a follow-up investigation as well as to produce the required data for a statistical analysis of the space covered by a survey.We argue that machine learning techniques are suitable to compute these estimates accurately and efficiently. This study promotes a feature selection algorithm, which selects the most informative magnitudes and colours for a given task of estimating physical quantities from photometric data alone. Using k-nearest neighbours regression, a well-known non-parametric machine learning method, we show that using the found features significantly increases the accuracy of the estimations compared to using standard features and standard methods. We illustrate the usefulness of the approach by estimating specific star formation rates (sSFRs) and redshifts (photo-z's) using only the broad-band photometry from the Sloan Digital Sky Survey (SDSS). For estimating sSFRs, we demonstrate that our method produces better estimates than traditional spectral energy distribution fitting. For estimating photo-z's, we show that our method produces more accurate photo-z's than the method employed by SDSS. The study highlights the general importance of performing proper model selection to improve the results of machine learning systems and how feature selection can provide insights into the predictive relevance of particular input features.",

author = "Kristoffer Stensbo-Smidt and Gieseke, {Fabian Cristian} and Christian Igel and Zirm, {Andrew Wasmuth} and Pedersen, {Kim Steenstrup}",

year = "2017",

month = jan,

day = "21",

doi = "10.1093/mnras/stw2476",

language = "English",

volume = "464",

pages = "2577--2596",

journal = "Monthly Notices of the Royal Astronomical Society",

issn = "0035-8711",

publisher = "Oxford University Press",

number = "3",

}

TY - JOUR

T1 - Sacrificing information for the greater good

T2 - how to select photometric bands for optimal accuracy

AU - Stensbo-Smidt, Kristoffer

AU - Gieseke, Fabian Cristian

AU - Igel, Christian

AU - Zirm, Andrew Wasmuth

AU - Pedersen, Kim Steenstrup

PY - 2017/1/21

Y1 - 2017/1/21

N2 - Large-scale surveys make huge amounts of photometric data available. Because of the sheer amount of objects, spectral data cannot be obtained for all of them. Therefore, it is important to devise techniques for reliably estimating physical properties of objects from photometric information alone. These estimates are needed to automatically identify interesting objects worth a follow-up investigation as well as to produce the required data for a statistical analysis of the space covered by a survey.We argue that machine learning techniques are suitable to compute these estimates accurately and efficiently. This study promotes a feature selection algorithm, which selects the most informative magnitudes and colours for a given task of estimating physical quantities from photometric data alone. Using k-nearest neighbours regression, a well-known non-parametric machine learning method, we show that using the found features significantly increases the accuracy of the estimations compared to using standard features and standard methods. We illustrate the usefulness of the approach by estimating specific star formation rates (sSFRs) and redshifts (photo-z's) using only the broad-band photometry from the Sloan Digital Sky Survey (SDSS). For estimating sSFRs, we demonstrate that our method produces better estimates than traditional spectral energy distribution fitting. For estimating photo-z's, we show that our method produces more accurate photo-z's than the method employed by SDSS. The study highlights the general importance of performing proper model selection to improve the results of machine learning systems and how feature selection can provide insights into the predictive relevance of particular input features.

AB - Large-scale surveys make huge amounts of photometric data available. Because of the sheer amount of objects, spectral data cannot be obtained for all of them. Therefore, it is important to devise techniques for reliably estimating physical properties of objects from photometric information alone. These estimates are needed to automatically identify interesting objects worth a follow-up investigation as well as to produce the required data for a statistical analysis of the space covered by a survey.We argue that machine learning techniques are suitable to compute these estimates accurately and efficiently. This study promotes a feature selection algorithm, which selects the most informative magnitudes and colours for a given task of estimating physical quantities from photometric data alone. Using k-nearest neighbours regression, a well-known non-parametric machine learning method, we show that using the found features significantly increases the accuracy of the estimations compared to using standard features and standard methods. We illustrate the usefulness of the approach by estimating specific star formation rates (sSFRs) and redshifts (photo-z's) using only the broad-band photometry from the Sloan Digital Sky Survey (SDSS). For estimating sSFRs, we demonstrate that our method produces better estimates than traditional spectral energy distribution fitting. For estimating photo-z's, we show that our method produces more accurate photo-z's than the method employed by SDSS. The study highlights the general importance of performing proper model selection to improve the results of machine learning systems and how feature selection can provide insights into the predictive relevance of particular input features.

U2 - 10.1093/mnras/stw2476

DO - 10.1093/mnras/stw2476

M3 - Journal article

SN - 0035-8711

VL - 464

SP - 2577

EP - 2596

JO - Monthly Notices of the Royal Astronomical Society

JF - Monthly Notices of the Royal Astronomical Society

IS - 3

ER -

Sacrificing information for the greater good: how to select photometric bands for optimal accuracy

Abstract

Access to Document

Fingerprint

Cite this