Trichome

Data-driven astronomy (DDA) refers to the use of data science in astronomy. Several outputs of telescopic observations and sky surveys are taken into consideration and approaches related to data mining and big data management are used to analyze, filter, and normalize the data set that are further used for making Classifications, Predictions, and Anomaly detections by advanced Statistical approaches, digital image processing and machine learning. The output of these processes is used by astronomers and space scientists to study and identify patterns, anomalies, and movements in outer space and conclude theories and discoveries in the cosmos.

History[edit]

In 2007, the Galaxy Zoo project[1] was launched for morphological classification[2][3] of a large number of galaxies. In this project, 900,000 images were considered for classification that were taken from the Sloan Digital Sky Survey (SDSS)[4] for the past 7 years. The task was to study each picture of a galaxy, classify it as elliptical or spiral, and determine whether it was spinning or not. The team of Astrophysicists led by Kevin Schawinski in Oxford University were in charge of this project and Kevin and his colleague Chris Linlott figured out that it would take a period of 3–5 years for such a team to complete the work.[5] There they came up with the idea of using Machine Learning and Data Science techniques for analyzing the images and classifying them.[6]

Methodology[edit]

The data retrieved from the sky surveys are first brought for data preprocessing. In this, redundancies are removed and filtrated. Further, feature extraction is performed on this filtered data set, which is further taken for processes.[7] Some of the renowned sky surveys are listed below:

The size of data from the above-mentioned sky surveys ranges from 3 TB to almost 4.6 EB.[7] Further, data mining tasks that are involved in the management and manipulation of the data involve methods like classification, regression, clustering, anomaly detection, and time-series analysis. Several approaches and applications for each of these methods are involved in the task accomplishments.

Classification[edit]

Classification[16] is used for specific identifications and categorizations of astronomical data such as Spectral classification, Photometric classification, Morphological classification, and classification of solar activity. The approaches of classification techniques are listed below:

Regression[edit]

Regression[17] is used to make predictions based on the retrieved data through statistical trends and statistical modeling. Different uses of this technique are used for fetching Photometric redshifts and measurements of physical parameters of stars.[18] The approaches are listed below:

Clustering[edit]

Clustering[19] is classifying objects based on a similarity measure metric. It is used in Astronomy for Classification as well as Special/rare object detection. The approaches are listed below:

Anomaly detection[edit]

Anomaly detection[21] is used for detecting irregularities in the dataset. However, this technique is used here to detect rare/special objects. The following approaches are used:

Time-series analysis[edit]

Time-Series analysis[22] helps in analyzing trends and predicting outputs over time. It is used for trend prediction and novel detection (detection of unknown data). The approaches used here are:

References[edit]

  1. ^ "Zooniverse". www.zooniverse.org. Retrieved 2024-05-10.
  2. ^ Cavanagh, Mitchell K.; Bekki, Kenji; Groves, Brent A. (2021-07-08). "Morphological classification of galaxies with deep learning: comparing 3-way and 4-way CNNs". Monthly Notices of the Royal Astronomical Society. 506 (1): 659–676. arXiv:2106.01571. doi:10.1093/mnras/stab1552. ISSN 0035-8711.
  3. ^ Goyal, Lalit Mohan; Arora, Maanak; Pandey, Tushar; Mittal, Mamta (2020-12-01). "Morphological classification of galaxies using Conv-nets". Earth Science Informatics. 13 (4): 1427–1436. doi:10.1007/s12145-020-00526-w. ISSN 1865-0481.
  4. ^ a b "Sloan Digital Sky Survey-V: Pioneering Panoptic Spectroscopy - SDSS-V". Retrieved 2024-05-10.
  5. ^ Pati, Satavisa (2021-06-18). "How Data Science is Used in Astronomy?". Analytics Insight. Retrieved 2024-05-10.
  6. ^ Baron, Dalya (2019-04-15), Machine Learning in Astronomy: a practical overview, arXiv:1904.07248
  7. ^ a b Zhang, Yanxia; Zhao, Yongheng (2015-05-22). "Astronomy in the Big Data Era". Data Science Journal. 14: 11. Bibcode:2015DatSJ..14...11Z. doi:10.5334/dsj-2015-011. ISSN 1683-1470.
  8. ^ "The Palomar Digital Sky Survey (DPOSS)". sites.astro.caltech.edu. Retrieved 2024-05-10.
  9. ^ "IRSA - Two Micron All Sky Survey (2MASS)". irsa.ipac.caltech.edu. Retrieved 2024-05-10.
  10. ^ "GBT". Green Bank Observatory. 2023-06-26. Retrieved 2024-05-10.
  11. ^ "GALEX - Galaxy Evolution Explorer". www.galex.caltech.edu. Retrieved 2024-05-10.
  12. ^ "SkyMapper Southern Sky Survey". skymapper.anu.edu.au. Retrieved 2024-05-10.
  13. ^ "Pan-STARRS1 data archive home page - PS1 Public Archive - STScI Outerspace". outerspace.stsci.edu. Retrieved 2024-05-10.
  14. ^ Telescope, Large Synoptic Survey. "Rubin Observatory". Rubin Observatory. Retrieved 2024-05-10.
  15. ^ "Explore | SKAO". www.skao.int. Retrieved 2024-05-10.
  16. ^ Chowdhury, Shovan; Schoen, Marco P. (2020-10-02). "Research Paper Classification using Supervised Machine Learning Techniques". 2020 Intermountain Engineering, Technology and Computing (IETC). IEEE. pp. 1–6. doi:10.1109/IETC47856.2020.9249211. ISBN 978-1-7281-4291-3.
  17. ^ Sarstedt, Marko; Mooi, Erik (2014), Sarstedt, Marko; Mooi, Erik (eds.), "Regression Analysis", A Concise Guide to Market Research: The Process, Data, and Methods Using IBM SPSS Statistics, Berlin, Heidelberg: Springer, pp. 193–233, doi:10.1007/978-3-642-53965-7_7, ISBN 978-3-642-53965-7, retrieved 2024-05-10
  18. ^ "Bulletin de la Société Royale des Sciences de Liège | PoPuPS". Bulletin de la Société Royale des Sciences de Liège (in French). ISSN 0037-9565.
  19. ^ Bindra, Kamalpreet; Mishra, Anuranjan (September 2017). "A detailed study of clustering algorithms". 2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO). IEEE. pp. 371–376. doi:10.1109/ICRITO.2017.8342454. ISBN 978-1-5090-3012-5.
  20. ^ Pizzuti, C.; Talia, D. (May 2003). "P-autoclass: scalable parallel clustering for mining large data sets". IEEE Transactions on Knowledge and Data Engineering. 15 (3): 629–641. doi:10.1109/TKDE.2003.1198395. ISSN 1041-4347.
  21. ^ Thudumu, Srikanth; Branch, Philip; Jin, Jiong; Singh, Jugdutt (Jack) (2020-07-02). "A comprehensive survey of anomaly detection techniques for high dimensional big data". Journal of Big Data. 7 (1): 42. doi:10.1186/s40537-020-00320-x. hdl:10536/DRO/DU:30158643. ISSN 2196-1115.
  22. ^ Weiner, Irving B., ed. (2003-04-15). Handbook of Psychology (1 ed.). Wiley. doi:10.1002/0471264385.wei0223. ISBN 978-0-471-17669-5.

Leave a Reply