Speaker
Description
As far as we know, galaxies form inside dark matter halos and elucidating this connection is a key element in theories of galaxy formation and evolution. In this work, we propose a suite of machine learning (ML) tools to analyze these intricate relations in the IllustrisTNG300 magnetohydrodynamical simulation. We apply four individual algorithms: extremely randomized trees (ERT), K-nearest neighbors (kNN), light gradient boosting machine (LGBM), and neural networks (NN). Moreover, we combine the results of the different methods in a stacked model. In addition, we apply all these methods in an augumented dataset using the synthetic minority over-sampling technique for regression with Gaussian noise (SMOGN), in order to alleviate the problem of unbalanced datasets, and show that it improves the shape of the predicted distributions. Overall, the all the ML algorithms produce consistent results in terms of predicting central galaxy properties from a set of input halo properties that include halo mass, concentration, spin, and halo overdensity. For stellar mass, the (predicted vs. real) Pearson correlation coefficient is 0.98, dropping down to 0.7-0.8 for specific star formation rate, colour, and size. We also demonstrate that our predictions are good enough to reproduce the power spectra of multiple galaxy populations, defined in terms of stellar mass, sSFR, colour, and size with high accuracy. Our analysis adds evidence to previous works indicating that certain galaxy properties cannot be reproduced using halo features alone.
Main Topic | Supervised/Unsupervised/Semi-supervised Learning |
---|---|
Secondary Topic | Data preparation, generation and augmentation |
Participation mode | Remote |