Data Science Using Unsupervised Learning & Visualization of Astronomy Data
Dhaval Mandalia
Empowering companies with AI, Quantum Solutions, Data & Cloud Engineering to reach their full potential. | CEO / Founder @ Arocom | Certified Data Scientist by JHU | Managing T1D | Volunteer - Diabetes Awareness Programs
"To confine our attention to terrestrial matters would be to limit the human spirit."
— By Stephen Hawking
Night sky is always fascinating. With so many stars and celestial objects around, it has always inspired me to contribute towards astronomy data science. The data science in astronomy started long back, however one of the most important milestone is shown below:
As Edwin Hubble mentions - "A simple visualization of a complicated data makes the science behind it seem obvious."
Above is the data plot by Sir Edwin Hubble in 1929 showing that farther the galaxy, faster it is moving away from us aka Redshift.
As we’ve mapped more areas of the known universe, we’ve discovered astounding structures on the largest scales. Visualizing this structure in 2 or 3 dimension maps give us intuitive grasp of the composition & properties of galaxies within the universe and the forces the creation of that structure.
With the goal to contribute towards data science for astronomy, I looked into two major datasets - GAIA and Spitzer space telescope data.
Here is the snippet of Spitzer S4G data — a survey of stellar structure within the galaxies:
Columns shown here indicate:
- mstar(solMass): log10(stellar mass) - This is the stellar mass indicator
- c31_1: r75/r25 concentration index at 3.6 microns - Measure of a galactical illumination and stellar density
- c42_1: 5*log10(r80/r20) concentration index at 3.6 microns - Measure of a galactic illumination and stellar density
- phys_size : In KPC (kilo parsecs)(1 parsec = 3.26 Light years) - This is the size of galaxy
- mabs1 and mabs2: Absolute magnitude of light wavelength at 3.6 and 4.5 microns.
- Note: R75 and R25 are radii at which enclosed luminosity is 75% and 25% respectively
Sample description of these galaxies can be found on links like:
Next step would be to find inter-relationship between the above six parameters. Plotting is the easiest way to figure out how these parameters are related with each other and with almost 6 of them scattered matrix is an ideal visualization.
Some of the inferences we can observe here are:
- Galaxies with larger c31 ratio are also larger in size.
- As physical size of the galaxy increases its stellar mass also increases. (not always obvious as now we know there are dark matter galaxies that have the inverse of this)
Using unsupervised learning like PCA and t-SNE can further help in evaluating this data. Here is the PCA plot of this data with 6 parameters.
The results are bimodal. PCA model after applying clustering groups the galaxies based on the type Elliptical (Red) and Spiral (Blue)
Another approach would be to plot t-SNE unsupervised learning algorithm with various perplexity values. Different values tried here are 5,10,15,30,40 & 50.
These plotting corroborate with the PCA analysis. We can zoom in further to select a pocket within this data. Selecting a part of the data for further analysis.
Plotting mstar (stellar mass) & phys_size of this selective data of 30 galaxies, against their morphological type code “t” (ref: https://en.wikipedia.org/wiki/Galaxy_morphological_classification) shows :
Conclusion shows that galaxies that we identified belong to various classification in the pocket of 30 are smaller indicating highly concentrated and has comparatively low stellar mass.
PCA and T-SNE unsupervised learnings can helps us derive these inferences from Spitzer galaxy catalog data.
Another example is data published by GAIA. On April 25th this year, GAIA published its DR2 archive. I was going through this archive and stumbled upon this video.
GAIA data contains spatial information about stellar objects. It is a huge data and best way to consume would be to use Astroquery in python. GAIA showed information about Cepheid Variable Stars between different arcs and exoplanets in relation to earth mass and radius. Cepheid variables are candlesticks to gauge distances in space. Since luminosity of each type of Cepheid is constant it is easier to extrapolate their distances from earth. Right Ascension and Declination are placement co-ordinates. Right Ascension is the angular distance of a particular point measured eastward along the celestial equator from the Sun March equinox. Declination is the angular distance of a point north or south of the celestial equator.
Below visualization shows the placement of various Cepheid between 73 & 80 RA and -65 and -67 Decl. Visualization is created using Python libraries.
Below visualization shows scatter plot of exoplanet data using Bokeh plotting in python.
Plotting of all 2500+ exoplanet archive helps in identifying near earths and super earths from the data. There are many other parameters like luminosity, temperature etc which can be visualized from this data.
Astronomy is a treasure trove for data analysis and any amateur data scientists can get their hands on. This was a start for me to capture more information around these measurements. Next in line would be to pay tribute to KEPLER and find inferences from data that TESS shares.
A more consistent and focused initiative can help in collaboration between astronomers, statisticians, data scientists and information & computer professionals and thereby expanding our understanding of the SPACE around us.
References:
- S4G Catalog Definitions
- Wikipedia.org
- GAIA Archives
- SPITZER Data
#unsupervisedlearning #machinelearning #datascience #astronomy #GAIA
Note: I published the same article few days back in medium. However after valuable feedback from few of my readers, I am changing the content and republishing it on LinkedIn.
Dhaval Mandalia enjoys data science, project management, training executives and write about AI & Management. He’s also a contributing member in the management association community in Gujarat. Follow him on Twitter and Facebook.
Founder @ Alexus Capital Management | Equity Market and Debt Advisory | Co-Founder & Chief Finance Officer @ Arocom IT Solutions (P) Ltd
6 年Nice article..
CEO - Founder at TaashaTech Infosolutions Pvt. Ltd. & Metis Intellisystems Pvt. Ltd. , Senior Technology Leader
6 年WOW !! This is at a different Level !!?