Data Science Using Unsupervised Learning & Visualization of Astronomy Data

Data Science Using Unsupervised Learning & Visualization of Astronomy Data

"To confine our attention to terrestrial matters would be to limit the human spirit."
— By Stephen Hawking

Night sky is always fascinating. With so many stars and celestial objects around, it has always inspired me to contribute towards astronomy data science. The data science in astronomy started long back, however one of the most important milestone is shown below:

As Edwin Hubble mentions - "A simple visualization of a complicated data makes the science behind it seem obvious."

Above is the data plot by Sir Edwin Hubble in 1929 showing that farther the galaxy, faster it is moving away from us aka Redshift.

As we’ve mapped more areas of the known universe, we’ve discovered astounding structures on the largest scales. Visualizing this structure in 2 or 3 dimension maps give us intuitive grasp of the composition & properties of galaxies within the universe and the forces the creation of that structure.

With the goal to contribute towards data science for astronomy, I looked into two major datasets - GAIA and Spitzer space telescope data.

Here is the snippet of Spitzer S4G data — a survey of stellar structure within the galaxies:

Columns shown here indicate:

  • mstar(solMass): log10(stellar mass) - This is the stellar mass indicator
  • c31_1: r75/r25 concentration index at 3.6 microns - Measure of a galactical illumination and stellar density
  • c42_1: 5*log10(r80/r20) concentration index at 3.6 microns - Measure of a galactic illumination and stellar density
  • phys_size : In KPC (kilo parsecs)(1 parsec = 3.26 Light years) - This is the size of galaxy
  • mabs1 and mabs2: Absolute magnitude of light wavelength at 3.6 and 4.5 microns.
  • Note: R75 and R25 are radii at which enclosed luminosity is 75% and 25% respectively

Sample description of these galaxies can be found on links like:

Next step would be to find inter-relationship between the above six parameters. Plotting is the easiest way to figure out how these parameters are related with each other and with almost 6 of them scattered matrix is an ideal visualization.

Some of the inferences we can observe here are:

  1. Galaxies with larger c31 ratio are also larger in size.
  2. As physical size of the galaxy increases its stellar mass also increases. (not always obvious as now we know there are dark matter galaxies that have the inverse of this)

Using unsupervised learning like PCA and t-SNE can further help in evaluating this data. Here is the PCA plot of this data with 6 parameters.

The results are bimodal. PCA model after applying clustering groups the galaxies based on the type Elliptical (Red) and Spiral (Blue)

Another approach would be to plot t-SNE unsupervised learning algorithm with various perplexity values. Different values tried here are 5,10,15,30,40 & 50.

These plotting corroborate with the PCA analysis. We can zoom in further to select a pocket within this data. Selecting a part of the data for further analysis.

Plotting mstar (stellar mass) & phys_size of this selective data of 30 galaxies, against their morphological type code “t” (ref: https://en.wikipedia.org/wiki/Galaxy_morphological_classification) shows :

Conclusion shows that galaxies that we identified belong to various classification in the pocket of 30 are smaller indicating highly concentrated and has comparatively low stellar mass.

PCA and T-SNE unsupervised learnings can helps us derive these inferences from Spitzer galaxy catalog data.

Another example is data published by GAIA. On April 25th this year, GAIA published its DR2 archive. I was going through this archive and stumbled upon this video.

GAIA data contains spatial information about stellar objects. It is a huge data and best way to consume would be to use Astroquery in python. GAIA showed information about Cepheid Variable Stars between different arcs and exoplanets in relation to earth mass and radius. Cepheid variables are candlesticks to gauge distances in space. Since luminosity of each type of Cepheid is constant it is easier to extrapolate their distances from earth. Right Ascension and Declination are placement co-ordinates. Right Ascension is the angular distance of a particular point measured eastward along the celestial equator from the Sun March equinox. Declination is the angular distance of a point north or south of the celestial equator.

Below visualization shows the placement of various Cepheid between 73 & 80 RA and -65 and -67 Decl. Visualization is created using Python libraries.


Below visualization shows scatter plot of exoplanet data using Bokeh plotting in python.

Plotting of all 2500+ exoplanet archive helps in identifying near earths and super earths from the data. There are many other parameters like luminosity, temperature etc which can be visualized from this data.

Astronomy is a treasure trove for data analysis and any amateur data scientists can get their hands on. This was a start for me to capture more information around these measurements. Next in line would be to pay tribute to KEPLER and find inferences from data that TESS shares.

A more consistent and focused initiative can help in collaboration between astronomers, statisticians, data scientists and information & computer professionals and thereby expanding our understanding of the SPACE around us.

References:

  1. S4G Catalog Definitions
  2. Wikipedia.org
  3. GAIA Archives 
  4. SPITZER Data

#unsupervisedlearning #machinelearning #datascience #astronomy #GAIA

Note: I published the same article few days back in medium. However after valuable feedback from few of my readers, I am changing the content and republishing it on LinkedIn.

Dhaval Mandalia enjoys data science, project management, training executives and write about AI & Management. He’s also a contributing member in the management association community in Gujarat. Follow him on Twitter and Facebook.




Kaushal Mandalia

Founder @ Alexus Capital Management | Equity Market and Debt Advisory | Co-Founder & Chief Finance Officer @ Arocom IT Solutions (P) Ltd

6 年

Nice article..

回复
KHUSHRU DOCTOR

CEO - Founder at TaashaTech Infosolutions Pvt. Ltd. & Metis Intellisystems Pvt. Ltd. , Senior Technology Leader

6 年

WOW !! This is at a different Level !!?

要查看或添加评论,请登录

Dhaval Mandalia的更多文章

  • Anomaly Detection in Telemetry Data using Azure ML

    Anomaly Detection in Telemetry Data using Azure ML

    Anomaly detection is referred to the identification of items or events that do not conform to an expected pattern or to…

  • Part 1 - What is Design Thinking?

    Part 1 - What is Design Thinking?

    Last year I had been part of an accelerator program at CIIE, IIM Ahmedabad for 4 months. During that I had a chance to…

  • 12 Cardinal Rules of Engagement I Learnt From My Career

    12 Cardinal Rules of Engagement I Learnt From My Career

    What if, you could roll back the clock and restart your career. For all those executives/managers with more than a…

    1 条评论
  • Classroom Training Is Working.....Really?

    Classroom Training Is Working.....Really?

    Lets think through the situation. You are a Manager.

    2 条评论
  • 4 Modern Challenges With Classroom Trainings

    4 Modern Challenges With Classroom Trainings

    For ages, classroom training has been the mechanism of delivering knowledge to employees in organizations. With advent…

  • Best Practices for Distributed Agile

    Best Practices for Distributed Agile

    With ever changing business requirements, Agile methods are widely used among organizations to pursue their project…

  • Agile : Now that you know the term...

    Agile : Now that you know the term...

    Welcome! Now that you are aware about the term Agile we will go through some of the values and principles that Agile…

  • Agile 101: Introduction

    Agile 101: Introduction

    Working in Project Management with Agile and Scrum projects for more than six years has provided for me awesome chances…

社区洞察

其他会员也浏览了