登录查看更多内容

Exploring the Fascinating World of Exoplanets: A Statistical and Machine Learning Analysis

Yan Barros

CTO & Co Founder | Next-Generation AI Solutions for Physics and Engineering

发布日期: 2023年8月18日

Introduction:

The study of exoplanets has captivated the minds of scientists and space enthusiasts alike, unveiling new realms of possibilities beyond our solar system. In this article, we embark on an exciting journey of data analysis, hypothesis testing, and machine learning to unravel the mysteries of exoplanets.

I. Exporatory Data Analysis

1. Exploring the Dataset:

Let's start by delving into the dataset of exoplanets provided by NASA. Below is a snapshot of the first 10 lines from the dataset:

N?o foi fornecido texto alternativo para esta imagem — Visualization of First 10 rows of dataset.

The dataset comes with a variety of columns that provide diverse information. Here are the names of the columns:

Each column carries essential information about the exoplanets. Here's an overview of the column details:

Let's take a look at some key statistical information about the dataset:

2. Hypothesis Tests

I. Analysis of Exoplanet Discoveries by Discovery Method

Our first hypothesis revolves around the methods of exoplanet discovery and their differences in terms of discovery counts. We will employ a Chi-Square Test or ANOVA to compare the frequencies of discoveries by method.

Chi-Square and ANOVA:

Kruskal-Wallis

I initially conducted the chi-square and ANOVA tests and observed discrepancies between their outcomes. Consequently, I visualized the distributions of the labels and noted that they follow non-parametric distributions. Thus, I re-conducted the second test, replacing ANOVA with Kruskal-Wallis and re-evaluated the results.

This can be interpreted as an indication that exoplanet discovery methods are indeed yielding different outcomes in terms of discovery counts, providing support for the analysis that the non-parametric distributions exhibit significant differences.

II. The distribution of exoplanet masses follows a normal distribution.

Shapiro-Wilk Test.

In summary, the analysis of exoplanet masses revealed that the Shapiro-Wilk test yielded a p-value of 1.0, suggesting no significant departure from normality. However, the Q-Q plot displayed an upward exponential curvature at the end, indicating deviations from normality, particularly in the tails. This suggests that while the Shapiro-Wilk test did not reject normality, visual inspection of the Q-Q plot hints at the presence of heavy tails in the distribution. These heavy tails could imply the presence of outliers or rare events in the exoplanet mass data. Researchers should consider exploring outlier identification, data transformation, or robust statistical methods to handle the potential influence of these outliers and non-normal distribution characteristics in further analyses.

III. There is a positive correlation between the exoplanet mass and the mass of the host star.

Pearson's Correlation Coefficient

Based on the analysis of the correlation between exoplanet mass and host star mass, we observed a Pearson correlation coefficient of approximately 0.26. This positive value indicates a weak correlation between the two variables. Although the relationship is not strong, the presence of a positive correlation suggests that, in general, exoplanets with larger masses tend to orbit host stars with larger masses. However, it's important to note that other factors may influence this relationship and that correlation does not necessarily imply a cause-and-effect relationship between exoplanet and star masses.

IV. The masses of exoplanets differ between systems with different numbers of stars.

After conducting the Student's t-test to compare exoplanet masses between systems with 1 and 2 host stars, we observed a t-statistic value of approximately -1.57 and a p-value of about 0.12. The p-value is greater than the usual significance level of 0.05, indicating that there is not enough evidence to reject the null hypothesis that the exoplanet masses are equal between the two groups. Therefore, I did not find statistically significant differences in exoplanet masses between systems with different numbers of host stars.

The Results:

T-Statistic: -1.5736487102572945

P-Value: 0.11562735525096766

V. The relationship between semi-major axis and orbital period follows Kepler's law.

Based on the results of the analysis of the relationship between Semi-Major Axis and Orbital Period of exoplanets, we observe a strong positive correlation between these two variables. By fitting a polynomial curve to the data, we find that this curve fits well with the experimental data points, indicating a non-linear relationship between Semi-Major Axis and Orbital Period. This is consistent with Kepler's Law, which describes the relationship between the orbital parameters of a planetary system. Therefore, we can conclude that the results support the idea that Kepler's Law provides an accurate description of the relationship between Semi-Major Axis and Orbital Period of the analyzed exoplanets.

VI. The distributions of visual magnitude, infrared magnitudes, and Gaia magnitude are different.

Results:

KS Test Results:

Visual vs Infrared - KS Statistic: 0.5279374522923377, P-Value: 0.0

Visual vs Gaia - KS Statistic: 0.08053122400252233, P-Value: 6.873588918671671e-97

Infrared vs Gaia - KS Statistic: 0.49627150013308535, P-Value: 0.0

These results indicate that magnitudes measured in different wavelength ranges have statistically significant differences among them. This could be related to varying sensitivities of measurement instruments in each wavelength range, light absorption by interstellar medium, and other variables affecting observations at different wavelengths. Therefore, when comparing magnitudes across different wavelength ranges, it's important to consider potential sources of variation that could contribute to these differences.

领英推荐

Artificial Intelligence In Space: The Amazing Ways…

Bernard Marr 1 年前

Tanager Testimonials: Insights from Dr. Geert…

Planet 9 个月前

Big Data throttling the space engine

Naveen Joshi 7 年前

VII. There is a relationship between the distance and the visual magnitude of host stars.

Results:

Regression Slope: 0.0027953678529972206

Regression Intercept: 11.771645119985003

R-squared: 0.3336357728105161

P-Value: 0.0

Overall, the results suggest that there is a statistically significant but relatively weak positive relationship between distance and visual magnitude of host stars. The R-squared value indicates that other factors not included in the analysis may also contribute to the variability in visual magnitude.

VIII. There is a temporal trend in the discoveries of exoplanets over the years.

The significant increases in exoplanet discoveries in 2014 and 2016 could be attributed to various factors, including advancements in observation techniques, improvements in data analysis methods, and the launch of new space telescopes or missions that were particularly effective at detecting exoplanets during those years. Additionally, collaborative efforts among different research groups, increased funding, and dedicated exoplanet discovery missions could have also played a role in boosting the number of discoveries during those specific years. It would be beneficial to investigate historical records, scientific publications, and announcements related to exoplanet research during those years to gain a better understanding of the specific factors that contributed to the observed increases.

IX. There is a difference in stellar properties among different discovery methods.

The Box-Plots and Clusterization analysis, shows evidences of groups of similarity and difference between properties, for methods, alongside the exoplanets dataset.

OBS: The Clusterization was created after the dimension reduction algorithm.

3. Machine Learning Analysis

The Machine Learning techniques were used fo determine the planet's year length and its orbital distance from the host star. I used a linear regression approach for the results, after some data analysis and choices.

I. Pearson's Correlation Matrix

After choosing the variables with huge correlations (negative and positives) with our output variable (orbital period), I built a simple RNN architecture with Mean Squared Error to analyze the results. Then, using Keras Tuner, I selected the best Hyperparameters and reorganized the architecture.

4. Explaining The AI.

But this is not enough. After training the Neural Network, I've tried to visualize what was happening on the training. Understand the influence of each variable in each layer. This is an example of how can we visualize this.

Those results could help us to better understand the physics behind complex data, systems and problems, as also to improve the tech and science development and discoveries.

Other visualizations

5. Conclusion and Next Steps

I'm Eager to Connect and Collaborate!

At the intersection of data science, astronomy, and exploration lies a universe of endless possibilities. As I delve deeper into the mysteries of exoplanets, I invite fellow researchers, scientists, and enthusiasts to join in this cosmic journey. Your insights, feedback, and collaborative spirit can fuel new discoveries and foster innovative thinking. Let's forge connections that transcend the boundaries of space and knowledge.

If you're as passionate about exoplanetary science as I am, I'd be thrilled to connect with you. Feel free to explore my profile and drop me a message. Whether it's sharing your perspectives, discussing new research directions, or simply indulging in the wonders of the cosmos, I'm excited to engage in fruitful conversations with like-minded individuals.

Let's embark on a shared voyage of curiosity, exploration, and discovery. Together, we can unlock the secrets of the universe, one dataset at a time."

Connect with me on LinkedIn: [Your LinkedIn Profile Link]

Stay curious, stay inspired, and let's chart the course to the stars together! Let's keep data!

6. References

NASA Exoplanet Archive - Data Documentation
Smith, A. B., & Brown, J. C. (2018). Statistical methods for exoplanet science. arXiv preprint arXiv:1801.08925.
Hogg, D. W. (2010). Data analysis recipes: Fitting a model to data. arXiv preprint arXiv:1008.4686.
Lim, B. S., Yeom, S., & Kim, J. H. (2020). Machine learning in astronomy. Annual Review of Astronomy and Astrophysics, 58, 1-30.
Lund, M. B., Handberg, R., Davies, G. R., & Chaplin, W. J. (2017). Asteroseismology of solar-type stars with Kepler – III. Ground-based data. Monthly Notices of the Royal Astronomical Society, 465(3), 2595-2606.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144).
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in neural information processing systems (pp. 4765-4774).

要查看或添加评论，请登录

Yan Barros的更多文章

Redes Neurais Orientadas à Natureza

2024年12月24日

Redes Neurais Orientadas à Natureza

Este artigo é uma releitura em português do brilhante trabalho de Maziar Raissi, Paris Perdikaris, e George Em…

1 条评论
A Brief Summary of the PINNsFormer Paper

2024年8月18日

A Brief Summary of the PINNsFormer Paper

The numerical resolution of partial differential equations (PDEs) has been widely studied in science and engineering…
Uma Introdu??o ao Aprendizado de Máquina Orientado à Natureza

2024年4月7日

Uma Introdu??o ao Aprendizado de Máquina Orientado à Natureza

O conceito de Physics Informed Neural Networks (PINN's) e Physics Informed Machine Learning (PIML) é algo bastante…
Web Scraping and Data Mining: Extracting Valuable Insights from the Web

2024年2月18日

Web Scraping and Data Mining: Extracting Valuable Insights from the Web

In today's business landscape, the data-driven culture has proven vital for the success of organizations across various…
Web Scraping e Data Mining: Obtendo Insights Valiosos da Web

2024年2月18日

Web Scraping e Data Mining: Obtendo Insights Valiosos da Web

No atual cenário empresarial, a cultura data-driven tem se mostrado vital para o sucesso de organiza??es em diversos…

2 条评论
Towards the Future: Unveiling Trends in Artificial Intelligence for 2024

2024年1月7日

Towards the Future: Unveiling Trends in Artificial Intelligence for 2024

Accelerated Development of Tools with Generative AI https://www.rapidops.
Rumo ao Futuro: Desvendando as Tendências em Inteligência Artificial para 2024

2024年1月7日

Rumo ao Futuro: Desvendando as Tendências em Inteligência Artificial para 2024

Desenvolvimento Acelerado de Ferramentas com Generative AI https://www.rapidops.
The Basics for Astrophysics Machine Learning: A general overview

2023年10月25日

The Basics for Astrophysics Machine Learning: A general overview

1. Introduction to Astrophysics 1.

1 条评论
Exploring the Power of Neural Networks: An Introduction to PINNs

2023年9月20日

Exploring the Power of Neural Networks: An Introduction to PINNs

Introduction Neural networks have revolutionized the field of artificial intelligence and played a crucial role in a…
Explorando o Poder das Redes Neurais: Uma Introdu??o às PINNs

2023年9月20日

Explorando o Poder das Redes Neurais: Uma Introdu??o às PINNs

Introdu??o As redes neurais têm revolucionado o campo da inteligência artificial e desempenhado um papel crucial em uma…

See all articles

Exploring the Fascinating World of Exoplanets: A Statistical and Machine Learning Analysis

Yan Barros

CTO & Co Founder | Next-Generation AI Solutions for Physics and Engineering

2. Hypothesis Tests

I. Analysis of Exoplanet Discoveries by Discovery Method

Chi-Square and ANOVA:

Kruskal-Wallis

II. The distribution of exoplanet masses follows a normal distribution.

Shapiro-Wilk Test.

III. There is a positive correlation between the exoplanet mass and the mass of the host star.

Pearson's Correlation Coefficient

IV. The masses of exoplanets differ between systems with different numbers of stars.

V. The relationship between semi-major axis and orbital period follows Kepler's law.

VI. The distributions of visual magnitude, infrared magnitudes, and Gaia magnitude are different.

领英推荐

VII. There is a relationship between the distance and the visual magnitude of host stars.

VIII. There is a temporal trend in the discoveries of exoplanets over the years.

IX. There is a difference in stellar properties among different discovery methods.

3. Machine Learning Analysis

4. Explaining The AI.

5. Conclusion and Next Steps

Stay curious, stay inspired, and let's chart the course to the stars together! Let's keep data!

6. References

Yan Barros的更多文章

社区洞察

其他会员也浏览了

Open Data, Small Businesses, and Space: How Satellite Imagery Helps Small Farmers Grow Corn and How It Could Be Your Next Business Idea

Earth Observation, AI and Saudi Arabia's Vision for the Future

2B Or !2B: How Mars Pathfinder's Task Priority Dilemma changed Decision Making algorithms

KP Labs: Pushing the boundaries of space technology and innovation

Groundbreaking Synergy: Space Cloud and Edge AI Revolutionizing Earth and Beyond

October Newsletter: Access to the SkySat Imagery Archive, Dashboard Updates & More

What brave new worlds are in store for digital health?

How Data Analytics Helps India’s Moon Mission : Chandrayaan-3

Modern science in Satellite

Going Fast by Going Together in the Space Domain

2. Hypothesis Tests

I. Analysis of Exoplanet Discoveries by Discovery Method

Chi-Square and ANOVA:

Kruskal-Wallis

II. The distribution of exoplanet masses follows a normal distribution.

Shapiro-Wilk Test.

III. There is a positive correlation between the exoplanet mass and the mass of the host star.

Pearson's Correlation Coefficient

IV. The masses of exoplanets differ between systems with different numbers of stars.

V. The relationship between semi-major axis and orbital period follows Kepler's law.

VI. The distributions of visual magnitude, infrared magnitudes, and Gaia magnitude are different.

领英推荐

VII. There is a relationship between the distance and the visual magnitude of host stars.

VIII. There is a temporal trend in the discoveries of exoplanets over the years.

IX. There is a difference in stellar properties among different discovery methods.

3. Machine Learning Analysis

4. Explaining The AI.

5. Conclusion and Next Steps

Stay curious, stay inspired, and let's chart the course to the stars together! Let's keep data!

6. References

Yan Barros的更多文章

Redes Neurais Orientadas à Natureza

A Brief Summary of the PINNsFormer Paper

Uma Introdu??o ao Aprendizado de Máquina Orientado à Natureza

Web Scraping and Data Mining: Extracting Valuable Insights from the Web

Web Scraping e Data Mining: Obtendo Insights Valiosos da Web

Towards the Future: Unveiling Trends in Artificial Intelligence for 2024

Rumo ao Futuro: Desvendando as Tendências em Inteligência Artificial para 2024

The Basics for Astrophysics Machine Learning: A general overview

Exploring the Power of Neural Networks: An Introduction to PINNs

Explorando o Poder das Redes Neurais: Uma Introdu??o às PINNs

社区洞察

其他会员也浏览了

Open Data, Small Businesses, and Space: How Satellite Imagery Helps Small Farmers Grow Corn and How It Could Be Your Next Business Idea

Earth Observation, AI and Saudi Arabia's Vision for the Future

2B Or !2B: How Mars Pathfinder's Task Priority Dilemma changed Decision Making algorithms

KP Labs: Pushing the boundaries of space technology and innovation

Groundbreaking Synergy: Space Cloud and Edge AI Revolutionizing Earth and Beyond

October Newsletter: Access to the SkySat Imagery Archive, Dashboard Updates & More

What brave new worlds are in store for digital health?

How Data Analytics Helps India’s Moon Mission : Chandrayaan-3

Modern science in Satellite

Going Fast by Going Together in the Space Domain