Comparing AI-Generated Spaceflight Animations: Grok 3 vs. DeepSeek, ChatGPT, and Claude AI

Comparing AI-Generated Spaceflight Animations: Grok 3 vs. DeepSeek, ChatGPT, and Claude AI


Introduction

Elon Musk's AI company, xAI, recently launched Grok 3, showcasing a demonstration of a spacecraft's journey from Earth to Mars and back. The live simulation highlighted Grok 3’s ability to generate complex orbital mechanics with smooth, realistic animations. This caught my attention, leading me to test how other major AI models—DeepSeek, ChatGPT, and Claude AI—performed in a similar task.

I asked each model to generate Python code for a 3D animated simulation of a Mars-bound spacecraft mission and then evaluated their results based on animation quality, scientific accuracy, and execution errors.

Key Takeaways

  • Grok 3 produced the most scientifically accurate and visually smooth animation.
  • DeepSeek and Claude AI performed well, generating error-free code with decent animations.
  • ChatGPT failed, producing incomplete animations and error-prone Python code.

Here in the video I have compiled, you can see the generated animations comparison.

The Experiment Setup

The prompt given to each AI model was:

"Generate code for an animated 3D plot of a launch from Earth, landing on Mars, and then back to Earth at the next launch window."

The generated Python scripts were executed in VS Code, and I recorded the animations for direct comparison. The key aspects of evaluation included:

  • Visual Accuracy: How realistic the transition appeared
  • Physics Implementation: Use of correct orbital mechanics
  • Execution Success: Whether the generated code ran without errors

source: https://github.com/clickonsol/race-to-mars-ai-code-challenge

AI Model Comparisons


Grok 3

Grok 3 swiftly produced Python code that accurately depicted a spacecraft’s journey using Hohmann transfer orbits. The animation was smooth and, although not visually rich in content (like claude ai or deepSeek) but reflecting a deep understanding of orbital mechanics. This aligns with xAI’s demonstration, where Grok 3 generated code for a 3D animation of a space launch, showcasing its advanced reasoning capabilities.

DeepSeek

DeepSeek provided a detailed explanation of the mission's mechanics but required more time to generate the code. The resulting animation utilised parametric equations for trajectory approximation; however, the transitions lacked the refinement seen in Grok 3's output.?

Claude AI

Claude AI focused on Hohmann transfer orbits and ensured a scientifically accurate approach. Its code was structured efficiently, and the animation was visually pleasing. However, the transition between the orbits lacked the correct positioning and motion paths seen in Grok 3’s output.


ChatGPT (Multiple Models)

I tested multiple ChatGPT variants (4o, 4o-mini, and o3-mini-high). Unfortunately, ChatGPT failed completely. The animations were either half-done or visually incomplete, lacking proper physics. Worse, its generated Python code had multiple errors when executed. No other AI chatbot had execution errors, making ChatGPT the weakest performer in this comparison.

Video Demonstration

I documented the entire comparison in a video where you can see the animations side by side. Watch the full breakdown here: https://www.youtube.com/watch?v=xtoDMD0ZY2A

Conclusion

This experiment shows how different AI models interpret complex tasks like orbital mechanics. While Grok 3 led the way in animation realism (mathematically), each model had strengths in different areas—except ChatGPT, which struggled with execution. As AI continues to evolve, future advancements could lead to even more sophisticated AI-generated aerospace simulations.

Note: I took AI's help for analysis as I am not an aerospace engineer or an astronaut. However, after playing the animation and comparing the analysis did sound accurate and needed little editing.

Intizar Jaffery

Functional Analyst at eHealth NSW

2 周

Does this represent Microsoft made a mistake in partnering with ChatGPT / CoPilot ? Or may be their think tank considerd generating complex code and leverage for corporate system is far from adoption atleast for few years

回复

要查看或添加评论,请登录

Hassan Syed的更多文章