DailySnap: Transforming Daily Life into Visual Narratives with AI
Mohamed Yasser
Government Solution Architect | Emerging Technology Advisor | Technology Analyst
The inspiration behind this project stems from Meta 's Orion AR glasses, a cutting-edge augmented reality device under development. These glasses are designed to capture daily moments using a live video feed, integrating seamlessly into a user’s day-to-day life. Leveraging AR and AI, the glasses have the potential to act as a personal assistant, automatically identifying and documenting significant events. This could revolutionize how users track and reflect on their activities, transforming mundane moments into meaningful digital memories.
This project centers around utilizing advanced AI models and Meta's smart glasses to automate the capturing, journaling, and infographic creation from daily life. By leveraging a live video feed, the system identifies significant moments, processes the visual data to create textual descriptions, and finally transforms these descriptions into visually engaging infographics. The goal is to simplify and enrich how users reflect on their day, combining AI-driven journaling with infographic representation.
Key Components:
My Experiment: Mocking the Process with Recorded Video
Since Meta’s smart glasses are still in development, I conducted a mock experiment using recorded video instead of live feeds. Here’s a detailed breakdown of my process:
Step 1: Video Recording as Input
I started by using a pre-recorded video simulating my daily routine, which featured segments of work, leisure, and outdoor activities. The video provided a steady stream of visual data, which mimicked what the smart glasses would capture in a real-time setup.
Step 2: Detecting Key Frames with YOLOv10
Using YOLOv10 from Ultralytics , I analyzed the video for key moments. YOLOv10 scanned each frame to detect objects like laptops, coffee mugs, vehicles, and natural environments. For example, it accurately identified:
领英推荐
Step 3: Generating Descriptions with Llama-3.2-11B-Vision-Instruct
Once the key frames were detected, I passed these images to Llama-3.2-11B-Vision-Instruct. This model generated natural language descriptions for each selected image. These descriptions included not just the objects in the frame but also the activities associated with them. For example:
Step 4: Infographic Creation with Qwen2.5-Coder-7B-Instruct
After generating the journal, I used Qwen2.5-Coder-7B-Instruct to transform these text descriptions into infographics. The model intelligently organized the text into visual elements:
Results and Future Implications
The experiment successfully demonstrated how a combination of object detection, language generation, and infographic creation could automate the journaling process. By turning mundane daily tasks into visually compelling stories, the system can help users better understand their habits, productivity, and daily routine. Once integrated with Meta’s smart glasses, this process could happen in real time, offering an effortless way for users to document and visualize their day.
Looking ahead, this technology could be extended to applications like personal productivity tracking, memory aids, or even social sharing of daily highlights in a visually engaging format. The potential to combine vision, language, and design opens up exciting opportunities for how we capture, reflect on, and share our daily experiences.
Can we see some infographics created through this process?