DailySnap: Transforming Daily Life into Visual Narratives with AI

DailySnap: Transforming Daily Life into Visual Narratives with AI

The inspiration behind this project stems from Meta 's Orion AR glasses, a cutting-edge augmented reality device under development. These glasses are designed to capture daily moments using a live video feed, integrating seamlessly into a user’s day-to-day life. Leveraging AR and AI, the glasses have the potential to act as a personal assistant, automatically identifying and documenting significant events. This could revolutionize how users track and reflect on their activities, transforming mundane moments into meaningful digital memories.

This project centers around utilizing advanced AI models and Meta's smart glasses to automate the capturing, journaling, and infographic creation from daily life. By leveraging a live video feed, the system identifies significant moments, processes the visual data to create textual descriptions, and finally transforms these descriptions into visually engaging infographics. The goal is to simplify and enrich how users reflect on their day, combining AI-driven journaling with infographic representation.

Key Components:

  1. Smart Glasses with Live Video Feed:
  2. Ultralytics ,YOLOv10 for Object Detection:
  3. Meta Llama-3.2-11B-Vision-Instruct for Descriptive Journaling:
  4. Alibaba Cloud Qwen2.5-Coder-7B-Instruct for Infographic Creation:

My Experiment: Mocking the Process with Recorded Video

Since Meta’s smart glasses are still in development, I conducted a mock experiment using recorded video instead of live feeds. Here’s a detailed breakdown of my process:

Step 1: Video Recording as Input

I started by using a pre-recorded video simulating my daily routine, which featured segments of work, leisure, and outdoor activities. The video provided a steady stream of visual data, which mimicked what the smart glasses would capture in a real-time setup.

Step 2: Detecting Key Frames with YOLOv10

Using YOLOv10 from Ultralytics , I analyzed the video for key moments. YOLOv10 scanned each frame to detect objects like laptops, coffee mugs, vehicles, and natural environments. For example, it accurately identified:

  • A laptop and coffee on a desk while working.
  • A vehicle during outdoor commuting.
  • A plate of food during a meal break. The object detection system filtered out irrelevant frames, focusing only on those that held meaningful context for a daily journal.

Step 3: Generating Descriptions with Llama-3.2-11B-Vision-Instruct

Once the key frames were detected, I passed these images to Llama-3.2-11B-Vision-Instruct. This model generated natural language descriptions for each selected image. These descriptions included not just the objects in the frame but also the activities associated with them. For example:

  • "At 9:30 AM, working at a desk with a laptop and coffee."
  • "At 12:00 PM, enjoying lunch at an outdoor café." The result was a structured journal that accurately summarized my day based on visual inputs.

Step 4: Infographic Creation with Qwen2.5-Coder-7B-Instruct

After generating the journal, I used Qwen2.5-Coder-7B-Instruct to transform these text descriptions into infographics. The model intelligently organized the text into visual elements:

  • Icons representing different activities (e.g., work, meals, commuting).
  • A timeline illustrating the order and timing of events.
  • Visual highlights for the most significant moments of the day. The final infographics provided a beautiful and informative way to review daily activities, making it easier to visualize and reflect on how my time was spent.

Results and Future Implications

The experiment successfully demonstrated how a combination of object detection, language generation, and infographic creation could automate the journaling process. By turning mundane daily tasks into visually compelling stories, the system can help users better understand their habits, productivity, and daily routine. Once integrated with Meta’s smart glasses, this process could happen in real time, offering an effortless way for users to document and visualize their day.

Looking ahead, this technology could be extended to applications like personal productivity tracking, memory aids, or even social sharing of daily highlights in a visually engaging format. The potential to combine vision, language, and design opens up exciting opportunities for how we capture, reflect on, and share our daily experiences.

Can we see some infographics created through this process?

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了