Computational Time-Lapse

Computational Time-Lapse

Whenever you've seen a time-lapse video, it probably looked fairly 'jumpy' from one frame to the next with objects and people 'popping' in and out of the field of vision. This is because the goal of time lapse is to take a long video and compress it down to a more digestible form (perfect for our limited attention spans!). In order to do that, most time-lapse videos sample every couple of frames (i.e. every 10-15 frames) in order to compress the original video. As shown below, the streets of NY are filmed over several hours and then compressed into a 2:42 video which gives the viewer a nice overall summary.

However, what if there was a way to compress this video into the same length but also preserve as much motion as possible? Put another way, can we remove the choppiness as shown in the NYC time lapse above? Why would you want to do that? Maybe you're creating a time-lapse of someone cooking in the kitchen. The person is in and out of the kitchen, grabbing supplies and other ingredients necessary. It would be great if an algorithm could cut out all of the video without motion, i.e. when no one was in the kitchen cooking, and focused on preserving all of the frames where someone was actively moving in the video.

This was the question that was tackled by Eric Bennett and Leonard McMillan in their paper Computational Time-Lapse Video. The authors explored two avenues to approach time-lapse videos. The first, a non-uniform sampling method to maximize the user's visual objective. In this case, my objective was to maximize the amount of motion present between frames. The second approach, which I did not focus on implementing, was a method to create a virtual shutter that extends the exposure time of time-lapse videos. This is a way to create photos similar to the cover photo for this blog post.

In order to create a motion-saving video, you need to define an objective function to maximize. Therefore, the authors define the following min-error metric which computes the cost of jumping from initial frame i to frame j. Where Aij^(xy) is the y-intercept for each pixel and Bij^(xy) is the slope for each pixel. The gist is that we are summing up the amount of 'motion' that we are missing in between each pair of frames that we jump to. For example, if we jump from frame 1 to from 5, our error is the summation of the metric below for frames 1 to 5.

In order to solve the problem of which frames to choose, the authors implement a dynamic-programming technique outlined below. This approach, function D(s, M), results in the optimal sampling set v that is the min-error reconstruction of your set of frames s. In order to solve this, I created a matrix D with rows equal to the length of s, which is the number of input frames I had and columns equal to M, which was the number of frames I wanted to sample to create this time lapse video from. Then, for each column, i.e. for each frame I wanted to sample, I picked the entry in my matrix D that had the lowest cost. At the end, I had a series of frames that minimized the given error metric (min-error in this case as described above).

You can see my results of this implementation in the video below.

All of the code for this assignment lives on my Github.

Thanks for reading!

要查看或添加评论,请登录

Jonathan Hilgart的更多文章

  • Leading, and Pairing on, ML projects

    Leading, and Pairing on, ML projects

    Most machine learning projects sit squarely in the intersection between "spend two years on this and get back to me"…

  • Testing Machine Learning Systems

    Testing Machine Learning Systems

    Testing in software development is almost a science at this point. You’ve probably heard the quip, “Write tests, some…

  • Real-time audio processing on a Pi

    Real-time audio processing on a Pi

    Over the past couple of days, I've been working on visualizing the frequency spectrum of audio. My goal was to wire up…

    2 条评论
  • Good Ol' Algorithms

    Good Ol' Algorithms

    Did you know it is estimated people make 35,000 decisions per day! Most of these decisions are binary, but some of…

    1 条评论
  • Primitive functions and the bias they introduce

    Primitive functions and the bias they introduce

    Noah Chomsky has a theory that humans have an innate ability to learn language. This ability to learn a language is…

    3 条评论
  • FHIR is Fire

    FHIR is Fire

    You remember the last time you texted an Android phone (yes, I’m assuming you’re an iPhone user) and a green bubble…

    2 条评论
  • How Computers Learn to See

    How Computers Learn to See

    Professional wide receivers have been known to catch a football with their eyes closed. Hockey goalies predict where…

  • Information Security & Social Inclusion

    Information Security & Social Inclusion

    I’m currently reading through the book Computer Security: Principles and Practice which discusses security protocols…

    1 条评论
  • Inductive Bias for Different Machine Learning Approaches

    Inductive Bias for Different Machine Learning Approaches

    Source - https://www.researchgate.

  • How to Teach Computers to be `Intelligent`

    How to Teach Computers to be `Intelligent`

    Does this mean anything to you? If this were a Rorschach test, i.e.

社区洞察

其他会员也浏览了