A Brief History of the iPhone 12 - from an 'XR' perspective
David Francis
Immersive storyteller, content-maker, advisor & strategist | Virtual Method Co-Founder | Forbes Technology Council
So I have been hanging-on to my iPhone XS Max for a while now - waiting for the iPhone 12. Why? Because of one thing: the LIDAR (Light Detection and Ranging) sensor. In this brief article, I will give you a little history of Apple and their LIDAR sensor, some definitions of computer depth sensing, and then outline a few of the huge opportunities it presents.
In 2013, a little computer vision company in Israel, called PrimeSense, really started to ramp-up. PrimeSense had already done very well, supplying Microsoft with the RGBD (Red Green Black Depth) sensors they needed for the Microsoft Kinect 1, albeit that the device was mostly used to enhance XBox gameplay interface options at the time. But Microsoft decided to bring the development of RGBD sensors in-house and was moving away from PrimseSense for Kinect 2. So PrimeSense started to develop sensors that could clip-onto other devices - such as tablets - to open up their options. Their first major offering was the Capri: https://www.engadget.com/2013-05-15-primesense-demonstrates-capri-3d-sensor.html
No doubt that at this time, Apple had begun to watch PrimeSense very closely.
At the time, the AR industry was dominated by two SDK-providing players: Qualcomm Vuforia (US) and Metaio (German). When Qualcomm brought the Capri to their annual Uplinq conference in September 2013 and showcased their 'Smart Terrain' software that recognised the geometry of any environment and procedurally turned it into an Augmented Reality game-terrain (amongst other use-cases), the AR world really sat-up. And no doubt, so did Apple. Two great innovators of the computer-vision-for-AR industry, Jeff Powers and Vikas Reddy (Co-Founders of Occipital), had already seen the power of a device like Capri and had built their own version with a PrimeSense sensor inside, called Structure Sensor (I still have mine that I bought back then and it works amazingly well): https://techcrunch.com/2013/09/17/occipitals-new-structure-sensor-turns-your-ipad-into-a-mobile-3d-scanner/
There was just too much momentum happening around PrimeSense at that time. The iron was hot - and Apple struck, acquiring them the month after, for $360M: https://www.theverge.com/2013/11/24/5141416/apple-confirms-primesense-acquisition. And in typical Apple-fashion, the PrimeSense team disappeared (although they were allowed to stay in Israel and significantly increased in size, I am told).
Fast-forward six months later and Google launch Project Tango, which was the first Android-powered device with the RGBD sensors embedded. But the device was a prototype in BETA and they only ever released a very small number of units (as I recall, maybe 200?). I was fortunate enough to be loaned one by my mentor and great mate, Dave Lorenzini, and I showed off the first Project Tango in the country to the Australian development community - and the TODAY Show on Channel 9(!) whose producers wanted to focus on the concept of 3D Selfies. Project Tango wasn't stable enough for 3D model creation at that time so I actually just used 123D Catch on my iPad to make the 3D bust of the reporter. Please excuse the shockingly bad audio that grabbed the background sound as well):
But Project Tango was never destined to become a featured, embedded kit on Android phones - and ultimately just served to provide a lot of the data required to build-out ARcore Visual Inertial Odometry (VIO).
At the same time, Facebook made their first (arguably second after Face.com) AR acquisition, taking the world's leading Visual SLAM SDK, 13th Lab, into their own fold: https://arcticstartup.com/swedish-13th-lab-acquired-by-facebook/
The final piece of the puzzle came in mid 2015, with Apple's acquisition of Metaio, for what I am educated-guessing was about $200M : https://techcrunch.com/2015/05/28/apple-metaio/. Co-Founders Peter Meier and Thomas Alt both knew the power of RGBD in mobile devices and how it would solve so many of the issues they had been trying to resolve through VIO. This is evidenced in this video by Peter, from 2014 (he mentions at 1:15):
Peter's special obsession, had been the 'Augmented City' and having persistent, occluded information overlaid on a city to turn it into the ultimate IoT-powered 'Smart City'. Note well that Peter is still at Apple.
And then RGBD in the rear-facing cameras of phones and tablets effectively disappeared - until the launch of the new iPad Pro several months ago and, this week, the iPhone 12.
WHY IS LIDAR SO IMPORTANT??
Because LIDAR allows your phone to understand the exact dimensions and distance of everything your camera sees. As human beings, we recognise things in the real world not just by the photographs in our memory - but also by our stereoscopic depth perception: our two eyes offset around 5cms from each other (on average). When we close our eyes, most of us get lost, except for blind people who have highly functioning aural senses and can 'hear' their way through a space. So they can still navigate. When you apply these perception-types to devices, this is how we describe the tech:
Visual SLAM: this uses just one camera - or can use two, to create 2x stereoscopic points/poses/angle-of-view of a scene. When we have 2x vision-points, those 2x images are like independent eyes that detect depth. But what happens if the sun reflects into the camera or the room is dark or there are a lot of similar coloured objects? Then, we need:
VIO: The 'inertial' in Visual Inertial Odemetry is the fall-back on the gyro, accelerometer and GPS sensors in a device when the vision fails. These sensors are collectively called the IMU (Inertial Measurement Unit) and they act analogously to 'heightened hearing', detecting movement and trying to determine position in space just through those senses. A great example of pure IMU spatial-sensing is the new Spatial Audio feature update for Apple Airpods - that can detect where you are walking. But all these sensors, vision and IMU, require a lot of compute and can struggle with accuracy in certain environments. Cue: LIDAR (like the iPhone 12 has):
LIDAR: shoots out a scatter-spray of laser-beams into the space in front of the camera. Typically, this will extend around 15 feet before they lose effective definition (although I don't know the exact specs for iPhone 12). These dots time-of-flight detect objects and relay that spatial information back to the device so that the device can then log that object and understand its position, relative to it. With a lot of compute, it can even do that with moving objects (TBC on iPhone 12). It can then use the normal camera to map images to those objects. And it can also create a very lightweight algorithm to store and remember that space - so that the device will always remember it has been there.
WHAT ARE THE OPPORTUNITIES??
Search: Forget image search, now we can object-search. Now, you can find something that not only looks like your current couch - but is also the exact same size.
E-Commmerce: Selling something online? Walk around it with your phone and create a 3D model in seconds. The dimensional data is also stored so that when you put it on your site, the potential buyer can click on it and view + inspect it spatially at its real world size - even check how it looks in their kitchen.
Property: selling a property, just walk through it and scan the walls and floors to build a completely accurate 3D model. Don't have time? Get 5 mates with iPhone 12's to join you and the mesh network created by the Ultra-wide band connectivity will allow you to collaborate your scans into a model straightaway.
Make your mark: Leave your friends notes, animations, videos etc in the real world at your favourite bar. When they arrive several night later, they scan around and see the notes and videos you left for them. Leave a customer or maintenance person a video or note on what you've done or what needs to get done.
Fashion: Don't know your size? No worries. Just get a friend to spend a couple of mins scanning you in and your dimensions will be perfectly matched as you shop online.
Volumetric Capture: One of your best mates just wrote an awesome song. Why don't you and 3 of your friends stand around her as she sings - and you capture her performance in full 3D. Or even broadcast it Live in 3D.
Gaming: Imagine Fortnite meets Pokemon GO. Running around the streets with characters hiding behind objects, around corners etc, firing weapons down streets and seeing ordinates from other players whizz past you. UWB connectivity delivering zero latency between you and your buddies/team as you play in the augmented world, together.
These are just a few opportunities that are available when you have a really powerful LIDAR sensor in your phone. The whole world of content becomes WAY more about 3D than 2D. The iPhone 12 has been many years in the making, and very few yet appreciate the massive opportunities it represents (AND what a huge precursor it is, for Apple's eyewear), but we have just been 'given' a very powerful 'extra sense' in LIDAR - and I for one feel very fortunate to be in this industry and this community, at this moment.
If you're imagining ways this might benefit your brand/business and want to validate your ideas, please get in touch and I'm more than happy to chat.
Investment Research | Microeconomics | Game Theory
4 年Ian Olson / David Lowe / Kevin Tasker / Pointerra
Product Strategy Manager at Beca | Designing digital twin solutions and empowering decision makers to use every resource wisely | Speak to me about how digital twins can work for your business
4 年Concur 100%
Executive Leader ? Commercial ? Communications ? Strategy
4 年I’m hoping this is the consumer-level LIDAR tipping-point too
Strategy | Transformation | Sales & Marketing | The Marketing Academy Alumn
4 年Be in touch soon. It has been a big week!