The Eyes of AI
How Machines Can Learn to ‘See’ the Weather and Enhance Future Energy Management
By: Sho Akama, Rich Johnson, Saurabh Shrivastava, Niladri Roy - Climate Connect Technologies
UPDATE: For those interested in a deeper technical understanding of CNN, the lead technical author (Sho Akama) has written an extended version of this article. With a more substantial explanation in the additional 'Machine's Eye' section.
“Erlang Shen, nephew of the Jade Emperor, and greatest warrior-god, opened his third-eye of Heaven until it was quite round like a phoenix, and looked about him……….He saw that the Monkey King, Great Sage Equalling Heaven, and fiend of the Mountain of Flowers and Fruit, had changed himself into a sparrow……... Upon landing on a mountain stream he transformed into a fish, then plunged into the water. Erlang pursued him to the bank of the stream, but could see no trace of him………. Then, he saw a snake jump out of the water, and realised it was the Monkey King, who rolled down the precipice.……. When Erlang came to the foot of the precipice, he opened his phoenix eye and looked carefully around…... He saw a temple with its flagpole at the back, and observed with a smile, “It must be that fiend monkey over there….. He’s trying to fool me again……...”
- Wu Cheng’en, Journey to the West
Sight or Intuition?
How do we understand the world around us? That is a surprisingly complex question. Let’s try to explain using a real-life example. In the past, whilst enjoying a merry late night meander back from the bar, you may have experienced being barked at, or even chased by a dog.
It was obviously a dog, but did you know what kind of dog? German Shepherd, Labrador, Beagle? In that situation it didn’t matter, it was just a ‘dog’, a barking one with sharp teeth.
But how was that understood? For humans, there is a degree of intuition. Aside from the barking and teeth, it is because we all share a common abstract image of 'dog'. The internal image is likely to vary from person to person. If you grew up with a Golden Retriever, your internal ‘dog’ will look more like that, with some positive emotional attachment. And if you grew up with a Rottweiler snarling at you, your image will be one with scary teeth, and some frightful memories.
Perhaps surprisingly, this is not so dissimilar to how a visual processing AI 'sees' the world.
A Machine's Brain
Having established how humans understand the world in an abstract sense - by having representations inside our brains - enables us to start conceptually understanding how a machine ‘sees’. It obtains its own internal image of reality, which is in a sense the mission of Machine Learning (ML), to obtain a representation of whatever we want it to learn, but in a systematic way. We can encourage machines to form their own internal image of reality using ML, by giving them a mathematical foundation from which they can construct these.
So if we have a picture of a dog, how can we enable a machine to learn how to see that dog?
Figure 1: Dog - a friendly, drooling one (Image credit: Wikimedia Commons)
Figure 1 shows just one dog. But there are hundreds of different breeds, and a machine cannot obtain a general image of ‘dog’ from just looking at a single example. Figure 2 shows several dogs, and we could assemble thousands. Each a different type and in different poses.
Figure 2: Selection of different dogs (Image credit: Wikimedia Commons)
They are simple enough to differentiate. But, now squint your eyes, and you’ll see something like Figure 3.
Figure 3: Blurred dogs - but still identifiable by a human (Image credit: Wikimedia Commons)
As a human, you can probably still tell them apart, but don’t those dogs now look at least slightly similar to each other? Certainly more so than the clear pictures (Figure 2). All we did was decrease the resolution of the image (known as down-sampling). More precisely, we picked a group of pixels and took their average (Figure 4), then repeated for all areas.
Figure 4: The operation of blurring (Image credit: Wikimedia Commons and Sho Akama Images)
Now imagine if we apply the same process to all the picture several times. They will look increasingly similar. We might then only be able to obtain abstract images of ‘dog’, containing some distinct features, like the shape of its head and body, its saucer bowl eyes, wet button nose, drooling tongue, floppy ears, and maybe sharp white teeth. But whereas human intuition can still make sense of these incomplete datasets, machines typically cannot, they lack that intuition.
Convolutional Neural Networks
We all see the world through our own personal 'lens'. Elderly people tend to have slightly blurrier eyes than young people in a physical sense. More conceptually, a politician has a better lens to understand parliamentary dynamics. And a physicist has a lens to see the invisible laws of the universe.
But what does the sight lens of a machine look like? Within the ML domain, this lens is called a filter (or kernel). For any computer, an image is a 2-dimensional array with numbers. It cannot recognise a complex object like a dog in one go. It starts from understanding edges and lines, and by stacking different filters, it can eventually ‘see’ a dog. An ML model specifically designed for computer vision is called a Convolutional Neural Network (CNN).
Traditionally the set of filters is handcrafted by specific ML-area experts. For a human face, you may want to consider a filter that can detect the general geometric shape of faces, like the ‘T-line’ between the nose and eyes (Figure 5).
Figure 5: T-Line, on just some random guy (Image credit: Wikimedia Commons)
To identify a chameleon, you need a very different set of filters than to identify a human (maybe less so if they are of the Karma variety). This means every time you want to detect something new, you need a group of experts for R&D. Imagine trying to develop the best filters for a chameleon. What would they be? To detect rounded edges and stack those up to find two big eyeballs? Why not a filter to find their signature long tongue? Now how are those filters different from the filters for a human? Can our filter distinguish the two creatures in figure 6?
Figure 6: Chameleon vs human (......probably) (Image credit: Wikimedia Commons)
So there are a seemingly infinite number of possibilities, and even with years of research, it is just not practicable to obtain a ‘best’ set of filters. This is where CNN enters the frame and excels - a machine capable of learning filters relative to given objects.
Figure 7: How a Convolutional Neural Network recognises an object (Image credit: Wikimedia Commons)
In Figure 7, each blue block is applying a filter, then extracting core information relative to that filter (also known as ‘convolution’). The size of subsequent blocks decreases because each time a filter is applied, the resulting image shrinks.
Each sequential group of blue blocks is called a layer. By exploiting sets and combinations of filters, a CNN can recognise complex shapes, like dogs (or chameleons).
CNN for Weather Prediction and Better Energy Management
So now we have understood CNN and how a machine can ‘see’. But the link between recognition of a dog and Energy management probably still feels very abstract.
Rather than jump to how machines can ‘see’ and interpret weather patterns to make forecasts. Let's also consider how applications to forecasting are very different to simple object recognition. There are some important future possibilities that could provide compelling solutions to energy industry problems, but it is important to first smooth that gap of understanding.
ML-enhanced visual recognition can be used to recognise, and more crucially interpret, specific visual weather patterns, in much the same way that they can learn how to recognise dogs. Then, these machines can combine image interpretation data with other weather data-sets, to perform some of the higher level integrative decision-making tasks that a human meteorologist currently does. This automation will enable a speedier response by renewable energy generation and distribution management systems. To better match energy supply and demand, by predicting changes in weather patterns over specific local areas.
For example, CNN could be applied to imagery from various weather sources (satellite and radar images, forecast model output graphics, etc), to forecast the temperature, humidity and precipitation. Such a weather combiner would greatly help with the operations and management of electricity grid load, which is heavily dependent on the weather. So such forecasts are hugely important for balancing energy supply with demand. The use of meteorological imagery in the energy industry has started to garner more interest, yet the intersection of ML (CNN), Energy, and Meteorology remains under-explored. But with the integration of renewable generation rapidly growing across the globe, the need to focus on this is becoming more pertinent.
Taking weather satellite imagery specifically, CNN can observe and forecast solar irradiation at the surface. As it can ‘see’ everything, any weather data represented in the form of an image can have filters applied. These are both inherently complex and seemingly similar, even to the human eye. Human eyes are attentive to fine detail, in principle all-encompassing, and hugely experienced due to the constant repetition of situations. Nevertheless, humans can and do make mistakes, and one set of eyes can only see what is in front of it. This is where CNN wins over human eyes. If given enough data, a CNN can help interpolate multiple images, around the clock. In an ideal future, perhaps Meteorologists would no longer have to do night-shifts – a CNN-driven forecasting model could do the job instead.
CNN Can Bring Human Intuition to Numerical Weather Prediction Models
Within the field of Meteorology, weather forecasts are generated by Numerical Weather Prediction (NWP) models, which are then interpreted by Meteorologists to accurately communicate a weather forecast for a given location. Due to the complexity of weather, the human interpretation of these outputs is a crucial step – the model cannot be relied upon alone.
Figure 8: Typhoon Mawar - didn't require much interpretation (Image credit: Wikimedia Commons)
For example, an NWP model output may show there to be short, sharp, heavy showers over Buckingham Palace in London tomorrow. But in reality, these mesoscale-sized heavy showers may occur further to the north than the model predicted. Or maybe the convective rain clouds developed slower than forecasted and ended up causing rainfall later in the day and further downstream from Buckingham Palace. Clearly, there’s uncertainty in the NWP forecast. This is where the job of a trained meteorologist comes in.
The meteorologist would hopefully have understood the complete picture of the NWP forecast and made solid interpretations with that understanding. They see that the predicted rain clouds are small in size and so covering only a very small part of London. They also see that other NWP models are showing the rain clouds to cover different locations. With this added layer of understanding, the meteorologist can then communicate a forecast along the lines of: “Short, heavy showers expected to be scattered around central London.”
Figure 9: ‘Not raining’ at Buckingham Palace (Image credit: Wikimedia Commons)
CNN applied to NWP model graphics could replicate the meteorologists’ interpretation. The CNN can be trained to interpret the spatial relationship of components within the output image, and produce an accurate local forecast. This would be achieved by reading every model output graphic and learning their evolution. This can then be matched with historical model graphics, and the observed conditions seen at every time-step and every location. Once the CNN has enough repetition of each scenario, it will have learnt to understand the complete picture of the forecast, just like a human would. Though the above example of Buckingham Palace is light-hearted, there are far more destructive potential cases in much of the world, where weather patterns are highly volatile and severe.
Similarly, CNN visual recognition can be applied to radar imagery. Radars typically scan the atmosphere every 15 minutes and output a value for the reflectivity that’s bounced back to it by water droplets within clouds and rainfall. Currently a meteorologist must use these radar images to decide if and when rainfall may occur, say over an airport. Use of CNN visual recognition could replicate the human interpolation, and eventually avoid equivalent human error.
Conclusion
Given how fundamental weather forecasting is for the Energy sector, especially as we rapidly integrate more renewables to the grid, there is a clear need to develop weather specific CNN. Computer-vision researchers used to hand-craft filters so that machines could see, now a CNN can learn these filters by itself. In similar spirit, a CNN can start to interpret weather images instead of a meteorologist, eventually achieving far greater accuracy and efficiency. Though development is still very nascent, it echoes what is happening across the energy sector, and many other domains. However, though machines can learn 'how' to see, they don’t yet know ‘what’ to see. So for now, machines have only gained sight. To enhance the sector and address our pressing global energy needs, we must help them gain vision.
?“The only thing worse than being blind is having sight but no vision”
- Helen Keller
Proven Technology Consulting & Disruption Leader.
3 年Thanks for the article. Get the point that metrological predictions in real time, can help optimising power grids. pardon my ignorance. However why only visual data interpretation. Visual data is created by numerical data modelling. Instead cant you directly ingest the numeral data from multiple sources, with a data schema agreed upon, and hence bypass the visual interpretation. Unless of course the visual data source is from. image capture device, camera.
Max credit to Sho A. and Rich Johnson for providing their CNN and Meteorology expertise. They gave the real substance to work with.