Neuro Design AI
Darren Bridger
Co-Founder and VP Science at CloudArmy | Author of Neuro Design & Decoding the Irrational Consumer
How Computational Neuroaesthetics works
The idea that beauty is subjective - "in the eye of the beholder" - has long been taken for granted. However, recent advances in computer vision are challenging this notion. Researchers have developed algorithms that can analyze images and predict how visually appealing or memorable they will be to humans with surprising accuracy. This suggests that qualities like attractiveness and memorability are not purely subjective, but have objective underpinnings that can be quantified and predicted. Though people will always experience art and imagery through the lens of their personal tastes, computers are revealing that some aesthetic qualities transcend individual perspectives and may be rooted in how our visual systems are wired.
The field behind these discoveries is ‘Computational neuroaesthetics’ or just ‘Computational aesthetics’ (I will call it ‘Neuro Design AI’ from here on).?
Researchers in this field codify how people respond to different types of images and designs. They often do this by taking large collections of images that we have people’s reactions to - such as online image libraries where viewers can ‘like’ an image - and then analysing these images and using AI to search for common features between the most liked images that aren’t present in the least liked images.??
Their tools are mathematical, but often inspired by the human brain itself. Like so many areas of the world before it, aesthetics is yielding to greater understanding thanks to its ability to be described, understood and predicted via mathematical models. And, in turn, these models are giving us insights into how our own minds work.?
Obviously, the ability for software to understand how pleasing an image is to the Human eye is of use across many fields. For example, it could be used to automatically screen images to ensure they reach a certain threshold of quality. Or to give feedback suggestions to designers on ways to potentially improve a design.
You've already encountered the world of Design AI if you've ever used suggested photo effects, trimmed an image using an overlay grid, or sought suggestions for films or TV shows on a streaming service. Photo apps in particular are now replete with AI features designed to help you create more compelling images. Bringing design abilities once only the domain of design professionals into the hands of everyone. And newsfeeds on social media platforms are already feeding you images and videos that it 'thinks' you will like. There are many other digital ‘tricks’ that AI can perform when analysing images. For example, there are algorithms that can identify an artist or author purely through analysing the features of a particular piece of work.?
Design AI algorithms are being used to optimize the aesthetics of everyday products - for example Philips smart lights can change your mood by changing the colour temperature, leading to better sleep or greater alertness.?
This long-form article is intended to give a flavor of the types of things that Design AI is able to measure that relate to our sense of aesthetics. I’ll examine some of the most frequently used Neuro Design AI tools and consider why they are relevant to how our minds perceive images and prefer particular types of images.?
Design and maths rules
Since ancient times, artists have used mathematical laws for beauty. Renaissance painters paid attention to things like harmony and proportions, often inspired by the mathematical patterns embedded in nature itself.?
but only recently have we had tools to analyze images extensively and to a greater depth and complexity than ancient artists could.?
Our eyes evolved over 600 million years and are able to quickly comprehend the world. Yet within only a few decades we have created computer algorithms that are beginning to rival our own visual abilities. Design AI has grown out of the parent field of computer vision. Computer vision, which emerged in the 1950s, advanced more rapidly in the late 1990s due to powerful chips and neural networks that mimic the way our brains work. Then, in more recent years, it has received lots of research funding to develop technologies like self-driving cars and create sophisticated visual simulations.?
The developments in computer vision have subsequently been joined by researchers using similar analytical techniques to understand Human responses to images.
Just as Humans can glance at an image and intuitively feel if it is appealing or not, researchers are developing models that use image statistics to mimic this skill. It can be helpful to understand some of the principles behind these image statistics.
Artificial neural networks need large amounts of data to learn. They need to be fed many examples of images and how people have reacted to them. For example, collections of images from social media or online photography image banks and how people have reacted to them, by using 'likes' or other feedback scores. Artificial neural networks are also being fed with text descriptions of images, so they have the same understanding of what's happening in an image as people. Sometimes, particularly in the case of understanding the memorability of images, researchers will show lots of images to people and gather their responses directly, rather than by making use of pre-existing databases.?
These three sources of data are creating Artificial Intelligence algorithms with a very good knowledge of how humans react to different types of designs or images.?
These algorithms ‘peel back’ the layers of patterns in an image. Or, to use another analogy, they can be like applying different forms of scanners to a body: the x-ray, the fMRI, the ultra-sound all give us insightful views of different types into the body. Similarly, the various math tools of Design AI can reveal different types of information on an image. And just as medical scanners have provided new insights into the workings of the human body and brain, so the new computational algorithm tools have provided new insights into why we find particular images attractive.
One of the surprising findings of the Design AI field has been just how successful this approach has been so far.?
The hidden connection between beauty and information processing
Why did we evolve to have a sense of beauty or aesthetics??
Evolutionary psychologists theorise that it was to make judgments about people and the natural world that would help us successfully reproduce or survive.?
For example, the link between strong mutation-free genetics and a symmetrical face gave us an intuitive way to pick mates with whom we would be less likely to have children who also shared mutation-free genetics. So we evolve to feel attracted to such faces. Or that we gain pleasure from looking at calorie rich foods, or natural scenes that contain bodies of clear water as we are anticipating the evolutionarily important drives for food and drink. Equally, as with most evolutionary explanations, there are examples that relate to animals and their attraction to beauty, such as the way that male peacocks evolved elaborate decorative feathers to attract a mate. For me, it’s slightly odd to contemplate how animals may also have pleasurable aesthetic experiences.?
However, a more widely applicable theory for explaining our aesthetic feelings looks at them from an information processing perspective. Our visual system - our eyes and the visual cortex in our brains and all the rich connections it makes to other parts of our brains - can be thought of as a tool for extracting information about the world, for learning. Yet the dilemma faced by evolution was that capturing this information comes at a cost: it burns energy. Our eyes and brains have to rapidly process a large amount of light information representing the world around us. Just by opening your eyes and looking at something that’s moving, the glucose consumption in your visual cortex immediately increases by 50% (Lennie, 2003)?
Our visual system evolved to save energy and use a range of tricks to see and understand the world around us with the minimal amount of necessary effort. For example, we pay more attention to the edges and contours of shapes - particularly those whose colours and patterns are homogeneous across the shape. Imagine a green circle. All you need to do to understand and remember it is to understand its edge forms a circle and that it's green. You don’t need to pay equal attention to every part of the inside of the circle in the way that a camera recording an image would.?
Then the visual information that gets sent from the eyes to the visual cortex is itself compressed. Our brains employ a range of techniques to further focus only on the higher priority information, from directing our eyes to elements that seem most different from their surroundings, things that look like animals or people, or focusing on things that match the colour of the object we are searching for while suppressing the visuals of objects that aren’t that colour. The outcome of all this has been described by cognitive psychologist Donald Hoffman as “Billions of bits enter the eye each second, but only forty win the competition for attention”. (Hoffman, 2019)?
But it turns out that the energy saving tricks don’t stop there. Our brain has evolved to reward us for looking at things that are informative yet easy to process. Our brains actually generate pleasurable feelings upon seeing images that are ‘easy on the eye’.?
In other words, they are efficiently encodable.?
This is explained by the theory of processing fluency: that any image which gives us the most information with the least amount of mental effort expended will be attractive to us. (Reber, et al, 2004)
?As an example, consider why people willingly put the effort into working on apparently pointless puzzles like crosswords or sudoku. It's because the pay-off is a particular form of visual pleasure. Neuroscientists have found that the moment of ‘aha!’ When we solve a puzzle it can give us a little hit of pleasure, making the task rewarding for its own purpose. By putting people in an fMRI scanner to monitor which areas of their brains became active while solving simple puzzles, they discovered that brain networks involved with the release of dopamine, one of the brains pleasure chemicals, were activated. (Tik et al, 2018)
The feeling of ‘aha!’ when solving a problem can feel like a pleasurable form of surprise relief, as the difficulty and potential confusing experience of trying to solve the problem gives way to the feeling of ease upon discovering the solution.?
We can see that this is an example of processing fluency. Many aspects of visual perception are like solving small puzzles: there is brain effort involved in figuring out what we are looking at, given less than perfect information or viewing conditions. (Ishikawa, et al, 2019) It may be that the solving of object or pattern recognition is the single largest source of why we find certain images pleasurable to view.
So, with this in mind, let’s now consider how computers can decode images.?
Detecting Image Features With Neural Networks
Some of the early computational studies looked at image features that had been pre-defined using the knowledge of professional artists. In other words, the things that artists intuitively believe are important to making images attractive, such as the rule of thirds in composition. Overlaying an image, such as a photograph or painting, with a grid of equally spaced pairs of horizontal lines and vertical lines, and then placing the subject or subjects of the image at one of the intersection points of the lines helps to make it feel more pleasing. Such compositional rules are already used in applications for automatically cropping images to make them more appealing.
This rule-based approach to studying images has mostly been overtaken by computer neural networks that learn on their own which visual features to extract. While this approach can be more powerful – potentially able to take into account many more image statistics than the ‘hand-crafted’ technique – it can be harder to interpret as one doesn’t always understand the rules that the software derives.
Neural networks are a type of machine learning software that take inspiration from the organisation and function of the brain. They’ve been found to be effective at computer vision applications. These types of applications can ‘learn’ by being fed large sets of data. Such as thousands of photos and user ratings of those photos. Or they can be trained to recognise a particular category of object by being given a large collection of images of that object and images that don’t contain that object.?
At its most basic, computer vision systems use very simple rules to detect lines or edges in an image. They simply iteratively process areas of an image and calculate whether they contain sharp changes in contrast. If they do, there's a good chance they have alighted on an edge. By building up an understanding of where all the edges are, they can then detect patterns such as objects.?
A typical type of neural network algorithm that's used for not only detecting edges but many kinds of more complex objects, patterns and things that make an image aesthetically pleasing are convolutional neural networks (CNN). These essentially apply rules to an image to ‘filter for the presence of particular patterns, then they pool these simple patterns into increasingly more complex ones.?
You can imagine this process by visualising an image in black-and-white. Imagine that a grid has been imposed over the image, so that it's broken down into a large number of small squares.?
Next, imagine you have a mask with a square hole that you place over the image that will only reveal a grid of 25 of those squares at any one time. Each little square can be ‘seen’ by the computer as a number somewhere inbetween -1 (totally black) to +1 (totally white) with 0 representing a mid-range grey.?
Now, as you move your masked window over the grid, for each view of 25 small squares you can answer a question like: do you see at least five squares above one another that have a high contrast with the squares to their left and right? In other words, this ‘rule’ enables the algorithm to detect a line or edge.?
If there is, then you give that big square a score on your second grid. You continue this process as you move your mask across the image systematically (e.g. left to right, top to bottom).?
The result is a new, smaller, grid that reveals the existence of any vertical lines. Then you can imagine another window mask that moves over this ‘vertical line’ grid and detects, again just with a simple rule, whether these lines join together to make a shape. This results in a shape grid. Then this shape grid might use another rule to determine if the shape matches that of a particular object. This type of iterative process, using comparatively simple rules, allows the software to process an image, extracting mathematical patterns from it that can be used to detect objects.
These grids are called layers and the rules that transform one grid into another are called filters or kernels.?
They are essentially a process of filtering the image over and over to extract more complex and abstract information. Each new grid, or layer, is smaller than the previous, and therefore focused more on the overall patterns and objects. The process of scanning over the image (the window mask in our example above) and translating the results to a new grid is known as ‘convolutional’.?
A convolutional neural network works in this spatial manner, understanding that a particular pattern or object can be positioned in different regions of the image.
For example, in the diagram above, there is an image of a square. The first layer of the network, represented by the circular nodes A, is simply detecting the presence of vertical and horizontal lines. Then when these are detected, the second layer of the network, represented by the circular nodes at B, detects the fact that there are a pair of horizontal lines, and a pair of vertical lines. Finally, node C, puts together that information to confirm the image is a square.
The process is simply a method of breaking down the analysis of an image into detecting simpler patterns.?
Such networks are able to quickly detect things like areas of high contrast, something which our brains also detect early on in their process of decoding the visual world in front of us. Increased contrast has also been shown to increase our preference for photographs. Probably because it makes them clearer, and easier for us to process. (Mayer and Landwehr, 2018)
Of course, this is relatively easy to envisage when you have simple rules for detecting simple lines, but how does such a network detect a particular object? For example, detecting a particular animal or a person’s face.
The answer is that it needs some feedback as to whether it was right or wrong and then begins playing with random rules and checking the outputs against the desired one (e.g. the actual image it's trying to recognise). Just through sheer brute force of repeated attempts it can alight on the successful sequence of rules to detect the object. This might take millions of cycles. It works, but the downside is the exact rules its using might not be clear to us as Humans.?
The intermediate layers of rules inbetween the inputs and the outputs are why techniques like this are called ‘deep learning’. With only one or several layers - to solve a simpler more direct problem - you would have a ‘shallow learning’ algorithm.?
For example, more simple image analysis tasks are often handled by types of analysis called decision trees and linear regression. While these don’t have the power of deep neural networks the way they calculate their results is more transparent. The layers in CNN’s are often referred to as ‘hidden’ as their exact operations can be opaque.?
Nevertheless, the process itself, of grouping together small elements using basic rules into increasingly larger and more global patterns and objects is the same way that the human visual cortex processes images.?
(Some other popular techniques for decoding images are covered in the Appendix)
Predicting Attractiveness: Low-Level Features and Complexity
Computers use a variety of techniques to break down an image into mathematical patterns. The types of information that researchers typically look at can be thought of as either low-level and local or higher level and global. Low level are the small details of an image. The little bits that our brains have to first process before they can understand the higher level features such as objects, how they are positioned relative to one another and how familiar they are to us.?
Here are some examples:
The difference between these categories isn’t perfect, and can sometimes be blurred. But they nevertheless give us a way to categorise the type of visual patterns that make up an image.?
In the rest of this article I will summarise the evidence that the attractiveness of images is a combination of low level and higher/more global level features of images, the attention-grabbingness of images is typically predicted more with lower level features and the memorability of images is mostly predicted using higher-level/more global features.
Low-Level Features of Images
What triggers our brains to generate pleasure from sensory information seems to be, at least in part, what are called ‘low level’ features of an image. These include things like balance, complexity, symmetry and contour (the degree of smoothness of shapes). They affect the subconscious reactions our brains have to viewing images.?
This seems to occur because the easier these features are for our brains to process, the less energy they require. But there are also probably some more sensory-based reasons. We anticipate threats in our environment through our eyes. One of the most fundamental threats we face is that of sharp objects. Therefore a preference for curves and smooth objects over sharp ones seems an obvious choice for natural selection.?
The preference for curved contours has also been demonstrated in product design and architecture. In cartoons, friendly characters are often depicted with rounded features and physiques (think of the friendly bear Balloo in Disney's 'The Jungle Book'), while villains, conversely, often are typically angular with sharper features and physiques (think of Maleficent). (Bar and Neta, 2006)
However, this preference is - to coin a phrase - more complex than meets the eye.?
If a straight line has one orientation angle, a curve is made up of many. But within an image there can be many connected or unconnected lines that share the same orientation or don’t. If there is little similarity in the orientation of lines across an image it is said to have high ‘entropy’. While curved contours are more a property of a shape, edge-orientation entropy is more of a global property of the whole image, or of ‘texture’.?
For a shape, or even a simple pattern, with a small number of disconnected lines, people prefer curves, when the number of lines or edges increases to form a texture or more complex image, they prefer high edge-orientation entropy, i.e. that the contours are distributed across a wide variety of orientations. (Stanischewski et al, 2020)?
This is perhaps unsurprising, as while it makes sense to us that a single object should share a curved contours all over, an overall image might look too unnatural if too many of its contours are all lined up. An effect known as ‘abhorence of co-incidence’. It might seem ‘too good to be true’.?
Inspired by nature
One other area of insights has been in people’s preference for visuals of the natural world. Surveys have demonstrated that people have a preference for natural landscape images over those of urban or artificial environments (Kaplan and Kaplan,1989)?
Designers already exploit this through strategies such as using natural materials, colours, or shapes from nature. Interestingly, it’s not just any natural landscape that people prefer. There are some specific features that make sense from an evolutionary perspective. For example, while expanses of natural water in an image adds appeal, if they are fully on view so that viewers can see both sides (e.g. of a river) or all the way around a lake, they are more preferred. (Ibarra et al, 2017). Similarly, people show a preference for images that appear to be from a safe vantage point that affords them a good view to a clearly visible horizon (Appleton,1996)
?These types of visuals are also beneficial as they are easier to learn and remember.
?Natural scenes are, themselves, full of such redundant information. They have repetitive patterns that are predictable to our brains and make such imagery very easy for us to process. Which, given the fact that we evolved in them and they are easy to encode, makes it unsurprising that many people find natural scenes pleasant and relaxing - even restorative - to view. It also means that natural scenes can hold some clues for designers in making their designs more aesthetically pleasing.?
Self Similarity: Balance, Symmetry and Harmony
Self-similarity is when a part of an image is repeated. This creates some redundancy in the image, such that if you understand one part, you potentially understand many. For example, if you look at an image of a forest, you don’t need to inspect and process every tree individually, there is enough repetitive similarity between the trees that you can quickly comprehend the gist of the image. And this, as mentioned earlier, is probably part of the reason why we find images of natural landscapes so enjoyable to look at. They are ‘easy on the eye’, which is our intuitive way of saying that they don’t require much effort to look at.?
And these sorts of patterns can be detected by Neuro Design AI algorithms.
Many patterns in nature are like this: a close up of a length of coastline can resemble a larger stretch of the coastline, a leaf from a fern can resemble the whole branch of leaves, or one segment of broccoli can resemble the whole broccoli.?
Symmetrical shapes also tend to be liked, as do those that are close to the average of faces (a phenomenon that itself was discovered thanks to computers’ ability to average together many hundreds of face photos) (Rhodes, 2006) and are another example of images that are easier to process. As there is redundancy in the image. Similarly, images that are familiar are known to be more liked as they are easier to process. This is because our brains are constantly trying to predict what we are looking at, and feeding these predictions to the ‘lower’ levels of our visual cortex. If these areas are accurately primed with information of what we should expect to see, it makes their visual processing more efficient. This also helps images that are more ‘prototypical’ or are very close to what you would expect them to look like.
Another example of self-similarity is a phenomenon known as ‘scale invariance’. This means that similar patterns occur at both larger and smaller scales within the image. This adds harmony and redundancy to an image and makes it pleasant to look at. Interestingly, visual ai analyses of portrait paintings of faces reveal that the painters have - almost certainly unconsciously - imbued their art with similar statistical regularities to natural scenes. Regularities that are not present in photographs of faces. They used a technique called ‘Fourier power analysis’, which is essentially calculating the patterns of contrast at different distances across an image. (Redies et al, 2007) Similarly, high levels of scale invariant self-similarity have been found in pleasing-to-view advertising images. (Braun, et al 2013)
The role of personality
One quick aside: Interestingly, while most people prefer balanced, complex, symmetrical and smooth imagery, some individuals prefer the opposite or are indifferent to one or more of these features.?
Interestingly, the low-level feature preferences have correlates in music, and similarly, individuals have stable patterns of preference for features in music. Overall, people prefer music that is complex, smooth, asymmetrical and balanced. For individuals, just liking a particular feature in visuals doesn’t seem to predict you’ll like it in music. The only exception is for contours: those who like smooth melodies also like curved imagery, and vice versa for jagged equivalents.? (Clemente, 2021)
Other features that are less obviously familiar to us have been used in Neuro Design AI analyses that seem to predict image appeal. Features such as the distribution of colour edges, image ‘entropy’ and the so-called scale invariance or fractality of image.
Case Study of Using Low-Level Features: Logos
Commercial logos have to be designed with great intelligence. They need to work quickly – at a glance – widely, often across the world and across different cultures, and across time, as investment in a logo is usually intended to last over a long time. To do this, they must tap into universal principles of good design. The same can be true for other types of graphical icons. For example, icons intended to represent qualities of foods on packaging, such as that the product is gluten-free, vegan, or contains bio-active ingredients.
?
Logos typically need to be simple. This means they need to communicate with a great economy. In a way this makes them an interesting case study for computational aesthetics, as they represent design at its most simple yet hard-working. They have minimal elements to study. For example, they don’t typically have fine details in them.
In one study, researchers took a range of sixty existing corporate logos, rendered in black-and-white, and developed computer models for measuring several features of the logos that had a high correlation with how a group of human volunteers also rated each logo on those features. They included:
Overall aesthetic value: how attractive the logo appears at first sight.
Balance: similar to the balance of a physical object, if you imagine the shape of the logo has its own ‘weight’, how centrally located or otherwise balanced is that weight around the image's centre of gravity? This can also be thought of as similar to the symmetry of a shape around horizontal, vertical and diagonal axes.
Contrast: the amount of difference in values such as light/dark, texture, colour and shapes. This included accounting for whether shapes had curved or sharp corners. Research has shown that people tend to look more at the edges of shapes, and particularly where edges change direction rapidly – e.g. On a sharp point – perhaps as an evolutionary adaptation to avoiding potentially harmful sharp edges.
Harmony: How well the different shape elements in the logo feel like they belong or fit together.
Sixty Human volunteers rated each logo on each of these characteristics, using a 5-point rating scale. In order to validate the computer model. Then, by using the computer readings of balance, contrast and harmony, they developed a model that was very accurate in predicting Human ratings of aesthetics of the logo. Harmony was a more important element in the prediction than balance or contrast.?
This form of analysis shows the potential for creating a quick feedback system for logo designers that could give them real-time guidance on which aspects of their design to improve. It also provides a general guideline for anyone creating icons: the concept of harmony is particularly important. (See Zhang et al, 2017)
Another study looked at the balance, complexity and repetition (a similar quality to the ‘harmony’ of the previous study) and found that, across a sample of 26,000 logos, they tended to have low complexity and high balance. (Liao and Chen, 2014)
When in a competitive display, such as an online app store, other factors can influence the success of a logo. For example, one study looked at a range of app icons in the Google Play store, and compared them to data – within different categories of app – for the popularity of each app. They found that almost 40% of the variation in the popularity of the app could be predicted by the visual saliency (higher is better) and complexity (lower is better) of its icon.? Another study confirmed that users are more willing to interact with an app if they find its icon aesthetically appealing. (Jylh? and Hamari, 2019)
Complexity and entropy
A third key concept in the Neuro Design AI toolkit for measuring attractiveness is that of measuring complexity and entropy in images. Researchers have found that we tend to like images that have a certain amount of complexity, as long as it is ordered and not simply random. (Mondol and Brown,2021)
One way of pulling apart the difference between a complex image that has underlying patterns rather than just random detail is to use concepts developed to help
Much of this work (and of computer vision in general) grew out of a field called information theory. As telecommunication networks and early computing systems evolved in the mid-20th century, a significant challenge emerged: transmitting data effectively over channels plagued by noise and interference. This need for efficient communication became a central concern for engineers and scientists of the era. The turning point in addressing this dilemma came in 1948 with a paper by Claude Shannon, an American mathematician and electrical engineer often dubbed the "father of information theory." In his seminal work, "A Mathematical Theory of Communication," Shannon introduced entropy as a tool to quantify the amount of randomness in signals: low randomness = information. His insights provided not just theoretical clarity but also practical methods for compressing data and ensuring its reliable transmission, even amidst noise. Through Shannon's contributions, information theory crystallised as a vital discipline, shaping the trajectory of modern digital communication.
One concept in information theory is called Kolmogorov complexity (named after the Soviet mathematician Andrey Kolmogorov). Kolmogorov complexity is essentially a measure of how ‘compressible’ a set of information - such as an image - is. In other words what is the smallest amount of information that you would need in order to reproduce the image. The more that an image describes an underlying pattern or patterns, the easier it is to compress. Conversely, the more random or chaotic the image is (its entropy measure), the harder it is to compress.
It turns out that these concepts are helpful in understanding our own reactions to mentally processing images. If an image contains patterns, this means that they are compressible: learn the pattern and you can process the image and remember it more easily. Conversely, if an image simply contains a lot of random information - noise or ‘entropy’ - then it's making our brain work hard to process something that might have little or no meaning.?
So, along with self-similarity, the complexity/entropy scores for an image can help tell us how much we will like it due to it being easier for our brains to process.
Case Study on a Higher Level Visual Feature: Calculating web aesthetics Using Image Complexity
There are now billions of web pages. Users take for granted that a page should functionally work well, and decisions on whether to linger on a new page that one has clicked onto are typically made within several seconds. This heightens the importance of a webpage’s aesthetics in its ability to hold users’ attention.?
Research has shown a strong relationship between users’ own ratings of how complex a webpage looks and their sense of how aesthetically pleasing it is. (Schmidt and Wolff, 2018)
Researchers have found a link between measures of entropy in our eye-movements over an image that can predict its aesthetic appeal. Using eye-tracking cameras they measured participants' patterns of eye movements while they briefly (for 3 seconds) viewed a sequence of forty web-pages. This may seem like a brief view of each page, but previous research has shown that we can form rapid opinions on the attractiveness of webpages and, indeed, that we decide within seconds whether to remain on a webpage (Liu et al, 2010) Within these three seconds viewers' brains had to rapidly decide where to move and focus their eyes. They looked at one of the main forms of outputs from the eye-tracking data: The overall distribution of fixations within and across viewers on the image (this is often visualised as a heatmap: a coloured overlay on an image that shows areas that received more attention in ‘hotter’ colours and those that received less attention in ‘cooler’ colours) (duration of eye fixations can vary a lot from less than 0.1 second to well over a second (Irwin, 1996).
After the eye-tracking recordings of the pages, each participant gave an aesthetic rating of each. They found a weak correlation between the average number of fixations on the page and its attractiveness: the more attractive it was, the more fixations or more eye-movement occurred on the page. They then measured the degree of Shannon entropy contained in the distribution of fixations. This basically measures the degree of focusing of attention within particular areas on the image: the more agreement on these areas, the lower the entropy and the higher the order. They found that this measure was significantly correlated with the aesthetic ratings of the image. In other words: the greater the focus of attention on particular areas on the image, the more attractive it was deemed.?
In practice, what does this mean? If you feel uncertain where to look on a webpage - either because there are no areas of interest that stand out to you, or there are too many of seemingly equal importance dispersed across the page - it can feel effortful and unpleasant to look at, compared to a webpage in which its clearer to everyone where to look, which can result in a page that feels easier, less effortful and distracting to look at.
One of the limitations of these metrics is that you could get a high measure of order from the eye-tracking data simply by displaying a very simple image that was also boring. The order wouldn’t then necessarily reflect its attractiveness. So the researchers also devised a way of calculating the metric that at least partially neutralised this effect. (Gu et al 2021).
Entropy measures can also be thought of as an indicator of where an image resides on a spectrum of chaos to monotony. Visual chaos feeling overwhelming and uncomfortable, while monotony feeling boring. Somewhere inbetween these extremes is a position where a visual image will deliver meaningful information: a pattern. As our brains are energy hungry, and we evolved to avoid depleting too much of our energy, images that deliver information rich content would naturally appeal to us in a similar way to the way we are drawn to calorie rich foods. This isn’t a new idea. The 18th Century Dutch writer Francois Hemsterhuis defined beauty as “that which gives the greatest number of ideas in the shortest space of time.”
Surveys of people shown images with varying levels of entropy bare this out: a preference for a mixed position between chaos and monotonous order. It's also been observed that this balance of order and disorder mirrors that found in nature, which could provide another reason why people are so favourably attuned to it.? (Lakhal et al 2020)
Image entropy measures have also been shown to be meaningful in the pattern of eye movements that viewers make when seeing an image.
Overall, research shows that a combination of low-level features like complexity, entropy, symmetry, and self-similarity as well as higher-level features like familiarity play an important role in determining aesthetic appeal and attractiveness of images. A key thread connecting many low-level features is that they make images easier for our brains to rapidly process, which elicits a mildly pleasurable feeling. Measures of visual elements that enhance efficient processing provide valuable insights designers can use to improve aesthetics. While perception of attractiveness relies partly on universal principles, individual differences also shape aesthetic responses.
Next we’ll look at how Neuro Design AI algorithms can predict what will grab our attention.?
Predicting Attention
When predicting which parts of an image will most likely grab our attention, Neuro Design AI researchers mostly refer to the concept of Visual Saliency.?
Visual saliency models aim to predict human eye fixations on images by identifying visually salient regions: the areas of an image that are most likely to grab attention, mostly by being the areas that contain the strongest contrasts. They work by extracting different low-level visual features from the image and using those features to assign saliency values to each location. (Itti et al, 1998)
A common approach is to analyze an image across different scales and compute local contrast features like intensity, color, and orientation of patterns at each scale. The idea is that regions that stand out from their surroundings in terms of these basic features will attract attention. The contrast computations are done using center-surround operations, where a pixel is compared to its neighboring pixels within some radius.
To analyze images at different scales, the models create something called Gaussian Image Pyramids. ?This is a method of making a series of smaller and blurrier images from an original picture.?
Here's how it works:
This pyramid analysis helps in analyzing images. It shows both tiny details and bigger, broader parts. For instance, it can help spot both a tiny bright dot and a big dark area in a picture. So, Gaussian pyramids are a cool way to see images at different sizes and clarity levels.
At each scale, basic low-level features like intensity, color, and orientation are captured. For example, intensity contrast can be measured by finding the difference between a pixel's intensity and the average intensity within a surrounding area, at each different ‘scale’ or level of the pyramid.
Similarly, color contrast is measured by comparing a pixel's color to the average color of the neighbors. Orientation contrast looks at differences in edge orientations.
The key is that these are local computations, comparing a pixel to its close spatial neighbors. The radius of this spatial neighborhood is proportional to the scale of each different version of the image.
So at a fine scale, the center-surround comparison happens over a small region. But at a very coarse scale, the neighborhood encompasses more of the image.
This multi-scale analysis allows the model to represent contrast at different granularities and distances. Nearby fine differences or faraway gross differences both matter.
The idea is that pixels (or regions) that stand out from their immediate surroundings across scales will draw attention. The local scale-dependent contrast computations aim to capture such visual conspicuity.
领英推荐
The contrast maps from the different scales are then combined into an overall conspicuity map for that feature.
So in summary, multi-scale analysis and local center-surround contrast computations are key ingredients for extracting low-level saliency features from images.?
More recent models have incorporated additional features inspired by neuroscience research on the primate visual system. These include visual cues like local symmetry, visual boundaries, and global rarity of features. Machine learning techniques are also often used to learn feature weights and saliency computations from human eye tracking data.
The early saliency models focused mostly on basic image attributes like color, intensity, and orientation. But scientists have learned more about how the primate visual system works.
This neuroscience research has inspired new features to add to attention models, to make them work more like the human brain.
For example, we tend to notice symmetric patterns in images. So symmetry computations are included to highlight locally symmetrical regions.
Visual boundaries like edges and contours also stand out to us. Features like oriented gradients help models detect these boundaries.
Another new feature is "global rarity." Our eyes get drawn to parts of an image that are unique or infrequent across the whole image. Like a red flower in a green field. Measuring this rarity helps predict attention.
In addition, rather than manually designing how to combine features, machine learning is used. Models are trained on human eye tracking data to learn the best way to weigh and integrate features.
This allows discovering how to predict human visual attention by example, instead of just logical rules. The models get better at mimicking our actual attention as a result.
So in summary, new kinds of features based on neuroscience research are making saliency models smarter. And machine learning helps the models learn directly from human vision patterns.
After feature extraction, the different conspicuity maps for each feature are combined to create a master saliency map. This can be done through linear or nonlinear integration. Finally, this master map is thresholded to create binary maps highlighting the salient regions.
The full pipeline allows models to take an image as input and output a heat map showing the predicted visual attentional landscape. Performance is evaluated by comparing against human fixation maps collected through eye tracking experiments. These models have applications in areas like image compression, content delivery, and ad placement.
In summary, computational saliency models analyze images across scales to extract basic features like contrast and complex cues like symmetry. The features are integrated to predict attentional allocation. The models aim to mimic and explain aspects of low-level, bottom-up visual attention in humans viewing natural images.
Predicting Memorability
What causes certain images to be unforgettable while others fade away??
Neuro Design AI research reveals that memory is strongly shaped by image content, particularly the people, objects, and text depicted. Computational models can now predict how memorable photos will be for humans.?
To quantify memorability, researchers use memory games where people view streams of photos and press a button whenever they see a repeat (Isola et al., 2011; Isola et al., 2013; Khosla et al., 2015). The hit rate for each image measures how often people correctly identify repeats, indicating an image's memorability. Intriguingly, different groups of observers are remarkably consistent in which photos they remember or forget. So memorability is largely intrinsic to images themselves, rather than the individual viewer.
What specific elements of photographs drive this consistency in memory? Studies have systematically investigated various visual factors.
Looking at pixel-level features, memorability shows little dependence on simple, low-level properties like color, contrast, brightness, or aesthetics. Images can be visually striking yet forgettable. Beautiful sunsets often slip away while quirky selfies stick.
However, the higher-level, semantic contents of images—the identifiable people, objects, and scenes—exert a large influence. To measure semantic content, researchers use techniques like object detection and scene categorization. These can identify objects like faces and people in images, which are particularly memorable (Bylinskii et al, 2015). Specific object classes like animals and vehicles also stand out (Dubey et al, 2015; Khosla et al., 2015). Researchers detect these objects using trained convolutional neural networks for object recognition like VGGNet (Simonyan & Zisserman, 2014). Images with text grab attention, helping us remember them. Text can be detected using optical character recognition algorithms.
The composition and layout of images also impacts memorability. To measure image layout computationally, researchers have used visual saliency algorithms that predict which regions draw visual attention. Images centralized on a single prominent object are more memorable than cluttered scenes. Photos with a close-up, detailed view of an item tend to stick better than tiny objects in a sprawling landscape [Dubey et al., 2015]. Layout analysis has found that images containing many small disconnected regions are less memorable. Emotionally charged images, both positive and negative, are also more intrinsically memorable. Facial emotion recognition algorithms can detect emotional expression in images.
On the flip side, images of homogeneous textures, peaceful nature scenes, and mundane snapshots fare poorly in memory tests. When content lacks central objects and eye-grabbing details, images wash away in memory.
Predicting Memorability with Computational Models
Given these consistent effects of semantic image content on memorability, can algorithms automatically predict how memorable photos will be? Researchers have developed specialized models that analyze image features to estimate memorability scores.
Earlier work extracted simple features like blob shapes and textures, called GIST descriptors (Oliva & Torralba, 2001), using algorithms like SIFT [Lowe, 2004]. These features were input to machine learning models called support vector regression machines to predict memorability scores. Performance was mediocre, suggesting that low-level features alone cannot fully explain memorability.
More recent methods use convolutional neural networks (CNNs). When trained on tens of thousands of images with memorability scores, CNNs can analyze image contents from simple edges to complex objects and scenes. This allows predicting memorability at near human consistency.
For example, The best models can now predict memorability with an accuracy approaching the consistency between different groups of people.
For example, on a dataset where human consistency was 0.68 (meaning two groups of people agreed 68% on high and low memorability images), the best deep learning model achieved a predictive accuracy of 0.67. So it essentially matches humans in its ability to classify images as memorable or forgettable based solely on the visual patterns, without any need for running psychological experiments. (Bylinskii et al, 2022)
One top-performing model is MemNet (Khosla et al., 2015), a CNN trained to recognize thousands of object and scene categories. Adding an extra layer to directly predict memorability scores achieves high accuracy. Examining MemNet’s learned features shows that it associates photos containing people, animals, and text with high memorability. On the contrary, images of natural landscapes and cluttered textures produce low memorability predictions.
Looking inside CNNs reveals how their hierarchical processing mirrors human vision in building up from pixels to objects to full scenes when determining image memorability. And training CNNs on ever larger datasets produces features that capture more of the complexity in the visual world, improving predictions.
Applications for Memorability Algorithms
What can we do by automatically estimating how memorable images will be??
There are many practical uses, including:
An exciting direction is tailoring algorithms to individual interests and experiences, as current models predict group-level memorability. The ideal system would know which specific images you will find memorable based on the contents you engage with most.
By revealing the features that stick in memory, computational models teach us about the remarkable capabilities of human vision. Advancements in modelling memorability will continue to provide new insights into subjective perception and how our minds filter visual experiences into lasting memories.
As an aside, interestingly, one research paper (Guo and Bainbridge,2022)? has shown that children don’t seem to develop the adult patterns of remembering images until the age of four. The 3-year-olds did show consistent memory for certain scenes within their age group. Specifically, they tended to remember scenes depicting places very familiar to young children, like playgrounds, bounce houses, and libraries.
This suggests that 3-year-olds may rely more on the familiarity of a scene when encoding it into memory. So their memory strategies seem to be different from adults and older children who rely more on intrinsic visual features that make images memorable.
The paper hypothesizes that 3-year-olds have a sense of which scenes are familiar to them, so they consistently remember those types of familiar places. But by age 4, children transition to using more adult-like visual memory strategies, so their memory patterns match adults' predictions.
The quest to understand why some images stick in our minds while others fade is gaining clarity, thanks to AI research in neuro design. We've discovered that while the basic visual traits of an image—like color or brightness—have little impact on memorability, it's the higher-level details, such as the presence of people, objects, and text, that make images linger in our memory. With the help of algorithms, we can now almost match human ability in predicting an image's staying power, without the need for traditional memory tests. These insights are not just academic; they have real-world applications in areas ranging from marketing to education, as we learn to harness the elements that command our attention and foster lasting impressions.
The Potential and Limits of Neuro Design AI
While it might seem that our preference for images is very personal and subjective, researchers have found that there are predictable and measurable qualities in images that can tell us things about how attractive, emotionally engaging and even memorable they are. That, ultimately, maths can be used to predict our reactions to images. Equally people may assume that it's the subject of an image that is the main driver of its appeal. While this is true to some extent, it's not just the content but the way it's rendered that makes images appealing.?
The ability of computer algorithms to uncover patterns in our aesthetic preferences, attention and ability to remember images will undoubtedly have myriad applications in the years ahead. As the number of digital images grows, so will the demand for being able to automatically identify and improve them. Whether it be images of products, logos, web-pages or icons. Neuro Design AI tools hold the promise of a broad range of applications.?
Where the field has generalised learnings, these can help supplement guidance from designers and market research. For example, market research image testing can be expensive. In many cases the design process can be fully or partially guided by algorithms. Equally, the demand for imagery may outstrip the available budgets for the creation of new designs by human designers and the generation of designs by computer or the adaptation of existing designs by computer for new contexts may be adopted as a solution. One example is the possibility of more webpage designs being dynamically altered based on the personal tastes and interests of the user. Having systems that could automatically keep such designs aesthetically pleasing could be valuable.
However, the main limit to computational aesthetics is how data-hungry it is. In order to find robust and applicable widespread patterns in Human visual preferences it can need many thousands, if not millions, of examples of human ratings to images. This requires that sites such as social media photo-sharing sites that have rating options remain popular and that the ratings remain publicly accessible. Alternatively the access to the data on these image preferences might become more of the proprietary property of the platforms themselves.
If you don’t have enough data then ‘overfitting’ can become a problem. This is the phenomena of the algorithm’s learnings being too specific to the vagaries of the particular data-set of images that it was trained on.?
Another limit is that its powers are, by necessity, statistical. That is to say they are looking for group trends. There may be preferences that are to be found in smaller groups or individuals that are not visible to these algorithms. Equally there may be particular individuals who have tastes that learn towards the imperfect. Such as the Japanese aesthetic of wabi sabi, where imperfections are valued for their own sake. Or those who have less mainstream tastes.?
Finally, the more powerful that neural network methods are, often the more opaque they are. They may be able to find patterns that we don’t understand, let alone know how to reproduce. Without an understanding of what the patterns of preference are that the CNNs are discovering, their applicability may be limited. This problem is known as the ‘black box’ dilemma.?
As we've seen, neuro design AI offers powerful new capabilities for quantifying and even predicting human aesthetic responses to images. While the math behind the algorithms may seem abstract, the insights they uncover about why we find certain images visually pleasing or memorable feel intuitive. This nascent field will continue advancing, bringing neuroscience and computer vision together to reveal the universal patterns underlying our subjective perceptions. However, as with any modelling of complex human cognition, neuro design AI has inherent limitations. Individual differences, context, and the sheer intricacy of perception imply ceilings to its predictive accuracy. Still, even approximate tools provide valuable practical guidance, and this young science promises to progress, uncovering ever more about the nuanced aesthetics of the human mind.
If you found this article interesting, you may enjoy my book 'Neuro Design' (UK, US, Can, Brazil (please note that the Brazilian translated edition is called 'Neuromarketing')
Appendix: Some Additional Examples of Image Analysis Methods
As well as Neuro Design AI techniques such as convolutional neural networks, there are a range of other techniques that software uses to analyse imagery. Here are just three examples: Fourier analysis, quadtree decomposition and histogram analysis.?
Fourier transformations: One of the tools that computers use to understand images is transforming them from a ‘spatial’ language into the language of waves. This is critical to the digital images that surround us in our daily lives, but also to the detection of patterns such as scale invariance which are often not consciously obvious to us.
In the real world objects around us aren’t digital and discrete but analogue and continuous.
For example, consider an oil painting. The patterns on the painting are continuous streaks of paint that were formed from the continuous movement of a brush.
So how can we represent them digitally both efficiently and without losing important information? One answer might be to sample and record each and every single point down to the highest possible resolution and store the colour information for each. However, it turns out that? this would be highly computationally intensive. This is not how it's done.?
The solution came from two mathematicians: one from the 18th and one the 20th Century.
First, the 18th Century French mathematician Jean Joseph Fourier. Fourier showed that not only music and voice can be described as waves - a concept most of us are familiar with - but so can images. The idea is not necessarily as intuitive as with sound, and it doesn’t refer to the light waves themselves that travel into our eyes. Whereas sound waves are typically visualised as a one dimensional dot drawing a circular sine wave up and down over time, and then the sum of lots of such waves of varying amplitudes and frequencies add together to form the resulting wave that describes the sound. Image ‘waves’ can be thought of as two dimensional sheets that have ripples across them. Similarly these sheets are then stacked up together and even at different angles, and the waves (or ripples) add together or cancel eachother out to form the resulting picture. Low frequencies = the overall pattern or shape of the image. High frequencies = the fine detail.?
The Russian mathematician Vladimir Kotelnikov then showed that you only need to digitally sample points along the way - two points for every cycle of the fastest wave is enough (representing its peak and trough), thus minimising the amount of information needed. These samples are called pixels and are not, contrary to common misconception, little squares. Rather they are just points, either a number denoting a shade of grey or three numbers denoting a colour. Each of these points of grey or colour then gets smudged or spread out in order to return to a continuous image before being sent to a screen or printer.?
The magic of this approach is that the pattern rules that make up any image can be discovered by a maths process known as a fast fourier transformation. By turning an image into a set of frequency patterns it's also possible to filter out noise or sharpen up an image by dropping out lower amplitude frequencies.??
By the way, and forgive me because this is a small detour, one of the things we misunderstand about computer images is the nature of pixels. We typically think of a pixel as a little square. This is how they are depicted in computer-speak and pop-culture. For example, when you zoom in on a digital picture, each pixel looks bigger like a square. But it's not showing the real pixel - it's just making copies of the pixel that look square. The best way to think of digital images is as points on a grid. Software blends them into smooth pictures. But pixels stay points, not squares. (for more on this see: https://alvyray.com/Memos/CG/Microsoft/6_pixel.pdf and Alvy Ray’s excellent book ‘A Biography of the Pixel’: https://alvyray.com/DigitalLight/default.htm )
Quadtree Decomposition serves as a powerful tool for isolating regions of interest and simplifying complex images. By breaking down an image into smaller, more manageable sections, analysts can pinpoint areas with high variability or detail, such as edges or textures. This is especially useful in applications like satellite imagery, where vast areas might be uniform (like oceans or deserts), but small regions could contain critical details (like ships or vegetation). Additionally, quadtrees can aid in reducing noise in images. Since the decomposition groups similar pixels together, it becomes easier to identify and filter out anomalies or outliers. Furthermore, in the realm of object detection and recognition, quadtrees can help in narrowing down search areas, making the process faster and more accurate. By understanding the hierarchical structure of an image through quadtrees, analysts can make more informed decisions about where to focus their attention and computational resources.
Image Histogram Analysis is a technique used in computer vision and image processing to understand the distribution and frequency of pixel values in an image. Essentially, a histogram is a graphical representation that shows how often each pixel intensity value appears in the image. For a grayscale image, the histogram will display the frequency of pixels ranging from black (0 intensity) to white (255 intensity). For color images, separate histograms can be created for each color channel (Red, Green, and Blue). Analyzing these histograms can provide insights into the contrast, brightness, and overall tonal distribution of an image. For instance, a histogram skewed to the left indicates a predominance of darker pixels, suggesting the image might be underexposed. Conversely, one skewed to the right might indicate overexposure. Histogram analysis is fundamental in many image processing tasks, including image enhancement, thresholding, and equalization.
References
Appleton, J., 1996. The experience of landscape, Chichester: Wiley, pp.66-67.
?
Balietti, S., 2020. The human quest for discovering mathematical beauty in the arts. Proceedings of the National Academy of Sciences, 117(44), pp.27073-27075.
?
Bar, M. and Neta, M., 2006. Humans prefer curved visual objects. Psychological Science, 17(8), pp.645-648.
?
Braun, J., Amirshahi, S.A., Denzler, J., and Redies, C., 2013. Statistical image properties of print advertisements, visual artworks, and images of architecture. Frontiers in Psychology, 4, p.808.
?
Bylinskii, Z., Goetschalckx, L., Newman, A., and Oliva, A., 2022. Memorability: An image-computable measure of information utility. Human Perception of Visual Information: Psychological and Computational Perspectives, pp.207-239.
?
Clemente, A., Pearce, M.T., Skov, M., and Nadal, M., 2021. Evaluative judgment across domains: Liking balance, contour symmetry, and complexity in melodies and visual designs. Brain and Cognition, 151, p.105729.
?
Dubey, R., Peterson, J., Khosla, A., Yang, M.H., and Ghanem, B., 2015. What makes an object memorable?. In Proceedings of the IEEE International Conference on Computer Vision, pp.1089-1097.
?
Gu, Z., Jin, C., Chang, D., and Zhang, L., 2021. Predicting webpage aesthetics with heatmap entropy. Behaviour & Information Technology, 40(7), pp.676-690.
?
Guo, X. and Bainbridge, W.A., 2022. Children develop adult-like visual sensitivity to image memorability by the age of four. bioRxiv, pp.2022-12.
?
Hoffman, D., 2019. The case against reality: Why evolution hid the truth from our eyes. WW Norton & Company.
?
Hübner, R. and Fillinger, M.G., 2019. Perceptual balance, stability, and aesthetic appreciation: Their relations depend on the picture type. i-Perception, 10(3), p.2041669519856040.
?
Ibarra, F.F., Kardan, O., Hunter, M.R., Kotabe, H.P., Meyer, F.A., and Berman, M.G., 2017. Image feature types and their predictions of aesthetic preference and naturalness. Frontiers in Psychology, 8, p.632.
?
Irwin, D.E., 1996. Integrating information across saccadic eye movements. Current Directions in Psychological Science, 5(3), pp.94-100.
?
Ishikawa, T., Toshima, M., and Mogi, K., 2019. How and when? Metacognition and solution timing characterize an “aha” experience of object recognition in hidden figures. Frontiers in Psychology, 10, p.1023.
?
Isola, P., Parikh, D., Torralba, A., and Oliva, A., 2011. Understanding the intrinsic memorability of images. Advances in Neural Information Processing Systems, 24.
?
Isola, P., Xiao, J., Parikh, D., Torralba, A., and Oliva, A., 2013. What makes a photograph memorable?. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), pp.1469-1482.
?
Itti, L., Koch, C., and Niebur, E., 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), pp.1254-1259.
?
Jahanian, A., Vishwanathan, S.V.N., and Allebach, J.P., 2015. Learning visual balance from large-scale datasets of aesthetically highly rated images. In Human Vision and Electronic Imaging XX, Vol. 9394, p. 93940Y. International Society for Optics and Photonics.
?
Jylh?, H. and Hamari, J., 2019. An icon that everyone wants to click: How perceived aesthetic qualities predict app icon successfulness. International Journal of Human-Computer Studies, 130, pp.73-85.
?
Kaplan, R. and Kaplan, S., 1989. The experience of nature: A psychological perspective. Cambridge University Press.
?
Khosla, A., Raju, A.S., Torralba, A., and Oliva, A., 2015. Understanding and predicting image memorability at a large scale. In Proceedings of the IEEE International Conference on Computer Vision, pp.2390-2398.
?
Lakhal, S., Darmon, A., Bouchaud, J.P., and Benzaquen, M., 2020. Beauty and structural complexity. Physical Review Research, 2(2), p.022058.
Lennie, P., 2003. The cost of cortical computation. Current biology, 13(6), pp.493-497.
?
Liao, W.H. and Chen, P.M., 2014. Analysis of visual elements in logo design. In International Symposium on Smart Graphics, Springer, Cham, pp.73-85.
?
Liu, C., White, R.W., and Dumais, S., 2010. Understanding web browsing behaviors through Weibull analysis of dwell time. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.379-386.
?
Mayer, S. and Landwehr, J.R., 2018. Quantifying visual aesthetics based on processing fluency theory: Four algorithmic measures for antecedents of aesthetic preferences. Psychology of Aesthetics, Creativity, and the Arts, 12(4), p.399.
?
Mondol, T. and Brown, D.G., 2021. Computational creativity and aesthetics with algorithmic information theory. Entropy, 23(12), p.1654.
?
Oliva, A. and Torralba, A., 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42, pp.145-175.
?
Reber, R., Schwarz, N., and Winkielman, P., 2004. Processing fluency and aesthetic pleasure: Is beauty in the perceiver's processing experience?Personality and Social Psychology Review, 8(4), pp.364-382.
Redies, C., H?nisch, J., Blickhan, M., and Denzler, J., 2007. Artists portray human faces with the Fourier statistics of complex natural scenes. Network: Computation in Neural Systems, 18(3), pp.235-248.
Rhodes, G., 2006. The evolutionary psychology of facial beauty. Annual Review of Psychology, 57, pp.199-226.
Schmidt, T. and Wolff, C., 2018. The influence of user interface attributes on aesthetics. i-com, 17(1), pp.41-55.
Simonyan, K. and Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Stanischewski, S., Altmann, C.S., Brachmann, A., and Redies, C., 2020. Aesthetic perception of line patterns: Effect of edge-orientation entropy and curvilinear shape. i-Perception, 11(5), p.2041669520950749.
Th?mmes, K. and Hübner, R., 2018. Instagram likes for architectural photos can be predicted by quantitative balance measures and curvature. Frontiers in Psychology, 9, p.1050.
Th?mmes, K. and Hayn-Leichsenring, G., 2021. What Instagram can teach us about bird photography: The most photogenic bird and color preferences. i-Perception, 12(2), p.20416695211003585.
Tik, M., Sladky, R., Luft, C.D.B., Willinger, D., Hoffmann, A., Banissy, M.J., Bhattacharya, J., and Windischberger, C., 2018. Ultra‐high‐field fMRI insights on insight: Neural correlates of the Aha!‐moment. Human Brain Mapping, 39(8), pp.3241-3252.
Zhang, J., Yu, J., Zhang, K., Zheng, X.S., and Zhang, J., 2017. Computational aesthetic evaluation of logos. ACM Transactions on Applied Perception (TAP), 14(3), pp.1-21.