Other
CGarchitect Uses an AI Judge to Evaluate the 3D Awards
For the past several years, CGarchitect has been working on an AI Aesthetics Prediction Engine. We put this engine into production on the new CGarchitect website launched in early 2020, and it has proven to work exceptionally well. It's a controversial approach, but at the end of the day, it works and does much of the heavy lifting on the site to determine what we showcase on the homepage and in our gallery's 'Popular' section.
I was curious what would happen if we used this same engine to judge the 2020 CGarchitect 3D Awards nominees. The results were surprisingly accurate, and the AI's overall accuracy on the three image categories was 78% compared to expert industry judges.
I was curious what would happen if we used this same engine to judge the 2020 CGarchitect 3D Awards nominees. The results were surprisingly accurate, and the AI's overall accuracy on the three image categories was 78% compared to expert industry judges.
The results were surprisingly accurate, and the AI's overall accuracy on the three image categories was 78% compared to expert industry judges.
The AI intended to solve a long-standing problem on the site - which images get featured on the homepage. We've always taken a rather democratic approach to this and wanted to ensure that everyone, whether they were an industry veteran or someone brand new to the field, would have an equal chance to have their work showcased.
In theory, this sounds great, but in practice, it does not work. Site engagement and the calibre of work submitted to the site are directly proportional to the quality of what appears on the homepage. Traditionally there have been only a few approaches to solve this issue: You can manually curate every single image that is uploaded to the site, which does not scale very well, or you wait for social engagement to dictate what is good or bad through likes, views and comments on any given image. This, too, has its drawbacks as it takes time to receive that social feedback, and in the interim, you are potentially impacting engagement.
About three years ago, I started thinking about using machine learning to determine what our industry considers aesthetically pleasing. I hypothesized that there is a pattern in what we find visually pleasing. If I could find this pattern, I could leverage an AI to push as many aesthetically pleasing images to the homepage and then allow traditional social cues to take over.
I presented some of the work and research I did at the Vertex Conference in London in early 2020. This talk was based on thousands of pages of reading of academic research in the field of Neuroaesthetics. "Neuroesthetics is a relatively recent sub-discipline of empirical aesthetics. Empirical aesthetics takes a scientific approach to the study of aesthetic perceptions of art, music, or any object that can give rise to aesthetic judgments."
Many different approaches to this problem have been researched that attempt to quantify the elements of an image that we would normally expect to contribute to something pleasing to the eye. Things like rules of thirds, composition, areas of contrast etc. That research was generally able to achieve an accuracy of 70-80% on photographs. However, I wanted to take a less complex and slightly more elegant approach.
Being proprietary IP and our 'secret sauce', I won't explain it in full detail here, but the engine's core involves training an AI to recognize visual quality levels within the industry. We then run some post-processing on the AI data to obtain an image score that tells us how good an image likely is.
Obviously, this is a controversial approach. As humans, we want to believe that aesthetic perception is an innately human trait that cannot be quantified mathematically. And to a large degree, that is absolutely true. Culture, context and a host of other factors play into what we consider aesthetically pleasing, but there is also a pattern under the right circumstances.
If you think about the act of pressing 'like' on social media or the average score of a group of judges, neither of these scenarios delivers a result that says 'this is THE best', only that it's the best in the confines of that group of data and represents an average consensus on something that many find pleasing. On social media, not everyone likes the same things, and in judging, not all judges agree on THE best image. However, both require finding some form of consensus. We leverage that consensus in our AI to help find work we think the industry will like. The AI was not designed to be used in the context of judging images for the 3D Awards, but I was impressed just how well it did!
Below I've created several graphs that visually illustrate how close and, in some cases, not so close; the AI got to predicting expert industry judges' scores.
In theory, this sounds great, but in practice, it does not work. Site engagement and the calibre of work submitted to the site are directly proportional to the quality of what appears on the homepage. Traditionally there have been only a few approaches to solve this issue: You can manually curate every single image that is uploaded to the site, which does not scale very well, or you wait for social engagement to dictate what is good or bad through likes, views and comments on any given image. This, too, has its drawbacks as it takes time to receive that social feedback, and in the interim, you are potentially impacting engagement.
About three years ago, I started thinking about using machine learning to determine what our industry considers aesthetically pleasing. I hypothesized that there is a pattern in what we find visually pleasing. If I could find this pattern, I could leverage an AI to push as many aesthetically pleasing images to the homepage and then allow traditional social cues to take over.
I presented some of the work and research I did at the Vertex Conference in London in early 2020. This talk was based on thousands of pages of reading of academic research in the field of Neuroaesthetics. "Neuroesthetics is a relatively recent sub-discipline of empirical aesthetics. Empirical aesthetics takes a scientific approach to the study of aesthetic perceptions of art, music, or any object that can give rise to aesthetic judgments."
Many different approaches to this problem have been researched that attempt to quantify the elements of an image that we would normally expect to contribute to something pleasing to the eye. Things like rules of thirds, composition, areas of contrast etc. That research was generally able to achieve an accuracy of 70-80% on photographs. However, I wanted to take a less complex and slightly more elegant approach.
Being proprietary IP and our 'secret sauce', I won't explain it in full detail here, but the engine's core involves training an AI to recognize visual quality levels within the industry. We then run some post-processing on the AI data to obtain an image score that tells us how good an image likely is.
Obviously, this is a controversial approach. As humans, we want to believe that aesthetic perception is an innately human trait that cannot be quantified mathematically. And to a large degree, that is absolutely true. Culture, context and a host of other factors play into what we consider aesthetically pleasing, but there is also a pattern under the right circumstances.
If you think about the act of pressing 'like' on social media or the average score of a group of judges, neither of these scenarios delivers a result that says 'this is THE best', only that it's the best in the confines of that group of data and represents an average consensus on something that many find pleasing. On social media, not everyone likes the same things, and in judging, not all judges agree on THE best image. However, both require finding some form of consensus. We leverage that consensus in our AI to help find work we think the industry will like. The AI was not designed to be used in the context of judging images for the 3D Awards, but I was impressed just how well it did!
Below I've created several graphs that visually illustrate how close and, in some cases, not so close; the AI got to predicting expert industry judges' scores.
I'm really proud of this engine. No one has ever done anything quite like this. It is groundbreaking and fascinating all at the same time. It's not perfect, nor will it likely ever be, nor does it need to be. But it has shown it can do enough heavy lifting to facilitate better site engagement. While I don't expect we will be replacing our real judges in the 3D Awards anytime soon, this exercise provided some insights into where this might go. Under the right circumstances, maybe human aesthetic experience is not as unquantifiable as we might think.
Commissioned Image Category
Comparing the Judges Average Scores to the AI Average Scores of all images in the category, there was a 10.86% difference. The smallest individual image difference was 2.6%, and the largest difference was 21.95%.
Overall Accuracy: 89%
Overall Accuracy: 89%
Image Credits: Sonny Holmberg / Depth Per Image (Denmark), Henri Khoo (UK), Nicolas Dagna / Mad architects (Argentina), Massimiliano Marzoli / imperfct* (Italy), Pictury Archviz (Spain)
Non-Commissioned Image Category
Comparing the Judges Average Scores to the AI Average Scores of all images in the category, there was a 22.23% difference. The smallest individual image difference was 6.63%, and the largest difference was 57.14%
Overall Accuracy: 78%
Overall Accuracy: 78%
Image Credits: Vittorio Bonapace (UK), Thomas Dubois (France), Arqui9 / Pedro Fernandes (UK), Reinaldo Handaya / 2G Studio (Indonesia), Csaba Bánáti (Norway)
Student Image Category
Comparing the Judges Average Scores to the AI Average Scores of all images in the category, there was a 33.71% difference. The smallest individual image difference was 5.25%, and the largest difference was 82.04%.
Overall Accuracy: 66%
Overall Accuracy: 66%
Image Credits: Nadia Monte / MADI IUAV (Italy), Manuel Gomez / 24Studio LAB (Spain), Nicola Scognamiglio / MADI IUAV (Italy), Christian Paul Espinoza / School-ing (Spain), ELENA BELTRÁN VILLAR / School-ing (Spain)
NOTES:
1) The 3D Awards dataset is too big and not worth the work to evaluate all of the submissions, so only the judge's top 5 nominees were evaluated for this experiment.
1) The 3D Awards dataset is too big and not worth the work to evaluate all of the submissions, so only the judge's top 5 nominees were evaluated for this experiment.
2) The AI can not interpret story or meaning, and often, in cases where an AI image scored lower, you could tell it was the concept that impacted the higher judge's score, not the image quality.
3) Our AI Aesthetics prediction engine does tend to score exterior images higher. As a whole, exterior images are often considered more aesthetically pleasing in the wider industry as well. As a point of reference, only 4 out of 15 3D Awards nominee images were interiors.
You must be logged in to post a comment. Login here.
About this article
CGarchitect puts its AI Image Aesthetics Prediction Engine to the test to see how well it faired against expert industry judges in evaluating the nominees in the 2020 CGarchitect 3D Awards.
visibility1.48 k
favorite_border3
mode_comment2
Glad you liked it Simon!
Really interesting approach and results. Thanks for sharing some of these data and insights Jeff!