Inspiration
On Building An Importance Engine
There is a race on for the perfect rendering to emerge from each render engine. Render technology is racing towards “realism”, but not necessarily clarity. Are there ways we can increase the communication of digital renderings by going beyond the concept of realism, to better connect with human perception? My goal in writing this thought-piece is to explore what about painting and other forms of creative storytelling produces the motivating reactions it does in people, and translate that back to our newest mode of visual communication--digital art. We need to think about the effects visual art has on us as we experience the work and reverse engineer that to what we need digital 3D rendering technology to be able to do. Then, translate those insights to the toolset.
The current standard in digital rendering engines is Physically Based Rendering, PBR, where reflection properties of materials and their corresponding light transportation are calculated based on natural physics. This allows for global illumination, skylight, caustics and radiation to be rendered to a picture-plane.
The results can be quite convincingly realistic. I fully agree with building a physically based rendering environment. Bounced light brings joy, subtlety invites immersion.
But should we stop there? There are still visual arenas to evolve beyond merely representing a scene's lighting accurately. New tools can open even better ways to create digital work. In my thinking, that involves looking toward traditional art for ideas about where to take digital rendering next. It is worthwhile to consider the end goals while developing a new engine. Now is the time. A smart company could leapfrog other Digital Content Creation apps, DCC, and rendering plugins, with an intuitive innovation in CG/3D imaging.
PBR, taken to its logical ends, should deliver the competing engines to a place where they converge, each producing identical results. What fun is that? Is the ultimate goal to have a reality simulator, such that all energy propagation in a scene is fully and flawlessly rendered? Then we should be looking at spectral calculations, blackbody emitters for lights, atmospherics, physics at all levels. These can be slow to render. Or are we seeking an engine to create a visually believable version of reality that is fast, allowing many artistic iterations and choices? Such and engine would not just be a simulation tool but also a flexible system to mix convincing reality with more expressive and potentially impactful narrative choices. It would be a true, unique visualization medium.
There are fundamental differences between human perception and technological encoding. Photography is the latter. It is an imperfect method to translate a lightfield to a 2D matrix for human viewing and relating back to the former. It has many inherent biases, from optical distortions to non-linear spectral responses, to being able to 'see what no person can'. Now it adds the differences between analog 'film' and digital sensors to its complexity . Photography is wonderful. But CG is something else. It is not a reflection or capture of “reality”. It is a suggestion of it, an idea of how it could be, might be or should be or cannot ever be. So why do we try so hard to render to the standard of faking photography over the possibility of engaging human perception? The human eye only sees a small portion of our field-of-vision in full detail. The rest is suggested more than rendered perfectly. We get more information by focusing our attention on new areas, but at the expense of others.
How we relate what we see to others is done so using a hierarchy. We say what is important and what is detailed over what is deemed less important. You may describe an encounter with a cheetah with detail about the animal but without describing the car you were in. Or, while doing an ad for a safari-worthy SUV you may show wildlife beyond without much information, but the car build is strong and the doors lock easily. The same scene is depicted differently based on what is important. Focusing on what is important to the author turns a report into a narrative. It is a story, but not just one story. You emphasize what is important to the story you are telling and minimize or leave to the viewer to fill in the rest from their mind. And so it should be with rendering. But CG has equality of focus. That is not how humans see.
Can tools in a render engine directly address this? The simple answer is that it is up to the artist to use the existing CG tech to tell their story using ad-hoc effects not determined within the engine, not rule-based, often relegated to “post”. Photoshop works by borrowing techniques from traditional-media art creation, but is still a picture-plane effect. I have presented ideas, in other venues, about integrating lessons from art into rendering. They are valid and I am not suggesting abandoning adding value to rendering work through expert composition and lighting, visual sweetening.
But are there tools that we could create to work within the 3D rendering environment itself that could adjust the lighting, edge qualities, contrast, sense of depth based on relationships and importance--an Importance Engine?
PBR, taken to its logical ends, should deliver the competing engines to a place where they converge, each producing identical results. What fun is that? Is the ultimate goal to have a reality simulator, such that all energy propagation in a scene is fully and flawlessly rendered? Then we should be looking at spectral calculations, blackbody emitters for lights, atmospherics, physics at all levels. These can be slow to render. Or are we seeking an engine to create a visually believable version of reality that is fast, allowing many artistic iterations and choices? Such and engine would not just be a simulation tool but also a flexible system to mix convincing reality with more expressive and potentially impactful narrative choices. It would be a true, unique visualization medium.
There are fundamental differences between human perception and technological encoding. Photography is the latter. It is an imperfect method to translate a lightfield to a 2D matrix for human viewing and relating back to the former. It has many inherent biases, from optical distortions to non-linear spectral responses, to being able to 'see what no person can'. Now it adds the differences between analog 'film' and digital sensors to its complexity . Photography is wonderful. But CG is something else. It is not a reflection or capture of “reality”. It is a suggestion of it, an idea of how it could be, might be or should be or cannot ever be. So why do we try so hard to render to the standard of faking photography over the possibility of engaging human perception? The human eye only sees a small portion of our field-of-vision in full detail. The rest is suggested more than rendered perfectly. We get more information by focusing our attention on new areas, but at the expense of others.
How we relate what we see to others is done so using a hierarchy. We say what is important and what is detailed over what is deemed less important. You may describe an encounter with a cheetah with detail about the animal but without describing the car you were in. Or, while doing an ad for a safari-worthy SUV you may show wildlife beyond without much information, but the car build is strong and the doors lock easily. The same scene is depicted differently based on what is important. Focusing on what is important to the author turns a report into a narrative. It is a story, but not just one story. You emphasize what is important to the story you are telling and minimize or leave to the viewer to fill in the rest from their mind. And so it should be with rendering. But CG has equality of focus. That is not how humans see.
Can tools in a render engine directly address this? The simple answer is that it is up to the artist to use the existing CG tech to tell their story using ad-hoc effects not determined within the engine, not rule-based, often relegated to “post”. Photoshop works by borrowing techniques from traditional-media art creation, but is still a picture-plane effect. I have presented ideas, in other venues, about integrating lessons from art into rendering. They are valid and I am not suggesting abandoning adding value to rendering work through expert composition and lighting, visual sweetening.
But are there tools that we could create to work within the 3D rendering environment itself that could adjust the lighting, edge qualities, contrast, sense of depth based on relationships and importance--an Importance Engine?
Depth-based level-of-detail (LOD) is a start, but taken further should understand that close up a tree is thousands of leaves and branches, while at distance is a unified object of canopy and trunk, and further still is a tile in the mosaic of a forest. We could write rules that would allow the treatment and integration of elements based on distance or proximity, and also their relevance to our story. Simplifying some parts can provide significant benefit. Areas of flattened effect can elicit responses in the viewer with clues and links that create an experience, not just an image. What triggers memories? Vagueness, hints, suggestion? By not providing all information there is room for the viewer’s mind to fill in, to create with you, drawing from their own lives. Lead them in, give them room.
In painting you learn to 'find an edge, lose an edge', where you use the sharpness and contrast along a boundary between visual elements to control their relationships and contrasts to each other to put certain aspects forward and others back. You can use localized treatments to link or separate players in your story. It is to treat a scene holistically. It goes beyond the accurate rendering of light transport and caustics and haze to consider a scene as a narrative that has a hierarchy of elements and message. The render engine could employ an 'importance map' and user-defined rules about interactions through the virtual lens.
Would people react favorably to imagery with clear, purposeful importance within the view? Museums are full of just such works. It is part of what makes a work of art resonant for its viewer. It should feel like the subject more than just look like it. There are portraits that look back at you, across four hundred years, there are landscapes that you walk into, films that play as your own memories. This is not the result of magic, just applied techniques in creating a narrative image. That is what we want a render engine to do. Rarely are the revered works in the world's museums just literal captures of a scene, devoid of interpretation and without bias. They have voice. So what tools would a new CG engine need to give that kind of control and assistance to the digital artist? It's not enough to just say that this is the realm of the artist to use the tools available. The discussion is about what tools to develop. And why. Can an engine be built that leads to more intuitive and even entirely new ways to create and present digital content?
Tools:
- Depth-Based LOD, plus contrast and other properties
- Depth-Based Object Integration (leaves become tree canopy, trees become forest)
- Spectral Calculations addressable to human vision or film or something new
- Importance Mapping, with response rules--perhaps like weight painting
- Importance/Focus-based Light Transport by a process like weight painting or proximity
- POV Dynamic by non-rendering light, p.e. from eyes, what is being ‘looked at’ in-scene
- Forward In Time Focus--importance of where something will be soon
I am limiting my thinking to picture plane output. Practitioners in many media are using a 3D scene to produce an end result that is 2D, presented on a picture-plane versus field-of-play (VR). There is more to think about in immersive and integrative environments--VR/AR.
Some additional properties of traditional art that we might be able to adapt to new CG render engine tools:
An underpainting is a foundational visual field that informs a deeper meaning to the primary image. It is applied first, then painted over. In watercolor, the tones read through via transparency. In oils, they tend to 'peek' around areas of opaque paint that do not fully cover the 'ground'. Either way, and in combination, the technique adds life and depth to a work, acting to unify the elements. For example, you can tone warmer and cooler areas of the visual field so that the paint above takes on these characteristics, such that separate objects then share a unified tonal structure and feel.
Underpainting serves also to literally ground a picture. It binds the floating illusion of the imagery to a physical thing that exists in your environment. Paintings are objects as well as illusions. Done well, they have physical depth, whether by transparency of paint layers of a three-dimensional surface--heavy paint, Impasto. They are both an image and an object.
Some additional properties of traditional art that we might be able to adapt to new CG render engine tools:
An underpainting is a foundational visual field that informs a deeper meaning to the primary image. It is applied first, then painted over. In watercolor, the tones read through via transparency. In oils, they tend to 'peek' around areas of opaque paint that do not fully cover the 'ground'. Either way, and in combination, the technique adds life and depth to a work, acting to unify the elements. For example, you can tone warmer and cooler areas of the visual field so that the paint above takes on these characteristics, such that separate objects then share a unified tonal structure and feel.
Underpainting serves also to literally ground a picture. It binds the floating illusion of the imagery to a physical thing that exists in your environment. Paintings are objects as well as illusions. Done well, they have physical depth, whether by transparency of paint layers of a three-dimensional surface--heavy paint, Impasto. They are both an image and an object.
Art on a substrate—paper or prepared canvas, a photograph printed or projected—provides a “ground”, literally and figuratively, a state of being, a sense of objective existence, a foundation on which the image is built. Ground brings imperfections born out of the creative process itself and the materials used into the work, such ‘noise’ provides an innate nod to objective reality. Early photography has grain that lent a sense of ‘object’. CG, on the other hand, without the rendered image, is simply black. It is nothing. It has no physicality on its own. I am imaging that within a render engine there can be subtle stochastic patterns that exist in the absence of important levels of illumination and provides subtle grounding. Perhaps the engine facilitates a sort of controlled layer blend, where varying local conditions determine the mix between the created visual aspects and the absence of them. Put another way, every pixel need not resolve to a single, physically-perfect result. Instead, each has multiple possible expressions that are contextually weighted. Like life.
The Importance Engine and Relational Weighting could add tools and techniques for narrative within rendered output:
- Ground -- 3D noise providing a presence to the field allowing areas less detailed to have a subtle unifying base of visual complexity
- Underpainting, a routine could add a depth of expression and integration to the next render engine, informing tonal and/or chromatic results, rule-based, scripted or by map
- Edge Condition Rendering, from find/lose to line generation, local contrast, could produce more holistic, connected-feeling, relational renderings and animations
We don't tend to think of CG images as objects themselves. I am curious to find out if a more self-aware treatment of the medium will add acceptance of it on its own terms—on screens without materiality. In a funny way, pixel-art does just that. It is self-referencing. It does not seek to 'fool' the viewer. Photo-realism in CG does. I'm suggesting that stepping back a bit from the fake-photo paradigm can open up render engine output to another aspect of realism--connection.
Even when my handmade renderings were going for realism, my least-favorite compliments were always 'wow, it looks just like a photograph'. I would thank the person and die a little inside. I knew I had failed to rise above a factual telling. My all-time favorite compliment was of a restaurant rendering. 'I can almost hear the plates and silverware' someone said.
Even when my handmade renderings were going for realism, my least-favorite compliments were always 'wow, it looks just like a photograph'. I would thank the person and die a little inside. I knew I had failed to rise above a factual telling. My all-time favorite compliment was of a restaurant rendering. 'I can almost hear the plates and silverware' someone said.
Artists build their narrative by modulating the relationships of the elements being described. They are biasing an otherwise perfect model of something to their very personal perception of it, the cultural valuation of the work being a product of others’ ability to perceive it as the artist did, or, that the work opened up the creative mind of the viewer to become aware of their perceptions and expectations. Bias is voice.
Relational weighting could control how the parts of a scene render within the output as a whole. It would control how edge conditions are treated, rendering subtle boundary relationships between what we would otherwise think of as equal, discreet objects. This becomes even more exciting a prospect as we think about animation. For example, if we see a person, we might weight their hands to have a strong relationship to what they touch. If they grab an object, it could render with sharper edges, more contrast than before, or other effects like increased levels of patterning or detail or saturation. The effect could be the same for the part of the table it was on, diminishing as the object leaves contact and proximity. Clarifying effects could follow a character’s gaze, allowing the viewer to share the focus of the story visually. These effects could be subtle yet enriching. They would add focus and story by connecting the elements to advance meaning.
I read an article recently about AI researchers working to improve their programs by going from trying to mimic the abilities of the adult brain to studying the creative learning and expression of children. Apparently Alan Turing suggested just this path in the 1950's. Always listen to Turing. Whenever we seek to recreate or extend something, we need to find the right model. When it comes to DCC and rendering, is photography the right model? That is the one we have been using, to be sure. Where has it gotten us? We can sometimes convince viewers that a render is a photograph. And some photographs are good. Some are not. Those that are not fail by having little or nothing to say. In that way some of CG is very much like photography.
Physically Based Rendering (PBR) is the most promising direction because it starts with an attempt at synthetic photography by utilizing a key trait of reality, that of energy propagation. A rendering is a map of 3D energy distribution within a defined 2D field. Sensing relative energy levels and interpreting meaning from them is how we perceive our environment. Imagination simulates these perceptions. We dream of fields and friends, fleeting impressions of breezes and colors and collectively, feelings. Perception opens the door to emotion. When there is a lack of stimulating input from the outside world we tend to create worlds in our minds, just to feel, just to be.
Art can be another door opening to more than what is in front of us. We understand that independent of medium. In CG the basis of the medium is simulated light, a great start. But then we apply a standard of photography, even going out of our way to model lens imperfections, a step back. In many ways CG has sought to be photography's virtual twin. By developing new tools for 3D artists to use we have an opportunity to make rendering and animation more, to make it something evolutionary and new by going beyond the camera model.
We can give PBR voice.
Relational weighting could control how the parts of a scene render within the output as a whole. It would control how edge conditions are treated, rendering subtle boundary relationships between what we would otherwise think of as equal, discreet objects. This becomes even more exciting a prospect as we think about animation. For example, if we see a person, we might weight their hands to have a strong relationship to what they touch. If they grab an object, it could render with sharper edges, more contrast than before, or other effects like increased levels of patterning or detail or saturation. The effect could be the same for the part of the table it was on, diminishing as the object leaves contact and proximity. Clarifying effects could follow a character’s gaze, allowing the viewer to share the focus of the story visually. These effects could be subtle yet enriching. They would add focus and story by connecting the elements to advance meaning.
I read an article recently about AI researchers working to improve their programs by going from trying to mimic the abilities of the adult brain to studying the creative learning and expression of children. Apparently Alan Turing suggested just this path in the 1950's. Always listen to Turing. Whenever we seek to recreate or extend something, we need to find the right model. When it comes to DCC and rendering, is photography the right model? That is the one we have been using, to be sure. Where has it gotten us? We can sometimes convince viewers that a render is a photograph. And some photographs are good. Some are not. Those that are not fail by having little or nothing to say. In that way some of CG is very much like photography.
Physically Based Rendering (PBR) is the most promising direction because it starts with an attempt at synthetic photography by utilizing a key trait of reality, that of energy propagation. A rendering is a map of 3D energy distribution within a defined 2D field. Sensing relative energy levels and interpreting meaning from them is how we perceive our environment. Imagination simulates these perceptions. We dream of fields and friends, fleeting impressions of breezes and colors and collectively, feelings. Perception opens the door to emotion. When there is a lack of stimulating input from the outside world we tend to create worlds in our minds, just to feel, just to be.
Art can be another door opening to more than what is in front of us. We understand that independent of medium. In CG the basis of the medium is simulated light, a great start. But then we apply a standard of photography, even going out of our way to model lens imperfections, a step back. In many ways CG has sought to be photography's virtual twin. By developing new tools for 3D artists to use we have an opportunity to make rendering and animation more, to make it something evolutionary and new by going beyond the camera model.
We can give PBR voice.
You must be logged in to post a comment. Login here.
About this article
What lessons from painting and other forms of illusionary imaging can we translate to new tools for our newest mode of visual communication—3D rendering?
visibility857
favorite_border11
mode_comment2
Ernest - thank you so much for sharing this with us! I strongly agree that many of these points tend to be overlooked in the pursuit of "realism", as though that was the end goal. There is a lot of potential here of course, but I was struck especially by the concepts of "ground" and "underpainting". I see "ground" in some really good CGI, but I suspect it takes a lot of time to get it right in 3D as there are so many variables. At least it takes me a long time :) And, underpainting is a technique that I often try to work in as well, but almost always in post-production which really is quite backwards. It's very difficult to achieve in "raw", or as some would say, "honest" 3D.
A very timely read - thanks again, and I hope you are well!
Thanks Ernest for the great contribution to the site! Much appreciated.