What’s the story behind NVIDIA’s new Iray AI?

What’s the story behind NVIDIA’s new Iray AI?

feature | Posted by John Montgomery | May 17, 2017

Much of the keynote address by NVIDIA CEO and founder Jensen Huang focused on machine learning and applications that weren’t central to the visual effects and post industry. However, there was one demonstration which showed how this tech can be used to lead to much faster resolution of ray traced images and removal of noise. They are calling this tech Iray AI and the best way to see it in action is to watch the keynote presentation below (be sure to watch in full screen to view the results more clearly).

After watching the high level presentation, I was quite curious about what was happening. On the exhibit floor later in the day, they had the same split-screen demo running on a workstation with dual P100 graphics cards with the camera moving from one position to another. What was impressive is that the image on the right side quickly (within one or two seconds) resolved to a noise-free state that is much easier to make creative judgements on than the noisy image which would require more iterations to get to the same point.

In this interactive mode on the exhibition floor, the image didn’t go through enough iterations to achieve final render quality as the automated camera would move to a new position before significant resolution of noise. In the “AI” image on the right, there was significant edge detail yet it exhibited characteristics similar to running a median filter on the image with large areas with lower detail. Both images continued to resolve and improve with more detail over time, but the camera moved positions before this got too far. It’s important to note that the quickly denoised image is far superior to judge lighting and shading than the noisy ray-traced render still resolving.

So what’s going on here? Phil Miller, NVIDIA’s Director of Product Management, was kind enough to leave a rendering meeting with other renderer companies to let us in on a few of the details before I hopped a flight back to LA.

Using the NVIDIA DGX-1 supercomputer, the team trained a neural network to translate a noisy image into a clean reference image. Once trained, the network takes a fraction of a second to clean up noise in almost any image — even those not represented in the original training set. As a user there’s no need for your own neural network as the functionality will be built into Iray and the Iray SDK. The tech will also eventually find it’s way to Mental Ray.

On the Pascal system on the exhibit floor, the render needed to progress through a couple iterations before being able to use the AI, which was the several second delay we noticed. But once it does, the cleanup using the AI takes approximately 100ms according to Miller.

The tech works especially well on interiors and darker scenes (as you might be able to see in the YouTube video above when the camera enters the car). With traditional iterative ray tracing, “for scenes that are really dark, the pixels fill in so the image gets lighter over time,” says Miller, “but with this it is the right lighting level right away and detail fill in over time.”

This isn’t just for interactive rendering at artist workstations, as a fully converged noiseless final frame renders is about four times faster; if a final render took an hour before, with the new AI it will take approximately 15 minutes.

So how was this trained?

NVIDIA took several hundred rendering jobs with diverse content, textures, and lighting conditions. Variety in texture selection was especially important because they needed to train the system as to what is noise and what is a texture (good noise). They trained it with a full progression of renders of each scene from the noisy first iteration to the final clean image, producing a series of image sets for each scene. “The deep learning algorithm basically learns how an image converges,” says Miller, “and can then predict what another image is going to do.”

In addition, they found that providing a different group of image sets for interactive rendering and final frame rendering also proved useful. “It’s more discriminating as to what needs to be done for final frame,” Miller relates. “For example, it can make mistakes early on with noise but in the end, you can’t make a mistake.”

According to the NVIDIA blog, the neural network was trained in less than 24 hours using more than 15,000 image sets with varying amounts of noise. To be clear, the scene of the car shown during the keynote and in this article were not part of any of the image sets used for training.

I mentioned above that during the interactive demonstration, it took a second or two before the noise started being removed from the render. This is by design, as the Iray API to the host application provides the ability for the user (or app) to set the number of render iterations before the denoise algorithm is applied. It makes sense, as the first progressive render displayed has far too much noise, so if a denoise was used at that point it would result in a smushy watercolor image. According to Miller, the denoise can become effective after only five or six iterations.

“We’re providing APIs in our own renderer so that the application can choose to wait for a certain number of iterations and then maybe blend a certain percentage and then increase it over time,” says Miller. “Whatever they (the host application) feel is necessary for the user to make the denoise helpful and not jarring. All those controls are there, but it’s up to the application whether to expose it.”

Unlike some other denoise procedures which require a lot of additional buffers, NVIDIA is only using one additional buffer to obtain their results.

With Iray, multiple GPUs in a single machine (such as the dual P100 on the exhibit floor) allow each single iteration to happen faster, but it doesn’t provide multiple iterations. With multiple machines rendering a scene together, each one is producing an iteration. For example, with five machines rendering at once you’ll get those aforementioned five iterations right at the beginning so the denoise can effectively be on all the time.  “With a cluster of VCAs, you could have enough so that your reflections are solid and on the outside of the car you could actually move it around and judge the reflections interactively without any noise,” says Miller.

One interesting side benefit with the image being cleaned and smoothed this way is that server hosted to client remote sessions are much more responsive. Previously, with the noise changing dramatically and every pixel changing, there was no continuity for the run-length encoded compression that’s required for streaming display video. Now, according to Miller, the smoothness/frame rate of the video “has tripled and it’s a much nicer experience for the artist.”

They hope to have this shipping as part of Iray in the fall.

Miller feels that NVIDIA’s approach is something that most ray tracers which do progressive sampling will be able to use as well. The team at NVIDIA will be publishing a paper for SIGGRAPH in August which describes what they’re doing as well as already sharing the info with developers of other renderers who were attending GTC. In fact, I ran into several of them at the NVIDIA booth checking out the results close-up. NVIDIA won’t be supplying their own library from the AI learning to other manufacturers, but will be describing how they achieved their results.

Another paper to check out at this year’s SIGGRAPH in Los Angeles….






Leave a Reply