This is the first of a three part series on the mathematics of 3D graphics. I'm not going to put out a bunch of formulas, but just attempt to explain in layman's terms the massive amounts of calculations to display a 3D image. Here is a typical 3D image I created:
Objects and Cameras The first step into understanding how computers can create images like this is to understand how the "scene" is represented mathematically to the computer. First, we define all the the object in the scene mathematically. In analytic geometry we learn to define many basic shapes in terms of mathematical formulas.
For example, a sphere of radius 1 can be defined as x2 + y2 + z2 = 1. Every basic shape can be similarly defined in terms of x, y, and z. The other thing we need is a camera position. To create our 2D image of our 3D world, we project lines (at least one per pixel) from a camera position into the 3D scene. We then calculate where these projected lines intersect with the objects in the scene and that determines the color of that particular pixel. If you recall from Algebra, finding the intersection point is a matter of finding x, y, and z values such that the equation of the view line equals the equation of the object it is intersecting. Solving these "systems of equations" is done millions, maybe even billions of times in order to "render" the image.
Now the color of the pixel will be the color of the object at that particular point. Color is determined by two things: maps and light sources (represented by the yellow arrow). If the line does not intersect any object, the pixel is assigned a background color, in the above case black. Before explaining maps and light sources, we need to look at other objects besides simple "primitives" as the basic shapes above are called. Complex Objects Now we know how to calculate intersections with primitive objects, but lets face it, most of the world is not made up of cones and spheres, it is a lot more complicated than that. Two tricks of the trade are commonly used to model "natural" objects: Spline meshes and fractals.
Spline meshes are curved surfaces defined by four beginning and ending points and four beginning and ending angles. A multitude of spline meshes connected to one another can model almost any smooth surface, whether it be cars, people, weird alien creatures, what have you. I described the mathematics in a previous essay "The Mathematics of String Art"
Fractals, on the other hand are rough and random. One way to describe a fractal shape is to try graphing a coin flip. Start at 0 and at each flip go up a notch if it is heads, down a notch if it is tails. If you do it enough times the resulting graph will look like a mountain range of some sort. Do it in 3 dimensions instead of two and the result is an artificial mountain range. Now that is not a true definition of a fractal. A fractal function is one that is infinitely complex such that any portion of the function is as complex as the whole function. So mountains are made up of randomly positioned rocks, rocks are randomly positioned grains of sand, the grains of sand are randomly positioned molecules, etc. Fractals are simple formulas repeated over and over to infinity in the mathematical sense, but in the 3D graphics sense they are repeated over and over to the point where they make no difference in the detail. A mountain range like the one pictured above may be simply defined. Graphic artists use fractals to describe many other natural objects, like clouds, trees, bushes, etc. Now once you have the 3D scene described, and you start calculating intersections with imaginary lines from the camera, you have to know the "color" of the objects in the scene. The color of the object is defined by two things: lighting and mapping. Lighting Models Lights are defined just like cameras. To determine how an object looks in response to its lighting we go back to calculus and determine the vector perpendicular to the tangential angle of the object at each point. In other words we need to know where it is facing. If the object at that point is facing a light, it is going to "shine", if it is pointing away, it will be in shadow. At least that is the simple case. In the real world, different objects behave differently in light, so 3D artists define an objects behavior to light: Diffuse color, specular color, highlight, ambience, transparency, translucence (glow), reflection, refraction, bumpiness. Each of these lighting characteristics have their own physics models.
For example, reflection mirrors the environment around it, so a reflection sphere is like ball bearing. Refraction is like a lens, so a refraction sphere is like a glass marble, picking up the background behind it and distorting it according to the laws of optics. All of these lighting variables can be combined to define an objects looks. A really shiny glass marble may have both reflection and refraction characteristics for example.
Note that all the above examples are done with a single light source. Lights are like objects themselves and can be defined as a point, a spotlight, or an object of any shape that can give off light. all the rules are the same. Mapping A "map" in 3D graphics is a 2 dimensional image that is wrapped over an object. A good example is a world map is a map of a globe, it is often distorted, like the huge size of Greenland on a Mercator Projection. A map tells us what color every point on a surface is.
The above map is an example of Diffuse Color mapping. Maps can represent any kind of lighting model mentioned above.
Here are three kinds of maps. The first is a diffuse map applied to a sphere. The second sphere has reflection maps and transparency maps applied, and the third is a "bump" map which can be used to add details to a shape that would otherwise be too complex. Modeling of humans almost always involves multiple maps. A texture map will show changes to the color of skin. A Bump map will simulate flaws such as moles, pimples or scars. Transparency maps are used primarily for hair like eye lashes, eye brows, bangs, etc which are too complex to model one hair at a time. Now with this information we have the object tools, the lighting tools and the texturing tools to "render" (let the computer create) a realistic picture:
Now on a good desktop computer an image this big and this complex may take five to 10 minutes. A good artist would not stop at just the render, they would bring up the image in a paint program and add details like buttons on the shirt or some bushes to hide the transition between the road and the mountains in the background, both of which I neglected to do. I might add realistic flaws to the picture to avoid the overly clean look. That is where the "art" of 3D graphics comes in. Now rendering stills is a fun hobby, but there is not much money in it. The money making is in either computer games and computer animation, both of which are much different than stills. Computer animation has a whole different set of problems. First, post production touch-ups are too time consuming to be done frame by frame, so you must have a near perfect image in every frame. But that is the easy part, synchronizing lips to speech, realistic motion of clothes and hair, realistic motion of liquid. All of these are huge problems in computer animation. I will cover some of this in part 2. Computer games are even more complicated. You cannot wait 10 minutes a frame for a computer game. They require real time animation, a new frame each fraction of a second, and they cannot use up a ton of memory either. I will cover computer game 3D graphics in part 3. |