Basic Mathematical Knowledge for Graphical Engineering
A Principal Engineer's Foundational Guide to the Mathematics Underpinning Real-Time Rendering
"You cannot reason about a renderer you cannot derive. Every rendering technique, from a single rasterized triangle to a full path-traced reservoir resampling pipeline, is the application of three centuries of mathematics to the problem of moving photons from a virtual world into a human eye. Learn the math, and the engine becomes obvious."
Abstract
Real-time computer graphics is, at its core, applied mathematics with a deadline. Every frame, the GPU executes billions of arithmetic operations whose correctness rests on a small set of foundational mathematical structures: linear algebra, projective geometry, calculus, probability, and signal processing. Yet the modern engine engineer is rarely taught these subjects in a unified, graphics-oriented form. Textbooks treat them in isolation; tutorials skip the derivations; production codebases hide them behind opaque utility functions. This paper attempts to close the gap. We present a comprehensive, self-contained survey of the mathematical knowledge required to build, debug, and reason about a modern AAA renderer, from the first vector dot product up to the rendering equation, Monte Carlo importance sampling, quaternion blending, and the numerical stability concerns that haunt every shader. We motivate each topic with its concrete graphical application, derive the formulas from first principles where it aids understanding, and provide reference C++23 implementations consistent with the conventions of the Graphical Playground (GP) Engine's in-house mathematics library. This document is intended both as a study reference for engineers entering the field and as the foundational curriculum on which all subsequent GP educational material will build.
Keywords: linear algebra, quaternions, projective geometry, rendering equation, BRDF, Monte Carlo integration, signal processing, color spaces, numerical stability, SIMD, real-time rendering, GP Engine
Table of Contents
- Introduction and Motivation
- Number Systems and Floating-Point Reality
- Linear Algebra: The Lingua Franca of Graphics
- Coordinate Systems and the Transformation Pipeline
- Affine and Projective Transformations
- Rotations: From Euler Angles to Quaternions
- Trigonometry and Analytic Geometry
- Calculus for Graphics Engineers
- Probability, Statistics, and Monte Carlo Methods
- The Rendering Equation and Light Transport
- Color Theory and Color Spaces
- Signal Processing and Anti-Aliasing
- Numerical Methods and Stability
- Spatial Data Structures and Computational Geometry
- Physics and Animation Mathematics
- The GP Math Library: Design Rationale
- Curriculum Roadmap and Further Study
- References
1. Introduction and Motivation
1.1 Why a Graphics Engineer Cannot Outsource the Math
Modern engines bundle mathematical utilities into libraries: GLM for OpenGL-style C++, DirectXMath for D3D, Eigen for the academic crowd, Unreal's FMath and FVector/FMatrix, Unity's Vector3 and Quaternion, and so on. A casual reader of these libraries might conclude that the math is solved, that one need only call glm::lookAt() and the framework will produce the correct view matrix.
This is dangerously incomplete. The libraries are correct, but they are also opinionated. Each one bakes in conventions, row-major versus column-major, right-handed versus left-handed, depth in versus , pre-multiplication versus post-multiplication, that are invisible to the caller but catastrophic when crossed. A glm::lookAt matrix multiplied with a DirectX-convention projection matrix produces a scene that renders inside-out, with depth test inverted, and where culling is upside down. The error is a one-character convention mismatch; finding it without understanding the underlying math takes days.
More fundamentally, every nontrivial rendering technique requires the engineer to derive new math, not just apply old. Implementing screen-space reflections requires deriving the reflection vector from the partial derivatives of the depth buffer. Implementing physically based shading requires deriving the half-vector formulation of microfacet BRDFs and choosing between Smith and V-cavity shadowing. Implementing temporal anti-aliasing requires deriving sub-pixel jitter sequences from low-discrepancy number theory. None of these are library calls. All of them rest on the foundations this paper covers.
1.2 Why GP Writes Its Own Math Library
The Graphical Playground engine ships its own mathematics module, source/runtime/core/public/maths/ (referenced in the Engine Architecture post). We did not adopt GLM, Eigen, or DirectXMath. The reasons are pedagogical first and engineering second:
- Transparency. Every formula in GP's math library is derivable by a student reading the source. There is no hidden SIMD intrinsic that obscures the mathematical operation; the SIMD specialization sits next to the scalar reference implementation.
- Convention discipline. GP standardizes on right-handed, column-major, -up world space, -up tangent space, depth, pre-multiplication: . These conventions are documented in the public headers and enforced by the type system, no implicit conversions across handedness.
- Educational value. A student reading our
quaternion.hpplearns how slerp works, not just that it exists. Comments cite the original papers; tests prove the algebraic identities. - Performance autonomy. When the time comes to vectorize for AVX-512 or NEON, we are not blocked on a third-party release cadence.
This paper is written from inside that library: every formula presented can be cross-referenced to a function in core/public/maths/, and every function in core/public/maths/ derives from a formula presented here.
1.3 Reading Order
The paper is organized in dependency order: section requires only sections through . A reader new to graphics should read linearly. A reader with prior background can skip §2-§3 and jump to §6 (quaternions), §10 (rendering equation), or §12 (signal processing) as needed. Every section closes with a short list of canonical references (collated in §18) for deeper study.
2. Number Systems and Floating-Point Reality
2.1 IEEE 754 in Three Sentences
A 32-bit single-precision float (float / f32) stores a number as
where is the sign bit, is a 23-bit mantissa, and is an 8-bit biased exponent. This buys roughly 7 decimal digits of precision spanning , with a smallest normal value of .
A 64-bit double (double / f64) extends mantissa to 52 bits and exponent to 11 bits, yielding roughly 15-16 decimal digits spanning .
Half precision (f16, used in HDR textures and tensor cores) has 10 mantissa bits and 5 exponent bits, yielding decimal digits and a max value of .
2.2 Why This Matters: The Open World Origin Problem
A position vector stored in f32 has different precision at different magnitudes. The gap between consecutive representable floats near is (one ULP, unit in the last place). At (10 km from the world origin) it is , almost a millimeter. At (1000 km) it is meters.
This is why open-world games like Star Citizen and No Man's Sky implement camera-relative rendering (also called origin shifting or floating origin): they translate the world so that the camera is always at , performing the subtraction in f64 on the CPU before feeding f32 positions to the GPU. The math:
without this trick, vertices wobble visibly past a few kilometers from origin.
2.3 The Cancellation Trap
Subtraction of nearly equal floats catastrophically loses precision:
with only 1 significant bit remaining. This drives a number of graphics-specific reformulations:
- Quadratic formula: the textbook catastrophically cancels when . The numerically stable form selects the sign:
This appears in every ray-sphere intersection routine.
- Depth comparison: comparing two
f32depths is unreliable when both are near 1.0. Reverse-Z (storing depth as ) shifts precision back to the far plane where it is needed; it is now standard in AAA engines [Reed 2015].
2.4 Fixed-Point and Integer Math
Not everything is a float. Texture coordinates in GPU samplers use 8.8 fixed-point for sub-texel addressing. Color buffers use 8-bit unsigned normalized (unorm8): the byte represents the value . Mip-level computation uses fixed-point in hardware. The unorm decode is exactly:
Note the divisor is , not , a subtlety that catches students implementing custom decoders.
2.5 Practical Rules
- Use
f32for vertex positions, UVs, normals;f64only on the CPU for world-scale arithmetic. - Never compare floats with
==. Use|a - b| < εwith chosen relative to the magnitude (relative tolerance) or as a fixed ulp count. - Beware
NaNpropagation: a single uninitialized normal turns the entire shader output to black. Initialize. - Denormals (
subnormals) are slow on many CPUs; flush-to-zero is set by default in shader compilers and recommended on the CPU side too.
3. Linear Algebra: The Lingua Franca of Graphics
Every graphical operation, rotating a model, projecting it onto a screen, lighting it, transforming a normal, blending two animation poses, is expressible as linear algebra over real-valued vectors. We cover the operations that appear most frequently in rendering code.
3.1 Vectors
A vector is an ordered -tuple of real numbers. In graphics, almost without exception. We write column vectors:
Magnitude (length):
Normalization (producing a unit vector ):
If , normalization is undefined; production code checks for this with a small epsilon to avoid NaN.
3.2 The Dot Product
Given :
The dot product is the most-used scalar in all of rendering. It answers four questions simultaneously:
| Question | Formula | Used In |
|---|---|---|
| Are these vectors aligned? | Lambertian shading, backface culling | |
| What is the projection of onto ? | Reflection, shadow projection | |
| What angle separates them? | Cone tests, anisotropic lighting | |
| Are they perpendicular? | Plane equations, basis construction |
The single most common pattern in shading: , the Lambertian cosine term, where is the surface normal and is the direction to the light. The clamp to zero discards back-facing light.
// GP-style scalar reference implementation
constexpr f32 dot(const Vec3f& a, const Vec3f& b) noexcept {
return a.x * b.x + a.y * b.y + a.z * b.z;
}
3.3 The Cross Product
For :
The result is a vector perpendicular to both inputs, with magnitude (the area of the parallelogram they span). The direction follows the right-hand rule.
Used for:
- Computing face normals from triangle edges: .
- Building orthonormal bases: given a normal and a tangent , the bitangent is .
- Triangle area: .
- Front/back face determination: the sign of where is the view direction.
3.4 Vector Operations Summary
| Operation | Formula | Result | Cost |
|---|---|---|---|
| Add | Vec3 | 3 add | |
| Scale | Vec3 | 3 mul | |
| Dot | scalar | 3 mul + 2 add | |
| Cross | Vec3 | 6 mul + 3 sub | |
| Length squared | scalar | 3 mul + 2 add | |
| Length | scalar | + 1 sqrt | |
| Normalize | Vec3 | + 1 rsqrt + 3 mul | |
| Linear interp | Vec3 | 6 mul + 3 add |
Lerp is so central that it deserves its own line:
The first form (single multiply) is preferred when should be exactly returned at ; the second form is symmetric in and but multiplies twice. Both forms exist in production code; choose deliberately.
3.5 Matrices
A matrix is a rectangular array. In graphics we use , , and . The fundamental operation is matrix-vector multiplication:
For a matrix:
Matrix-matrix multiplication . For matrices this is 64 multiplies and 48 adds, the unit operation of all transformation pipelines.
3.6 Memory Layout: Row-Major vs Column-Major
The same matrix can be stored two ways in memory:
Row-major (C, C++, DirectX convention):
[ m00 m01 m02 m03 | m10 m11 m12 m13 | m20 m21 m22 m23 | m30 m31 m32 m33 ]
Column-major (Fortran, OpenGL, Metal, GLSL convention):
[ m00 m10 m20 m30 | m01 m11 m21 m31 | m02 m12 m22 m32 | m03 m13 m23 m33 ]
The mathematics is unaffected; the memory layout is a storage decision. The peril is mismatch: passing a row-major matrix to a column-major shader transposes it implicitly, and a transposed transformation is its inverse for orthonormal matrices, exactly the wrong answer.
GP standardizes on column-major storage (matching the GLSL / SPIR-V default) and column vectors with pre-multiplication: . Composing transforms reads right-to-left:
3.7 Special Matrices
Identity matrix : ones on diagonal, zeros elsewhere. and .
Transpose : swap rows and columns: . For a product: .
Inverse : the matrix such that . Exists if and only if . For a matrix :
For larger matrices, the closed-form via cofactor expansion is impractical; LU decomposition or specialized inverse formulas are used. Crucial graphics fact: for an orthonormal matrix (pure rotation), the inverse is the transpose:
This is why view matrices are cheap to invert: they are translation + rotation, and we invert each piece separately.
Determinant measures signed volume change. For a matrix:
A negative determinant indicates a handedness flip, and for normals this means winding order inversion. Engines use this to detect mirrored skeletons.
3.8 The Normal Matrix
A subtle but critical distinction: when transforming a vertex position by matrix , the corresponding normal is not transformed by but by:
This is because normals are covectors (one-forms): they transform inversely so that the dot-product with tangent vectors stays invariant. If contains only rotation and uniform scale, up to the scale factor and we can use directly. Under non-uniform scale, using for normals warps lighting visibly; the inverse-transpose is mandatory.
Reference: PBR Book §3.10 (Applying Transformations: Normals)
3.9 Useful Reading
- Strang, Introduction to Linear Algebra, 5th ed. The canonical undergraduate reference.
- 3Blue1Brown, Essence of Linear Algebra video series. Visual intuition. (youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)
- Lengyel, Foundations of Game Engine Development, Volume 1: Mathematics.
4. Coordinate Systems and the Transformation Pipeline
4.1 The Cardinal Spaces
A vertex moves through five canonical coordinate spaces between the artist's modeling tool and the user's monitor:
LOCAL (object) ──[M_model]──> WORLD ──[M_view]──> VIEW (camera) ──[M_proj]──> CLIP ──[/w]──> NDC ──[viewport]──> SCREEN
| Space | Origin | Used For |
|---|---|---|
| Local / Object | Mesh authoring origin | Storing vertex data |
| World | Game world origin | Physics, gameplay queries |
| View / Camera / Eye | The camera | Lighting (sometimes), screen-space effects |
| Clip | Origin shifted by projection | Frustum clipping (hardware) |
| NDC (Normalized Device Coordinates) | Center of cube | Hardware-internal rasterization |
| Screen / Window | Pixel grid | Final output |
Each transition is a matrix multiply (or, in the clip→NDC step, a perspective divide: ).
4.2 Handedness
A coordinate system is right-handed if the cross product when measured with the right hand. It is left-handed if measured with the left.
The major conventions in industry:
| System | Forward | Up | Right | Notes |
|---|---|---|---|---|
| OpenGL / Vulkan world | Right-handed | |||
| OpenGL clip | Right-handed, | |||
| Vulkan clip | Right-handed, | |||
| DirectX | Left-handed, | |||
| Unreal Engine | Left-handed, -up | |||
| Unity | Left-handed | |||
| Blender / GP world | or | Right-handed, -up |
GP follows Blender / scientific convention: right-handed, -up world space. We acknowledge this is the minority among game engines; the choice is motivated by physical intuition (gravity along ) and interoperability with engineering and CAD pipelines.
4.3 The View Matrix
The view matrix transforms world-space coordinates into camera-space coordinates: it is the inverse of the camera's world transform.
Given a camera at world position , looking at target , with up reference (typically world up):
For a right-handed system with camera looking down :
The top-left is the rotation (an orthonormal basis); the top-right column is the camera's negated position projected onto each basis vector; the bottom row is homogeneous.
4.4 The Projection Matrix
There are two common projections.
Perspective projection (the eye sees a frustum):
(left-handed, convention; Vulkan / DirectX style)
where is the vertical field of view, is the aspect ratio, is the near plane and is the far plane.
The bottom row is , meaning . The subsequent perspective divide scales and inversely with depth, producing the foreshortening that defines a perspective image.
Orthographic projection (parallel rays, used for shadow maps and 2D):
where are the extents of the view volume. The bottom row is , so ; no perspective divide effect.
4.5 Reverse-Z Depth
The standard projection wastes precision: with f32 depth and far/near ratios common in games (1000:1), 99% of the depth buffer's distinguishable values fall within the first 10% of the view distance, leaving the far field with severe z-fighting.
Reverse-Z swaps the mapping: the near plane maps to and the far plane to . This is a one-line change in the projection matrix and an inverted depth comparison (GREATER instead of LESS). The win comes from the multiplicative interaction of f32's exponent representation with the projection's hyperbolic depth, which now concentrates precision near the far plane where it is needed [Reed 2015]. GP enables Reverse-Z by default.
Reference: NVIDIA Developer Blog: Visualizing the Z-Buffer's Depth Distribution
5. Affine and Projective Transformations
5.1 The Homogeneous Coordinate Trick
A pure matrix can rotate and scale, but it cannot translate, for any linear . To unify translation with rotation/scale, we extend to with a fourth coordinate :
A direction has so translation doesn't affect it; a point has so translation does. The translation matrix is:
Multiplying gives for points, and for directions, exactly the desired behavior.
5.2 The Five Affine Building Blocks
Every affine transformation is a composition of these:
Translation :
Uniform scale and non-uniform scale :
Rotation about axis by angle (Rodrigues' formula, derived in §6).
Shear (rare in games, common in 2D UI):
Reflection about a plane (a sign flip in scale combined with a rotation).
5.3 TRS Composition
The conventional model matrix is the composition :
Reading right-to-left: first scale the local mesh, then rotate, then translate to its world position. The composed matrix has the structure:
The decomposition (extracting from a given ) is non-trivial in general, especially under non-uniform scale and shear. Engines that store transforms as tuples avoid this problem entirely.
5.4 Why GP Stores Transforms as TRS Tuples
A 4×4 matrix has 16 floats and embeds rotation, scale, and translation in a non-orthogonal way; reconstructing rotation from a scaled matrix is lossy under non-uniform scale. GP's Transform struct is:
struct alignas(16) Transform {
Vec3f position; // 12B
f32 _pad0; // 4B (alignment)
Quaternion rotation; // 16B (unit quaternion)
Vec3f scale; // 12B
f32 _pad1; // 4B
}; // 48B = 3 cache-line-friendly halves
This is 48 bytes versus 64 for a 4×4 matrix, decomposes cleanly, blends correctly (slerp on the quaternion, lerp on and ), and allows the matrix to be derived only when needed. The matrix is then reconstructed:
Mat4f Transform::ToMatrix() const noexcept {
Mat3f R = rotation.ToMat3();
Mat4f M;
M.Col(0) = Vec4f(R.Col(0) * scale.x, 0.0f);
M.Col(1) = Vec4f(R.Col(1) * scale.y, 0.0f);
M.Col(2) = Vec4f(R.Col(2) * scale.z, 0.0f);
M.Col(3) = Vec4f(position, 1.0f);
return M;
}
This pattern is mirrored in Unreal's FTransform and Unity's Transform.
6. Rotations: From Euler Angles to Quaternions
Rotations deserve their own section because they are the single most error-prone topic in 3D graphics. A small rotation bug compounds across animation frames into visible jitter; a mistuned interpolation causes visible "snapping" at angle boundaries; a misapplied basis change inverts the world. Every senior engineer has a story about gimbal lock.
6.1 Euler Angles
Euler angles parameterize rotation by three sequential angles about three axes, typically named pitch, yaw, roll (or roll, pitch, yaw depending on the field). The rotation is the product of three elementary axis rotations:
Pros: human-readable, three intuitive numbers, used by artists and animators in tools.
Cons (numerous):
- Gimbal lock: when two rotation axes align (e.g., ), one degree of freedom is lost. The system becomes singular; inversion fails.
- Order matters: in general. The convention (XYZ, ZYX, ZYZ, ...) must be specified.
- Interpolating Euler angles linearly produces non-uniform angular velocity and visible jitter at gimbal-lock boundaries.
GP exposes Euler angles only as a display representation; internally all rotations are quaternions.
6.2 Rotation Matrices
A rotation matrix satisfies and . The set of such matrices forms the Special Orthogonal group .
Elementary rotations:
Pros: composes naturally with other linear transforms, GPU hardware applies them in one matrix multiply.
Cons: 9 floats with 6 redundant constraints; numerical drift away from orthogonality after many compositions; no clean interpolation; expensive to blend.
6.3 Axis-Angle and Rodrigues' Formula
Any 3D rotation can be expressed as a rotation by angle about a unit axis . The corresponding matrix is given by Rodrigues' rotation formula:
where is the cross-product matrix:
Expanding:
with , . Axis-angle uses 4 floats; it is intuitive but does not compose by simple multiplication.
6.4 Quaternions
A quaternion is a four-component object with multiplication rules
discovered by Hamilton in 1843 and burned into Brougham Bridge in Dublin. The unit quaternions () form the group , which is a double cover of : every rotation corresponds to two quaternions, and , representing the same orientation.
A unit quaternion encoding rotation by angle about unit axis :
Note the half-angle: this is essential and often the source of bugs.
Quaternion product (Hamilton product):
This composes rotations: rotates first by then by . The product is non-commutative, mirroring the non-commutativity of 3D rotations.
Conjugate: . For a unit quaternion, .
Rotating a vector by a quaternion:
(treating as a quaternion with ). Expanded for performance:
This is the form used in shaders, 18 multiplies and 12 adds, faster than constructing a matrix when only one vector is to be rotated.
Quaternion-to-matrix (when many vectors are to be rotated, build the matrix once):
constexpr Mat3f Quaternion::ToMat3() const noexcept {
const f32 xx = x*x, yy = y*y, zz = z*z;
const f32 xy = x*y, xz = x*z, yz = y*z;
const f32 wx = w*x, wy = w*y, wz = w*z;
return Mat3f{
{ 1 - 2*(yy+zz), 2*(xy - wz), 2*(xz + wy) },
{ 2*(xy + wz), 1 - 2*(xx+zz), 2*(yz - wx) },
{ 2*(xz - wy), 2*(yz + wx), 1 - 2*(xx+yy) }
};
}
6.5 Spherical Linear Interpolation (Slerp)
Quaternion interpolation has its own technique because linear interpolation between two quaternions, followed by renormalization, produces non-uniform angular velocity. Slerp [Shoemake 1985] gives constant angular velocity:
where is the angle between them on the 4-sphere.
Two crucial details:
- If , negate one of the quaternions: this picks the short way around the 4-sphere. Otherwise the rotation takes the long way (>180°), causing visible flips.
- When is near zero, and the formula is numerically unstable. Fall back to linear interpolation (
lerp + normalize, also called nlerp) for small .
Quaternion Slerp(Quaternion q0, Quaternion q1, f32 t) noexcept {
f32 cos_omega = Dot(q0, q1);
if (cos_omega < 0.0f) { q1 = -q1; cos_omega = -cos_omega; }
if (cos_omega > 0.9995f) return Normalize(Lerp(q0, q1, t)); // Fallback
const f32 omega = std::acos(cos_omega);
const f32 sin_omega = std::sin(omega);
const f32 a = std::sin((1.0f - t) * omega) / sin_omega;
const f32 b = std::sin(t * omega) / sin_omega;
return q0 * a + q1 * b;
}
6.6 When to Use Which Representation
| Use case | Recommended representation |
|---|---|
| Storage, animation keyframes | Quaternion (16B, no drift) |
| Composition (chain of rotations) | Quaternion (16 mul/12 add per compose) |
| Many vectors to rotate | Matrix (cache-friendly batch op) |
| Single vector to rotate | Quaternion sandwich form |
| Artist-facing UI | Euler angles (display only) |
| Smooth interpolation | Slerp on quaternions |
| Specifying rotations programmatically | Axis-angle, then convert |
Reference: Sommer et al., Why and How to Avoid the Flipped Quaternion Multiplication (arXiv:1801.07478) sorts out conventions across libraries.
7. Trigonometry and Analytic Geometry
7.1 The Unit Circle Identities
The graphics-relevant trig identities, memorized by every engineer:
The half-angle identities, central to quaternions:
The dot product as cosine:
This is the workhorse of every shading model in graphics.
7.2 The Law of Cosines
For a triangle with sides and angle opposite :
A surprisingly common appearance: deriving the half-vector formulation in microfacet BRDFs. The angle between view and light directions and relates to the half-vector via:
7.3 Atan2: The Right Way to Find an Angle
atan2(y, x) returns the angle in of the point from the positive x-axis. Crucially, it handles all four quadrants correctly, unlike the textbook . Use it for:
- Spherical coordinates:
- 2D angle between vectors: (sign correctness in 2D)
- Direction-to-cubemap-face determination
7.4 Lines, Planes, and Rays
A ray is parameterized as with and unit-length.
A plane is defined by a unit normal and offset :
Points satisfying this lie on the plane. The signed distance from any point to the plane is:
A positive value places on the side points to. This single formula powers frustum culling: a sphere is fully outside a plane if .
7.5 Ray-Plane Intersection
Substituting :
If , the ray is parallel to the plane (no intersection or infinitely many). If , the plane is behind the ray origin.
7.6 Ray-Sphere Intersection
For sphere :
Expanding and letting :
(using for a unit ray direction). This is a quadratic in with discriminant:
If : no intersection. If : two solutions . Take the smaller positive one for the closest intersection.
7.7 Ray-Triangle Intersection (Möller-Trumbore)
The standard, branchless ray-triangle test [Möller and Trumbore 1997]:
// Returns t (distance along ray) and barycentric (u, v) on hit, or no-hit.
struct HitInfo { f32 t, u, v; bool hit; };
HitInfo IntersectRayTriangle(Vec3f o, Vec3f d,
Vec3f v0, Vec3f v1, Vec3f v2) noexcept {
constexpr f32 EPSILON = 1e-7f;
const Vec3f e1 = v1 - v0;
const Vec3f e2 = v2 - v0;
const Vec3f h = Cross(d, e2);
const f32 a = Dot(e1, h);
if (std::abs(a) < EPSILON) return { 0,0,0,false }; // parallel
const f32 f = 1.0f / a;
const Vec3f s = o - v0;
const f32 u = f * Dot(s, h);
if (u < 0.0f || u > 1.0f) return { 0,0,0,false };
const Vec3f q = Cross(s, e1);
const f32 v = f * Dot(d, q);
if (v < 0.0f || u + v > 1.0f) return { 0,0,0,false };
const f32 t = f * Dot(e2, q);
return { t, u, v, t > EPSILON };
}
This is the inner loop of every CPU ray tracer and the backbone of GPU bounding-volume hierarchies.
7.8 Ray-AABB Intersection (Slab Method)
For an axis-aligned bounding box with min and max corners, treat each pair of opposing faces as a "slab" and intersect the ray with all three slabs:
bool IntersectRayAABB(Vec3f o, Vec3f inv_d,
Vec3f bmin, Vec3f bmax,
f32& t_min_out, f32& t_max_out) noexcept {
const Vec3f t1 = (bmin - o) * inv_d;
const Vec3f t2 = (bmax - o) * inv_d;
const Vec3f tmin = Min(t1, t2);
const Vec3f tmax = Max(t1, t2);
const f32 tn = std::max({ tmin.x, tmin.y, tmin.z, 0.0f });
const f32 tf = std::min({ tmax.x, tmax.y, tmax.z, FLT_MAX });
t_min_out = tn; t_max_out = tf;
return tn <= tf;
}
The trick of pre-computing inv_d = 1.0f / d (with explicit handling of via encoding) eliminates branches in the inner loop. This is the standard BVH traversal primitive.
7.9 Barycentric Coordinates
A point inside triangle can be written as:
The triple is the barycentric coordinate of . They are used for:
- Vertex attribute interpolation in fragment shaders (a hardware feature).
- Triangle membership tests: a point is inside iff all three are non-negative.
- Ray-triangle intersection results (Möller-Trumbore returns above; ).
- Texture sampling for procedural meshes without UV authoring.
The areas are proportional to the coordinates: , etc.
8. Calculus for Graphics Engineers
A working knowledge of single-variable and multivariable calculus is required for shader writing, Monte Carlo techniques, and physically based rendering. We cover the essentials.
8.1 Derivatives
The derivative of a function is the rate of change of with respect to . In graphics it appears most often as:
- Texture LOD selection: the GPU computes (texture coordinate derivatives w.r.t. screen-space pixels) and uses them to select the appropriate mipmap level. Available in shaders as
dFdx,dFdy(GLSL) orddx,ddy(HLSL). - Normal mapping: the tangent-space basis is derived from texture-space partial derivatives.
- Ambient occlusion and SSR: these effects sample neighborhoods whose extent is proportional to derivatives of depth.
8.2 The Gradient
For a scalar field in 3D, the gradient is the vector of partial derivatives:
The gradient points in the direction of steepest increase, with magnitude equal to the maximum rate of change.
In rendering, appears in:
- Signed distance fields: is the surface normal (when on the surface).
- Heightmap-derived normals: .
- Volumetric rendering: density gradients drive single-scattering shading.
8.3 The Jacobian
For a vector function , the Jacobian is the matrix of partial derivatives:
The Jacobian determinant is the local volume scaling factor of the transformation. It is essential in:
- Importance sampling: when changing variables from to , the PDF transforms as .
- Texture warping: anisotropic texture filtering analyzes the projected Jacobian onto the texture plane.
8.4 Integrals and the Fundamental Theorem
The definite integral measures the signed area under . The fundamental theorem of calculus states:
where is an antiderivative of . In graphics, closed-form antiderivatives are rare; we approximate integrals numerically (§9).
The integrals that matter in rendering are over solid angles (§10):
where is the hemisphere above the surface point .
8.5 Solid Angles
A solid angle is the 3D analog of a 2D angle, measured in steradians (sr). The full sphere subtends sr; a hemisphere . In spherical coordinates:
A small surface patch at distance from a point, with normal making angle with the line of sight, subtends:
This falloff is the geometric origin of inverse-square light attenuation.
8.6 Taylor Series
A function can be approximated locally by its Taylor expansion:
In graphics:
- Schlick's Fresnel approximation is a low-order expansion of the full Fresnel equations:
This single expression replaces the costly exact Fresnel and is accurate to within a few percent for most materials [Schlick 1994].
- Tone mapping operators like
x / (x + 1)are approximations of the human visual system's response curve. - Trigonometric polynomial approximations in shaders avoid the slow
sin/cosinstructions on older hardware.
8.7 Recommended Reading
- Spivak, Calculus, 4th ed. The undergraduate gold standard.
- Pharr, Jakob, Humphreys, Physically Based Rendering: From Theory to Implementation, 4th ed. Calculus applied to rendering. (pbr-book.org)
9. Probability, Statistics, and Monte Carlo Methods
Modern photorealistic rendering is statistical: we estimate the value of an integral over light paths by random sampling. This section gives the foundation.
9.1 Random Variables and Distributions
A random variable takes values from a distribution. Its probability density function (PDF) satisfies:
The cumulative distribution function (CDF) runs from 0 to 1.
The expectation of a function of :
The variance:
9.2 Common Distributions in Graphics
- Uniform : on the interval. Generated by RNG.
- Cosine-weighted hemisphere: . Importance sampling for diffuse BRDFs.
- GGX (Trowbridge-Reitz): the dominant microfacet distribution in modern PBR. Used to importance-sample specular BRDFs.
- Multivariate normal: gradient noise, terrain generation.
9.3 Monte Carlo Integration
To estimate , draw samples and compute:
This is unbiased: . The variance scales as:
The standard error therefore decreases as , which is the curse of Monte Carlo: doubling sample count halves the noise's magnitude only by a factor of . To halve noise visibly, the samples.
9.4 Importance Sampling
The art of Monte Carlo is choosing so that has low variance. The ideal is , which gives zero variance (and is the integral itself, hence unrealizable). In practice we sample proportional to a major factor of .
Example: in path tracing, the integrand is . We typically sample one of:
- -weighted hemisphere (good for diffuse)
- BRDF-weighted (good for specular)
- Light-source-weighted (good for direct illumination)
Multiple Importance Sampling (MIS) [Veach 1995] combines several strategies optimally, weighting each sample by:
with (the "power heuristic") performing best in practice.
9.5 Quasi-Monte Carlo and Low-Discrepancy Sequences
Pure pseudo-random sampling clusters; is wasteful. Low-discrepancy sequences (Halton, Sobol, Owen-scrambled Sobol) cover the sample space more uniformly and achieve convergence in dimensions.
The Halton sequence with bases for 2D:
where is the radical inverse:
with the digits of in base . Used in TAA jitter sequences in nearly every modern renderer.
constexpr f32 RadicalInverse(u32 base, u32 i) noexcept {
f32 result = 0.0f;
f32 inv_base = 1.0f / base;
f32 inv_base_n = inv_base;
while (i > 0) {
result += (i % base) * inv_base_n;
i /= base;
inv_base_n *= inv_base;
}
return result;
}
9.6 Reservoir Sampling
For real-time applications where samples cannot be stored, reservoir sampling maintains a single sample summarizing a stream of weighted candidates. This underpins ReSTIR [Bitterli et al. 2020], the dominant real-time path tracing technique:
Reservoir<S>: maintains one sample s with weight W
Update(s_new, w_new):
W += w_new
if rand() < w_new / W: s = s_new
After processing candidates, the reservoir holds a sample distributed proportionally to its weight, with constant memory. ReSTIR adds spatial and temporal resampling of reservoirs to amortize the cost of light selection across pixels and frames.
10. The Rendering Equation and Light Transport
10.1 The Equation
Kajiya's rendering equation [Kajiya 1986] is the master equation of physically based rendering. It states that the outgoing radiance at point in direction equals the emitted radiance plus the integral of incoming radiance reflected through the BRDF:
where:
- : outgoing radiance, the quantity we want
- : emitted radiance (zero for non-emitters)
- : the BRDF, units
- : incoming radiance from direction
- : the cosine projection, accounting for foreshortening
- : the hemisphere above the surface
This is a Fredholm integral equation of the second kind, at one point depends on from elsewhere, hence the iterative/Monte Carlo approach.
10.2 The BRDF
The Bidirectional Reflectance Distribution Function (BRDF) describes how a surface reflects light. It must satisfy:
- Non-negativity:
- Helmholtz reciprocity:
- Energy conservation: for all
The simplest physically valid BRDF is Lambertian:
with albedo . The normalizes so that energy is conserved when integrated over the hemisphere.
10.3 Microfacet Models (Cook-Torrance)
Physically based specular BRDFs use the microfacet model [Cook and Torrance 1982]:
with:
- : the half-vector
- : Normal Distribution Function, the proportion of microfacets oriented to reflect from to
- : Fresnel term, the fraction reflected (vs. transmitted/absorbed)
- : geometric self-shadowing/masking
The GGX (Trowbridge-Reitz) NDF, the modern industry standard:
with roughness (a perceptual remap recommended by [Burley 2012]).
Schlick's Fresnel approximation:
with the reflectance at normal incidence (4% for dielectrics, RGB-tinted for metals).
Smith's geometric term (height-correlated form, used in UE5 and Frostbite):
The full Cook-Torrance microfacet specular, plus a Lambertian diffuse modulated by for energy conservation, is the Disney/Burley BRDF that dominates AAA pipelines [Burley 2012, Karis 2013].
References:
- Karis 2013, Real Shading in Unreal Engine 4
- Burley 2012, Physically-Based Shading at Disney
- Heitz 2014, Understanding the Masking-Shadowing Function
10.4 Image-Based Lighting via Spherical Harmonics
For diffuse environment lighting, integrating the irradiance over the hemisphere is expensive per pixel. Spherical harmonics project directional functions onto an orthonormal basis on the sphere:
For diffuse irradiance, the first three bands (9 SH coefficients) capture nearly all the energy [Ramamoorthi and Hanrahan 2001]. Storage: 9 RGB floats per probe; evaluation: 9 dot products in the shader.
Reference: Ramamoorthi & Hanrahan, An Efficient Representation for Irradiance Environment Maps, SIGGRAPH 2001.
11. Color Theory and Color Spaces
Color is not a number; it is a perceptual response to a power spectrum, mediated by three retinal cone types. The mapping from spectrum to perceived color, and from perceived color to display, is a deeper topic than most engineers realize.
11.1 The CIE 1931 Color Matching Functions
Human color vision is well-modeled by three weighting functions , the CIE 1931 Standard Observer. A spectrum produces tristimulus values:
is luminance, the perceptual brightness. encode chromaticity.
11.2 Linear vs Gamma-Encoded Color
The single most common bug in beginner shaders: confusing linear-light values (where addition and lerp are physically meaningful) with gamma-encoded sRGB values (perceptually-uniform encoding for storage).
The sRGB encoding (approximately gamma 2.2):
with the inverse:
Rule: do all lighting math in linear space; encode to sRGB only when writing to the framebuffer (or use hardware sRGB texture views, which do this automatically).
11.3 HDR and Tone Mapping
Physical light intensities span many orders of magnitude (sun at cd/m², candle at cd/m²). Display devices have limited dynamic range. Tone mapping compresses HDR scene radiance into the displayable range.
The simplest, Reinhard:
Modern AAA pipelines use ACES (Academy Color Encoding System) which provides cinematic film-like response with controlled saturation and toe/shoulder behavior. The full ACES Reference Rendering Transform is a complex matrix and curve sequence; approximations like ACES Filmic [Hill 2017] are typically inlined into the shader:
vec3 ACESFilmic(vec3 x) {
const float a = 2.51;
const float b = 0.03;
const float c = 2.43;
const float d = 0.59;
const float e = 0.14;
return clamp((x * (a*x + b)) / (x * (c*x + d) + e), 0.0, 1.0);
}
References:
11.4 Wide Gamut, HDR Displays, and Rec.2020
Modern HDR-capable monitors and HDR10/Dolby Vision broadcasts use the Rec.2020 color space, with a wider gamut than sRGB. Engines targeting HDR output must:
- Render in linear, scene-referred (HDR) space.
- Apply tone mapping that preserves HDR headroom.
- Encode in PQ (Perceptual Quantizer, SMPTE ST 2084) for HDR10, or HLG for broadcast.
The math of these transforms is documented in the BT.2100 specification.
12. Signal Processing and Anti-Aliasing
Rendering is sampling: we evaluate a continuous image function at discrete pixel locations. Sampling theory dictates what we can faithfully reconstruct and what produces aliasing artifacts (jaggies, moire, shimmering).
12.1 The Nyquist-Shannon Sampling Theorem
A signal containing frequencies up to can be perfectly reconstructed from samples spaced at intervals . The threshold is the Nyquist frequency.
Frequencies above in the original signal alias to lower apparent frequencies in the sampled signal. In rendering this manifests as:
- Jaggies along high-contrast edges (the geometric edge has infinite frequency)
- Moire patterns in textures with fine repeating detail viewed at oblique angles
- Shimmering on specular highlights when the camera moves
12.2 Pre-Filtering: The Only True Solution
The mathematically correct fix is to band-limit the signal before sampling, applying a low-pass filter that removes frequencies above . In practice:
- Texture mipmapping: precompute progressively-filtered versions of textures; the GPU selects the appropriate one based on screen-space derivatives. The mipmap chain is the practical realization of pre-filtering for textures.
- Multi-Sample Anti-Aliasing (MSAA): take multiple sub-pixel samples per pixel and average. Reduces geometric aliasing; does not help with shading aliasing.
- Super-Sampling Anti-Aliasing (SSAA): render at higher resolution and downsample. Brute-force but correct for everything.
- Temporal Anti-Aliasing (TAA): use sub-pixel jitter across frames + temporal accumulation with motion-vector-based reprojection. The current AAA standard.
12.3 Mipmap Generation
A mipmap level is half the resolution of level in each axis. Level is computed by box-filtering (or better: tent, Lanczos, or Kaiser) the source level:
for box filter. Higher-quality filters use larger kernels and address frequencies just below Nyquist.
The total storage overhead of a full mipmap chain is , only 33% extra memory.
12.4 Convolution and Filter Kernels
A convolution of signal with kernel is:
Discrete convolution (on a pixel grid):
Common kernels in graphics:
- Gaussian (separable, smooth, theoretically infinite support):
A 2D Gaussian factors into two 1D passes: . This separability makes Gaussian blur affordable for bloom, depth of field, screen-space ambient occlusion.
- Box filter: average over the kernel. Simple but introduces ringing at high contrasts.
- Sobel/Scharr: derivative kernels for edge detection (used in screen-space normals from depth).
- Bilateral filter: edge-aware blur. Used in denoising path-traced output.
12.5 The Frequency Domain (FFT)
The Fourier transform decomposes a signal into its frequency components:
In graphics:
- Bloom: large-radius blurs are faster in the frequency domain via FFT.
- Convolution theorem: . Convolution becomes multiplication; this is why FFT-based filtering exists.
- Spherical harmonics (§10.4) are the angular Fourier basis on the sphere.
12.6 Reading
- Glassner, Principles of Digital Image Synthesis. The classical signal-processing reference for graphics.
- Pharr et al., PBRT §8 (Sampling and Reconstruction).
13. Numerical Methods and Stability
13.1 Iterative Solvers
Many graphics problems reduce to root-finding . The standard tools:
Newton-Raphson:
Quadratic convergence near a simple root. Used for:
- Sphere tracing in raymarched SDFs (when augmented to handle Lipschitz constraints)
- Inverse kinematics
- Fast inverse square root (Quake III's
0x5f3759dfconstant is one Newton-Raphson step on a clever initial guess)
Bisection: brackets a root in with and halves the interval. Linear convergence but unconditionally robust.
13.2 Numerical Linear Algebra
Solving :
- For small dense (4×4 or smaller): direct Gaussian elimination or LU decomposition.
- For sparse : iterative methods (conjugate gradient, GMRES). Used in cloth simulation, fluid simulation, and global illumination via radiosity.
Eigendecomposition appears in:
- Principal Component Analysis on tangent frames or SH coefficients
- Inertia tensor diagonalization in physics
- BVH construction (longest-axis splits use eigenvectors of the centroid covariance)
Singular Value Decomposition (SVD) :
- Polar decomposition (extracting rotation from a possibly-scaled matrix): where , .
- Best low-rank approximation (Eckart-Young theorem), used in compressed materials and neural shading.
13.3 Conditioning and Stability
A computation is ill-conditioned if small input perturbations produce large output changes. The condition number of a matrix is .
Graphics-specific instability examples:
- Inverting a near-singular matrix: a TBN matrix becomes near-singular when tangent and bitangent collide. Detect via and fall back to a synthesized basis.
- Acos of a value slightly above 1: produces NaN. Always clamp:
std::acos(std::clamp(d, -1.0f, 1.0f)). - Normalizing a near-zero vector: divide-by-zero. Guard with epsilon.
- Quaternion drift: chaining many quaternion products accumulates drift in . Renormalize periodically (every products is safe).
13.4 SIMD and Vectorization
Modern CPUs offer SIMD (Single Instruction, Multiple Data) units that operate on 4-8-16 floats simultaneously: SSE (128-bit, 4×f32), AVX (256-bit, 8×f32), AVX-512 (512-bit, 16×f32), NEON (ARM, 128-bit). On GPUs, every shader thread is implicitly SIMD across 32 (NVIDIA warp) or 64 (AMD wavefront) threads.
A scalar Vec3f doesn't naturally SIMD-vectorize (3 lanes of 4). The two strategies:
- AoS-of-Vec4 padding: store as
Vec4f(12B → 16B), use SIMD intrinsics. Wastes 25%. - SoA (Structure of Arrays): store vectors as three contiguous arrays of floats each. Operate on vectors at once. Used by Embree, OptiX, and modern ray tracers.
GP exposes both patterns; the SoA path is reserved for batch operations on vectors.
13.5 Recommended Reading
- Trefethen and Bau, Numerical Linear Algebra. The graduate reference.
- Goldberg, What Every Computer Scientist Should Know About Floating-Point Arithmetic. (docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html)
14. Spatial Data Structures and Computational Geometry
Real-time queries (visibility, collision, ray tracing) require organizing geometry so that "what is near ?" is faster than scanning all objects.
14.1 Bounding Volume Hierarchies (BVH)
A BVH is a binary tree where each node holds an axis-aligned bounding box (AABB) enclosing all geometry in its subtree. Ray-AABB tests prune entire subtrees. Construction by Surface Area Heuristic (SAH) [MacDonald & Booth 1990]:
with the probability a ray traverses each child (proportional to surface area), the traversal cost, and the intersection cost. The splitting plane minimizing this cost yields good trees in build time.
Production engines now use Linear BVH (LBVH) built from Morton-coded primitives, parallelizable on GPUs in . See our Virtualized Geometry Systems post for the cluster-BVH application.
14.2 KD-Trees
A KD-tree splits along axis-aligned planes, with the split axis cycling per level. Faster traversal than BVH for static scenes; slower to update for dynamic scenes. Largely superseded by BVH in real-time graphics.
14.3 Octrees
3D space recursively subdivided into 8 octants. Used for:
- Sparse Voxel Octrees (SVO): efficient storage of voxelized geometry [Laine and Karras 2010].
- Position-based collision broadphase.
- Cone tracing for global illumination (NVIDIA VXGI).
14.4 Spatial Hashing
For uniformly-sized objects scattered through space, hashing to a fixed-size table provides neighbor queries on average. Used for SPH fluid simulation and broadphase collision in unbounded worlds.
14.5 Convex Hulls and Half-Plane Intersections
The convex hull of a point set is the smallest convex polytope containing all points. Computed in by QuickHull or in 3D by incremental algorithms. Used for:
- Conservative collision shapes from raw meshes
- Frustum construction (intersection of 6 half-planes)
- Convex decomposition for physics (V-HACD)
14.6 Reading
- Ericson, Real-Time Collision Detection (companion site). The graphics-side reference for spatial structures.
- Wald, Boulos, Shirley, Ray Tracing Deformable Scenes Using Dynamic Bounding Volume Hierarchies, TOG 2007.
15. Physics and Animation Mathematics
15.1 Newton's Laws and Numerical Integration
A rigid body's motion is governed by:
In simulation we integrate numerically. The choice of integrator matters:
Forward Euler (not used in production):
Cheap but unstable; energy grows unboundedly for oscillators.
Semi-implicit (symplectic) Euler (used in most physics engines):
Same cost, conserves energy on average. The default for game physics.
Verlet integration (used for cloth, hair, and constraint-based systems):
Position-based; velocity is implicit. Robust under iterative constraint solving (PBD).
Runge-Kutta 4 (RK4) (used in offline solvers): Four-stage method with truncation error. High accuracy at cost; rarely used in games.
15.2 Quaternion Time Derivatives
For an angular velocity (a vector in body or world frame), the rate of change of orientation is:
(quaternion product). Integrating over :
with the quaternion exponential of a pure-imaginary quaternion:
This is the correct way to integrate orientation; first-order with renormalization is a common approximation.
15.3 Inertia Tensor
For a rigid body, the inertia tensor is a 3×3 symmetric positive-definite matrix relating angular velocity to angular momentum:
For a uniform-density box of dimensions and mass :
For a sphere of radius :
The tensor transforms with the body's orientation: . Diagonalizing once at startup yields principal axes along which angular dynamics decouple.
15.4 Skeletal Animation
A skeleton is a tree of bones, each with a local-to-parent transform. The world-space transform of a vertex bound to bone with bind pose is:
where is the current world-space bone transform. With linear blend skinning, a vertex influenced by multiple bones with weights :
LBS is fast (a sum of weighted matrix-vector products) but produces "candy-wrapper" artifacts at sharply twisted joints. Dual-quaternion skinning [Kavan et al. 2007] fixes this at higher cost by interpolating in the dual-quaternion representation that preserves rigid-body length.
References:
16. The GP Math Library: Design Rationale
The GP Engine ships its mathematics module under source/runtime/core/public/maths/. This section enumerates the design choices and their motivations.
16.1 Type Inventory
| Type | Size | Purpose |
|---|---|---|
Vec2f, Vec3f, Vec4f | 8B / 12B / 16B | Floating-point vectors |
Vec2i, Vec3i, Vec4i | 8B / 12B / 16B | Integer vectors (texture coordinates, grid indices) |
Mat2f, Mat3f, Mat4f | 16B / 36B / 64B | Column-major matrices |
Quaternion | 16B | Unit quaternion |
Transform | 48B | TRS tuple |
AABB | 24B | Axis-aligned bounding box |
Sphere | 16B | Bounding sphere |
Plane | 16B | Plane equation |
Ray | 24B | Origin + direction |
Frustum | 96B | Six planes |
Color | 16B | RGBA float |
16.2 Conventions, Enforced by the Type System
- Right-handed, -up world space.
- Column-major matrix storage, column vectors, pre-multiplication: .
- Quaternions are always unit-length when used to represent rotation; the type's invariant is enforced at construction. Non-unit quaternions are a different type (
UnnormalizedQuaternion) used only as an intermediate. - Angles are radians at the API boundary. Conversion utilities
Degrees(rad)andRadians(deg)are explicit.
16.3 The Scalar/SIMD Duality
Every operation has two implementations:
// public/maths/Vec3.hpp
namespace gp::math {
struct alignas(16) Vec3f {
f32 x, y, z;
f32 _pad; // align to 16B for SIMD load efficiency
[[nodiscard]] friend constexpr f32 Dot(const Vec3f& a, const Vec3f& b) noexcept {
return a.x * b.x + a.y * b.y + a.z * b.z;
}
};
} // namespace gp::math
The SIMD specialization lives next door in private/maths/Vec3_SIMD.cpp and is selected at compile time based on GP_HAS_SSE/GP_HAS_NEON macros. The scalar version is the reference: every formula in this paper compiles directly to it.
16.4 Constexpr-Everything
GP's math types are constexpr-friendly. This allows:
- Compile-time matrix construction for static look-up tables.
- Static asserts on basis orthonormality.
- Template metaprogramming for shader uniforms whose layout is determined at compile time.
This was a deliberate departure from GLM (which is mostly non-constexpr) and DirectXMath (whose intrinsics defy constexpr).
16.5 No Implicit Conversions
GP does not allow implicit conversions across handedness, between row-major and column-major, or between unit and unnormalized quaternions. These conversions require explicit calls (ToRowMajor(), Normalize(), FlipHandedness()) so the cost is visible and the conversion is auditable.
16.6 Forward Declarations
Heavy templates (Mat<T, R, C>) are hidden behind a MathForward.hpp header. Most consumers include only the forward declarations and avoid the full template instantiation in their compilation units, dramatically improving compile times.
17. Curriculum Roadmap and Further Study
This paper is the entry point. The recommended progression for a student building toward AAA-level proficiency:
17.1 Foundational (months 1-3)
- Strang, Linear Algebra and Its Applications (or 3Blue1Brown's video series for visual intuition).
- Lengyel, Foundations of Game Engine Development, Vol. 1: Mathematics.
- Implement a software rasterizer from scratch in C++. The Tinyrenderer tutorial is canonical.
17.2 Intermediate (months 4-9)
- Akenine-Möller, Haines, Hoffman, Real-Time Rendering, 4th ed. The standard textbook of the field.
- Pharr, Jakob, Humphreys, Physically Based Rendering, 4th ed. (pbr-book.org)
- Ericson, Real-Time Collision Detection.
- Implement a CPU path tracer (use PBRT as the reference).
17.3 Advanced (year 2+)
- Veach, Robust Monte Carlo Methods for Light Transport Simulation (PhD thesis, the bible of variance reduction).
- Shoemake, Quaternions tutorial.
- SIGGRAPH course notes (annual): Advances in Real-Time Rendering, Physically Based Shading. Available at advances.realtimerendering.com.
- Implement a real-time deferred renderer, then a Vulkan-based path tracer with ReSTIR, then virtualized geometry (see our previous post).
17.4 The GP Curriculum
Future GP educational posts will build on this foundation in roughly this order:
- Practical Linear Algebra: Implementing Vec/Mat/Quat from Scratch (anatomy of
core/public/maths/). - The Rasterization Pipeline: From Triangle to Pixel (deriving the perspective divide).
- PBR From First Principles: Building a Cook-Torrance BRDF (microfacet derivation, code).
- Monte Carlo Integration for Real-Time Path Tracing (importance sampling, MIS, ReSTIR).
- Temporal Anti-Aliasing: The Subpixel Frequency Domain.
- Skeletal Animation and Dual-Quaternion Skinning.
Each builds on the foundations laid here; none introduces a formula not derivable from this paper.
18. References
Foundational Mathematics
- Strang, G. (2016). Introduction to Linear Algebra, 5th ed. Wellesley-Cambridge Press.
- Spivak, M. (2008). Calculus, 4th ed. Publish or Perish.
- Trefethen, L. N. & Bau, D. (1997). Numerical Linear Algebra. SIAM.
- Goldberg, D. (1991). What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys 23(1). Available: docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
- Sanderson, G. (3Blue1Brown). Essence of Linear Algebra. Video series. Available: youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab
Graphics-Oriented Math
- Lengyel, E. (2016). Foundations of Game Engine Development, Volume 1: Mathematics. Terathon. Available: foundationsofgameenginedev.com
- Lengyel, E. (2019). Foundations of Game Engine Development, Volume 2: Rendering. Terathon.
- Akenine-Möller, T., Haines, E., Hoffman, N., Pesce, A., Iwanicki, M., & Hillaire, S. (2018). Real-Time Rendering, 4th ed. CRC Press. Errata and supplements: realtimerendering.com
- Dunn, F. & Parberry, I. (2011). 3D Math Primer for Graphics and Game Development, 2nd ed. CRC Press.
Quaternions and Rotations
- Shoemake, K. (1985). Animating rotation with quaternion curves. SIGGRAPH '85, 245-254.
- Sommer, H., Gilitschenski, I., Bloesch, M., et al. (2018). Why and how to avoid the flipped quaternion multiplication. Aerospace 5(3), 72. arxiv.org/abs/1801.07478
- Hanson, A. J. (2006). Visualizing Quaternions. Morgan Kaufmann.
Geometry and Intersection
- Möller, T. & Trumbore, B. (1997). Fast, minimum storage ray-triangle intersection. Journal of Graphics Tools 2(1), 21-28.
- Ericson, C. (2005). Real-Time Collision Detection. Morgan Kaufmann. Companion site: realtimecollisiondetection.net
- MacDonald, J. D. & Booth, K. S. (1990). Heuristics for ray tracing using space subdivision. The Visual Computer 6(3), 153-166.
Physically Based Rendering
- Kajiya, J. T. (1986). The rendering equation. SIGGRAPH '86, 143-150.
- Cook, R. L. & Torrance, K. E. (1982). A reflectance model for computer graphics. ACM Transactions on Graphics 1(1), 7-24.
- Schlick, C. (1994). An inexpensive BRDF model for physically-based rendering. Computer Graphics Forum 13(3), 233-246.
- Burley, B. (2012). Physically-based shading at Disney. SIGGRAPH 2012 Course Notes. Available: disneyanimation.com/publications/physically-based-shading-at-disney/
- Karis, B. (2013). Real shading in Unreal Engine 4. SIGGRAPH 2013 Course Notes. Available: blog.selfshadow.com/publications/s2013-shading-course/karis/s2013_pbs_epic_notes_v2.pdf
- Heitz, E. (2014). Understanding the masking-shadowing function in microfacet-based BRDFs. Journal of Computer Graphics Techniques 3(2), 48-107. jcgt.org/published/0003/02/03/
- Pharr, M., Jakob, W. & Humphreys, G. (2023). Physically Based Rendering: From Theory to Implementation, 4th ed. MIT Press. Full book online: pbr-book.org
- Veach, E. (1997). Robust Monte Carlo methods for light transport simulation. PhD thesis, Stanford University.
- Veach, E. & Guibas, L. J. (1995). Optimally combining sampling techniques for Monte Carlo rendering. SIGGRAPH '95.
Sampling and Real-Time Path Tracing
- Bitterli, B., Wyman, C., Pharr, M., Shirley, P., Lefohn, A. & Jarosz, W. (2020). Spatiotemporal reservoir resampling for real-time ray tracing with dynamic direct lighting. SIGGRAPH 2020. Available: research.nvidia.com/publication/2020-07_Spatiotemporal-reservoir-resampling
- Halton, J. H. (1964). Algorithm 247: Radical-inverse quasi-random point sequence. Communications of the ACM 7(12), 701-702.
- Owen, A. B. (1995). Randomly permuted (t, m, s)-nets and (t, s)-sequences. In: Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing.
Color and Tone Mapping
- CIE (1931). Commission Internationale de l'Éclairage Proceedings, 1931. Cambridge University Press.
- Reinhard, E., Stark, M., Shirley, P. & Ferwerda, J. (2002). Photographic tone reproduction for digital images. ACM Transactions on Graphics 21(3).
- Narkowicz, K. (2016). ACES Filmic Tone Mapping Curve (approximation). Available: knarkowicz.wordpress.com/2016/01/06/aces-filmic-tone-mapping-curve/
- Academy of Motion Picture Arts and Sciences. Academy Color Encoding System (ACES). Available: acescentral.com
- ITU-R Recommendation BT.2100. Image parameter values for high dynamic range television. Available: itu.int/rec/R-REC-BT.2100/en
Signal Processing
- Glassner, A. S. (1995). Principles of Digital Image Synthesis. Morgan Kaufmann.
- Mitchell, D. P. & Netravali, A. N. (1988). Reconstruction filters in computer graphics. SIGGRAPH '88.
- Smith, S. W. (1997). The Scientist and Engineer's Guide to Digital Signal Processing. California Technical Publishing. Available: dspguide.com
Spatial Data Structures
- Karras, T. (2012). Maximizing parallelism in the construction of BVHs, octrees, and k-d trees. High-Performance Graphics 2012.
- Laine, S. & Karras, T. (2010). Efficient sparse voxel octrees. I3D 2010.
- Wald, I., Boulos, S. & Shirley, P. (2007). Ray tracing deformable scenes using dynamic bounding volume hierarchies. ACM Transactions on Graphics 26(1).
Animation and Physics
- Kavan, L., Collins, S., Žára, J. & O'Sullivan, C. (2007). Skinning with dual quaternions. I3D 2007. Available: cs.utah.edu/~ladislav/kavan07skinning/kavan07skinning.pdf
- Müller, M., Heidelberger, B., Hennix, M. & Ratcliff, J. (2007). Position based dynamics. Journal of Visual Communication and Image Representation 18(2).
- Catto, E. (2015). Numerical methods. GDC 2015. Available: box2d.org/files/ErinCatto_NumericalMethods_GDC2015.pdf
Spherical Harmonics and Image-Based Lighting
- Ramamoorthi, R. & Hanrahan, P. (2001). An efficient representation for irradiance environment maps. SIGGRAPH 2001. Available: graphics.stanford.edu/papers/envmap/envmap.pdf
- Sloan, P.-P. (2008). Stupid spherical harmonics tricks. GDC 2008.
Engine Math Library Engineering
- Reed, N. (2015). Depth precision visualized. NVIDIA Developer Blog. Available: developer.nvidia.com/content/depth-precision-visualized
- Lottes, T. (2009). FXAA. NVIDIA Whitepaper.
- Sucker Punch / Naughty Dog GDC presentations on world-scale precision (various years). Aggregated at advances.realtimerendering.com.
Tutorials
- Sokolov, D. Tinyrenderer: How OpenGL works, software rendering in 500 lines of code. Tutorial. Available: github.com/ssloy/tinyrenderer
- De Vries, J. Learn OpenGL. Tutorial. Available: learnopengl.com
GP Engine Companion Posts
- Scotton, M. (2026). Virtualized Geometry Systems: Architecture, Theory, and Implementation. GP Engineering Reference Series. (/blog/virtualized-geometry-systems)
- Scotton, M. (2026). Engine Architecture and Directory Layout. GP Engineering Reference Series. (/blog/engine-architecture-layout)
Document prepared as part of the GP SDK Engineering Reference Series. Version 1.0, Principal Systems Engineer, Graphical Playground SDK.
