Skip to main content

Basic Mathematical Knowledge for Graphical Engineering

· 63 min read
Mallory Scotton
Founder of Graphical Playground

A Principal Engineer's Foundational Guide to the Mathematics Underpinning Real-Time Rendering


"You cannot reason about a renderer you cannot derive. Every rendering technique, from a single rasterized triangle to a full path-traced reservoir resampling pipeline, is the application of three centuries of mathematics to the problem of moving photons from a virtual world into a human eye. Learn the math, and the engine becomes obvious."


Abstract

Real-time computer graphics is, at its core, applied mathematics with a deadline. Every frame, the GPU executes billions of arithmetic operations whose correctness rests on a small set of foundational mathematical structures: linear algebra, projective geometry, calculus, probability, and signal processing. Yet the modern engine engineer is rarely taught these subjects in a unified, graphics-oriented form. Textbooks treat them in isolation; tutorials skip the derivations; production codebases hide them behind opaque utility functions. This paper attempts to close the gap. We present a comprehensive, self-contained survey of the mathematical knowledge required to build, debug, and reason about a modern AAA renderer, from the first vector dot product up to the rendering equation, Monte Carlo importance sampling, quaternion blending, and the numerical stability concerns that haunt every shader. We motivate each topic with its concrete graphical application, derive the formulas from first principles where it aids understanding, and provide reference C++23 implementations consistent with the conventions of the Graphical Playground (GP) Engine's in-house mathematics library. This document is intended both as a study reference for engineers entering the field and as the foundational curriculum on which all subsequent GP educational material will build.

Keywords: linear algebra, quaternions, projective geometry, rendering equation, BRDF, Monte Carlo integration, signal processing, color spaces, numerical stability, SIMD, real-time rendering, GP Engine


Table of Contents

  1. Introduction and Motivation
  2. Number Systems and Floating-Point Reality
  3. Linear Algebra: The Lingua Franca of Graphics
  4. Coordinate Systems and the Transformation Pipeline
  5. Affine and Projective Transformations
  6. Rotations: From Euler Angles to Quaternions
  7. Trigonometry and Analytic Geometry
  8. Calculus for Graphics Engineers
  9. Probability, Statistics, and Monte Carlo Methods
  10. The Rendering Equation and Light Transport
  11. Color Theory and Color Spaces
  12. Signal Processing and Anti-Aliasing
  13. Numerical Methods and Stability
  14. Spatial Data Structures and Computational Geometry
  15. Physics and Animation Mathematics
  16. The GP Math Library: Design Rationale
  17. Curriculum Roadmap and Further Study
  18. References

1. Introduction and Motivation

1.1 Why a Graphics Engineer Cannot Outsource the Math

Modern engines bundle mathematical utilities into libraries: GLM for OpenGL-style C++, DirectXMath for D3D, Eigen for the academic crowd, Unreal's FMath and FVector/FMatrix, Unity's Vector3 and Quaternion, and so on. A casual reader of these libraries might conclude that the math is solved, that one need only call glm::lookAt() and the framework will produce the correct view matrix.

This is dangerously incomplete. The libraries are correct, but they are also opinionated. Each one bakes in conventions, row-major versus column-major, right-handed versus left-handed, depth in [0,1][0, 1] versus [1,1][-1, 1], pre-multiplication versus post-multiplication, that are invisible to the caller but catastrophic when crossed. A glm::lookAt matrix multiplied with a DirectX-convention projection matrix produces a scene that renders inside-out, with depth test inverted, and where culling is upside down. The error is a one-character convention mismatch; finding it without understanding the underlying math takes days.

More fundamentally, every nontrivial rendering technique requires the engineer to derive new math, not just apply old. Implementing screen-space reflections requires deriving the reflection vector from the partial derivatives of the depth buffer. Implementing physically based shading requires deriving the half-vector formulation of microfacet BRDFs and choosing between Smith and V-cavity shadowing. Implementing temporal anti-aliasing requires deriving sub-pixel jitter sequences from low-discrepancy number theory. None of these are library calls. All of them rest on the foundations this paper covers.

1.2 Why GP Writes Its Own Math Library

The Graphical Playground engine ships its own mathematics module, source/runtime/core/public/maths/ (referenced in the Engine Architecture post). We did not adopt GLM, Eigen, or DirectXMath. The reasons are pedagogical first and engineering second:

  1. Transparency. Every formula in GP's math library is derivable by a student reading the source. There is no hidden SIMD intrinsic that obscures the mathematical operation; the SIMD specialization sits next to the scalar reference implementation.
  2. Convention discipline. GP standardizes on right-handed, column-major, zz-up world space, yy-up tangent space, [0,1][0, 1] depth, pre-multiplication: v=Mv\mathbf{v}' = M \mathbf{v}. These conventions are documented in the public headers and enforced by the type system, no implicit conversions across handedness.
  3. Educational value. A student reading our quaternion.hpp learns how slerp works, not just that it exists. Comments cite the original papers; tests prove the algebraic identities.
  4. Performance autonomy. When the time comes to vectorize for AVX-512 or NEON, we are not blocked on a third-party release cadence.

This paper is written from inside that library: every formula presented can be cross-referenced to a function in core/public/maths/, and every function in core/public/maths/ derives from a formula presented here.

1.3 Reading Order

The paper is organized in dependency order: section nn requires only sections 11 through n1n-1. A reader new to graphics should read linearly. A reader with prior background can skip §2-§3 and jump to §6 (quaternions), §10 (rendering equation), or §12 (signal processing) as needed. Every section closes with a short list of canonical references (collated in §18) for deeper study.


2. Number Systems and Floating-Point Reality

2.1 IEEE 754 in Three Sentences

A 32-bit single-precision float (float / f32) stores a number as

x=(1)s(1.m22m21m0)22e127x = (-1)^s \cdot (1.m_{22}m_{21}\ldots m_0)_2 \cdot 2^{e - 127}

where ss is the sign bit, mm is a 23-bit mantissa, and ee is an 8-bit biased exponent. This buys roughly 7 decimal digits of precision spanning ±3.4×1038\pm 3.4 \times 10^{38}, with a smallest normal value of 1.18×1038\approx 1.18 \times 10^{-38}.

A 64-bit double (double / f64) extends mantissa to 52 bits and exponent to 11 bits, yielding roughly 15-16 decimal digits spanning ±1.8×10308\pm 1.8 \times 10^{308}.

Half precision (f16, used in HDR textures and tensor cores) has 10 mantissa bits and 5 exponent bits, yielding 3\approx 3 decimal digits and a max value of 65504\approx 65504.

2.2 Why This Matters: The Open World Origin Problem

A position vector stored in f32 has different precision at different magnitudes. The gap between consecutive representable floats near x=1|x| = 1 is 1.19×107\approx 1.19 \times 10^{-7} (one ULP, unit in the last place). At x=104|x| = 10^4 (10 km from the world origin) it is 9.77×104\approx 9.77 \times 10^{-4}, almost a millimeter. At x=106|x| = 10^6 (1000 km) it is 0.12\approx 0.12 meters.

This is why open-world games like Star Citizen and No Man's Sky implement camera-relative rendering (also called origin shifting or floating origin): they translate the world so that the camera is always at 0\mathbf{0}, performing the subtraction in f64 on the CPU before feeding f32 positions to the GPU. The math:

prel=pworldf64ccameraf64,cast to f32\mathbf{p}_{rel} = \mathbf{p}_{world}^{f64} - \mathbf{c}_{camera}^{f64}, \quad \text{cast to } f32

without this trick, vertices wobble visibly past a few kilometers from origin.

2.3 The Cancellation Trap

Subtraction of nearly equal floats catastrophically loses precision:

1.0000001f321.0f32=1.19×1071.0000001_{f32} - 1.0_{f32} = 1.19 \times 10^{-7}

with only 1 significant bit remaining. This drives a number of graphics-specific reformulations:

  • Quadratic formula: the textbook x=b±b24ac2ax = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} catastrophically cancels when b24acb^2 \gg 4ac. The numerically stable form selects the sign:

x1=bsgn(b)b24ac2a,x2=cax1x_1 = \frac{-b - \text{sgn}(b)\sqrt{b^2 - 4ac}}{2a}, \quad x_2 = \frac{c}{a x_1}

This appears in every ray-sphere intersection routine.

  • Depth comparison: comparing two f32 depths is unreliable when both are near 1.0. Reverse-Z (storing depth as 1z1 - z) shifts precision back to the far plane where it is needed; it is now standard in AAA engines [Reed 2015].

2.4 Fixed-Point and Integer Math

Not everything is a float. Texture coordinates in GPU samplers use 8.8 fixed-point for sub-texel addressing. Color buffers use 8-bit unsigned normalized (unorm8): the byte bb represents the value b/255b / 255. Mip-level computation uses fixed-point in hardware. The unorm decode is exactly:

xdecoded=b2n1,b[0,2n1]x_{decoded} = \frac{b}{2^n - 1}, \quad b \in [0, 2^n - 1]

Note the divisor is 255255, not 256256, a subtlety that catches students implementing custom decoders.

2.5 Practical Rules

  • Use f32 for vertex positions, UVs, normals; f64 only on the CPU for world-scale arithmetic.
  • Never compare floats with ==. Use |a - b| < ε with ε\varepsilon chosen relative to the magnitude (relative tolerance) or as a fixed ulp count.
  • Beware NaN propagation: a single uninitialized normal turns the entire shader output to black. Initialize.
  • Denormals (subnormals) are slow on many CPUs; flush-to-zero is set by default in shader compilers and recommended on the CPU side too.

3. Linear Algebra: The Lingua Franca of Graphics

Every graphical operation, rotating a model, projecting it onto a screen, lighting it, transforming a normal, blending two animation poses, is expressible as linear algebra over real-valued vectors. We cover the operations that appear most frequently in rendering code.

3.1 Vectors

A vector vRn\mathbf{v} \in \mathbb{R}^n is an ordered nn-tuple of real numbers. In graphics, n{2,3,4}n \in \{2, 3, 4\} almost without exception. We write column vectors:

v=(vxvyvz)\mathbf{v} = \begin{pmatrix} v_x \\ v_y \\ v_z \end{pmatrix}

Magnitude (length):

v=vx2+vy2+vz2\|\mathbf{v}\| = \sqrt{v_x^2 + v_y^2 + v_z^2}

Normalization (producing a unit vector v^\hat{\mathbf{v}}):

v^=vv\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}

If v=0\|\mathbf{v}\| = 0, normalization is undefined; production code checks for this with a small epsilon to avoid NaN.

3.2 The Dot Product

Given a,bR3\mathbf{a}, \mathbf{b} \in \mathbb{R}^3:

ab=axbx+ayby+azbz=abcosθ\mathbf{a} \cdot \mathbf{b} = a_x b_x + a_y b_y + a_z b_z = \|\mathbf{a}\| \|\mathbf{b}\| \cos\theta

The dot product is the most-used scalar in all of rendering. It answers four questions simultaneously:

QuestionFormulaUsed In
Are these vectors aligned?a^b^\hat{\mathbf{a}} \cdot \hat{\mathbf{b}}Lambertian shading, backface culling
What is the projection of a\mathbf{a} onto b\mathbf{b}?abb2b\frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{b}\|^2} \mathbf{b}Reflection, shadow projection
What angle separates them?arccos(a^b^)\arccos(\hat{\mathbf{a}} \cdot \hat{\mathbf{b}})Cone tests, anisotropic lighting
Are they perpendicular?ab=0\mathbf{a} \cdot \mathbf{b} = 0Plane equations, basis construction

The single most common pattern in shading: max(NL,0)\max(\mathbf{N} \cdot \mathbf{L}, 0), the Lambertian cosine term, where N\mathbf{N} is the surface normal and L\mathbf{L} is the direction to the light. The clamp to zero discards back-facing light.

// GP-style scalar reference implementation
constexpr f32 dot(const Vec3f& a, const Vec3f& b) noexcept {
return a.x * b.x + a.y * b.y + a.z * b.z;
}

3.3 The Cross Product

For a,bR3\mathbf{a}, \mathbf{b} \in \mathbb{R}^3:

a×b=(aybzazbyazbxaxbzaxbyaybx)\mathbf{a} \times \mathbf{b} = \begin{pmatrix} a_y b_z - a_z b_y \\ a_z b_x - a_x b_z \\ a_x b_y - a_y b_x \end{pmatrix}

The result is a vector perpendicular to both inputs, with magnitude absinθ\|\mathbf{a}\| \|\mathbf{b}\| \sin\theta (the area of the parallelogram they span). The direction follows the right-hand rule.

Used for:

  • Computing face normals from triangle edges: N=(v1v0)×(v2v0)\mathbf{N} = (\mathbf{v}_1 - \mathbf{v}_0) \times (\mathbf{v}_2 - \mathbf{v}_0).
  • Building orthonormal bases: given a normal N\mathbf{N} and a tangent T\mathbf{T}, the bitangent is B=N×T\mathbf{B} = \mathbf{N} \times \mathbf{T}.
  • Triangle area: A=12(v1v0)×(v2v0)A = \frac{1}{2} \|(\mathbf{v}_1 - \mathbf{v}_0) \times (\mathbf{v}_2 - \mathbf{v}_0)\|.
  • Front/back face determination: the sign of (v1v0)×(v2v0)V(\mathbf{v}_1 - \mathbf{v}_0) \times (\mathbf{v}_2 - \mathbf{v}_0) \cdot \mathbf{V} where V\mathbf{V} is the view direction.

3.4 Vector Operations Summary

OperationFormulaResultCost
Adda+b\mathbf{a} + \mathbf{b}Vec33 add
Scalesas \mathbf{a}Vec33 mul
Dotab\mathbf{a} \cdot \mathbf{b}scalar3 mul + 2 add
Crossa×b\mathbf{a} \times \mathbf{b}Vec36 mul + 3 sub
Length squaredaa\mathbf{a} \cdot \mathbf{a}scalar3 mul + 2 add
Lengthaa\sqrt{\mathbf{a} \cdot \mathbf{a}}scalar+ 1 sqrt
Normalizea/a\mathbf{a} / \|\mathbf{a}\|Vec3+ 1 rsqrt + 3 mul
Linear interp(1t)a+tb(1-t)\mathbf{a} + t\mathbf{b}Vec36 mul + 3 add

Lerp is so central that it deserves its own line:

lerp(a,b,t)=a+t(ba)=(1t)a+tb\text{lerp}(\mathbf{a}, \mathbf{b}, t) = \mathbf{a} + t(\mathbf{b} - \mathbf{a}) = (1-t)\mathbf{a} + t\mathbf{b}

The first form (single tt multiply) is preferred when a\mathbf{a} should be exactly returned at t=0t=0; the second form is symmetric in a\mathbf{a} and b\mathbf{b} but multiplies twice. Both forms exist in production code; choose deliberately.

3.5 Matrices

A matrix MRm×nM \in \mathbb{R}^{m \times n} is a rectangular array. In graphics we use 2×22 \times 2, 3×33 \times 3, and 4×44 \times 4. The fundamental operation is matrix-vector multiplication:

(Mv)i=j=1nMijvj(M \mathbf{v})_i = \sum_{j=1}^{n} M_{ij} v_j

For a 3×33 \times 3 matrix:

Mv=(m00m01m02m10m11m12m20m21m22)(vxvyvz)=(m00vx+m01vy+m02vzm10vx+m11vy+m12vzm20vx+m21vy+m22vz)M \mathbf{v} = \begin{pmatrix} m_{00} & m_{01} & m_{02} \\ m_{10} & m_{11} & m_{12} \\ m_{20} & m_{21} & m_{22} \end{pmatrix} \begin{pmatrix} v_x \\ v_y \\ v_z \end{pmatrix} = \begin{pmatrix} m_{00} v_x + m_{01} v_y + m_{02} v_z \\ m_{10} v_x + m_{11} v_y + m_{12} v_z \\ m_{20} v_x + m_{21} v_y + m_{22} v_z \end{pmatrix}

Matrix-matrix multiplication (AB)ij=kAikBkj(AB)_{ij} = \sum_k A_{ik} B_{kj}. For 4×44 \times 4 matrices this is 64 multiplies and 48 adds, the unit operation of all transformation pipelines.

3.6 Memory Layout: Row-Major vs Column-Major

The same matrix can be stored two ways in memory:

Row-major (C, C++, DirectX convention):

[ m00 m01 m02 m03 | m10 m11 m12 m13 | m20 m21 m22 m23 | m30 m31 m32 m33 ]

Column-major (Fortran, OpenGL, Metal, GLSL convention):

[ m00 m10 m20 m30 | m01 m11 m21 m31 | m02 m12 m22 m32 | m03 m13 m23 m33 ]

The mathematics is unaffected; the memory layout is a storage decision. The peril is mismatch: passing a row-major matrix to a column-major shader transposes it implicitly, and a transposed transformation is its inverse for orthonormal matrices, exactly the wrong answer.

GP standardizes on column-major storage (matching the GLSL / SPIR-V default) and column vectors with pre-multiplication: v=Mv\mathbf{v}' = M\mathbf{v}. Composing transforms reads right-to-left:

vclip=MprojMviewMmodelvlocal\mathbf{v}_{clip} = M_{proj} \cdot M_{view} \cdot M_{model} \cdot \mathbf{v}_{local}

3.7 Special Matrices

Identity matrix II: ones on diagonal, zeros elsewhere. IM=MI=MIM = MI = M and Iv=vI\mathbf{v} = \mathbf{v}.

Transpose MTM^T: swap rows and columns: (MT)ij=Mji(M^T)_{ij} = M_{ji}. For a product: (AB)T=BTAT(AB)^T = B^T A^T.

Inverse M1M^{-1}: the matrix such that MM1=M1M=IMM^{-1} = M^{-1}M = I. Exists if and only if detM0\det M \neq 0. For a 2×22 \times 2 matrix (abcd)\begin{pmatrix}a & b \\ c & d\end{pmatrix}:

M1=1adbc(dbca)M^{-1} = \frac{1}{ad - bc}\begin{pmatrix}d & -b \\ -c & a\end{pmatrix}

For larger matrices, the closed-form via cofactor expansion is impractical; LU decomposition or specialized inverse formulas are used. Crucial graphics fact: for an orthonormal matrix (pure rotation), the inverse is the transpose:

R1=RTwhen RTR=IR^{-1} = R^T \quad \text{when } R^T R = I

This is why view matrices are cheap to invert: they are translation + rotation, and we invert each piece separately.

Determinant measures signed volume change. For a 3×33 \times 3 matrix:

detM=m00(m11m22m12m21)m01(m10m22m12m20)+m02(m10m21m11m20)\det M = m_{00}(m_{11}m_{22} - m_{12}m_{21}) - m_{01}(m_{10}m_{22} - m_{12}m_{20}) + m_{02}(m_{10}m_{21} - m_{11}m_{20})

A negative determinant indicates a handedness flip, and for normals this means winding order inversion. Engines use this to detect mirrored skeletons.

3.8 The Normal Matrix

A subtle but critical distinction: when transforming a vertex position by matrix MM, the corresponding normal is not transformed by MM but by:

Mnormal=(M1)TM_{normal} = (M^{-1})^T

This is because normals are covectors (one-forms): they transform inversely so that the dot-product with tangent vectors stays invariant. If MM contains only rotation and uniform scale, (M1)T=M(M^{-1})^T = M up to the scale factor and we can use MM directly. Under non-uniform scale, using MM for normals warps lighting visibly; the inverse-transpose is mandatory.

Reference: PBR Book §3.10 (Applying Transformations: Normals)

3.9 Useful Reading


4. Coordinate Systems and the Transformation Pipeline

4.1 The Cardinal Spaces

A vertex moves through five canonical coordinate spaces between the artist's modeling tool and the user's monitor:

LOCAL (object) ──[M_model]──> WORLD ──[M_view]──> VIEW (camera) ──[M_proj]──> CLIP ──[/w]──> NDC ──[viewport]──> SCREEN
SpaceOriginUsed For
Local / ObjectMesh authoring originStoring vertex data
WorldGame world originPhysics, gameplay queries
View / Camera / EyeThe cameraLighting (sometimes), screen-space effects
ClipOrigin shifted by projectionFrustum clipping (hardware)
NDC (Normalized Device Coordinates)Center of cube [1,1]2×[0,1][-1,1]^2 \times [0,1]Hardware-internal rasterization
Screen / WindowPixel gridFinal output

Each transition is a matrix multiply (or, in the clip→NDC step, a perspective divide: pNDC=pclip.xyz/pclip.w\mathbf{p}_{NDC} = \mathbf{p}_{clip}.xyz / \mathbf{p}_{clip}.w).

4.2 Handedness

A coordinate system is right-handed if the cross product x×y=z\mathbf{x} \times \mathbf{y} = \mathbf{z} when measured with the right hand. It is left-handed if measured with the left.

The major conventions in industry:

SystemForwardUpRightNotes
OpenGL / Vulkan worldZ-Z+Y+Y+X+XRight-handed
OpenGL clipZ-Z+Y+Y+X+XRight-handed, z[1,1]z \in [-1,1]
Vulkan clip+Z+ZY-Y+X+XRight-handed, z[0,1]z \in [0,1]
DirectX+Z+Z+Y+Y+X+XLeft-handed, z[0,1]z \in [0,1]
Unreal Engine+X+X+Z+Z+Y+YLeft-handed, zz-up
Unity+Z+Z+Y+Y+X+XLeft-handed
Blender / GP worldY-Y or +Y+Y+Z+Z+X+XRight-handed, zz-up

GP follows Blender / scientific convention: right-handed, zz-up world space. We acknowledge this is the minority among game engines; the choice is motivated by physical intuition (gravity along z-z) and interoperability with engineering and CAD pipelines.

4.3 The View Matrix

The view matrix transforms world-space coordinates into camera-space coordinates: it is the inverse of the camera's world transform.

Given a camera at world position c\mathbf{c}, looking at target t\mathbf{t}, with up reference uref\mathbf{u}_{ref} (typically world up):

f=tctc(forward)\mathbf{f} = \frac{\mathbf{t} - \mathbf{c}}{\|\mathbf{t} - \mathbf{c}\|} \quad (\text{forward})

r=f×ureff×uref(right)\mathbf{r} = \frac{\mathbf{f} \times \mathbf{u}_{ref}}{\|\mathbf{f} \times \mathbf{u}_{ref}\|} \quad (\text{right})

u=r×f(actual up)\mathbf{u} = \mathbf{r} \times \mathbf{f} \quad (\text{actual up})

For a right-handed system with camera looking down zcam-z_{cam}:

Mview=(rxryrzrcuxuyuzucfxfyfzfc0001)M_{view} = \begin{pmatrix} r_x & r_y & r_z & -\mathbf{r} \cdot \mathbf{c} \\ u_x & u_y & u_z & -\mathbf{u} \cdot \mathbf{c} \\ -f_x & -f_y & -f_z & \mathbf{f} \cdot \mathbf{c} \\ 0 & 0 & 0 & 1 \end{pmatrix}

The top-left 3×33 \times 3 is the rotation (an orthonormal basis); the top-right column is the camera's negated position projected onto each basis vector; the bottom row is homogeneous.

4.4 The Projection Matrix

There are two common projections.

Perspective projection (the eye sees a frustum):

Mprojpersp=(1atan(θ/2)00001tan(θ/2)0000ffnnffn0010)M_{proj}^{persp} = \begin{pmatrix} \frac{1}{a \tan(\theta/2)} & 0 & 0 & 0 \\ 0 & \frac{1}{\tan(\theta/2)} & 0 & 0 \\ 0 & 0 & \frac{f}{f-n} & -\frac{nf}{f-n} \\ 0 & 0 & 1 & 0 \end{pmatrix}

(left-handed, z[0,1]z \in [0, 1] convention; Vulkan / DirectX style)

where θ\theta is the vertical field of view, a=W/Ha = W/H is the aspect ratio, nn is the near plane and ff is the far plane.

The bottom row is (0,0,1,0)(0, 0, 1, 0), meaning wclip=zvieww_{clip} = z_{view}. The subsequent perspective divide pNDC=pclip/wclip\mathbf{p}_{NDC} = \mathbf{p}_{clip} / w_{clip} scales xx and yy inversely with depth, producing the foreshortening that defines a perspective image.

Orthographic projection (parallel rays, used for shadow maps and 2D):

Mprojortho=(2rl00r+lrl02tb0t+btb001fnnfn0001)M_{proj}^{ortho} = \begin{pmatrix} \frac{2}{r-l} & 0 & 0 & -\frac{r+l}{r-l} \\ 0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\ 0 & 0 & \frac{1}{f-n} & -\frac{n}{f-n} \\ 0 & 0 & 0 & 1 \end{pmatrix}

where l,r,t,b,n,fl, r, t, b, n, f are the extents of the view volume. The bottom row is (0,0,0,1)(0,0,0,1), so wclip=1w_{clip} = 1; no perspective divide effect.

4.5 Reverse-Z Depth

The standard z[0,1]z \in [0, 1] projection wastes precision: with f32 depth and far/near ratios common in games (1000:1), 99% of the depth buffer's distinguishable values fall within the first 10% of the view distance, leaving the far field with severe z-fighting.

Reverse-Z swaps the mapping: the near plane maps to z=1z = 1 and the far plane to z=0z = 0. This is a one-line change in the projection matrix and an inverted depth comparison (GREATER instead of LESS). The win comes from the multiplicative interaction of f32's exponent representation with the projection's hyperbolic depth, which now concentrates precision near the far plane where it is needed [Reed 2015]. GP enables Reverse-Z by default.

Reference: NVIDIA Developer Blog: Visualizing the Z-Buffer's Depth Distribution


5. Affine and Projective Transformations

5.1 The Homogeneous Coordinate Trick

A pure 3×33 \times 3 matrix can rotate and scale, but it cannot translate, M0=0M\mathbf{0} = \mathbf{0} for any linear MM. To unify translation with rotation/scale, we extend R3\mathbb{R}^3 to R4\mathbb{R}^4 with a fourth coordinate ww:

v(vx,vy,vz,1)T(point)\mathbf{v} \mapsto (v_x, v_y, v_z, 1)^T \quad \text{(point)} v(vx,vy,vz,0)T(direction)\mathbf{v} \mapsto (v_x, v_y, v_z, 0)^T \quad \text{(direction)}

A direction has w=0w = 0 so translation doesn't affect it; a point has w=1w = 1 so translation does. The translation matrix is:

T(t)=(100tx010ty001tz0001)T(\mathbf{t}) = \begin{pmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{pmatrix}

Multiplying gives T(t)p=(p+t,1)T(\mathbf{t}) \mathbf{p} = (\mathbf{p} + \mathbf{t}, 1) for points, and T(t)d=(d,0)T(\mathbf{t}) \mathbf{d} = (\mathbf{d}, 0) for directions, exactly the desired behavior.

5.2 The Five Affine Building Blocks

Every affine transformation is a composition of these:

Translation T(t)T(\mathbf{t}):

(100tx010ty001tz0001)\begin{pmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{pmatrix}

Uniform scale S(s)S(s) and non-uniform scale S(sx,sy,sz)S(s_x, s_y, s_z):

S=(sx0000sy0000sz00001)S = \begin{pmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & s_z & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}

Rotation about axis a\mathbf{a} by angle θ\theta (Rodrigues' formula, derived in §6).

Shear (rare in games, common in 2D UI):

Hxy=(1h00010000100001)H_{xy} = \begin{pmatrix} 1 & h & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}

Reflection about a plane (a sign flip in scale combined with a rotation).

5.3 TRS Composition

The conventional model matrix is the composition M=TRSM = T \cdot R \cdot S:

Mmodel=T(t)R(q)S(s)M_{model} = T(\mathbf{t}) \cdot R(\mathbf{q}) \cdot S(\mathbf{s})

Reading right-to-left: first scale the local mesh, then rotate, then translate to its world position. The composed matrix has the structure:

Mmodel=(sxR00syR01szR02txsxR10syR11szR12tysxR20syR21szR22tz0001)M_{model} = \begin{pmatrix} s_x R_{00} & s_y R_{01} & s_z R_{02} & t_x \\ s_x R_{10} & s_y R_{11} & s_z R_{12} & t_y \\ s_x R_{20} & s_y R_{21} & s_z R_{22} & t_z \\ 0 & 0 & 0 & 1 \end{pmatrix}

The decomposition (extracting T,R,ST, R, S from a given MM) is non-trivial in general, especially under non-uniform scale and shear. Engines that store transforms as (T,R,S)(T, R, S) tuples avoid this problem entirely.

5.4 Why GP Stores Transforms as TRS Tuples

A 4×4 matrix has 16 floats and embeds rotation, scale, and translation in a non-orthogonal way; reconstructing rotation from a scaled matrix is lossy under non-uniform scale. GP's Transform struct is:

struct alignas(16) Transform {
Vec3f position; // 12B
f32 _pad0; // 4B (alignment)
Quaternion rotation; // 16B (unit quaternion)
Vec3f scale; // 12B
f32 _pad1; // 4B
}; // 48B = 3 cache-line-friendly halves

This is 48 bytes versus 64 for a 4×4 matrix, decomposes cleanly, blends correctly (slerp on the quaternion, lerp on TT and SS), and allows the matrix to be derived only when needed. The matrix is then reconstructed:

Mat4f Transform::ToMatrix() const noexcept {
Mat3f R = rotation.ToMat3();
Mat4f M;
M.Col(0) = Vec4f(R.Col(0) * scale.x, 0.0f);
M.Col(1) = Vec4f(R.Col(1) * scale.y, 0.0f);
M.Col(2) = Vec4f(R.Col(2) * scale.z, 0.0f);
M.Col(3) = Vec4f(position, 1.0f);
return M;
}

This pattern is mirrored in Unreal's FTransform and Unity's Transform.


6. Rotations: From Euler Angles to Quaternions

Rotations deserve their own section because they are the single most error-prone topic in 3D graphics. A small rotation bug compounds across animation frames into visible jitter; a mistuned interpolation causes visible "snapping" at angle boundaries; a misapplied basis change inverts the world. Every senior engineer has a story about gimbal lock.

6.1 Euler Angles

Euler angles parameterize rotation by three sequential angles about three axes, typically named pitch, yaw, roll (or roll, pitch, yaw depending on the field). The rotation is the product of three elementary axis rotations:

RXYZ(α,β,γ)=RX(α)RY(β)RZ(γ)R_{XYZ}(\alpha, \beta, \gamma) = R_X(\alpha) \cdot R_Y(\beta) \cdot R_Z(\gamma)

Pros: human-readable, three intuitive numbers, used by artists and animators in tools.

Cons (numerous):

  • Gimbal lock: when two rotation axes align (e.g., β=±π/2\beta = \pm\pi/2), one degree of freedom is lost. The system becomes singular; inversion fails.
  • Order matters: RX(α)RY(β)RY(β)RX(α)R_X(\alpha)R_Y(\beta) \neq R_Y(\beta)R_X(\alpha) in general. The convention (XYZ, ZYX, ZYZ, ...) must be specified.
  • Interpolating Euler angles linearly produces non-uniform angular velocity and visible jitter at gimbal-lock boundaries.

GP exposes Euler angles only as a display representation; internally all rotations are quaternions.

6.2 Rotation Matrices

A 3×33 \times 3 rotation matrix RR satisfies RTR=IR^T R = I and detR=+1\det R = +1. The set of such matrices forms the Special Orthogonal group SO(3)SO(3).

Elementary rotations:

RX(θ)=(1000cosθsinθ0sinθcosθ)R_X(\theta) = \begin{pmatrix} 1 & 0 & 0 \\ 0 & \cos\theta & -\sin\theta \\ 0 & \sin\theta & \cos\theta \end{pmatrix}

RY(θ)=(cosθ0sinθ010sinθ0cosθ)R_Y(\theta) = \begin{pmatrix} \cos\theta & 0 & \sin\theta \\ 0 & 1 & 0 \\ -\sin\theta & 0 & \cos\theta \end{pmatrix}

RZ(θ)=(cosθsinθ0sinθcosθ0001)R_Z(\theta) = \begin{pmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix}

Pros: composes naturally with other linear transforms, GPU hardware applies them in one matrix multiply.

Cons: 9 floats with 6 redundant constraints; numerical drift away from orthogonality after many compositions; no clean interpolation; expensive to blend.

6.3 Axis-Angle and Rodrigues' Formula

Any 3D rotation can be expressed as a rotation by angle θ\theta about a unit axis a^\hat{\mathbf{a}}. The corresponding matrix is given by Rodrigues' rotation formula:

R=I+(sinθ)[a^]×+(1cosθ)[a^]×2R = I + (\sin\theta) [\hat{\mathbf{a}}]_\times + (1 - \cos\theta) [\hat{\mathbf{a}}]_\times^2

where [a^]×[\hat{\mathbf{a}}]_\times is the cross-product matrix:

[a^]×=(0azayaz0axayax0)[\hat{\mathbf{a}}]_\times = \begin{pmatrix} 0 & -a_z & a_y \\ a_z & 0 & -a_x \\ -a_y & a_x & 0 \end{pmatrix}

Expanding:

R=(c+ax2(1c)axay(1c)azsaxaz(1c)+aysayax(1c)+azsc+ay2(1c)ayaz(1c)axsazax(1c)aysazay(1c)+axsc+az2(1c))R = \begin{pmatrix} c + a_x^2(1-c) & a_x a_y (1-c) - a_z s & a_x a_z (1-c) + a_y s \\ a_y a_x (1-c) + a_z s & c + a_y^2(1-c) & a_y a_z (1-c) - a_x s \\ a_z a_x (1-c) - a_y s & a_z a_y (1-c) + a_x s & c + a_z^2(1-c) \end{pmatrix}

with c=cosθc = \cos\theta, s=sinθs = \sin\theta. Axis-angle uses 4 floats; it is intuitive but does not compose by simple multiplication.

6.4 Quaternions

A quaternion is a four-component object q=(w,x,y,z)=w+xi+yj+zk\mathbf{q} = (w, x, y, z) = w + x\mathbf{i} + y\mathbf{j} + z\mathbf{k} with multiplication rules

i2=j2=k2=ijk=1\mathbf{i}^2 = \mathbf{j}^2 = \mathbf{k}^2 = \mathbf{ijk} = -1

discovered by Hamilton in 1843 and burned into Brougham Bridge in Dublin. The unit quaternions (q=1\|\mathbf{q}\|=1) form the group S3S^3, which is a double cover of SO(3)SO(3): every rotation corresponds to two quaternions, q\mathbf{q} and q-\mathbf{q}, representing the same orientation.

A unit quaternion encoding rotation by angle θ\theta about unit axis a^\hat{\mathbf{a}}:

q=(cosθ2,sinθ2a^)\mathbf{q} = \left(\cos\frac{\theta}{2}, \sin\frac{\theta}{2}\hat{\mathbf{a}}\right)

Note the half-angle: this is essential and often the source of bugs.

Quaternion product (Hamilton product):

q1q2=(w1w2v1v2w1v2+w2v1+v1×v2)\mathbf{q}_1 \mathbf{q}_2 = \begin{pmatrix} w_1 w_2 - \mathbf{v}_1 \cdot \mathbf{v}_2 \\ w_1 \mathbf{v}_2 + w_2 \mathbf{v}_1 + \mathbf{v}_1 \times \mathbf{v}_2 \end{pmatrix}

This composes rotations: q2q1\mathbf{q}_2 \mathbf{q}_1 rotates first by q1\mathbf{q}_1 then by q2\mathbf{q}_2. The product is non-commutative, mirroring the non-commutativity of 3D rotations.

Conjugate: q=(w,v)\mathbf{q}^* = (w, -\mathbf{v}). For a unit quaternion, q=q1\mathbf{q}^* = \mathbf{q}^{-1}.

Rotating a vector by a quaternion:

v=qvq\mathbf{v}' = \mathbf{q} \mathbf{v} \mathbf{q}^*

(treating v\mathbf{v} as a quaternion with w=0w = 0). Expanded for performance:

v=v+2qv×(qv×v+wv)\mathbf{v}' = \mathbf{v} + 2 \mathbf{q}_v \times (\mathbf{q}_v \times \mathbf{v} + w \mathbf{v})

This is the form used in shaders, 18 multiplies and 12 adds, faster than constructing a matrix when only one vector is to be rotated.

Quaternion-to-matrix (when many vectors are to be rotated, build the matrix once):

R(q)=(12y22z22xy2wz2xz+2wy2xy+2wz12x22z22yz2wx2xz2wy2yz+2wx12x22y2)R(\mathbf{q}) = \begin{pmatrix} 1 - 2y^2 - 2z^2 & 2xy - 2wz & 2xz + 2wy \\ 2xy + 2wz & 1 - 2x^2 - 2z^2 & 2yz - 2wx \\ 2xz - 2wy & 2yz + 2wx & 1 - 2x^2 - 2y^2 \end{pmatrix}

constexpr Mat3f Quaternion::ToMat3() const noexcept {
const f32 xx = x*x, yy = y*y, zz = z*z;
const f32 xy = x*y, xz = x*z, yz = y*z;
const f32 wx = w*x, wy = w*y, wz = w*z;
return Mat3f{
{ 1 - 2*(yy+zz), 2*(xy - wz), 2*(xz + wy) },
{ 2*(xy + wz), 1 - 2*(xx+zz), 2*(yz - wx) },
{ 2*(xz - wy), 2*(yz + wx), 1 - 2*(xx+yy) }
};
}

6.5 Spherical Linear Interpolation (Slerp)

Quaternion interpolation has its own technique because linear interpolation between two quaternions, followed by renormalization, produces non-uniform angular velocity. Slerp [Shoemake 1985] gives constant angular velocity:

slerp(q0,q1,t)=sin((1t)Ω)sinΩq0+sin(tΩ)sinΩq1\text{slerp}(\mathbf{q}_0, \mathbf{q}_1, t) = \frac{\sin((1-t)\Omega)}{\sin\Omega} \mathbf{q}_0 + \frac{\sin(t\Omega)}{\sin\Omega} \mathbf{q}_1

where Ω=arccos(q0q1)\Omega = \arccos(\mathbf{q}_0 \cdot \mathbf{q}_1) is the angle between them on the 4-sphere.

Two crucial details:

  1. If q0q1<0\mathbf{q}_0 \cdot \mathbf{q}_1 < 0, negate one of the quaternions: this picks the short way around the 4-sphere. Otherwise the rotation takes the long way (>180°), causing visible flips.
  2. When Ω\Omega is near zero, sinΩ0\sin\Omega \to 0 and the formula is numerically unstable. Fall back to linear interpolation (lerp + normalize, also called nlerp) for small Ω\Omega.
Quaternion Slerp(Quaternion q0, Quaternion q1, f32 t) noexcept {
f32 cos_omega = Dot(q0, q1);
if (cos_omega < 0.0f) { q1 = -q1; cos_omega = -cos_omega; }
if (cos_omega > 0.9995f) return Normalize(Lerp(q0, q1, t)); // Fallback
const f32 omega = std::acos(cos_omega);
const f32 sin_omega = std::sin(omega);
const f32 a = std::sin((1.0f - t) * omega) / sin_omega;
const f32 b = std::sin(t * omega) / sin_omega;
return q0 * a + q1 * b;
}

6.6 When to Use Which Representation

Use caseRecommended representation
Storage, animation keyframesQuaternion (16B, no drift)
Composition (chain of rotations)Quaternion (16 mul/12 add per compose)
Many vectors to rotateMatrix (cache-friendly batch op)
Single vector to rotateQuaternion sandwich form
Artist-facing UIEuler angles (display only)
Smooth interpolationSlerp on quaternions
Specifying rotations programmaticallyAxis-angle, then convert

Reference: Sommer et al., Why and How to Avoid the Flipped Quaternion Multiplication (arXiv:1801.07478) sorts out conventions across libraries.


7. Trigonometry and Analytic Geometry

7.1 The Unit Circle Identities

The graphics-relevant trig identities, memorized by every engineer:

sin2θ+cos2θ=1\sin^2\theta + \cos^2\theta = 1

sin(2θ)=2sinθcosθ,cos(2θ)=cos2θsin2θ\sin(2\theta) = 2\sin\theta\cos\theta, \quad \cos(2\theta) = \cos^2\theta - \sin^2\theta

sin(α±β)=sinαcosβ±cosαsinβ\sin(\alpha \pm \beta) = \sin\alpha\cos\beta \pm \cos\alpha\sin\beta

cos(α±β)=cosαcosβsinαsinβ\cos(\alpha \pm \beta) = \cos\alpha\cos\beta \mp \sin\alpha\sin\beta

The half-angle identities, central to quaternions:

sinθ2=±1cosθ2,cosθ2=±1+cosθ2\sin\frac{\theta}{2} = \pm\sqrt{\frac{1-\cos\theta}{2}}, \quad \cos\frac{\theta}{2} = \pm\sqrt{\frac{1+\cos\theta}{2}}

The dot product as cosine:

a^b^=cosθab\hat{\mathbf{a}} \cdot \hat{\mathbf{b}} = \cos\theta_{ab}

This is the workhorse of every shading model in graphics.

7.2 The Law of Cosines

For a triangle with sides a,b,ca, b, c and angle γ\gamma opposite cc:

c2=a2+b22abcosγc^2 = a^2 + b^2 - 2ab\cos\gamma

A surprisingly common appearance: deriving the half-vector formulation in microfacet BRDFs. The angle between view and light directions ωv\omega_v and ωl\omega_l relates to the half-vector h=(ωv+ωl)/ωv+ωl\mathbf{h} = (\omega_v + \omega_l)/\|\omega_v + \omega_l\| via:

cosθh=1+ωvωl2(1+ωvωl)\cos\theta_h = \frac{1 + \omega_v \cdot \omega_l}{\sqrt{2(1 + \omega_v \cdot \omega_l)}}

7.3 Atan2: The Right Way to Find an Angle

atan2(y, x) returns the angle in (π,π](-\pi, \pi] of the point (x,y)(x, y) from the positive x-axis. Crucially, it handles all four quadrants correctly, unlike the textbook arctan(y/x)\arctan(y/x). Use it for:

  • Spherical coordinates: ϕ=atan2(y,x)\phi = \text{atan2}(y, x)
  • 2D angle between vectors: θ=atan2(a×b,ab)\theta = \text{atan2}(\mathbf{a} \times \mathbf{b}, \mathbf{a} \cdot \mathbf{b}) (sign correctness in 2D)
  • Direction-to-cubemap-face determination

7.4 Lines, Planes, and Rays

A ray is parameterized as p(t)=o+td\mathbf{p}(t) = \mathbf{o} + t\mathbf{d} with t0t \geq 0 and d\mathbf{d} unit-length.

A plane is defined by a unit normal n^\hat{\mathbf{n}} and offset dd:

n^p=d\hat{\mathbf{n}} \cdot \mathbf{p} = d

Points satisfying this lie on the plane. The signed distance from any point p0\mathbf{p}_0 to the plane is:

dist(p0)=n^p0d\text{dist}(\mathbf{p}_0) = \hat{\mathbf{n}} \cdot \mathbf{p}_0 - d

A positive value places p0\mathbf{p}_0 on the side n^\hat{\mathbf{n}} points to. This single formula powers frustum culling: a sphere (c,r)(c, r) is fully outside a plane if n^cd<r\hat{\mathbf{n}} \cdot \mathbf{c} - d < -r.

7.5 Ray-Plane Intersection

Substituting p(t)\mathbf{p}(t):

n^(o+td)=dt=dn^on^d\hat{\mathbf{n}} \cdot (\mathbf{o} + t\mathbf{d}) = d \quad \Longrightarrow \quad t = \frac{d - \hat{\mathbf{n}} \cdot \mathbf{o}}{\hat{\mathbf{n}} \cdot \mathbf{d}}

If n^d=0\hat{\mathbf{n}} \cdot \mathbf{d} = 0, the ray is parallel to the plane (no intersection or infinitely many). If t<0t < 0, the plane is behind the ray origin.

7.6 Ray-Sphere Intersection

For sphere (c,r)(\mathbf{c}, r):

o+tdc2=r2\|\mathbf{o} + t\mathbf{d} - \mathbf{c}\|^2 = r^2

Expanding and letting m=oc\mathbf{m} = \mathbf{o} - \mathbf{c}:

t2+2(md)t+(mmr2)=0t^2 + 2(\mathbf{m} \cdot \mathbf{d}) t + (\mathbf{m} \cdot \mathbf{m} - r^2) = 0

(using dd=1\mathbf{d} \cdot \mathbf{d} = 1 for a unit ray direction). This is a quadratic in tt with discriminant:

Δ=(md)2(mmr2)\Delta = (\mathbf{m} \cdot \mathbf{d})^2 - (\mathbf{m} \cdot \mathbf{m} - r^2)

If Δ<0\Delta < 0: no intersection. If Δ0\Delta \geq 0: two solutions t±=(md)±Δt_\pm = -(\mathbf{m} \cdot \mathbf{d}) \pm \sqrt{\Delta}. Take the smaller positive one for the closest intersection.

7.7 Ray-Triangle Intersection (Möller-Trumbore)

The standard, branchless ray-triangle test [Möller and Trumbore 1997]:

// Returns t (distance along ray) and barycentric (u, v) on hit, or no-hit.
struct HitInfo { f32 t, u, v; bool hit; };

HitInfo IntersectRayTriangle(Vec3f o, Vec3f d,
Vec3f v0, Vec3f v1, Vec3f v2) noexcept {
constexpr f32 EPSILON = 1e-7f;
const Vec3f e1 = v1 - v0;
const Vec3f e2 = v2 - v0;
const Vec3f h = Cross(d, e2);
const f32 a = Dot(e1, h);
if (std::abs(a) < EPSILON) return { 0,0,0,false }; // parallel

const f32 f = 1.0f / a;
const Vec3f s = o - v0;
const f32 u = f * Dot(s, h);
if (u < 0.0f || u > 1.0f) return { 0,0,0,false };

const Vec3f q = Cross(s, e1);
const f32 v = f * Dot(d, q);
if (v < 0.0f || u + v > 1.0f) return { 0,0,0,false };

const f32 t = f * Dot(e2, q);
return { t, u, v, t > EPSILON };
}

This is the inner loop of every CPU ray tracer and the backbone of GPU bounding-volume hierarchies.

7.8 Ray-AABB Intersection (Slab Method)

For an axis-aligned bounding box with min and max corners, treat each pair of opposing faces as a "slab" and intersect the ray with all three slabs:

bool IntersectRayAABB(Vec3f o, Vec3f inv_d,
Vec3f bmin, Vec3f bmax,
f32& t_min_out, f32& t_max_out) noexcept {
const Vec3f t1 = (bmin - o) * inv_d;
const Vec3f t2 = (bmax - o) * inv_d;
const Vec3f tmin = Min(t1, t2);
const Vec3f tmax = Max(t1, t2);
const f32 tn = std::max({ tmin.x, tmin.y, tmin.z, 0.0f });
const f32 tf = std::min({ tmax.x, tmax.y, tmax.z, FLT_MAX });
t_min_out = tn; t_max_out = tf;
return tn <= tf;
}

The trick of pre-computing inv_d = 1.0f / d (with explicit handling of di=0d_i = 0 via ±\pm \infty encoding) eliminates branches in the inner loop. This is the standard BVH traversal primitive.

7.9 Barycentric Coordinates

A point inside triangle (v0,v1,v2)(v_0, v_1, v_2) can be written as:

p=uv0+vv1+wv2,u+v+w=1,u,v,w0\mathbf{p} = u v_0 + v v_1 + w v_2, \quad u + v + w = 1, \quad u, v, w \geq 0

The triple (u,v,w)(u, v, w) is the barycentric coordinate of p\mathbf{p}. They are used for:

  • Vertex attribute interpolation in fragment shaders (a hardware feature).
  • Triangle membership tests: a point is inside iff all three are non-negative.
  • Ray-triangle intersection results (Möller-Trumbore returns (u,v)(u, v) above; w=1uvw = 1 - u - v).
  • Texture sampling for procedural meshes without UV authoring.

The areas are proportional to the coordinates: u=A(v1v2p)/A(v0v1v2)u = A(v_1 v_2 \mathbf{p}) / A(v_0 v_1 v_2), etc.


8. Calculus for Graphics Engineers

A working knowledge of single-variable and multivariable calculus is required for shader writing, Monte Carlo techniques, and physically based rendering. We cover the essentials.

8.1 Derivatives

The derivative dfdx\frac{df}{dx} of a function ff is the rate of change of ff with respect to xx. In graphics it appears most often as:

  • Texture LOD selection: the GPU computes ux,uy,vx,vy\frac{\partial u}{\partial x}, \frac{\partial u}{\partial y}, \frac{\partial v}{\partial x}, \frac{\partial v}{\partial y} (texture coordinate derivatives w.r.t. screen-space pixels) and uses them to select the appropriate mipmap level. Available in shaders as dFdx, dFdy (GLSL) or ddx, ddy (HLSL).
  • Normal mapping: the tangent-space basis (T,B,N)(\mathbf{T}, \mathbf{B}, \mathbf{N}) is derived from texture-space partial derivatives.
  • Ambient occlusion and SSR: these effects sample neighborhoods whose extent is proportional to derivatives of depth.

8.2 The Gradient

For a scalar field f(x)f(\mathbf{x}) in 3D, the gradient is the vector of partial derivatives:

f=(fx,fy,fz)T\nabla f = \left(\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}\right)^T

The gradient points in the direction of steepest increase, with magnitude equal to the maximum rate of change.

In rendering, f\nabla f appears in:

  • Signed distance fields: SDF(x)\nabla \text{SDF}(\mathbf{x}) is the surface normal (when on the surface).
  • Heightmap-derived normals: N=normalize(h/x,h/y,1)\mathbf{N} = \text{normalize}(-\partial h/\partial x, -\partial h/\partial y, 1).
  • Volumetric rendering: density gradients drive single-scattering shading.

8.3 The Jacobian

For a vector function f:RnRm\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^m, the Jacobian is the m×nm \times n matrix of partial derivatives:

Jij=fixjJ_{ij} = \frac{\partial f_i}{\partial x_j}

The Jacobian determinant detJ|\det J| is the local volume scaling factor of the transformation. It is essential in:

  • Importance sampling: when changing variables from x\mathbf{x} to y=f(x)\mathbf{y} = \mathbf{f}(\mathbf{x}), the PDF transforms as pY(y)=pX(x)/detJp_Y(\mathbf{y}) = p_X(\mathbf{x}) / |\det J|.
  • Texture warping: anisotropic texture filtering analyzes the projected Jacobian onto the texture plane.

8.4 Integrals and the Fundamental Theorem

The definite integral abf(x)dx\int_a^b f(x) dx measures the signed area under ff. The fundamental theorem of calculus states:

abf(x)dx=F(b)F(a)\int_a^b f(x) dx = F(b) - F(a)

where FF is an antiderivative of ff. In graphics, closed-form antiderivatives are rare; we approximate integrals numerically (§9).

The integrals that matter in rendering are over solid angles (§10):

Lo(x,ωo)=H2fr(x,ωiωo)Li(x,ωi)(ωin)dωiL_o(\mathbf{x}, \omega_o) = \int_{H^2} f_r(\mathbf{x}, \omega_i \to \omega_o) L_i(\mathbf{x}, \omega_i) (\omega_i \cdot \mathbf{n}) d\omega_i

where H2H^2 is the hemisphere above the surface point x\mathbf{x}.

8.5 Solid Angles

A solid angle dωd\omega is the 3D analog of a 2D angle, measured in steradians (sr). The full sphere subtends 4π4\pi sr; a hemisphere 2π2\pi. In spherical coordinates:

dω=sinθdθdϕd\omega = \sin\theta \, d\theta \, d\phi

A small surface patch dAdA at distance rr from a point, with normal making angle θ\theta' with the line of sight, subtends:

dω=dAcosθr2d\omega = \frac{dA \cos\theta'}{r^2}

This 1/r21/r^2 falloff is the geometric origin of inverse-square light attenuation.

8.6 Taylor Series

A function can be approximated locally by its Taylor expansion:

f(x0+h)=f(x0)+f(x0)h+f(x0)2h2+O(h3)f(x_0 + h) = f(x_0) + f'(x_0) h + \frac{f''(x_0)}{2}h^2 + O(h^3)

In graphics:

  • Schlick's Fresnel approximation is a low-order expansion of the full Fresnel equations:

FSchlick(θ)=F0+(1F0)(1cosθ)5F_\text{Schlick}(\theta) = F_0 + (1 - F_0)(1 - \cos\theta)^5

This single expression replaces the costly exact Fresnel and is accurate to within a few percent for most materials [Schlick 1994].

  • Tone mapping operators like x / (x + 1) are approximations of the human visual system's response curve.
  • Trigonometric polynomial approximations in shaders avoid the slow sin/cos instructions on older hardware.
  • Spivak, Calculus, 4th ed. The undergraduate gold standard.
  • Pharr, Jakob, Humphreys, Physically Based Rendering: From Theory to Implementation, 4th ed. Calculus applied to rendering. (pbr-book.org)

9. Probability, Statistics, and Monte Carlo Methods

Modern photorealistic rendering is statistical: we estimate the value of an integral over light paths by random sampling. This section gives the foundation.

9.1 Random Variables and Distributions

A random variable XX takes values from a distribution. Its probability density function (PDF) pX(x)p_X(x) satisfies:

pX(x)dx=1,pX(x)0\int p_X(x) dx = 1, \quad p_X(x) \geq 0

The cumulative distribution function (CDF) PX(x)=xpX(t)dtP_X(x) = \int_{-\infty}^x p_X(t) dt runs from 0 to 1.

The expectation of a function of XX:

E[g(X)]=g(x)pX(x)dxE[g(X)] = \int g(x) p_X(x) dx

The variance:

Var[X]=E[X2]E[X]2\text{Var}[X] = E[X^2] - E[X]^2

9.2 Common Distributions in Graphics

  • Uniform U[a,b]U[a, b]: p(x)=1/(ba)p(x) = 1/(b-a) on the interval. Generated by RNG.
  • Cosine-weighted hemisphere: p(ω)=cosθ/πp(\omega) = \cos\theta / \pi. Importance sampling for diffuse BRDFs.
  • GGX (Trowbridge-Reitz): the dominant microfacet distribution in modern PBR. Used to importance-sample specular BRDFs.
  • Multivariate normal: gradient noise, terrain generation.

9.3 Monte Carlo Integration

To estimate I=Ωf(x)dxI = \int_\Omega f(x) dx, draw samples XipX_i \sim p and compute:

I^N=1Ni=1Nf(Xi)p(Xi)\hat{I}_N = \frac{1}{N} \sum_{i=1}^{N} \frac{f(X_i)}{p(X_i)}

This is unbiased: E[I^N]=IE[\hat{I}_N] = I. The variance scales as:

Var[I^N]=σ2N\text{Var}[\hat{I}_N] = \frac{\sigma^2}{N}

The standard error therefore decreases as O(1/N)O(1/\sqrt{N}), which is the curse of Monte Carlo: doubling sample count halves the noise's magnitude only by a factor of 21.41\sqrt{2} \approx 1.41. To halve noise visibly, 4×4\times the samples.

9.4 Importance Sampling

The art of Monte Carlo is choosing pp so that f(x)/p(x)f(x)/p(x) has low variance. The ideal is p(x)f(x)p(x) \propto f(x), which gives zero variance (and is the integral itself, hence unrealizable). In practice we sample proportional to a major factor of ff.

Example: in path tracing, the integrand is frLicosθf_r \cdot L_i \cdot \cos\theta. We typically sample one of:

  • cosθ\cos\theta-weighted hemisphere (good for diffuse)
  • BRDF-weighted (good for specular)
  • Light-source-weighted (good for direct illumination)

Multiple Importance Sampling (MIS) [Veach 1995] combines several strategies optimally, weighting each sample by:

wi(x)=nipi(x)βjnjpj(x)βw_i(x) = \frac{n_i p_i(x)^\beta}{\sum_j n_j p_j(x)^\beta}

with β=2\beta = 2 (the "power heuristic") performing best in practice.

9.5 Quasi-Monte Carlo and Low-Discrepancy Sequences

Pure pseudo-random sampling clusters; O(1/N)O(1/\sqrt{N}) is wasteful. Low-discrepancy sequences (Halton, Sobol, Owen-scrambled Sobol) cover the sample space more uniformly and achieve O(logdN/N)O(\log^d N / N) convergence in dd dimensions.

The Halton sequence with bases b1=2,b2=3b_1 = 2, b_2 = 3 for 2D:

xn=ϕ2(n),yn=ϕ3(n)x_n = \phi_2(n), \quad y_n = \phi_3(n)

where ϕb(n)\phi_b(n) is the radical inverse:

ϕb(n)=k=0ak(n)bk1\phi_b(n) = \sum_{k=0}^{\infty} a_k(n) b^{-k-1}

with aka_k the digits of nn in base bb. Used in TAA jitter sequences in nearly every modern renderer.

constexpr f32 RadicalInverse(u32 base, u32 i) noexcept {
f32 result = 0.0f;
f32 inv_base = 1.0f / base;
f32 inv_base_n = inv_base;
while (i > 0) {
result += (i % base) * inv_base_n;
i /= base;
inv_base_n *= inv_base;
}
return result;
}

9.6 Reservoir Sampling

For real-time applications where samples cannot be stored, reservoir sampling maintains a single sample summarizing a stream of weighted candidates. This underpins ReSTIR [Bitterli et al. 2020], the dominant real-time path tracing technique:

Reservoir<S>: maintains one sample s with weight W
Update(s_new, w_new):
W += w_new
if rand() < w_new / W: s = s_new

After processing NN candidates, the reservoir holds a sample distributed proportionally to its weight, with constant memory. ReSTIR adds spatial and temporal resampling of reservoirs to amortize the cost of light selection across pixels and frames.

Reference: Bitterli et al., Spatiotemporal reservoir resampling for real-time ray tracing with dynamic direct lighting, SIGGRAPH 2020.


10. The Rendering Equation and Light Transport

10.1 The Equation

Kajiya's rendering equation [Kajiya 1986] is the master equation of physically based rendering. It states that the outgoing radiance LoL_o at point x\mathbf{x} in direction ωo\omega_o equals the emitted radiance LeL_e plus the integral of incoming radiance reflected through the BRDF:

Lo(x,ωo)=Le(x,ωo)+H2fr(x,ωi,ωo)Li(x,ωi)(ωin)dωiL_o(\mathbf{x}, \omega_o) = L_e(\mathbf{x}, \omega_o) + \int_{H^2} f_r(\mathbf{x}, \omega_i, \omega_o) \, L_i(\mathbf{x}, \omega_i) \, (\omega_i \cdot \mathbf{n}) \, d\omega_i

where:

  • Lo(x,ωo)L_o(\mathbf{x}, \omega_o): outgoing radiance, the quantity we want
  • Le(x,ωo)L_e(\mathbf{x}, \omega_o): emitted radiance (zero for non-emitters)
  • frf_r: the BRDF, units sr1\text{sr}^{-1}
  • Li(x,ωi)L_i(\mathbf{x}, \omega_i): incoming radiance from direction ωi\omega_i
  • (ωin)(\omega_i \cdot \mathbf{n}): the cosine projection, accounting for foreshortening
  • H2H^2: the hemisphere above the surface

This is a Fredholm integral equation of the second kind, LiL_i at one point depends on LoL_o from elsewhere, hence the iterative/Monte Carlo approach.

10.2 The BRDF

The Bidirectional Reflectance Distribution Function (BRDF) fr(ωi,ωo)f_r(\omega_i, \omega_o) describes how a surface reflects light. It must satisfy:

  • Non-negativity: fr0f_r \geq 0
  • Helmholtz reciprocity: fr(ωi,ωo)=fr(ωo,ωi)f_r(\omega_i, \omega_o) = f_r(\omega_o, \omega_i)
  • Energy conservation: H2frcosθidωi1\int_{H^2} f_r \cos\theta_i \, d\omega_i \leq 1 for all ωo\omega_o

The simplest physically valid BRDF is Lambertian:

fr,Lambert=ρπf_{r,\text{Lambert}} = \frac{\rho}{\pi}

with albedo ρ[0,1]3\rho \in [0, 1]^3. The 1/π1/\pi normalizes so that energy is conserved when integrated over the hemisphere.

10.3 Microfacet Models (Cook-Torrance)

Physically based specular BRDFs use the microfacet model [Cook and Torrance 1982]:

fr,spec(ωi,ωo)=D(h)F(ωo,h)G(ωi,ωo,h)4(ωin)(ωon)f_{r,\text{spec}}(\omega_i, \omega_o) = \frac{D(\mathbf{h}) \, F(\omega_o, \mathbf{h}) \, G(\omega_i, \omega_o, \mathbf{h})}{4 (\omega_i \cdot \mathbf{n})(\omega_o \cdot \mathbf{n})}

with:

  • h=(ωi+ωo)/ωi+ωo\mathbf{h} = (\omega_i + \omega_o) / \|\omega_i + \omega_o\|: the half-vector
  • DD: Normal Distribution Function, the proportion of microfacets oriented to reflect from ωi\omega_i to ωo\omega_o
  • FF: Fresnel term, the fraction reflected (vs. transmitted/absorbed)
  • GG: geometric self-shadowing/masking

The GGX (Trowbridge-Reitz) NDF, the modern industry standard:

DGGX(h)=α2π((nh)2(α21)+1)2D_{GGX}(\mathbf{h}) = \frac{\alpha^2}{\pi ((\mathbf{n} \cdot \mathbf{h})^2 (\alpha^2 - 1) + 1)^2}

with roughness α=roughness2\alpha = \text{roughness}^2 (a perceptual remap recommended by [Burley 2012]).

Schlick's Fresnel approximation:

FSchlick(ωo,h)=F0+(1F0)(1ωoh)5F_\text{Schlick}(\omega_o, \mathbf{h}) = F_0 + (1 - F_0)(1 - \omega_o \cdot \mathbf{h})^5

with F0F_0 the reflectance at normal incidence (4% for dielectrics, RGB-tinted for metals).

Smith's geometric term (height-correlated form, used in UE5 and Frostbite):

G2=11+Λ(ωi)+Λ(ωo),Λ=1+α2tan2θ12G_2 = \frac{1}{1 + \Lambda(\omega_i) + \Lambda(\omega_o)}, \quad \Lambda = \frac{\sqrt{1 + \alpha^2 \tan^2\theta} - 1}{2}

The full Cook-Torrance microfacet specular, plus a Lambertian diffuse modulated by 1F1 - F for energy conservation, is the Disney/Burley BRDF that dominates AAA pipelines [Burley 2012, Karis 2013].

References:

10.4 Image-Based Lighting via Spherical Harmonics

For diffuse environment lighting, integrating the irradiance over the hemisphere is expensive per pixel. Spherical harmonics project directional functions onto an orthonormal basis on the sphere:

f(θ,ϕ)=l=0m=llclmYlm(θ,ϕ)f(\theta, \phi) = \sum_{l=0}^{\infty} \sum_{m=-l}^{l} c_l^m Y_l^m(\theta, \phi)

For diffuse irradiance, the first three bands (9 SH coefficients) capture nearly all the energy [Ramamoorthi and Hanrahan 2001]. Storage: 9 RGB floats per probe; evaluation: 9 dot products in the shader.

Reference: Ramamoorthi & Hanrahan, An Efficient Representation for Irradiance Environment Maps, SIGGRAPH 2001.


11. Color Theory and Color Spaces

Color is not a number; it is a perceptual response to a power spectrum, mediated by three retinal cone types. The mapping from spectrum to perceived color, and from perceived color to display, is a deeper topic than most engineers realize.

11.1 The CIE 1931 Color Matching Functions

Human color vision is well-modeled by three weighting functions xˉ(λ),yˉ(λ),zˉ(λ)\bar{x}(\lambda), \bar{y}(\lambda), \bar{z}(\lambda), the CIE 1931 Standard Observer. A spectrum S(λ)S(\lambda) produces tristimulus values:

X=S(λ)xˉ(λ)dλ,Y=S(λ)yˉ(λ)dλ,Z=S(λ)zˉ(λ)dλX = \int S(\lambda)\bar{x}(\lambda) d\lambda, \quad Y = \int S(\lambda)\bar{y}(\lambda) d\lambda, \quad Z = \int S(\lambda)\bar{z}(\lambda) d\lambda

YY is luminance, the perceptual brightness. X,ZX, Z encode chromaticity.

11.2 Linear vs Gamma-Encoded Color

The single most common bug in beginner shaders: confusing linear-light values (where addition and lerp are physically meaningful) with gamma-encoded sRGB values (perceptually-uniform encoding for storage).

The sRGB encoding (approximately gamma 2.2):

sRGB(L)={12.92LL0.00313081.055L1/2.40.055L>0.0031308\text{sRGB}(L) = \begin{cases} 12.92 L & L \leq 0.0031308 \\ 1.055 L^{1/2.4} - 0.055 & L > 0.0031308 \end{cases}

with the inverse:

L(sRGB)={sRGB/12.92sRGB0.04045(sRGB+0.0551.055)2.4sRGB>0.04045L(\text{sRGB}) = \begin{cases} \text{sRGB} / 12.92 & \text{sRGB} \leq 0.04045 \\ \left(\frac{\text{sRGB} + 0.055}{1.055}\right)^{2.4} & \text{sRGB} > 0.04045 \end{cases}

Rule: do all lighting math in linear space; encode to sRGB only when writing to the framebuffer (or use hardware sRGB texture views, which do this automatically).

11.3 HDR and Tone Mapping

Physical light intensities span many orders of magnitude (sun at 109\sim 10^9 cd/m², candle at 10\sim 10 cd/m²). Display devices have limited dynamic range. Tone mapping compresses HDR scene radiance into the displayable [0,1][0, 1] range.

The simplest, Reinhard:

Ldisplay=L1+LL_{display} = \frac{L}{1 + L}

Modern AAA pipelines use ACES (Academy Color Encoding System) which provides cinematic film-like response with controlled saturation and toe/shoulder behavior. The full ACES Reference Rendering Transform is a complex matrix and curve sequence; approximations like ACES Filmic [Hill 2017] are typically inlined into the shader:

vec3 ACESFilmic(vec3 x) {
const float a = 2.51;
const float b = 0.03;
const float c = 2.43;
const float d = 0.59;
const float e = 0.14;
return clamp((x * (a*x + b)) / (x * (c*x + d) + e), 0.0, 1.0);
}

References:

11.4 Wide Gamut, HDR Displays, and Rec.2020

Modern HDR-capable monitors and HDR10/Dolby Vision broadcasts use the Rec.2020 color space, with a wider gamut than sRGB. Engines targeting HDR output must:

  1. Render in linear, scene-referred (HDR) space.
  2. Apply tone mapping that preserves HDR headroom.
  3. Encode in PQ (Perceptual Quantizer, SMPTE ST 2084) for HDR10, or HLG for broadcast.

The math of these transforms is documented in the BT.2100 specification.


12. Signal Processing and Anti-Aliasing

Rendering is sampling: we evaluate a continuous image function at discrete pixel locations. Sampling theory dictates what we can faithfully reconstruct and what produces aliasing artifacts (jaggies, moire, shimmering).

12.1 The Nyquist-Shannon Sampling Theorem

A signal containing frequencies up to fmaxf_{max} can be perfectly reconstructed from samples spaced at intervals Δx1/(2fmax)\Delta x \leq 1/(2 f_{max}). The threshold fN=1/(2Δx)f_N = 1/(2\Delta x) is the Nyquist frequency.

Frequencies above fNf_N in the original signal alias to lower apparent frequencies in the sampled signal. In rendering this manifests as:

  • Jaggies along high-contrast edges (the geometric edge has infinite frequency)
  • Moire patterns in textures with fine repeating detail viewed at oblique angles
  • Shimmering on specular highlights when the camera moves

12.2 Pre-Filtering: The Only True Solution

The mathematically correct fix is to band-limit the signal before sampling, applying a low-pass filter that removes frequencies above fNf_N. In practice:

  • Texture mipmapping: precompute progressively-filtered versions of textures; the GPU selects the appropriate one based on screen-space derivatives. The mipmap chain is the practical realization of pre-filtering for textures.
  • Multi-Sample Anti-Aliasing (MSAA): take multiple sub-pixel samples per pixel and average. Reduces geometric aliasing; does not help with shading aliasing.
  • Super-Sampling Anti-Aliasing (SSAA): render at higher resolution and downsample. Brute-force but correct for everything.
  • Temporal Anti-Aliasing (TAA): use sub-pixel jitter across frames + temporal accumulation with motion-vector-based reprojection. The current AAA standard.

12.3 Mipmap Generation

A mipmap level LL is half the resolution of level L1L-1 in each axis. Level LL is computed by box-filtering (or better: tent, Lanczos, or Kaiser) the source level:

ML[x,y]=14(ML1[2x,2y]+ML1[2x+1,2y]+ML1[2x,2y+1]+ML1[2x+1,2y+1])M_L[x, y] = \frac{1}{4}\left( M_{L-1}[2x, 2y] + M_{L-1}[2x+1, 2y] + M_{L-1}[2x, 2y+1] + M_{L-1}[2x+1, 2y+1] \right)

for box filter. Higher-quality filters use larger kernels and address frequencies just below Nyquist.

The total storage overhead of a full mipmap chain is L=04L=4/3\sum_{L=0}^{\infty} 4^{-L} = 4/3, only 33% extra memory.

12.4 Convolution and Filter Kernels

A convolution of signal ff with kernel gg is:

(fg)(x)=f(τ)g(xτ)dτ(f * g)(x) = \int f(\tau) g(x - \tau) d\tau

Discrete convolution (on a pixel grid):

(fg)[x,y]=i,jf[xi,yj]g[i,j](f * g)[x, y] = \sum_{i, j} f[x-i, y-j] g[i, j]

Common kernels in graphics:

  • Gaussian (separable, smooth, theoretically infinite support):

g(x)=1σ2πex2/(2σ2)g(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-x^2/(2\sigma^2)}

A 2D Gaussian factors into two 1D passes: G(x,y)=g(x)g(y)G(x, y) = g(x) g(y). This separability makes Gaussian blur affordable for bloom, depth of field, screen-space ambient occlusion.

  • Box filter: average over the kernel. Simple but introduces ringing at high contrasts.
  • Sobel/Scharr: derivative kernels for edge detection (used in screen-space normals from depth).
  • Bilateral filter: edge-aware blur. Used in denoising path-traced output.

12.5 The Frequency Domain (FFT)

The Fourier transform decomposes a signal into its frequency components:

f^(ξ)=f(x)e2πiξxdx\hat{f}(\xi) = \int f(x) e^{-2\pi i \xi x} dx

In graphics:

  • Bloom: large-radius blurs are faster in the frequency domain via FFT.
  • Convolution theorem: fg^=f^g^\widehat{f * g} = \hat{f} \cdot \hat{g}. Convolution becomes multiplication; this is why FFT-based filtering exists.
  • Spherical harmonics (§10.4) are the angular Fourier basis on the sphere.

12.6 Reading


13. Numerical Methods and Stability

13.1 Iterative Solvers

Many graphics problems reduce to root-finding f(x)=0f(x) = 0. The standard tools:

Newton-Raphson:

xn+1=xnf(xn)f(xn)x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}

Quadratic convergence near a simple root. Used for:

  • Sphere tracing in raymarched SDFs (when augmented to handle Lipschitz constraints)
  • Inverse kinematics
  • Fast inverse square root (Quake III's 0x5f3759df constant is one Newton-Raphson step on a clever initial guess)

Bisection: brackets a root in [a,b][a, b] with f(a)f(b)<0f(a) f(b) < 0 and halves the interval. Linear convergence but unconditionally robust.

13.2 Numerical Linear Algebra

Solving Ax=bA\mathbf{x} = \mathbf{b}:

  • For small dense AA (4×4 or smaller): direct Gaussian elimination or LU decomposition.
  • For sparse AA: iterative methods (conjugate gradient, GMRES). Used in cloth simulation, fluid simulation, and global illumination via radiosity.

Eigendecomposition A=VΛV1A = V \Lambda V^{-1} appears in:

  • Principal Component Analysis on tangent frames or SH coefficients
  • Inertia tensor diagonalization in physics
  • BVH construction (longest-axis splits use eigenvectors of the centroid covariance)

Singular Value Decomposition (SVD) A=UΣVTA = U \Sigma V^T:

  • Polar decomposition (extracting rotation from a possibly-scaled matrix): A=RSA = R S where R=UVTR = U V^T, S=VΣVTS = V \Sigma V^T.
  • Best low-rank approximation (Eckart-Young theorem), used in compressed materials and neural shading.

13.3 Conditioning and Stability

A computation is ill-conditioned if small input perturbations produce large output changes. The condition number of a matrix AA is κ(A)=AA1=σmax/σmin\kappa(A) = \|A\| \|A^{-1}\| = \sigma_{max}/\sigma_{min}.

Graphics-specific instability examples:

  • Inverting a near-singular matrix: a TBN matrix becomes near-singular when tangent and bitangent collide. Detect via det<ε|\det| < \varepsilon and fall back to a synthesized basis.
  • Acos of a value slightly above 1: produces NaN. Always clamp: std::acos(std::clamp(d, -1.0f, 1.0f)).
  • Normalizing a near-zero vector: divide-by-zero. Guard with epsilon.
  • Quaternion drift: chaining many quaternion products accumulates drift in q\|\mathbf{q}\|. Renormalize periodically (every 100\sim 100 products is safe).

13.4 SIMD and Vectorization

Modern CPUs offer SIMD (Single Instruction, Multiple Data) units that operate on 4-8-16 floats simultaneously: SSE (128-bit, 4×f32), AVX (256-bit, 8×f32), AVX-512 (512-bit, 16×f32), NEON (ARM, 128-bit). On GPUs, every shader thread is implicitly SIMD across 32 (NVIDIA warp) or 64 (AMD wavefront) threads.

A scalar Vec3f doesn't naturally SIMD-vectorize (3 lanes of 4). The two strategies:

  1. AoS-of-Vec4 padding: store as Vec4f (12B → 16B), use SIMD intrinsics. Wastes 25%.
  2. SoA (Structure of Arrays): store NN vectors as three contiguous arrays of NN floats each. Operate on lanes\text{lanes} vectors at once. Used by Embree, OptiX, and modern ray tracers.

GP exposes both patterns; the SoA path is reserved for batch operations on 8\geq 8 vectors.


14. Spatial Data Structures and Computational Geometry

Real-time queries (visibility, collision, ray tracing) require organizing geometry so that "what is near p\mathbf{p}?" is faster than scanning all objects.

14.1 Bounding Volume Hierarchies (BVH)

A BVH is a binary tree where each node holds an axis-aligned bounding box (AABB) enclosing all geometry in its subtree. Ray-AABB tests prune entire subtrees. Construction by Surface Area Heuristic (SAH) [MacDonald & Booth 1990]:

Cost(split)=Tt+pLLTi+pRRTi\text{Cost}(\text{split}) = T_t + p_L \sum_L T_i + p_R \sum_R T_i

with pL,pRp_L, p_R the probability a ray traverses each child (proportional to surface area), TtT_t the traversal cost, and TiT_i the intersection cost. The splitting plane minimizing this cost yields good trees in O(NlogN)O(N \log N) build time.

Production engines now use Linear BVH (LBVH) built from Morton-coded primitives, parallelizable on GPUs in O(N)O(N). See our Virtualized Geometry Systems post for the cluster-BVH application.

14.2 KD-Trees

A KD-tree splits along axis-aligned planes, with the split axis cycling per level. Faster traversal than BVH for static scenes; slower to update for dynamic scenes. Largely superseded by BVH in real-time graphics.

14.3 Octrees

3D space recursively subdivided into 8 octants. Used for:

  • Sparse Voxel Octrees (SVO): efficient storage of voxelized geometry [Laine and Karras 2010].
  • Position-based collision broadphase.
  • Cone tracing for global illumination (NVIDIA VXGI).

14.4 Spatial Hashing

For uniformly-sized objects scattered through space, hashing p/δ\lfloor \mathbf{p}/\delta \rfloor to a fixed-size table provides O(1)O(1) neighbor queries on average. Used for SPH fluid simulation and broadphase collision in unbounded worlds.

14.5 Convex Hulls and Half-Plane Intersections

The convex hull of a point set is the smallest convex polytope containing all points. Computed in O(NlogN)O(N \log N) by QuickHull or in 3D by incremental algorithms. Used for:

  • Conservative collision shapes from raw meshes
  • Frustum construction (intersection of 6 half-planes)
  • Convex decomposition for physics (V-HACD)

14.6 Reading


15. Physics and Animation Mathematics

15.1 Newton's Laws and Numerical Integration

A rigid body's motion is governed by:

F=ma,τ=Iα\mathbf{F} = m\mathbf{a}, \quad \boldsymbol{\tau} = I\boldsymbol{\alpha}

In simulation we integrate numerically. The choice of integrator matters:

Forward Euler (not used in production): vn+1=vn+anΔt,xn+1=xn+vnΔt\mathbf{v}_{n+1} = \mathbf{v}_n + \mathbf{a}_n \Delta t, \quad \mathbf{x}_{n+1} = \mathbf{x}_n + \mathbf{v}_n \Delta t

Cheap but unstable; energy grows unboundedly for oscillators.

Semi-implicit (symplectic) Euler (used in most physics engines): vn+1=vn+anΔt,xn+1=xn+vn+1Δt\mathbf{v}_{n+1} = \mathbf{v}_n + \mathbf{a}_n \Delta t, \quad \mathbf{x}_{n+1} = \mathbf{x}_n + \mathbf{v}_{n+1} \Delta t

Same cost, conserves energy on average. The default for game physics.

Verlet integration (used for cloth, hair, and constraint-based systems): xn+1=2xnxn1+anΔt2\mathbf{x}_{n+1} = 2\mathbf{x}_n - \mathbf{x}_{n-1} + \mathbf{a}_n \Delta t^2

Position-based; velocity is implicit. Robust under iterative constraint solving (PBD).

Runge-Kutta 4 (RK4) (used in offline solvers): Four-stage method with O(h4)O(h^4) truncation error. High accuracy at 4×4\times cost; rarely used in games.

15.2 Quaternion Time Derivatives

For an angular velocity ω\boldsymbol{\omega} (a vector in body or world frame), the rate of change of orientation q\mathbf{q} is:

q˙=12(0,ω)q\dot{\mathbf{q}} = \frac{1}{2} (0, \boldsymbol{\omega}) \mathbf{q}

(quaternion product). Integrating over Δt\Delta t:

qn+1=exp(12ωΔt)qn\mathbf{q}_{n+1} = \exp\left(\frac{1}{2}\boldsymbol{\omega}\Delta t\right) \mathbf{q}_n

with the quaternion exponential of a pure-imaginary quaternion:

exp(0,v)=(cosv,sinvv^)\exp(0, \mathbf{v}) = (\cos\|\mathbf{v}\|, \sin\|\mathbf{v}\| \, \hat{\mathbf{v}})

This is the correct way to integrate orientation; first-order qn+1=qn+q˙Δt\mathbf{q}_{n+1} = \mathbf{q}_n + \dot{\mathbf{q}} \Delta t with renormalization is a common approximation.

15.3 Inertia Tensor

For a rigid body, the inertia tensor II is a 3×3 symmetric positive-definite matrix relating angular velocity to angular momentum:

L=Iω\mathbf{L} = I \boldsymbol{\omega}

For a uniform-density box of dimensions a×b×ca \times b \times c and mass mm:

Ibox=m12(b2+c2000a2+c2000a2+b2)I_{box} = \frac{m}{12}\begin{pmatrix} b^2 + c^2 & 0 & 0 \\ 0 & a^2 + c^2 & 0 \\ 0 & 0 & a^2 + b^2 \end{pmatrix}

For a sphere of radius rr:

Isphere=25mr2I3×3I_{sphere} = \frac{2}{5} m r^2 \cdot I_{3 \times 3}

The tensor transforms with the body's orientation: Iworld=RIbodyRTI_{world} = R I_{body} R^T. Diagonalizing IbodyI_{body} once at startup yields principal axes along which angular dynamics decouple.

15.4 Skeletal Animation

A skeleton is a tree of bones, each with a local-to-parent transform. The world-space transform of a vertex bound to bone bb with bind pose BbB_b is:

vworld=MbBb1vrest\mathbf{v}_{world} = M_b \cdot B_b^{-1} \cdot \mathbf{v}_{rest}

where MbM_b is the current world-space bone transform. With linear blend skinning, a vertex influenced by multiple bones with weights wiw_i:

vworld=iwiMiBi1vrest,wi=1\mathbf{v}_{world} = \sum_i w_i \cdot M_i \cdot B_i^{-1} \cdot \mathbf{v}_{rest}, \quad \sum w_i = 1

LBS is fast (a sum of weighted matrix-vector products) but produces "candy-wrapper" artifacts at sharply twisted joints. Dual-quaternion skinning [Kavan et al. 2007] fixes this at higher cost by interpolating in the dual-quaternion representation that preserves rigid-body length.

References:


16. The GP Math Library: Design Rationale

The GP Engine ships its mathematics module under source/runtime/core/public/maths/. This section enumerates the design choices and their motivations.

16.1 Type Inventory

TypeSizePurpose
Vec2f, Vec3f, Vec4f8B / 12B / 16BFloating-point vectors
Vec2i, Vec3i, Vec4i8B / 12B / 16BInteger vectors (texture coordinates, grid indices)
Mat2f, Mat3f, Mat4f16B / 36B / 64BColumn-major matrices
Quaternion16BUnit quaternion
Transform48BTRS tuple
AABB24BAxis-aligned bounding box
Sphere16BBounding sphere
Plane16BPlane equation
Ray24BOrigin + direction
Frustum96BSix planes
Color16BRGBA float

16.2 Conventions, Enforced by the Type System

  • Right-handed, zz-up world space.
  • Column-major matrix storage, column vectors, pre-multiplication: v=Mv\mathbf{v}' = M\mathbf{v}.
  • Quaternions are always unit-length when used to represent rotation; the type's invariant is enforced at construction. Non-unit quaternions are a different type (UnnormalizedQuaternion) used only as an intermediate.
  • Angles are radians at the API boundary. Conversion utilities Degrees(rad) and Radians(deg) are explicit.

16.3 The Scalar/SIMD Duality

Every operation has two implementations:

// public/maths/Vec3.hpp
namespace gp::math {

struct alignas(16) Vec3f {
f32 x, y, z;
f32 _pad; // align to 16B for SIMD load efficiency

[[nodiscard]] friend constexpr f32 Dot(const Vec3f& a, const Vec3f& b) noexcept {
return a.x * b.x + a.y * b.y + a.z * b.z;
}
};

} // namespace gp::math

The SIMD specialization lives next door in private/maths/Vec3_SIMD.cpp and is selected at compile time based on GP_HAS_SSE/GP_HAS_NEON macros. The scalar version is the reference: every formula in this paper compiles directly to it.

16.4 Constexpr-Everything

GP's math types are constexpr-friendly. This allows:

  • Compile-time matrix construction for static look-up tables.
  • Static asserts on basis orthonormality.
  • Template metaprogramming for shader uniforms whose layout is determined at compile time.

This was a deliberate departure from GLM (which is mostly non-constexpr) and DirectXMath (whose intrinsics defy constexpr).

16.5 No Implicit Conversions

GP does not allow implicit conversions across handedness, between row-major and column-major, or between unit and unnormalized quaternions. These conversions require explicit calls (ToRowMajor(), Normalize(), FlipHandedness()) so the cost is visible and the conversion is auditable.

16.6 Forward Declarations

Heavy templates (Mat<T, R, C>) are hidden behind a MathForward.hpp header. Most consumers include only the forward declarations and avoid the full template instantiation in their compilation units, dramatically improving compile times.


17. Curriculum Roadmap and Further Study

This paper is the entry point. The recommended progression for a student building toward AAA-level proficiency:

17.1 Foundational (months 1-3)

  • Strang, Linear Algebra and Its Applications (or 3Blue1Brown's video series for visual intuition).
  • Lengyel, Foundations of Game Engine Development, Vol. 1: Mathematics.
  • Implement a software rasterizer from scratch in C++. The Tinyrenderer tutorial is canonical.

17.2 Intermediate (months 4-9)

  • Akenine-Möller, Haines, Hoffman, Real-Time Rendering, 4th ed. The standard textbook of the field.
  • Pharr, Jakob, Humphreys, Physically Based Rendering, 4th ed. (pbr-book.org)
  • Ericson, Real-Time Collision Detection.
  • Implement a CPU path tracer (use PBRT as the reference).

17.3 Advanced (year 2+)

  • Veach, Robust Monte Carlo Methods for Light Transport Simulation (PhD thesis, the bible of variance reduction).
  • Shoemake, Quaternions tutorial.
  • SIGGRAPH course notes (annual): Advances in Real-Time Rendering, Physically Based Shading. Available at advances.realtimerendering.com.
  • Implement a real-time deferred renderer, then a Vulkan-based path tracer with ReSTIR, then virtualized geometry (see our previous post).

17.4 The GP Curriculum

Future GP educational posts will build on this foundation in roughly this order:

  1. Practical Linear Algebra: Implementing Vec/Mat/Quat from Scratch (anatomy of core/public/maths/).
  2. The Rasterization Pipeline: From Triangle to Pixel (deriving the perspective divide).
  3. PBR From First Principles: Building a Cook-Torrance BRDF (microfacet derivation, code).
  4. Monte Carlo Integration for Real-Time Path Tracing (importance sampling, MIS, ReSTIR).
  5. Temporal Anti-Aliasing: The Subpixel Frequency Domain.
  6. Skeletal Animation and Dual-Quaternion Skinning.

Each builds on the foundations laid here; none introduces a formula not derivable from this paper.


18. References

Foundational Mathematics

  1. Strang, G. (2016). Introduction to Linear Algebra, 5th ed. Wellesley-Cambridge Press.
  2. Spivak, M. (2008). Calculus, 4th ed. Publish or Perish.
  3. Trefethen, L. N. & Bau, D. (1997). Numerical Linear Algebra. SIAM.
  4. Goldberg, D. (1991). What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys 23(1). Available: docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
  5. Sanderson, G. (3Blue1Brown). Essence of Linear Algebra. Video series. Available: youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab

Graphics-Oriented Math

  1. Lengyel, E. (2016). Foundations of Game Engine Development, Volume 1: Mathematics. Terathon. Available: foundationsofgameenginedev.com
  2. Lengyel, E. (2019). Foundations of Game Engine Development, Volume 2: Rendering. Terathon.
  3. Akenine-Möller, T., Haines, E., Hoffman, N., Pesce, A., Iwanicki, M., & Hillaire, S. (2018). Real-Time Rendering, 4th ed. CRC Press. Errata and supplements: realtimerendering.com
  4. Dunn, F. & Parberry, I. (2011). 3D Math Primer for Graphics and Game Development, 2nd ed. CRC Press.

Quaternions and Rotations

  1. Shoemake, K. (1985). Animating rotation with quaternion curves. SIGGRAPH '85, 245-254.
  2. Sommer, H., Gilitschenski, I., Bloesch, M., et al. (2018). Why and how to avoid the flipped quaternion multiplication. Aerospace 5(3), 72. arxiv.org/abs/1801.07478
  3. Hanson, A. J. (2006). Visualizing Quaternions. Morgan Kaufmann.

Geometry and Intersection

  1. Möller, T. & Trumbore, B. (1997). Fast, minimum storage ray-triangle intersection. Journal of Graphics Tools 2(1), 21-28.
  2. Ericson, C. (2005). Real-Time Collision Detection. Morgan Kaufmann. Companion site: realtimecollisiondetection.net
  3. MacDonald, J. D. & Booth, K. S. (1990). Heuristics for ray tracing using space subdivision. The Visual Computer 6(3), 153-166.

Physically Based Rendering

  1. Kajiya, J. T. (1986). The rendering equation. SIGGRAPH '86, 143-150.
  2. Cook, R. L. & Torrance, K. E. (1982). A reflectance model for computer graphics. ACM Transactions on Graphics 1(1), 7-24.
  3. Schlick, C. (1994). An inexpensive BRDF model for physically-based rendering. Computer Graphics Forum 13(3), 233-246.
  4. Burley, B. (2012). Physically-based shading at Disney. SIGGRAPH 2012 Course Notes. Available: disneyanimation.com/publications/physically-based-shading-at-disney/
  5. Karis, B. (2013). Real shading in Unreal Engine 4. SIGGRAPH 2013 Course Notes. Available: blog.selfshadow.com/publications/s2013-shading-course/karis/s2013_pbs_epic_notes_v2.pdf
  6. Heitz, E. (2014). Understanding the masking-shadowing function in microfacet-based BRDFs. Journal of Computer Graphics Techniques 3(2), 48-107. jcgt.org/published/0003/02/03/
  7. Pharr, M., Jakob, W. & Humphreys, G. (2023). Physically Based Rendering: From Theory to Implementation, 4th ed. MIT Press. Full book online: pbr-book.org
  8. Veach, E. (1997). Robust Monte Carlo methods for light transport simulation. PhD thesis, Stanford University.
  9. Veach, E. & Guibas, L. J. (1995). Optimally combining sampling techniques for Monte Carlo rendering. SIGGRAPH '95.

Sampling and Real-Time Path Tracing

  1. Bitterli, B., Wyman, C., Pharr, M., Shirley, P., Lefohn, A. & Jarosz, W. (2020). Spatiotemporal reservoir resampling for real-time ray tracing with dynamic direct lighting. SIGGRAPH 2020. Available: research.nvidia.com/publication/2020-07_Spatiotemporal-reservoir-resampling
  2. Halton, J. H. (1964). Algorithm 247: Radical-inverse quasi-random point sequence. Communications of the ACM 7(12), 701-702.
  3. Owen, A. B. (1995). Randomly permuted (t, m, s)-nets and (t, s)-sequences. In: Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing.

Color and Tone Mapping

  1. CIE (1931). Commission Internationale de l'Éclairage Proceedings, 1931. Cambridge University Press.
  2. Reinhard, E., Stark, M., Shirley, P. & Ferwerda, J. (2002). Photographic tone reproduction for digital images. ACM Transactions on Graphics 21(3).
  3. Narkowicz, K. (2016). ACES Filmic Tone Mapping Curve (approximation). Available: knarkowicz.wordpress.com/2016/01/06/aces-filmic-tone-mapping-curve/
  4. Academy of Motion Picture Arts and Sciences. Academy Color Encoding System (ACES). Available: acescentral.com
  5. ITU-R Recommendation BT.2100. Image parameter values for high dynamic range television. Available: itu.int/rec/R-REC-BT.2100/en

Signal Processing

  1. Glassner, A. S. (1995). Principles of Digital Image Synthesis. Morgan Kaufmann.
  2. Mitchell, D. P. & Netravali, A. N. (1988). Reconstruction filters in computer graphics. SIGGRAPH '88.
  3. Smith, S. W. (1997). The Scientist and Engineer's Guide to Digital Signal Processing. California Technical Publishing. Available: dspguide.com

Spatial Data Structures

  1. Karras, T. (2012). Maximizing parallelism in the construction of BVHs, octrees, and k-d trees. High-Performance Graphics 2012.
  2. Laine, S. & Karras, T. (2010). Efficient sparse voxel octrees. I3D 2010.
  3. Wald, I., Boulos, S. & Shirley, P. (2007). Ray tracing deformable scenes using dynamic bounding volume hierarchies. ACM Transactions on Graphics 26(1).

Animation and Physics

  1. Kavan, L., Collins, S., Žára, J. & O'Sullivan, C. (2007). Skinning with dual quaternions. I3D 2007. Available: cs.utah.edu/~ladislav/kavan07skinning/kavan07skinning.pdf
  2. Müller, M., Heidelberger, B., Hennix, M. & Ratcliff, J. (2007). Position based dynamics. Journal of Visual Communication and Image Representation 18(2).
  3. Catto, E. (2015). Numerical methods. GDC 2015. Available: box2d.org/files/ErinCatto_NumericalMethods_GDC2015.pdf

Spherical Harmonics and Image-Based Lighting

  1. Ramamoorthi, R. & Hanrahan, P. (2001). An efficient representation for irradiance environment maps. SIGGRAPH 2001. Available: graphics.stanford.edu/papers/envmap/envmap.pdf
  2. Sloan, P.-P. (2008). Stupid spherical harmonics tricks. GDC 2008.

Engine Math Library Engineering

  1. Reed, N. (2015). Depth precision visualized. NVIDIA Developer Blog. Available: developer.nvidia.com/content/depth-precision-visualized
  2. Lottes, T. (2009). FXAA. NVIDIA Whitepaper.
  3. Sucker Punch / Naughty Dog GDC presentations on world-scale precision (various years). Aggregated at advances.realtimerendering.com.

Tutorials

  1. Sokolov, D. Tinyrenderer: How OpenGL works, software rendering in 500 lines of code. Tutorial. Available: github.com/ssloy/tinyrenderer
  2. De Vries, J. Learn OpenGL. Tutorial. Available: learnopengl.com

GP Engine Companion Posts

  1. Scotton, M. (2026). Virtualized Geometry Systems: Architecture, Theory, and Implementation. GP Engineering Reference Series. (/blog/virtualized-geometry-systems)
  2. Scotton, M. (2026). Engine Architecture and Directory Layout. GP Engineering Reference Series. (/blog/engine-architecture-layout)

Document prepared as part of the GP SDK Engineering Reference Series. Version 1.0, Principal Systems Engineer, Graphical Playground SDK.