What Is Camera Calibration? A Scientific Introduction

A first-principles tour of the pinhole model, distortion, the calibration problem, and reprojection error — with the math, the diagrams, and the references behind each step.

May 6, 202611 min readBy Nassim Hammami

Key takeaways

What you will learn

Camera calibration estimates the parameters that map a 3D world point to its 2D image pixel.
The intrinsic matrix K describes focal lengths and principal point; the distortion vector D describes how lenses deviate from an ideal pinhole.
Calibration is a nonlinear least-squares problem: given known 3D–2D correspondences, find K and D that minimise reprojection error.
RMS reprojection error is the standard quality metric, but per-frame and per-region diagnostics matter more than a single number.
OpenCV’s calibrateCamera is built on Zhengyou Zhang’s 2000 paper, which made the problem tractable for any flat target.

The mapping from world to pixel

A camera does not record pixels — it records light. Every pixel value in an image is the result of light reflecting off some 3D surface in the world, passing through a lens, and being measured by a sensor element. Camera calibration is the inverse problem: given the pixels, can we describe the geometric function $f : \mathbb{R}^3 \to \mathbb{R}^2$ that produced them?

That function is the composition of two transformations. The first is extrinsic: where the camera is in the world (a rotation $R$ and translation $\mathbf{t}$ ). The second is intrinsic: how the camera focuses light onto its sensor — focal length, principal point, distortion. Calibration is the process of estimating the intrinsic parameters, and (usually as a byproduct) the per-image extrinsic parameters for the frames used in the solve.

Once we know the intrinsics, we can undistort images, project world points, triangulate from stereo pairs, and pose-estimate from known features. None of those are possible without first answering: what camera am I dealing with?

The pinhole projection maps every world point P to a unique image point p along the ray from P to the camera centre O.

The pinhole model

The simplest geometric model of a camera is the pinhole: imagine all light passing through a single infinitesimal aperture at the camera centre $O$ , then landing on a planar sensor at distance $f$ (the focal length) behind it. Using the virtual-image convention, where the sensor sits between $O$ and the world, similar triangles give the projection equations $x' = f \cdot \frac{X}{Z}, \qquad y' = f \cdot \frac{Y}{Z}$

Written in homogeneous coordinates and rolled together with the extrinsics, this gives the standard pinhole equation $\mathbf{x} = K \, [\, R \mid \mathbf{t} \,] \, \mathbf{X}$ where $\mathbf{X} = (X, Y, Z, 1)^\top$ is a homogeneous world point and $\mathbf{x} = (u, v, 1)^\top$ is its homogeneous pixel coordinate. The $3 \times 3$ intrinsic matrix $K$ does the work of converting metric image-plane coordinates into pixel coordinates.

In pixels rather than metres, the focal length splits into $f_x$ and $f_y$ (the metric focal length divided by the horizontal and vertical pixel pitch). The principal point $(c_x, c_y)$ is where the optical axis pierces the sensor — close to but rarely exactly at the image centre. Together, $f_x, f_y, c_x, c_y$ are the four intrinsic parameters of an ideal pinhole camera.

The intrinsic camera matrix K with focal lengths fx, fy and principal point cx, cy highlighted — The intrinsic matrix K encodes how the lens focuses light onto pixel coordinates. Real cameras almost always have fx ≈ fy and cx, cy near the image centre — but rarely exactly.

Why distortion exists

A real lens is not an infinitesimal pinhole. It is a stack of curved glass elements designed to gather more light and project a sharp image. That design introduces deviations from the pure pinhole projection — most prominently, radial distortion, where points appear shifted along the radial direction from the optical centre. Wide-angle and fisheye lenses bow lines outward (barrel distortion), while telephoto lenses can bow them inward (pincushion).

OpenCV models radial distortion with a polynomial in the radial distance $r$ from the principal point: $r_d = r \cdot \bigl(1 + k_1 r^2 + k_2 r^4 + k_3 r^6\bigr)$ Most reasonable lenses are captured by $k_1$ and $k_2$ ; $k_3$ is added for wide-angle. The rational model extends this with $k_4, k_5, k_6$ in a denominator polynomial for cameras with stronger distortion. Fisheye lenses use a different model entirely — the equidistant projection — because the polynomial approach breaks down near 180°.

There is also tangential distortion, parameterised by $p_1$ and $p_2$ , which appears when the lens elements are not perfectly parallel to the sensor. It is usually small for modern cameras but matters for high-precision work. The full distortion vector $D = (k_1, k_2, p_1, p_2, k_3)$ is what calibration estimates alongside $K$ .

Three grids showing barrel distortion, ideal undistorted grid, and pincushion distortion — Radial distortion shows up as systematic curvature of straight world lines. Calibration recovers the coefficients that map distorted pixels back to ideal pinhole pixels.

The calibration problem, stated formally

Calibration is an optimisation. We capture $N$ images of a planar target whose 3D coordinates $\mathbf{X}_j$ are known by construction (a chessboard, ChArUco board, or circles grid). In each image $i$ we detect the projections $\mathbf{x}_{ij}$ of those known points. The unknowns are: the intrinsics $K$ and distortion $D$ (shared across all images), and a per-image rigid pose $(R_i, \mathbf{t}_i)$ .

The goal is to find the parameters that make the predicted projections $\pi(K, D, R_i, \mathbf{t}_i, \mathbf{X}_j)$ match the observations $\mathbf{x}_{ij}$ as closely as possible, in the sense of squared pixel distance. Concretely: $\min_{K, D, \{R_i, \mathbf{t}_i\}} \;\; \sum_{i, j} \bigl\lVert \mathbf{x}_{ij} - \pi(K, D, R_i, \mathbf{t}_i, \mathbf{X}_j) \bigr\rVert^2$

This is a nonlinear least-squares problem because $\pi$ is nonlinear in the unknowns (the distortion polynomial is the obvious source of nonlinearity, but rotation is also). The standard solver is Levenberg–Marquardt, initialised from a closed-form solution that ignores distortion. The closed-form initialisation is Zhang's contribution from his 2000 paper: he showed that the homographies between a planar target and its images give linear constraints on $K$ , after which distortion can be added back and the whole thing refined nonlinearly.

Calibration target with detected corners and reprojected predicted corners, with residual vectors between them — Calibration minimises the sum of squared residuals between detected corners (dark green) and the corners predicted by the model (light green).

Reading reprojection error

The standard quality metric is the root mean square reprojection error: $e_{\text{rms}} = \sqrt{\, \frac{1}{N} \sum_{i} \bigl\lVert \mathbf{x}_i - \pi(\boldsymbol\theta, \mathbf{X}_i) \bigr\rVert^2 \,}$ where $N$ is the total number of point observations and $\boldsymbol\theta$ is the full parameter vector. This is the same quantity the optimiser was minimising, evaluated at the converged solution. It has units of pixels.

What counts as "low"? It depends on the application. For typical machine-vision lenses on a moderate-resolution sensor, an RMS below $0.5\,\text{px}$ is excellent, $0.5$ – $1.0\,\text{px}$ is good, and above $2\,\text{px}$ usually means weak frames, wrong board parameters, or a model mismatch (e.g. trying to fit a fisheye lens with a pinhole model). For sub-pixel measurement work in metrology, sub- $0.2\,\text{px}$ is the target.

The trap is that one number can hide a lot. A run can show a respectable average while one or two outlier frames carry most of the error, or while the error is concentrated near the edges of the sensor where the distortion model is failing. That is why analytics that show per-image error, sensor coverage, and residual heatmaps are essential — and why CalibrX surfaces all three.

Three camera models, three trade-offs

In practice, "the pinhole model" is one of several models. Choosing the right one matters: a wide-angle lens fit with a strict pinhole will end up with a high RMS and ugly undistortion at the edges, while a normal lens fit as fisheye will produce parameters with strange physical meaning.

The three families most commonly used: the standard pinhole (Brown-Conrady distortion, OpenCV’s calibrateCamera) for moderate lenses up to roughly 90° field of view; the rational pinhole (additional k₄, k₅, k₆ in a denominator polynomial) for wide-angle lenses up to about 120°; and the equidistant fisheye model (Kannala–Brandt or OpenCV’s fisheye namespace) for true fisheye projections approaching 180°. CalibrX surfaces all three so you can solve the same captures against each and compare RMS and undistorted previews directly.

Pinhole — standard rectilinear lenses, 4 to 8 distortion coefficients.
Pinhole wide — rational distortion model for wide-angle and action cameras.
Fisheye — equidistant projection (Kannala–Brandt) for dome and 180° lenses.

What the parameters let you do

Once calibration has converged, K and D are the keys to every geometric operation the camera can support. Undistorting an image becomes a function call that takes pixels in and pixels out with straight lines preserved. Projecting a 3D world point becomes deterministic. Triangulating depth from a stereo pair requires both cameras’ intrinsics and the rectification homography that aligns their epipolar lines.

For applications that consume the calibration — robotics, drones, AR, 3D reconstruction, machine vision — the export needs more than the raw numbers. It needs the model identifier (pinhole, pinhole_wide, fisheye), the image size used during solving, and ideally the RMS and per-image quality metadata. That structured export is what CalibrX writes to JSON or YAML, and what the calibrx Python SDK reads back to drive undistortion locally with the correct model-specific routine.

What you will learn

More from the CalibrX blog.

Pinhole, Wide, or Fisheye? Choosing the Right Camera Model

The Complete Camera Calibration Workflow: From Sample Images to Production Parameters