TLDR; a geometric proof of the determinant as the “volume scaling factor”, equal to the product of the eigenvalues, without the familiar (and confusing) sign-flipping arithmetic.
Why do defective matrices have eigenvalues?
The eigensystem of an $N\times N$ square matrix $A$ can be read right-to-left;\begin{equation}A = Q^{-1} D\, Q\end{equation}The transformation $Q$ translates into $A$’s coordinate system, where the diagonal matrix $D$ scales each coordinate without mixing, before $Q^{-1}$ translates back to the original coordinate system. The $N$ eigenvalues lie along the diagonal of $D$ and their matching eigenvectors are the rows of $Q$. The $i^\text{th}$ eigenvalue $\lambda_i$ and eigenvector $\vert v_i \rangle$ satisfy the equation \begin{equation}A\vert v_i \rangle=\lambda_i\vert v_i\rangle\label{eigenvector}\end{equation}Of course many square matrices $A$ are defective, lacking a complete set of eigenvectors that satisfy Eq. \ref{eigenvector}. Curiously, these matrices always keep all $N$ eigenvalues. How can an eigenvalue exist without an eigenvector?
An eigenvalue lacks a true eigenvector when $A$ shears its victim off-axis:\begin{equation}A\vert u_i\rangle-\lambda_i\vert u_i\rangle\neq\vert0\rangle\end{equation}The generalized eigenvector $\vert u_i \rangle$ is scaled and sheared, breaking Eq. \ref{eigenvector}. However, the action of $A$ remains linear, so $(A-\lambda_i \mathbb{I})\vert u_i\rangle$ is nullified along the axis of its generalized eigenvector $\vert u_i \rangle$. Thus, eigenvalues satisfy a different constraint than eigenvectors;\begin{equation}\langle u_i \vert\big(A -\lambda_i\mathbb{I}\big)\vert u_i \rangle=0\label{eigenvalue}\end{equation}The null-space of $(A - \lambda \mathbb{I})$ that smothers each generalized eigenvector implies the characteristic polynomial\begin{equation}\det(A - \lambda \mathbb{I}) = 0\end{equation}A determinant is zero when at least one axis is nullified, but doesn’t indicate total nullification! Thus, the characteristic polynomial has always been about solving the eigenvalue constraint (Eq. \ref{eigenvalue}) rather than the eigenvector constraint (Eq. \ref{eigenvector}), which is why a true eigenvector was never required.
Why is the determinant the product of eigenvalues?
The determinant is the volume scaling factor of $A$, equal to the product of $A$’s eigenvalues\begin{equation}\det(A)=\prod_i \lambda_i\end{equation}When the determinant is zero, at least one eigenvalue is zero. A zero-eigenvalue deletes the information along its axis, so that the incoming vector-space is collapsed to a smaller dimension. When the characteristic polynomial solves the roots of $\det(A - \lambda \mathbb{I})$ for unknown $\lambda$, the roots find the eigenvalues which produce generalized eigenvectors that satisfy Eq. \ref{eigenvalue} by nullifying one axis — the generalized eigenvector matching its eigenvalue.
The generalized eigensystem has a property that the regular eigensystem lacks; orthogonality. While many eigensystems are naturally orthogonal (e.g., when $A=A^\dagger$), some eigensytems have linear dependent eigenvectors. But the Schur decomposition guarantees that the generalized eigensystem can always use orthogonal eigenvectors\begin{equation}A = Q^\dagger U\,Q\end{equation}Here $U$ is an upper-triangular matrix and the generalized eigenvectors are the columns of $Q^\dagger$. The eigenvalues of $A$ ride the diagonal of upper-triangle $U$ and its off-diagonal elements encode the shears that break an orthogonal eigensystem.
The Schur decomposition has been proven to exist for any square matrix, so we can use the Schur decomposition to explain why a matrix scales volume proportional to the product of its eigenvalues. The key insight is that a shear transformation preserves volume. A parallelogram’s area depends only on its base and height, not the shear angle. Similarly, any 3D shape can be partitioned into parallel planes, and any shear between those planes doesn’t change the total volume (e.g., a stack of playing cards). The matrix $A$ will stretch its generalized eigenvector $\vert u_i\rangle$ by $\lambda_i$ and also shear the vector by some unknown $\vert \tau \rangle$. But the shear by $\vert \tau \rangle$ doesn’t affect the volume along $\vert u_i\rangle$; the volume is only changed by $\lambda_i$. The Schur decomposition provides an orthogonal basis to independently cover every axis in space; thus, all volume transformation is exhaustively described by the eigenvalues alone — the shears preserve volume!
Imagine the unit hyper-cube: a vector from zero to each standard coordinate. The vectors describing this cube are encoded by the columns of identity matrix $\mathbb{I}$. Thus, $A\mathbb{I}=A$ is the result of $A$ acting on the unit hyper-cube; the unit hyper-cube started with unit volume and was stretched into $A$. Given the Schur decomposition of $A$, the eigenvalues along the diagonal of $U$ stretched each axis — scaling the overall volume — while the off-diagonal shears do not change the volume. Since the final volume is the same with or without the shears, the new volume must be the same as the un-sheared hyper-rectangle. Thus, if we define the determinant as the volume-scaling factor of a square matrix, then the determinant is the product of eigenvalues\begin{equation}\det(A)=\prod_i \lambda_i\end{equation}This derivation of the determinant explicitly avoided the familiar, sign-flipping arithmetic of the Leibniz formula because the Leibniz formula hides the meaning of the determinant behind complicated arithmetic.
Leave determinant arithmetic to the machines!
Linear algebra and eigensystems are too important for students to get hung-up on determinant arithmetic. The Leibniz formula is taught extensively in school and never used in real applications because it does not scale (requiring $\mathcal{O}(n!)$ operations). And because of Leibniz’s arcane arithmetic, one can earn a PhD in particle physics but ultimately understand the determinant from a YouTube video. Finally, most algorithms for calculating the determinant simply solve the eigensystem and multiply the eigenvalues, skipping the Leibniz formula entirely. The characteristic polynomial is not required to define nor find eigenvalues; it is merely a useful result of the Leibniz formula! Why not define eigenvalues via Eq. \ref{eigenvalue} and the Shur decomposition and teach the determinant as “the product of the eigenvalues — the volume scaling factor”?
Of course, the fact that the Leibniz formula calculates the product of eigenvalues is certainly no accident, and likely indicates that Hamilton and Grassman were onto something with their exterior algebra. But that is a question for scholars to ponder — an engineer is happy with the simplest tool that works. Yet most engineers don’t understand the determinant — one of the most important concepts in all of linear algebra! It’s time for humanity to step up our determinant game.
This work is similar “Down with Determinants” by Sheldon Axler, perhaps the first to define the determinant as the product of eigenvalues. For more detail on eigensystems, determinants, and bra-ket notation for linear algebra see my paper on linear algebra.