The constrained-optimisation view: minimising squared error subject to a penalty is
equivalent to minimising squared error subject to a constraint set
\(\|\beta\|_q \le t\). The \(\ell_2\) ball is a smooth sphere — a tangent
point between the squared-error contours (ellipses) and the sphere almost never
lies on an axis, so no coefficient is exactly zero. The \(\ell_1\) ball is a
polytope whose vertices sit on the coordinate axes; a tangent point often
is a vertex, which zeroes out whichever coordinate that vertex lacks.
Formally, \(\hat\beta\) satisfies the KKT conditions
\[
X^\top (X\beta - y) + \alpha s = 0, \quad s_j \in \begin{cases}
\{\operatorname{sign}(\beta_j)\} & \beta_j \neq 0 \\
[-1, 1] & \beta_j = 0
\end{cases}
\]
where the subgradient \(s_j\) of \(|\beta_j|\) can take any value in \([-1, 1]\)
when \(\beta_j = 0\); this slack is exactly what lets Lasso park a coordinate at
zero rather than push through.