Let $F\subset L_2$ be a class of complex-valued functions on a set $D$, such that, for all $x\in D$, point evaluation $f\mapsto f(x)$ is a continuous linear functional.
We study the $L_2$-approximation of functions from $F$ and want to compare the power of function values with the power of arbitrary linear information.
To be precise, the sampling width $e_n(F)$ is the minimal worst-case error that can be achieved with $n$ function values,
whereas the linear width $a_n(F)$ is the minimal worst-case error that can be achieved with $n$ pieces of arbitrary linear information (like derivative values or Fourier coefficients), using linear algorithms.
We show, under some assumptions on $F$, that these quantities have the same order of convergence, up to logarithmic terms. We also demonstrate that, if this assumption is not fulfilled, then no general comparison is possible, yielding that our results are nearly optimal.
Despite its generality, this result is sharp enough to improve upon several existing bounds in specific settings. Moreover, our proof reveals the fascinating fact that i.i.d. random sampling points, together with a suitable (weighted) least squares method, are with high probability as good as all known, sophisticated constructions of approximation algorithms.