I will present a class of regression problems, called multi-instance learning problems, in which the target variable depends on a summary statistic—typically a maximum or softmax—applied to a linear transformation of a collection of input vectors. Motivated by biological applications such as immune repertoire classification and sequence-to-phenotype mapping, I will present results on a toy model where key aspects of the learning dynamics can be understood analytically using techniques from disordered systems.