machine learning
thinking about machine learning and philosophy of science
More and more, scientists now rely on AI systems based on deep neural networks as computational tools for conducting scientific research. These systems defy previous theoretical predictions concerning the limitations of machine learning. Empirically, deep learning models exhibit a remarkably small gap between training error and generalization error rates. This fact is surprising because deep neural networks have a massive number of parameters. Statistical learning theory predicts that such large networks should overfit to idiosyncrasies in their training data and thus generalize poorly to novel data. A major outstanding question concerns how we can explain this unreasonable effectiveness of deep learning. In my dissertation, I discuss one possible contributing factor: invariance to transformation of data. Such invariances constrain the space of possible solutions a network can learn in a way that reflects underlying regularities in their target system. This connects the success of deep learning to the fundamental notion of symmetry invoked in mathematical physics. I plan to continue investigating the extent to which we can explain the success of deep learning in terms symmetries and scale. This warrants connecting philosophical work on the epistemology of applied mathematics to empirical work on the mechanistic interpretability of deep neural networks.