Deep learning is an art
Deep learning is the art of shaping latent space. We shape the latent space by giving structures and connections. When we train a model,gradients flow to bring it to life.
The beauty is in crafting structures and arranging them and connecting them appropriately. We put inductive structures like convolutions, attention mechanisms, residual connections, and so forth.
How do we know they work? I think it depends on the size of the model and the amount of data. On small scales, the inductive structures may not work as intended.
e.g., in LLMs, attention components store algorithms and reasoning patterns. MLPs store facts.
It's not just the size and components that matter. The connections do matter. Take MoE and regular models. MoE has fewer connections with the same number of neurons. And MoE is not as good.
We design the compute graphs. I think the next models will have dynamic compute graphs. We can start with rigid structures, but gradients will learn efficient compute graphs. I mean, they are doing that already, but I am thinking about more sophisticated ways. e.g., hyperconnections
I am not very interested in mechanistic interpretability. However, I believe it will provide better tools to look into the structures, whether they are working as designed or not. I am more interested in mechanistic probing.
These are my random thoughts.