Via novel path-routing techniques we prove a lower bound on the I/O-complexity of all recursive matrix multiplication algorithms computed in serial or in parallel and show that it is tight for all square and near-square matrix multiplication algorithms. Previously, tight lower bounds were known only for the classical $\Theta\left(n^3\right)$ matrix multiplication algorithm and those similar to Strassen's algorithm that lack multiple vertex copying. We first prove tight lower bounds on the I/O-complexity of Strassen-like algorithms, under weaker assumptions, by constructing a routing of paths between the inputs and outputs of sufficiently small subcomputations in the algorithm's CDAG. We then further extend this result to all recursive divide-and-conquer matrix multiplication algorithms, and show that our lower bound is optimal for algorithms formed from square and nearly square recursive steps. This requires combining our new path-routing approach with a secondary routing based on the Loomis-Whitney Inequality technique used to prove the optimal I/O-complexity lower bound for classical matrix multiplication.