The goal of coded distributed batch matrix multiplication is to efficiently multiply L instances of \lambda x \kappa matrices, A=(A_1, ..., A_L)$, with L instances of \kappa x \mu matrices B=(B_1,..., B_L), by distributing the computation across S servers, such that the response from any R servers (R is called the recovery threshold) is sufficient to compute the L matrix products, AB=(A_1B_1, A_2B_2, ..., A_LB_L). Existing solutions either compute each $A_lB_l$ one at a time by partitioning individual matrices and coding across these partitions, or rely only on batch processing, i.e., coding across the batch of matrices without any matrix partitioning. The state-of-art for matrix-partitioning and batch processing approaches is represented by Entangled Polynomial Codes (EP codes), and Lagrange Coded Computing (LCC), respectively. In order to combine the benefits of the two approaches, we propose Generalized Cross-Subspace Alignment Codes (GCSA codes) that unify, generalize and improve upon the state of art. GCSA codes bridge the two extremes by efficiently combining both matrix-partitioning and batch processing, and offer flexibility in how much of each approach is used. Both EP codes and LCC codes can be recovered as special cases of GCSA codes. Remarkably, even without matrix partitioning, GCSA codes demonstrate an advantage over LCC codes in download-constrained settings. This is due to cross-subspace alignment, characterized by a Cauchy-Vandermonde code structure that aligns interference along Vandermonde terms, while the desired matrix products remain resolvable along Cauchy terms.