Modern multiprocessor system-on-chip designs have high bandwidth constraints which must be satisfied by the underlying communication architecture. Traditional hierarchical shared bus communication architectures can only support limited bandwidths and are not scalable for very high-performance designs. Bus matrix-based communication architectures consist of several parallel busses which provide a suitable backbone to support high-bandwidth systems but suffer from high-cost overhead due to extensive bus wiring inside the matrix. Manual traversal of the vast exploration space to synthesize a minimal cost bus matrix that also satisfies performance constraints is practically infeasible. In this paper, we address this problem by proposing an automated approach for synthesizing a bus matrix communication architecture, which satisfies all performance constraints in the design and minimizes wire congestion in the matrix. To validate our approach, we consider several industrial strength applications from the networking domain and show that our approach results in up to 9 x component savings when compared to a full bus matrix, and up to 3.2 x savings when compared to a maximally connected reduced bus matrix, while satisfying all performance constraints in the design.