In multimodal sentiment analysis, a model learns to predict one’s opinion towards a certain entity after observing data from multiple sources. As humans merge information from all five senses to read someone,the model needs a proper fusion mechanism to utilize the data from multiple sources. Tensor fusion networks have been proposed and have shown to perform well for multimodal sentiment analysis. However, it suffers from the curse of dimensionality because they use the full tensor format. As a solution, low-rank fusion networks have been proposed, but they require us to pre-determine the rank, which is an ill-posed problem. Instead, we propose adaptive-rank fusion networks that can learn the rank in a data-driven way. We compare the three models in terms of accuracy, complexity, and runtime. Then, we test how adaptive-rank fusion networks react to being trained with a subset of the original training dataset.