Several popular sequence-based and pretrained language models have been found to be successful for text-driven prediction of brain activations. However, these models still lack long-term cognitive plausibility as well as insights on the underlying neural substrate mechanisms. This paper studies the influence of context representations of different language models such as sequence-based models: Long short-term memory networks (LSTMs) and ELMo, and popular pretrained Transformer language model (Longformer). In particular, we study how the internal hidden representations align with the brain activity observed via fMRI when the subjects listen to several narrative stories. We use brain imaging recordings of subjects listening to narrative stories to interpret word and sequence embeddings. We further investigate how the representations of language models layers reveal better semantic context during listening. Experiments across all language model representations provide the following cognitive insights: (i) the representations of LSTM cell states are better aligned with brain recordings than LSTM (hidden state), the cell state activity can represent more long-term information, (ii) the representations of ELMo and Longformer display a good predictive performance across brain regions for listening stimuli; (iii) Posterior Medial Cortex (PMC), Temporo-Parieto-Occipital junction (TPOJ), and Dorsal Frontal Lobe (DFL) have higher correlation versus Early Auditory (EAC) and Auditory Association Cortex (AAC).