An accurate estimation of driver vigilance is crucial for reducing fatigue-related incidents and traffic accidents. Despite advances in the field of fatigue detection, effective utilization of multimodal information remains a major challenge. Additionally, prevalent methodologies predominantly focus on local features, overlooking the importance of global features in this context. To solve the above problems, we propose the deep channel attention transformer (DCAT) model, which can effectively utilize multimodal information and extract local-global features for fatigue detection regression tasks. We first introduce a novel multimodal approach that integrates electroencephalography (EEG) and electrooculogram (EOG) data, capitalizing on their complementary strengths to enhance the understanding and assessment of fatigue states. Then, the DCAT model utilizes multimodal information by extracting local and global features using channel attention and transformer encoder modules, respectively. Our evaluation of the SEED-VIG and SADT public datasets showcases the model's superior performance compared to that of the state-of-the-art baselines.