As we sample the world via shifts in gaze, the visual system filters out irrelevant information to prioritize the most relevant visual input. However, there is debate regarding why one source of information is selected over another for attention. The literature has suggested that the physical salience of image features is a dominant guidance factor in scene perception. However, cognitive relevance theory suggests that it is actually scene meaning (our knowledge of the world) that guides attention. Because meaning and image salience are correlated but have been represented differently in the literature, however, it has previously been impossible to test whether meaning or image salience uniquely predict attention when they are represented in the same format. To test their unique contributions to attention, Chapters 2 and 3 tested whether attention, as operationalized by fixation densities, was more related to meaning maps, which capture the spatial distribution of semantic densities in real-world scenes, or to saliency maps, which capture the spatial distribution of physically conspicuous features in scenes. Chapter 2 used a task in which viewers were instructed to count bright patches in scenes or rate the overall brightness of scenes while their eye movements were recorded. This resulted in image salience being task-relevant and meaning being task-irrelevant. Despite its task-irrelevance, meaning predicted fixation densities uniquely whereas image salience did not. A caveat of Chapter 2, however, is that the task required that eye movements be directed to scene-dependent information, thereby conflating whether the task was truly meaning-independent. To remedy this, Chapter 3 employed a free viewing task that did not require participants to attend to meaning or salience. Here, it was found that even during free viewing, meaning continued to explain the overall and unique patterns of attention significantly better than image salience. Together, these findings suggest that the visual system selects meaningful information for attentional selection, as consistent with cognitive relevance theory. Finally, prior work has combined spatial constraint (knowledge of where objects are located in scenes) and image salience to predict where fixations are directed during visual search. Given that meaning uniquely predicts attention beyond image salience, however, Chapter 4 therefore tested whether combining spatial constraint and meaning also predicts eye movements during visual search. Here, meaning was represented as meaning maps and spatial constraint was represented as surface maps that represented the likely locations of target objects as continuous probabilities. The results showed that combining spatial constraint and meaning predicted eye movements better than spatial constraint or meaning alone. This suggested that the visual system selects meaningful regions that appear on surfaces related to visual search targets for fixation. These findings collectively demonstrate that the human visual system prioritizes scene regions that contain meaningful content based upon our knowledge of the world for attention. This has implications for cognitive relevance theory which describes how humans orient attention in the real world and may help inform technologies that reduce distractions.