Synthesizing a realistic or abstract scene from sentences remains challenging because there never exists a “model answer” for such problems as imagining a “picture” from text descrip- tions. The utilization of abstract images or 3D scenes avoids the dependence of the synthesis task on fundamental vision topics such as segmentation, object detection and classification, and thus brings more attention to semantics contained in images. Our goal is to explore the correlations between semantics in 3D scenes and linguistic information extracted from sentence parse graphs. We use CRF for modeling and sampling the scenes, and compute the conditional probability of a generated scene given the parse graph extracted from a group of sentences. The information contained in parse graphs includes objects, attributes and the relations between objects. Instead of synthesizing real-world scenes, we use symbols and indices to indicate the selected atomic 3D models from a dataset collected by ourselves. The generated scene is usually composed of several atomic objects and rendered in Unity.