Recent advancements in spatial transcriptomics (ST) technologies offer unprecedented opportunities to unveil the spatial heterogeneity of gene expression and cell states within tissues. Despite these capabilities of the ST data, accurately dissecting spatiotemporal structures (e.g., spatial domains, temporal trajectories, and functional interactions) remains challenging. Here, we introduce a computational framework, PearlST (partial differential equation [PDE]-enhanced adversarial graph autoencoder of ST), for accurate inference of spatiotemporal structures from the ST data using PDE-enhanced adversarial graph autoencoder. PearlST employs contrastive learning to extract histological image features, integrates a PDE-based diffusion model to enhance characterization of spatial features at domain boundaries, and learns the latent low-dimensional embeddings via Wasserstein adversarial regularized graph autoencoders. Comparative analyses across multiple ST datasets with varying resolutions demonstrate that PearlST outperforms existing methods in spatial clustering, trajectory inference, and pseudotime analysis. Furthermore, PearlST elucidates functional regulations of the latent features by linking intercellular ligand-receptor interactions to most contributing genes of the low-dimensional embeddings, as illustrated in a human breast cancer dataset. Overall, PearlST proves to be a powerful tool for extracting interpretable latent features and dissecting intricate spatiotemporal structures in ST data across various biological contexts.