This thesis covers techniques to improve wildfire detection accuracy through the use of spatial and temporal context. Our dataset of wildfires consists of high-resolution images with smoke plumes that need to be detected early, when they are smallest, to allow firefighters enough time to react. We propose two novel architectures - the first combines a region proposal network on a lower-resolution image with a sequence network on the full-resolution image for efficient, accurate predictions; and the second combines a ResNet feature extractor with a vision transformer for high resolution tiled image classification. We also propose novel loss functions, combining a precise per-grid-element loss with a coarser per-image loss to achieve better precision in our detected fires without increasing the amount of false positives. With these techniques, we achieve state of the art results on this dataset with an accuracy of 89% and an average time to detection of 12 seconds.