Recent large language models (LLMs) have demonstrated strong reasoning capabilities, often enhanced through online reinforcement learning (RL), particularly within the left-to-right autoregressive (AR) generation paradigm. In contrast, diffusion-based LLMs (dLLMs), which generate text in a coarse-to-fine manner, have shown competitive language modeling performance but their reasoning abilities remain less explored. To address this gap, we propose d1, a framework for adapting pre-trained masked dLLMs into effective reasoning agents using a combination of supervised finetuning (SFT) and RL. Specifically, we introduce two techniques tailored for reasoning: (a) a masked SFT procedure that distills reasoning patterns and encourages self-improvement from existing datasets, and (b) diffu-GRPO, a novel critic-free, policy-gradient RL algorithm—representing the first integration of policy gradient methods with masked dLLMs. We conduct empirical evaluations across mathematical, planning and coding benchmarks and find that d1 substantially improves reasoning performance over a strong dLLM baseline. Code is available at https://dllm-reasoning.github.io/.