A traditional extensible processor with customized circuits achieves high performance at the cost of flexibility, while a dynamically extensible processor with reconfigurable fabric offers flexibility for instruction-set extensions (ISEs) but suffers from computational inefficiency. We introduce a novel architecture called Just-in-Time Customizable (JiTC) processor that reconciles the conflicting demands of performance and flexibility in extensible processors. Our key innovation is a multi-stage accelerator, called Specialized Functional Unit (SFU), that is tightly integrated in the processor pipeline. The SFU design is derived through a systematic study of a large range of representative embedded applications. The SFU can be reconfigured on per-cycle basis to support different application-specific instructions at near-ideal performance of an extensible processor. We also provide an automated compilation tool chain for JiTC processor. The experimental results confirm the efficiency and applicability of our approach. © 2013 IEEE.