Soft multipliers have been designed and optimized to provide area, latency, and energy gains for
FPGA-based accelerators. While approximation techniques and FPGA architecture have been used
to produce these gains, certain unorthodox precisions have been overlooked. This work presents
the novel use of 3x3 multipliers that take advantage of commercial Xilinx and Altera FPGA lookup
table architecture to compose higher order soft multipliers. Generalized recursive methods of
composing higher order multipliers are presented, and a novel area-efficient fast 8x8 truncated soft
multiplier is designed for use with popular quantization methods and accelerators for domains such
as machine learning that are robust to errors induced from quantization and truncation. This design
improves upon a comparable 8x8 truncated soft multiplier using state-of-the-art soft multiplier
approximation techniques by 37.5% in both area and energy while maintaining the same delay and
producing an exact product.