Bloom filters are a very important tool for many applications including genomics, where they are used as a compact data structure for counting k-mers, represent de Bruijn graphs, and more. However, their performance is often bound by the large filter size requirement in genomics, and their random-access nature. Although accelerators such as FPGAs and GPUs can easily remove the computation overhead of the multiple hash functions, the random access performance of off-chip memory is still a bottleneck, calling for costly high-performance memory. The solution we propose is BunchBloomer, which improves the cost-effectiveness of FPGA Bloom filter accelerators by making better use of cheaper, lower-power DDR memory. As a part of this project, I work on creating the architecture of a two-layer radix sorter to group table updates into bursts directed to the same 8 KiB memory region, which can be efficiently cached in on-chip memory. The sorters can sustain high performance data ingestion by processing four 32-bit tuples at once, while still utilizing a reasonable amount of chip space. The overall BunchBloomer device achieves much better power efficiency compared to a traditional multicore server or even a conventional FPGA Bloom filter accelerator equipped with Hybrid Memory Cubes.