The thesis presents a scalable RF data generation framework, which aims to address the challenge of limited data availability for RF system testing. The proposed framework combines simulation and real-world data generation methods to generate large and diverse datasets for training and testing RF ML models. The framework includes modules for pre-processing, metadata generation and a data retrieval system which allows for easy experimentation. The effectiveness of the proposed framework is demonstrated through experiments including signal detection and modulation classification. Overall, this thesis contributes to the development of a comprehensive framework for generating RF data with ease that can significantly improve the development and deployment time of RF systems.