This dissertation explores the potential of machine learning in the swine industry, which faces numerous challenges such as disease outbreaks, antimicrobial resistance, and the need for efficient decision-making for optimal production processes. Traditional methods may not be scalable, accurate, or timely enough to meet the demands. Machine learning provides data-driven solutions that enhance decision-making, improve disease prediction, optimize antimicrobial usage, and enhance overall production efficiency. The dissertation consists of six chapters. Chapter 1 presents an introduction and overview of the remaining chapters. Chapters 2, 3, and 4 present machine learning applications to the swine industry. Chapters 5 and 6 present the development and understanding of deep learning models, specifically for novelty detection and synthetic tabular data generation.
Chapters 2, 3, and 4 address real-world problems in the swine industry using machine learning methods with real-world data. Each Chapter focuses on a specific issue: virus classification, antimicrobial resistance (AMR) prediction, and Minimal Inhibitory Concentration (MIC) prediction. Specifically, Chapter 2 aims to classify the Porcine Reproductive and Respiratory Syndrome Virus, a highly infectious disease of pigs, into four different sublineages via amino acid scores from Open Reading Frame 5 gene sequences. Chapter 3 focuses on predicting the future AMR burden of the bacterial pathogen through time series analysis. The goal of Chapter 4 is to predict MIC values for 12 antibiotics using Random Forest based on k-mer counting data processing approach for whole gene information of \textit{Streptococcus suis}.
Chapters 5 and 6 investigate the improvement of training data quality for machine learning applications through the use of deep generative models. These models, such as Generative Adversarial Networks and Autoencoders, are employed for novelty detection and synthetic tabular data generation. Specifically, Chapter 5 proposes a new method based on Adversarial Autoencoders for detecting novelties in multi-modal normality cases. Chapter 6 provides a systematic and comprehensive assessment of how label noise influences synthetic tabular data generation using deep generative model-based synthesizers.