Probably Approximately Correct Learnable Fuzzy System
- Author(s): Wang, Yan
- Advisor(s): Liu, Honghu
- et al.
This dissertation develops the probably approximately correct (PAC) learnable fuzzy system to predict clinical outcomes from a small number of survey questions (short form). There are five layers in the system: input, fuzzification, inference, defuzzification, and production. The major product in this dissertation is to derive the PAC learnable knowledge-driven machine learning algorithm by growing sample using Bootstrap samples with Gaussian distributed noise. The input layer is the procedure for preparing data input. In the fuzzification layer, sample size is significantly increased using bootstrap re-sampling with replacement. The fuzzy set with proposed membership function is generated by introducing Gaussian distributed noise to survey responses of the bootstrap samples to reflect uncertainty. This is a natural language extension from the point option in survey questions to region input with probabilities from survey design space. The inference layer includes both classification and prediction. Here we use machine learning techniques to derive the algorithms in this layer, e.g. Naive Bayesian method and eXtreme Gradient Boosting (XGBoost). The final predicted values require a defuzzification process in the next layer to remove noise in prediction. There are four types of input after fuzzification, original input, fuzzy input, input required interpolation and input required extrapolation. The defuzzification process is based on weighted means of related information. The last step of the system is the output layer with algorithms, final prediction and validation internally and externally. Lastly, we apply this fuzzy system to derive PAC learnable algorithms to predict oral health clinical outcomes. The input predictors include short forms and demographic information. The short forms, developed from Graded Response Models in Item Response Theory, have two versions (children and their parents). The clinical outcomes are referral for treatment needs (categorical) and children’s oral health status index score (continuous). The prediction is evaluated internally and externally by sensitivity and specificity of a binary variable, correlation (between original value and predicted value) and root mean square error (RMSE) of a continuous variable. Both internal and external validation show the improvement of prediction when new information is added and generalizability as well as the stability of the algorithm. The best prediction (high sensitivity and relatively high specificity for categorical variables, low RMSE and high correlation) is reached when using child's self-reported short form, plus parent's proxy-reported short form, and demographic characteristics.