Statistical Learning Procedures for Monitoring Regulatory Compliance: An Application to Fisheries Data
As a special case of statistical learning, ensemble methods are well suited for the analysis of opportunistically collected data that involve many weak and sometimes specialized predictors, especially when subject-matter knowledge favors inductive approaches. In this paper, we analyze data on the incidental mortality of dolphins in the purse-seine fishery for tunas in the eastern Pacific Ocean. The goal is to identify those rare purse-seine sets for which incidental mortality would be expected but none was reported. The ensemble method random forests is used to classify sets according to whether mortality was (response = 1) or was not (response = 0) reported. To identify questionable reporting practice, we construct “residuals” as the difference between the categorical response (0, 1) and the proportion of trees in the forest that correctly assify a given set. Two uses of these residuals to identify suspicious data are illustrated. This approach shows promise as a means to identify suspect data gathered for environmental monitoring.