The ability to use inexpensive, noninvasive sensors to accurately classify flying insects would have significant implications for entomological research, and allow for the development of many useful applications in vector control for both medical and agricultural entomology. Given this, the last sixty years have seen many research efforts on this task. To date, however, none of this research has had a lasting impact. We feel that the lack of progress in this pursuit can be attributed to two related factors: the lack of effective sensors makes data collection difficult, resulting in poor-quality and limited data; complicated classifiers built based on the limited data tend to be over-fitting and not well generalized to unknown insects classification. In this dissertation, we strive to solve these problems. The contribution of this dissertation is as follows:
First, we use optical sensors to record the “sound” of insect flight from up-to meters away, with invariance to interference from ambient sounds. With these sensors, we have recorded on the order of millions of labeled training instances, far more data than all previous efforts combined. The enormous amounts of data we have collected allow us to avoid the over-fitting that has plagued previous research efforts. Based on the data, we proposed a simple, accurate and robust classification framework. We also explored several novel features to better distinguish different species of insects to improve the classification accuracy. Experimental results on large scale insect dataset shows the proposed classification framework is both accurate and robust.
In addition, we consider the scenario where the number of labeled training data is limited. Obviously, a good solution to the problem is semi-supervised learning, however, although there are a plethora of semi-supervised learning algorithms in the literature, we found that off-the-shelf semi-supervised learning algorithms do not typically work well for time series. In the second part of this dissertation, we explain why semi-supervised learning algorithms typically fail for time series problems, and we introduce a simple but very effective fix.
Finally, we consider the scenario of insect monitoring where the task is to discover new/invasive species of insects from the data stream produced by sensors while doing the monitoring. We generalize this scenario and propose a never-ending learning framework that continuously attempts to extract new concepts from time series data stream. We demonstrated the utility of the framework by applying it to detect the invasive species of insects, and the results show that it is very both accurate and robust.