Skip to main content
eScholarship
Open Access Publications from the University of California

Classication of Imbalanced Data Using Synthetic Over-Sampling Techniques

  • Author(s): Huang, Peng Jun
  • Advisor(s): Wu, Ying Nian
  • et al.
Abstract

A dataset is considered to be imbalanced if the classication objects are not

approximately equally represented. The classication problems of imbalanced

dataset have brought growing attention in the recent years. It is a relatively new

challenge in both industrial and academic elds because many machine learn-

ing techniques do not have a good performance. Often the distribution of the

training data may be dierent than that of the testing data. Typically, sampling

methods would be used in imbalanced learning applications, which modies the

distribution of the training samples by some mechanisms in order to obtain a rel-

atively balanced classier. A novel synthetic sampling technique, SMOTE (Syn-

thetic Minority Over-sampling Technique), has showed a great deal of success in

many applications. Soon after this powerful methods was introduced, some other

SMOTE-based sampling methods such as SMOTEboost , Border-line SMOTE

and ADASYN (Adaptive Synthetic Sampling) have been developed. This pa-

per reviews and compares some of these synthetic sampling methods for learning

imbalanced datasets.

Main Content
Current View