Reducing the Effect of Imbalance in Text Classification Using SVD and GloVe with Ensemble and Deep Learning
Keywords:Deep learning, ensemble learning, machine learning, text classification, imbalanced data, singular value decomposition, global vectors
Due to the recent escalation in the amount of text data available and used online, text classification has become a staple for data analysts when extracting relevant information. Yet, machine learning algorithms are susceptible to biases when implemented on any large-scale automated task, especially in text analysis. With the popularization of newer branches of study emerging from the field of machine learning – such as ensemble and deep learning – we must analyze the potential pitfalls in the common experimental setup centered around learning algorithms. Imbalance in text data is one such pitfall – when data is not equally distributed across all categories in a dataset, it can influence and undermine the classification of underrepresented categories. In our research, we have proposed several techniques and unique approaches to tackle this obstacle. We prepared four datasets of varying degrees of imbalance to conduct our experimentation. We proved that feature extraction techniques singular value decomposition (SVD) and GloVe are the key to reducing the effect of imbalance in text classification, especially in ensemble and deep learning. Using the result of our research, we have also proposed a modified ensemble classifier that can classify imbalanced and balanced data alike.