State Space Models Based Efficient Long Documents Classification - Journal of Intelligent Learning Systems and Applications

JILSA > Vol.16 No.3, August 2024

Journal of Intelligent Learning Systems and Applications

Volume 16, Issue 3 (August 2024)

ISSN Print: 2150-8402 ISSN Online: 2150-8410

Google-based Impact Factor: 1.5 Citations

State Space Models Based Efficient Long Documents Classification ()

HTML XML

Download as PDF (Size: 480KB) PP. 143-154

DOI: 10.4236/jilsa.2024.163009 9 Downloads 63 Views

Author(s)

Bo Song^*, Yuanhao Xu, Penghao Liang, Yichao Wu

Affiliation(s)

Khoury College of Computer Science, Northeastern University, Boston, MA, USA.

ABSTRACT

Large language models like Generative Pretrained Transformer (GPT) have significantly advanced natural language processing (NLP) in recent times. They have excelled in tasks such as language translation question answering and text generation. However, their effectiveness is limited by the quadratic training complexity of Transformer models O (L²), which makes it challenging to handle complex tasks like classifying long documents. To overcome this challenge researchers have explored architectures and techniques such as sparse attention mechanisms, hierarchical processing and efficient attention modules. A recent innovation called Mamba based on a state space model approach offers inference speed and scalability in sequence length due to its unique selection mechanism. By incorporating this selection mechanism Mamba allows for context reasoning and targeted focus on particular inputs thereby reducing computational costs and enhancing performance. Despite its advantages, the application of Mamba in long document classification has not been thoroughly investigated. This study aims to fill this gap by developing a Mamba-based model, for long document classification and assessing its efficacy across four datasets; Hyperpartisan, 20 Newsgroups, EURLEX and CMU Book Summary. Our study reveals that the Mamba model surpasses NLP models such as BERT and Longformer showcasing exceptional performance and highlighting Mamba’s efficiency in handling lengthy document classification tasks. These results hold implications for NLP applications empowering advanced language models to address challenging tasks with extended sequences and enhanced effectiveness. This study opens doors for the exploration of Mamba’s abilities and its potential utilization, across diverse NLP domains.

KEYWORDS

Mamba, Transformer, NLP

Share and Cite:

Song, B. , Xu, Y. , Liang, P. and Wu, Y. (2024) State Space Models Based Efficient Long Documents Classification. Journal of Intelligent Learning Systems and Applications, 16, 143-154. doi: 10.4236/jilsa.2024.163009.

Cited by

No relevant information.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies