Features in extractive supervised single-document summarization: case of Persian news

Rezaei, Hosein; Mirhosseini, Seyed Amid Moeinzadeh; Shahgholian, Azar; Saraee, Mohamad

doi:10.1007/s10579-024-09739-7

Features in extractive supervised single-document summarization: case of Persian news

Rezaei, Hosein; Mirhosseini, Seyed Amid Moeinzadeh; Shahgholian, Azar; Saraee, Mohamad

Authors

Hosein Rezaei

Seyed Amid Moeinzadeh Mirhosseini

Azar Shahgholian

Prof Mo Saraee M.Saraee@salford.ac.uk
Interim Director of Computer Science

Abstract

Text summarization has been one of the most challenging areas of research in NLP. Much effort has been made to overcome this challenge by using either abstractive or extractive methods. Extractive methods are preferable due to their simplicity compared with the more elaborate abstractive methods. In extractive supervised single-document approaches, the system will not generate sentences. Instead, via supervised learning, it learns how to score sentences within the document based on some textual features and subsequently selects those with the highest rank. Therefore, the core objective is ranking, which enormously depends on the document structure and context. These dependencies have been unnoticed by many state-of-the-art solutions. In this work, document-related features such as topic and relative length are integrated into the vectors of every sentence to enhance the quality of summaries. Our experiment results show that the system takes contextual and structural patterns into account, which will increase the precision of the learned model. Consequently, our method will produce more comprehensive and concise summaries.

Citation

Rezaei, H., Mirhosseini, S. A. M., Shahgholian, A., & Saraee, M. (2024). Features in extractive supervised single-document summarization: case of Persian news. Language Resources and Evaluation, 58(4), 1073-1091. https://doi.org/10.1007/s10579-024-09739-7

Journal Article Type	Article
Acceptance Date	Apr 5, 2024
Online Publication Date	May 8, 2024
Publication Date	Dec 1, 2024
Deposit Date	May 22, 2024
Publicly Available Date	May 28, 2024
Journal	Language Resources and Evaluation
Print ISSN	1574-020X
Electronic ISSN	1574-0218
Publisher	Springer Verlag
Peer Reviewed	Peer Reviewed
Volume	58
Issue	4
Pages	1073-1091
DOI	https://doi.org/10.1007/s10579-024-09739-7
Keywords	Natural language processing, Feature extraction, Machine learning, Supervised extractive summarization, Regression
Publisher URL	https://link.springer.com/article/10.1007/s10579-024-09739-7