YD Woldemariam
A cloud-hosted MapReduce architecture for syntactic parsing
Woldemariam, YD; Pletschacher, S; Clausner, C; Bass, JM
Authors
Mr Stefan Pletschacher S.Pletschacher@salford.ac.uk
Lecturer
Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Prof Julian Bass J.Bass@salford.ac.uk
Professor of Software Engineering
Abstract
Syntactic parsing is a time-consuming task innatural language processing particularlywherea largenumber of text files are beingprocessed. Parsingalgorithms are conventionally designed to operate on a single machine in a sequential fashionand, as a consequence, failto benefit from high performance and parallel computing resources available on the cloud.We designed and implemented a scalable cloud-based architecture supporting parallel and distributed syntactic parsing for large datasets. The main architecture consists of asyntactic parser(constituency and dependency parsing)and a MapReduceframework running onclusters of machines.The resulting cloud-based MapReduce parsing is able to builda map where syntactic trees of the same input file have the same keyand collect into a singlefile containing sentences along with their corresponding trees.Ourexperimental evaluation showsthat the architecture scales wellwith regard to number or processing nodes and number of cores per node.In the fastest tested cloud-based setup, the proposed design performs 7times faster when compared to a localsetup. In summary, this study takes an important step toward providing and evaluating a cloud-hostedsolution for efficient syntactic parsingof natural language data sets consisting of a large number of files.
Citation
Woldemariam, Y., Pletschacher, S., Clausner, C., & Bass, J. (2019). A cloud-hosted MapReduce architecture for syntactic parsing. In Kallithea, Greece. https://doi.org/10.1109/SEAA.2019.00024
Conference Name | Euromicro Conference on Software Engineering and Advanced Applications |
---|---|
Start Date | Aug 28, 2019 |
End Date | Aug 30, 2019 |
Acceptance Date | May 7, 2019 |
Online Publication Date | Nov 21, 2019 |
Publication Date | Nov 21, 2019 |
Deposit Date | Jul 3, 2019 |
Publicly Available Date | Jul 3, 2019 |
Publisher | Institute of Electrical and Electronics Engineers |
Book Title | Kallithea, Greece |
DOI | https://doi.org/10.1109/SEAA.2019.00024 |
Files
PID5964649 Camrea Ready.pdf
(688 Kb)
PDF
You might also like
Flexible character accuracy measure for reading-order-independent evaluation
(2020)
Journal Article
Efficient and effective OCR engine training
(2019)
Journal Article
ICDAR2017 Competition on Recognition of Early Indian Printed Documents – REID2017
(2017)
Conference Proceeding
ICDAR2017 Competition on Recognition of Documents with Complex Layouts – RDCL2017
(2017)
Conference Proceeding
Creating a complete workflow for digitising historical census documents : considerations and evaluation
(2017)
Conference Proceeding
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search