YD Woldemariam
A cloud-hosted MapReduce architecture for syntactic parsing
Woldemariam, YD; Pletschacher, S; Clausner, C; Bass, JM
Authors
Mr Stefan Pletschacher S.Pletschacher@salford.ac.uk
Lecturer
Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Prof Julian Bass J.Bass@salford.ac.uk
Professor of Software Engineering
Abstract
Syntactic parsing is a time-consuming task innatural language processing particularlywherea largenumber of text files are beingprocessed. Parsingalgorithms are conventionally designed to operate on a single machine in a sequential fashionand, as a consequence, failto benefit from high performance and parallel computing resources available on the cloud.We designed and implemented a scalable cloud-based architecture supporting parallel and distributed syntactic parsing for large datasets. The main architecture consists of asyntactic parser(constituency and dependency parsing)and a MapReduceframework running onclusters of machines.The resulting cloud-based MapReduce parsing is able to builda map where syntactic trees of the same input file have the same keyand collect into a singlefile containing sentences along with their corresponding trees.Ourexperimental evaluation showsthat the architecture scales wellwith regard to number or processing nodes and number of cores per node.In the fastest tested cloud-based setup, the proposed design performs 7times faster when compared to a localsetup. In summary, this study takes an important step toward providing and evaluating a cloud-hostedsolution for efficient syntactic parsingof natural language data sets consisting of a large number of files.
Citation
Woldemariam, Y., Pletschacher, S., Clausner, C., & Bass, J. (2019). A cloud-hosted MapReduce architecture for syntactic parsing. In Kallithea, Greece. https://doi.org/10.1109/SEAA.2019.00024
Conference Name | Euromicro Conference on Software Engineering and Advanced Applications |
---|---|
Start Date | Aug 28, 2019 |
End Date | Aug 30, 2019 |
Acceptance Date | May 7, 2019 |
Online Publication Date | Nov 21, 2019 |
Publication Date | Nov 21, 2019 |
Deposit Date | Jul 3, 2019 |
Publicly Available Date | Jul 3, 2019 |
Publisher | Institute of Electrical and Electronics Engineers |
Book Title | Kallithea, Greece |
DOI | https://doi.org/10.1109/SEAA.2019.00024 |
Files
PID5964649 Camrea Ready.pdf
(688 Kb)
PDF
You might also like
A survey of OCR evaluation tools and metrics
(2021)
Conference Proceeding
Efficient and effective OCR engine training
(2019)
Journal Article
Crowdsourcing historical tabular data : 1961 census of England and Wales
(2019)
Conference Proceeding
Highlights of the novel dewaterability estimation test (DET) device
(2019)
Journal Article
ICFHR 2018 Competition on recognition of historical Arabic scientific manuscripts - RASM2018
(2018)
Conference Proceeding
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search