YD Woldemariam
A cloud-hosted MapReduce architecture for syntactic parsing
Woldemariam, YD; Pletschacher, S; Clausner, C; Bass, JM
Authors
S Pletschacher
Dr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Prof Julian Bass J.Bass@salford.ac.uk
Professor of Software Engineering
Abstract
Syntactic parsing is a time-consuming task innatural language processing particularlywherea largenumber of text files are beingprocessed. Parsingalgorithms are conventionally designed to operate on a single machine in a sequential fashionand, as a consequence, failto benefit from high performance and parallel computing resources available on the cloud.We designed and implemented a scalable cloud-based architecture supporting parallel and distributed syntactic parsing for large datasets. The main architecture consists of asyntactic parser(constituency and dependency parsing)and a MapReduceframework running onclusters of machines.The resulting cloud-based MapReduce parsing is able to builda map where syntactic trees of the same input file have the same keyand collect into a singlefile containing sentences along with their corresponding trees.Ourexperimental evaluation showsthat the architecture scales wellwith regard to number or processing nodes and number of cores per node.In the fastest tested cloud-based setup, the proposed design performs 7times faster when compared to a localsetup. In summary, this study takes an important step toward providing and evaluating a cloud-hostedsolution for efficient syntactic parsingof natural language data sets consisting of a large number of files.
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | Euromicro Conference on Software Engineering and Advanced Applications |
Start Date | Aug 28, 2019 |
End Date | Aug 30, 2019 |
Acceptance Date | May 7, 2019 |
Online Publication Date | Nov 21, 2019 |
Publication Date | Nov 21, 2019 |
Deposit Date | Jul 3, 2019 |
Publicly Available Date | Jul 3, 2019 |
Publisher | Institute of Electrical and Electronics Engineers |
Book Title | Kallithea, Greece |
DOI | https://doi.org/10.1109/SEAA.2019.00024 |
Files
PID5964649 Camrea Ready.pdf
(688 Kb)
PDF
You might also like
Efficient and effective OCR engine training
(2019)
Journal Article
The ENP image and ground truth dataset of historical newspapers
(-0001)
Book Chapter
A survey of OCR evaluation tools and metrics
(2021)
Presentation / Conference Contribution