Skip to main content

Research Repository

Advanced Search

A cloud-hosted MapReduce architecture for syntactic parsing

Woldemariam, YD; Pletschacher, S; Clausner, C; Bass, JM

A cloud-hosted MapReduce architecture for syntactic parsing Thumbnail


YD Woldemariam


Syntactic parsing is a time-consuming task innatural language processing particularlywherea largenumber of text files are beingprocessed. Parsingalgorithms are conventionally designed to operate on a single machine in a sequential fashionand, as a consequence, failto benefit from high performance and parallel computing resources available on the cloud.We designed and implemented a scalable cloud-based architecture supporting parallel and distributed syntactic parsing for large datasets. The main architecture consists of asyntactic parser(constituency and dependency parsing)and a MapReduceframework running onclusters of machines.The resulting cloud-based MapReduce parsing is able to builda map where syntactic trees of the same input file have the same keyand collect into a singlefile containing sentences along with their corresponding trees.Ourexperimental evaluation showsthat the architecture scales wellwith regard to number or processing nodes and number of cores per node.In the fastest tested cloud-based setup, the proposed design performs 7times faster when compared to a localsetup. In summary, this study takes an important step toward providing and evaluating a cloud-hostedsolution for efficient syntactic parsingof natural language data sets consisting of a large number of files.

Presentation Conference Type Conference Paper (published)
Conference Name Euromicro Conference on Software Engineering and Advanced Applications
Start Date Aug 28, 2019
End Date Aug 30, 2019
Acceptance Date May 7, 2019
Online Publication Date Nov 21, 2019
Publication Date Nov 21, 2019
Deposit Date Jul 3, 2019
Publicly Available Date Jul 3, 2019
Publisher Institute of Electrical and Electronics Engineers
Book Title Kallithea, Greece


You might also like

Downloadable Citations