Skip to main content

Research Repository

Advanced Search

Determination of bioavailable arsenic threshold and validation of modelled permissible total arsenic in paddy soil using machine learning

Mandal, J; Jain, V; Sengupta, S; Rahman, Md A; Bhattacharyya, K; Rahman, M; Debasis, G; Wood, M; Mondal, D

Determination of bioavailable arsenic threshold and validation of modelled permissible total arsenic in paddy soil using machine learning Thumbnail


Authors

V Jain

S Sengupta

Md A Rahman

K Bhattacharyya

M Rahman

G Debasis

Profile image of Mike Wood

Prof Mike Wood M.D.Wood@salford.ac.uk
Associate Dean Research & Innovation

D Mondal



Abstract

Minimizing arsenic intake from food consumption is a key aspect of the public health response in As-contaminated regions. In many of these regions, rice is the predominant staple food. Here we present a validated maximum allowable concentration of total As in paddy soil and provide the first derivation of a maximum allowable soil concentration for bioavailable As. We have previously used meta-analysis to predict the maximum allowable total As in soil based on decision tree (DT) and logistic regression (LR) models. The models were defined using the maximum tolerable concentration (MTC) of As in rice grains as per the codex recommendation. In the present study, we validated these models using three test data sets derived from purposely collected field data. The DT model performed better than the LR in terms of accuracy and Matthews correlation coefficient (MCC). Therefore, the DT estimated maximum allowable total As in paddy soil of 14 mg kg−1 could confidently be used as appropriate guideline value. We further used the purposely collected field data to predict the concentration of bioavailable As in the paddy soil with the help of random forest (RF), gradient boosting machine (GBM) and LR models. The category of grain As (<MTC and >MTC) was considered as the dependent variable; bioavailable As (BAs), total As (TAs), pH, organic carbon (OC), available phosphorus (AvP) and available iron (AvFe) were the predictor variables. LR performed better than RF and GBM in terms of accuracy, sensitivity, specificity, kappa, precision, log loss, F1score and MCC. From the better performing LR model, bioavailable As (BAs), TAs, AvFe and OC were significant variables for grain As. From the partial dependence plots (PDP) and individual conditional expectation (ICE) of the LR model, 5.70 mg kg−1 was estimated to be the limit for BAs in soil.

Citation

Mandal, J., Jain, V., Sengupta, S., Rahman, M. A., Bhattacharyya, K., Rahman, M., …Mondal, D. (in press). Determination of bioavailable arsenic threshold and validation of modelled permissible total arsenic in paddy soil using machine learning. Journal of Environmental Quality, https://doi.org/10.1002/jeq2.20452

Journal Article Type Article
Acceptance Date Dec 13, 2022
Online Publication Date Jan 18, 2023
Deposit Date Jan 20, 2023
Publicly Available Date Feb 20, 2023
Journal Journal of Environmental Quality
Print ISSN 0047-2425
Electronic ISSN 1537-2537
Publisher Crop Science Society of America
DOI https://doi.org/10.1002/jeq2.20452
Publisher URL https://doi.org/10.1002/jeq2.20452