Nhundu Constintine
A Context-Aware Data Mining Method Recommender for Enhanced Insight Relevance: A Case study Approach
Constintine, Nhundu
Abstract
Abstract Classical data mining approach struggle to produce insights that relates to a practical situation being analysed. Determining an appropriate data mining method is resource intensive and more of a trial-and-error strategy. The thesis proposed method recommender Method Assistant Buddy for data mining method recommendation. It also develops Context-aware Method and Algorithm Selecting (CMAS) approach to integrate contextual factors into Data Mining. Data Mining is moving towards Context-Aware to relate dataset to situation surrounding it. Main Objective: Improve relevance of insight in the practical setting in which a dataset is found. Other objective: Aid recommendation of data mining method based on expert knowledge on both subject and capabilities of specific data mining methods. Data mining and analysis in general are focused more on the figures or values than the context in which datasets are found. The major issue with the current Data Mining approach is that Analyst struggle to interpret outcomes. Issues also arise when analysis is divorced from the world it relates to, since it will be difficult to scope and design situation appropriate for a model. Established Data Mining approach produce results that are not linked to physical activities in a straightforward way. Novel ways to appropriately scope effort for a situation needs to be developed. Context-Aware data mining can improve model usability and reusability. This research proposes, Context-aware Method and Algorithm Selecting Framework (CMAS), applying software decision-making strength. The strength of software is based on knowledge sharing, processing speed and continuous improvements. The thesis develops a Method Assistant Buddy to help novice analysts during model design. CMAS could assist both novice Analysts and Systems End Users. Academics aim to simplify the data mining process. Industry gains from focused effort given limited resources. This research explores a proposed framework, three use cases and analysis of improvements demonstrated by applying the novel approach. Three real life case studies were performed to illustrate the proposed solution. CMAS can be applied to medical records where background knowledge is informative, a highlight could be PIMA India Diabetes context. It can also be applied to India Traffic data where lots of stakeholders might have varying context. A look at World Pollution challenges is complex, therefore CMAS can be applied. The domain knowledge and semantics guide context-based data analysis. The thesis creates a framework that applies contextual factors and uses accumulated expert information to select data mining methods. Evaluation of the proposal has been performed on three real life datasets (PIMA India Diabetes, India Traffic and World Pollution dataset). Analysis seeks to develop an approach, illustrate, and discuss achieved improvements. PIMA Indian Diabetes was analyzed using Association Rules data mining whose performance was measured based on confidence and conviction metrics. CMAS was applied to India traffic data in which the Method Buddy selected Clustering Data Mining method. The quality of clustered was measured using Silhouette score and Davies Bouldin Index. Last use case World Air and Water pollution was processed using prediction tool Regression. Regression performance was measured by Mean square error (MSE) and Root Mean Square error (RMSE). CMAS has shown that it reduces the complexity of the whole data analysis and improves performance of selected models.
Thesis Type | Thesis |
---|---|
Online Publication Date | Mar 27, 2025 |
Deposit Date | Mar 10, 2025 |
Publicly Available Date | Apr 28, 2025 |
Award Date | Mar 27, 2025 |
Files
Thesis
(3.5 Mb)
PDF
You might also like
Features in extractive supervised single-document summarization: case of Persian news
(2024)
Journal Article
Deriving Environmental Risk Profiles for Autonomous Vehicles From Simulated Trips
(2023)
Journal Article
DeepClean : a robust deep learning technique for autonomous vehicle camera data privacy
(2022)
Journal Article
Machine learning-based optimized link state routing protocol for D2D communication in 5G/B5G
(2022)
Presentation / Conference