Data mining is an interdisciplinary subfield of computer science with an overall goal to extract information from a data set and transform the information into a comprehensible structure for various analytical applications. Using Data mining and analytics, we plan to review various techniques and algorithms such as J48, Iterative Dichotomiser 3 (ID3), C 4.5 Algorithm
based on ID3 to identify techniques which would provide the most positive result for analysis of the payment wages under the MGNREGA scheme in various districts of India.
Mahatma Gandhi National Rural Employment Guarantee Act provides for the enhancement of livelihood security of the households in rural areas of the country by providing at least one hundred days of guaranteed wage employment in every financial year to every household whose adult members volunteer to do unskilled manual work.
Keywords: Data mining Techniques; Data mining algorithms; Data mining application; J48;
Iterative Dichotomiser 3 (ID3), C 4.5 Algorithm; MGNREGA
The term data mining appeared around 1990 in the database community, generally with positive connotations. For a short time in 1980s, a phrase “database mining” was used, but since it was trademarked by HNC, a San Diego-based company, to pitch their Database Mining Workstation, researchers consequently turned to data mining.
Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science with an overall goal to extract information (with intelligent method) from a data set and transform the information into a comprehensible structure for further use. Data mining is the analysis step of the “knowledge discovery in databases” process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
Data mining involves six common classes of tasks:
Anomaly detection (outlier/change/deviation detection)
Anomaly detection is a technique used to identify unusual patterns that do not conform to expected behavior, called outliers. It has many applications in business, from intrusion detection to system health monitoring, and from fraud detection in credit card transactions to fault detection in operating environments.
Association rule learning (dependency modeling)
Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.
Clustering analysis finds clusters of data objects that are similar in some sense to one another. The members of a cluster are more like each other than they are like members of other clusters. The goal of clustering analysis is to find high-quality clusters such that the inter-cluster similarity is low and the intra-cluster similarity is high.
Classification is a data mining function that assigns items in a collection to target categories or classes. The goal of classification is to accurately predict the target class for each case in the data. For example, a classification model could be used to identify loan applicants as low, medium, or high credit risks.
Regression is a data mining function that predicts a number. Profit, sales, mortgage rates, house values, square footage, temperature, or distance could all be predicted using regression techniques. For example, a regression model could be used to predict the value of a house based on location, number of rooms, lot size, and other factors.
Data Summarization summarizes evaluation data included both primitive and derived data, in order to create a derived evaluation data that is general in nature. Since the data in the data warehouse is of very high volume, there needs to be a mechanism in order to get only the relevant and meaningful information in a less messy format. Data summarization provides the capacity to give data consumers generalize view of disparate bulks of data.
Mahatma Gandhi National Rural Employment Guarantee Act (MGNREGA)
Mahatma Gandhi National Rural Employment Guarantee Act (MGNREGA) is an employment guarantee scheme enacted by legislation on August 25, 2005. Most significant fact about it is that it is considered as a right of a rural citizen to work for minimum 100 days in a year if he/she is willing to work and job should be provided to him/her by authorities within a given time-frame(within 15 days) otherwise state government is liable for paying Unemployment allowance. Wages are also well defined in the scheme. The cost of payments of wages is taken care by Central Government, 75% of material cost and also a share of administrative cost. The implementation part is done by State Governments as in many schemes. Unemployment allowance is provided by state government, so that state should take care of proper employment opportunity under this scheme. 25% of material cost is also provided by State.
Kritika Yadav and Mahesh Parmar 2017 analyze the various data mining techniques used in e-governance of the Mahatma Gandhi National Rural Employment Guarantee Act.
P. Sumithra and V. Valli Kumari 2015 analyze the performance of MGNREG scheme in villages of Visakhapatnam district, using distance weighted k-nearest neighbor classification technique. The paper also gives the comparison of previous year statistical data provided by the government.
G. Sugapriyan and S. Prakasam 2015 analyze the Success of MGNREGA in Kanchipuram District, using Data Mining Technique along with the comparison of previous year statistic data provided by the government. The aim of this work is to analyze the performance and success of this scheme.
G. Chandra 2015 studies the Mahatma Gandhi National Rural Employment Guarantee Act and its impact on the Indian society and analyses the corruption involved in the implementation of the act.
Vrushali Bhuyar 2014 in this paper one of the parameter which is used to increase yield production is considered; that is soil. Different classification algorithms are applied to soil data set to predict its fertility. This paper focuses on classification of soil fertility rate using J48, Naïve Bayes, and Random forest algorithm.
Niketa Gandhi and Leisa J. Armstrong 2016 examine the application of data visualization techniques to find correlations between the climatic factors and rice crop yield. The study also applies data mining techniques to extract the knowledge from the historical agriculture data set to predict rice crop yield for Kharif season of Tropical Wet and Dry climatic zone of India.
Amit Gupta et al. 2016 highlight the trends of incidents that will in return help security agencies and police department to discover precautionary measures from prediction rates. The classification of algorithms used in this study is to assess trends and patterns that are assessed by Bayes Net, Naïve Bayes, J48, JRip, OneR and Decision Table. The output that has been used in this study, are correct classification, incorrect classification, True Positive Rate (TP), False Positive Rate (FP), Precision (P), Recall (R) and F-measure (F).
Dr.M.Usha Rani 2012 analyzed the caste-wise households registered and working and out of these the registered households are collected for all 22 districts of Andhra Pradesh from 2006 to 2011. Data mining tools are used to extract the knowledge from the databases created. Data mining tool – Rapid miner is used to discover the interested patterns on the data of caste wise households that are registered and caste wise households that are working in NREGS works. Caste Wise Employee database is created from NREGS data.
Section 3(2) of MGNREGA tells that the distribution of the wages shall be on weekly basis, or in any case not later than fortnight after the date on which suck work is completed.
The internal studies conducted on the reasons for the trend in the funds allotted for each year and how much payment has been done as the wages to the worker under the scheme.
For the present study all the data were sourced from the official website of MGNREGA, i.e. ‘www.nrega.nic.in ‘. Details for different year have been considered for a particular state over which the research is conducted.
For the study, data was extracted from the MGNREGA system website, imported in test database and then datasets were created. The further analysis on the extracted sub-datasets was visualized using Microsoft Office Tools and Database IDE. Scatter plots were used to show the trend in the approved budget for the concerned state.
The database connectivity was established for further analysis by applying data mining technique. Different parameters were set before applying technique.
The figure above shows the graph plot obtained for three successive years (16-17, 17-18, and 18-19) and the amount disbursed through Bank/POs under the scheme. The MGNREGA scheme grants payment to the workers on a weekly basis. The data shows an account registration of two hundred and forty million workers, for the year 2018-2019, two hundred and ten million workers, for the year 20-17-2018 and one hundred and ninety million workers, for the year 2016-2017. Total wage distribution under the scheme is 50 crore rupee for year 2018-19, Rs.48.4 crore for year 2017-18, Rs.40.29 crore for year 2016-18.This shows a certain trend which was discovers by analyzing the data set using WEKA algorithm. The number of bank accounts opened under the scheme gradually increased over the three years which results in the increase in the funds allocated. On comparing the ratio of funds allocated to the bank accounts opened we analyzed that fund allocation is directly proportional to the account opened.
Let number of accounts opened be X and funds allocated be Y
Then, X.Y = constant
These wages are distributed by the Public Fund Management System (PFMS) through MIS to the nodal bank and post-offices of the districts. These are then distributed to the workers in hand. On further analysis of the funds revealed further patterns in the funds allocation as shown below in the graph. There is significant increase in the funds allocated for the year 2018-19 compared to the previous years.
The above figure portrays the same. Using linear regression we have shown a trend in the fund allocation. The above graph is the result of the ratio of fund allocation and number of accounts given by the following formula:
Where y = a continuous variable
B1=intercept of the graph
B2 = slope of the graph
E=the error term that cannot be explained by the equation.
The results obtained from the analysis of previous data set predicted the difference between the amount approved and the amount distributed for various financial years and the difference is maximum during the year 2017-18. Comparing the data obtained from the graphs we can clearly analyze that due to low budget allocation in the year 2017-18 resulted in a higher difference of the amount of the funds distributed to the funds issued.
Mahatma Gandhi National Rural Employment Guarantee Act (MGNREGA) has become a primary source of employment for the rural and tribal population of the Indian Society. But the success of the Act will depend upon the proper disbursement of the fund approved by the government. The participation of everyone i.e. the people, the government and the various organizations is required to do so.
The Research shows that the budget passed for the every year for the Dhule district varies and so is its disbursement. This show a delay in the payment to the workers under the scheme. Research also shows the wages of the worker are not satisfactory. It show that the disbursement of budget of current year is 6.4% better than the previous year.
Analyzing the Performance of MGNREGA Scheme using Data Mining Technique G. Sugapriyan , S. Prakasam
Review Paper on Data Mining and its Techniques and Mahatma Gandhi National Rural Employment Guarantee Act Kritika Yadav , Mahesh Parmar
As per calculations made by Siraj Dutta, more than 2 lakh bank transactions of MGNREGA wage payments worth `21.52 crore were “rejected” in Jharkhand from 1 April to 19 September 2016. This amounts to about 2% of all wage payments made in the state during that five and a half month period.