DATE: | Thursday, October 4, 2012 |
TIME: | 4:00 pm |
PLACE: | Council Room (SITE 5-084) |
TITLE: | Predicting High-Cost Patients in General Population Using Data Mining Techniques |
PRESENTER: | Seyed Izadshenas University of Ottawa |
ABSTRACT: In this research, we apply data mining techniques to a nationally-representative expenditure data from the US to predict very high-cost patients in the top 5 cost percentiles, among the general population. Samples are derived from the Medical Expenditure Panel Survey’s Household Component data for 2006-2008 including 98,175 records. After pre-processing, partitioning and balancing the data, the final MEPS dataset with 31,704 records is modeled by Decision Trees (including C5.0 and CHAID), Neural Networks. Multiple predictive models are built and their performances are analyzed using various measures including correctness accuracy, G-mean, and Area under ROC Curve. We conclude that the CHAID tree returns the best G-mean and AUC measures for top performing predictive models ranging from 76% to 85%, and 0.812 to 0.942 units, respectively. Among a primary set of 66 attributes, the best predictors to estimate the top 5% high-cost population include individual’s overall health perception, history of blood cholesterol check, history of physical/sensory/mental limitations, age, and history of colonic prevention measures. This means we can predict high-cost patients without knowing how many times the patient was visited by doctors or hospitalized. Results from this study can be used by policy makers, health planners, and insurers to improve delivery of health services in a more efficient way. |