A PREDICTIVE ANALYSIS PLATFORM USING LIVE BIG DATA
PredMine is a predictive analytics platform focusing on the insurance sector. It provides a variety of data mining and big data analytics tools, facilitating the utilization of predictive modelling by insurance companies
Turning big data into business value is a pressing challenge for companies. The cost of scaling data warehouses and the shortage of experts limit the ability to extract insights from big data. Predictive analytics is gradually finding application in multiple areas such as policy risk scoring, claim fraud detection, referral scoring, and score based customer segmentation for marketing and service purposes. The PredMine platform that was designed and implemented by the participating enterprises brings the advantages of predictive analytics, text mining and big data processing to the financial and insurance sector. Machine Learning techniques provide new approaches for the classification of the data, creation of risk segments and prediction of user behavior. Text mining extends the train dataset given to the machine learning module to contain not only the traditional structural information but also “live“ data automatically extracted from the web or other sources using complex data and text mining techniques. Big data techniques provide scalability of the methods and processing of huge data sets. The goal is to simplify the application of complex machine learning, pattern recognition and data mining techniques in risk management, implementing a platform that helps insurance organizations estimate risk, thus having competitive advantages in terms of cost, customer relationships (customer satisfaction), and market leadership.
PREDMINEA Bilateral Cooperation of Science & Technology between Greece and Israel 2013-2015
At its core, the insurance industry encompasses the management of an individual’s risk. Between life, health and liability insurance, companies collect premiums on policies and invest them in holdings until a claims is requested. If the maximum amount paid out is greater than premiums collected, the initial policy underestimated the individual’s level of risk. A number of factors are constantly being calculated to ensure appropriate policies are being issued. An actuary helps design insurance policies using past information to analyse the financial consequences and risks. Likewise, an underwriter will utilize actuary data along with financial data and claims reports to decide the appropriate level of coverage and the terms of coverage. If the price is too low then profit margins may be inadequate, and if prices are very high then customers won’t buy policies from the company.
The PredMine platform leverages predictive analysis tools and methodologies based on data mining and machine learning in order to provide the non-technical user (typically companies active in the financial and insurance sector, and their employees) with the ability to easily handle large datasets and estimate the risk level associated with particular categories of drivers, vehicles, and other related variables, and uncover relationships among them.
A predictive analysis task can generally be decomposed in broader tasks like data acquisition, data preparation and cleansing (usually, but not necessarily), model generation, execution of analysis, model evaluation (optionally) and model refinement and update (optionally). These tasks actually form a repeating loop that leads to a progressively more accurate and refined solution. As evident in the following sections, PredMine assists the non-technical user in all of the aforementioned tasks. The changing nature of the insurance industry has brought on new risks from catastrophes and regulatory compliance. As a result, risk management becomes more important to the organization.
The product provides a complete collaborative predictive analytics solution supporting state of the art machine learning and classification algorithms based on existing open source libraries (like Apache Mahout, R, etc.), automatic data extraction and text mining techniques, as well as big data manipulation and management based on the Hadoop execution environment. This ground-breaking solution enables the existing insurance company’s IT personnel to build predictive models and score those models with huge datasets.