Call Network / Call Detail Records As New Big Data Source To Predict Credit Scoring

statswork

5 years ago

In Brief:

Big Data Scoring is a cloud-based
credit decision engine that helps banks, telecoms and consumer lenders improve
credit quality and acceptance rates through the use of big data.
The study demonstrates how including
call networks, in the context of positive credit information, as a new Big Data
source has added value in terms of profit by applying a profit measure and profit-based
feature selection.

Credit scoring is one of the ancient applications of analytics where investors and financial institutions execute statistical analysis to evaluate the affluence of potential borrowers to support them decide whether or not to grant credit.

In 1956, Fair Isaac was found as
one of the first analytical companies contributing retail credit scoring facilities
in the US. It’s a well-known FICO score that has been used as an analytical
decision instrument by financial institutions, insurers, utility companies and
even employers.
Edward Altman developed a z-score
model for bankruptcy prediction, which is still used to this day in Bloomberg
reports as a default risk benchmark.

Initially,
these models were built using limited data and were based on simple
classification techniques such as linear programming, discriminant analysis and
logistic regression. The significance of these retail and corporate credit
scoring models further increased due to numerous regulatory compliance
guidelines such as the Basel Accords and IFRS 9 which specify the inputs and
outputs of a credit scoring model together with how these models can be used to
compute provisions and capital buffers.

The most elementary handset passively engenders a vast amount of metadata leaving behind a digital hint of the activity of its user. These metadata deliver information on when, how, from where and with whom we connect. In the beginning, researchers realized the possible of such data by uploading the following software into submissive subjects’ phones through the Reality Mining project of the MIT4. They later expanded admittance to actual metadata directly from mobile network providers, leading to larger-scale research and higher analytical power. Several creativities have occurred, such as the Data For Development (D4D) challenge prepared by Orange, that delivered datasets to the research community for projects associated with development. In a current survey carried out by the World Bank, mobile phone data seemed at the uppermost position in the Big Data used in SDG-related projects.

However, new sources of data present the chance to profile potential borrowers using a more comprehensive representation of behaviour; they also offer an ethical challenge. Mobile phone data, e.g., in the form of call detail records (CDR), allows constructing an extensive social network, and using this information to profile repayment behavior can be seen as unfair to borrowers that could be punished for their mobile cell phone behavior.

More
newly, the curiosity in using call networks as a new Big Data source for credit
scoring has increased power, e.g., with Wei et al. expressing the potential
value of credit scores gained with networks and how planned tie-formation might
affect these scores. Though especially fascinating concerning the Chinese
government’s idea for a social credit system, the study is only hypothetical
and is missing a significant experiential evaluation of the planned models. Additionally,
recent press coverage on specialized smartphone applications that assess
people’s creditworthiness using the vast amount of data created by their
handsets designates the potential of call networks as a substitute data source
for credit scoring.

COMPARISON
BETWEEN THE TRADITIONAL CDR ANALYSIS SOLUTIONS, HIGHLIGHTING THE ADVANTAGES AND
THE LIMITATIONS OF EACH

CDR Solution	Year	Advantages	Limitations
Vertica	2007	Linear scalability Analyze up to 21 TB CDRs. *Low hardware Cost.	Long average response time.
Redknee TCB Running on SQL Server	2009	Process stored CDRs with more than 100,000 CDRs/second and up to 540 Invoices/sec. Linear scalability.	Addresses only one CDR use case (invoices Generation).
DisTec	2012	Consider system’s quality of service, security, and data Privacy.	Cannot be extended to support real time Processing.
Processing 6 Billion CDRs/day	2013	*Process CDR streams on real time, up to 220,000 CDRs/second	Scalability was not Addressed.
Big Data Storage Techniques Implemented to Criminal Investigation for Telecom.	2014	Overall performance improvement and cost reduction compared to the Original platform.	Addresses only use cases that related to criminal investigation Application.
Big Data Platform Development with a Domain Specific Language for Telecom Industries.	2015	Analyzes many telecom data; such as billing, subscriber profiles, and network data Performance up to 30MB/sec.	There are no APIs for external application Usages.

Statistical Limitations

CDRs are an excellent illustration of Big Data source that can be abstracted from their key persistence to approximate socio-economic variables and populace mobility. As they are not intended for this purpose, this means that an inevitable prejudice will always influence any application based on these data. If not correctly understood, this could lead to a severe misunderstanding of the results and eventually, have damaging influences in misleading policy-makers.

Technical
issues
Selection
bias
Spatial
bias

Data Privacy

To defend people’s confidentiality, phone data are anonymised continuously, i.e., all personal data such as name, address, etc., are either removed from the database or substituted by a randomly produced number to avoid documentation. Data are then provided to a third party after a non-disclosure agreement was signed with the MNO. The persistence of the deal is to prevent CDRs to be shared with another party and to define the possibility of research questions that will be discovered with the data. Both the anonymization technique and the NDA are hypothetical to reserve the security of users privacy.

Conclusion

Compared to traditional data composed to calculate official statistics, they are cost-effective and can deliver earlier or even near real-time insights. They might also be used to test ideas and define future research questions. Credit-scoring agencies and creditors continually check and size new credit-scoring models. The accessibility of “big data” could generate opportunities for creditors who want to prospect, consumers, support new accounts, manage customers and grow profits. It is already vibrant that the mobile phone data used in this study is prominent in the sense of ‘Volume’, ‘Velocity’, ‘Veracity’ and ‘Variety’. Analysis of the data and the resultant well-performing models show that it also has a positive effect for financial inclusion and on model profit, and as such is also essential for ‘Value’: the fifth V of Big Data!

In Brief:

Statistical Limitations

Data Privacy

Conclusion

Learn More