Call Network / Call Detail Records As New Big Data Source To Predict Credit Scoring

In Brief:

  • Big Data Scoring is a cloud-based credit decision engine that helps banks, telecoms and consumer lenders improve credit quality and acceptance rates through the use of big data.
  • The study demonstrates how including call networks, in the context of positive credit information, as a new Big Data source has added value in terms of profit by applying a profit measure and profit-based feature selection. 

Credit scoring is one of the ancient applications of analytics where investors and financial institutions execute statistical analysis to evaluate the affluence of potential borrowers to support them decide whether or not to grant credit.

  • In 1956, Fair Isaac was found as one of the first analytical companies contributing retail credit scoring facilities in the US. It’s a well-known FICO score that has been used as an analytical decision instrument by financial institutions, insurers, utility companies and even employers.
  • Edward Altman developed a z-score model for bankruptcy prediction, which is still used to this day in Bloomberg reports as a default risk benchmark.

Initially, these models were built using limited data and were based on simple classification techniques such as linear programming, discriminant analysis and logistic regression. The significance of these retail and corporate credit scoring models further increased due to numerous regulatory compliance guidelines such as the Basel Accords and IFRS 9 which specify the inputs and outputs of a credit scoring model together with how these models can be used to compute provisions and capital buffers.

The most elementary handset passively engenders a vast amount of metadata leaving behind a digital hint of the activity of its user. These metadata deliver information on when, how, from where and with whom we connect. In the beginning, researchers realized the possible of such data by uploading the following software into submissive subjects’ phones through the Reality Mining project of the MIT4. They later expanded admittance to actual metadata directly from mobile network providers, leading to larger-scale research and higher analytical power. Several creativities have occurred, such as the Data For Development (D4D) challenge prepared by Orange, that delivered datasets to the research community for projects associated with development. In a current survey carried out by the World Bank, mobile phone data seemed at the uppermost position in the Big Data used in SDG-related projects.

However, new sources of data present the chance to profile potential borrowers using a more comprehensive representation of behaviour; they also offer an ethical challenge. Mobile phone data, e.g., in the form of call detail records (CDR), allows constructing an extensive social network, and using this information to profile repayment behavior can be seen as unfair to borrowers that could be punished for their mobile cell phone behavior.

More newly, the curiosity in using call networks as a new Big Data source for credit scoring has increased power, e.g., with Wei et al. expressing the potential value of credit scores gained with networks and how planned tie-formation might affect these scores. Though especially fascinating concerning the Chinese government’s idea for a social credit system, the study is only hypothetical and is missing a significant experiential evaluation of the planned models. Additionally, recent press coverage on specialized smartphone applications that assess people’s creditworthiness using the vast amount of data created by their handsets designates the potential of call networks as a substitute data source for credit scoring.

COMPARISON BETWEEN THE TRADITIONAL CDR ANALYSIS SOLUTIONS, HIGHLIGHTING THE ADVANTAGES AND THE LIMITATIONS OF EACH

CDR Solution Year Advantages Limitations
Vertica 2007 *Linear scalability
* Analyze up to 21
TB CDRs.
*Low hardware
Cost.
 Long average response time.
Redknee TCB Running on SQL Server 2009 *Process stored
*CDRs with more
than 100,000
*CDRs/second and
up to 540
Invoices/sec.
*Linear scalability.
Addresses
only one
CDR use
case
(invoices
Generation).
DisTec 2012 Consider system’s
quality of service,
security, and data
Privacy.
Cannot be
extended to
support real
time
Processing.
Processing 6 Billion CDRs/day 2013 *Process CDR
streams on real
time, up to
220,000
CDRs/second
Scalability
was not
Addressed.
Big Data Storage Techniques Implemented to Criminal Investigation for Telecom. 2014 Overall
performance
improvement and
cost reduction
compared to the
Original platform.
Addresses
only use
cases that
related to
criminal
investigation
Application.
Big Data Platform Development with a Domain Specific Language for Telecom Industries. 2015 Analyzes many
telecom data; such
as billing,
subscriber profiles,
and network data
Performance up to
30MB/sec.
There are no
APIs for
external
application
Usages.

Statistical Limitations

CDRs are an excellent illustration of Big Data source that can be abstracted from their key persistence to approximate socio-economic variables and populace mobility. As they are not intended for this purpose, this means that an inevitable prejudice will always influence any application based on these data. If not correctly understood, this could lead to a severe misunderstanding of the results and eventually, have damaging influences in misleading policy-makers.

  • Technical issues
  • Selection bias
  • Spatial bias

Data Privacy

To defend people’s confidentiality, phone data are anonymised continuously, i.e., all personal data such as name, address, etc., are either removed from the database or substituted by a randomly produced number to avoid documentation. Data are then provided to a third party after a non-disclosure agreement was signed with the MNO. The persistence of the deal is to prevent CDRs to be shared with another party and to define the possibility of research questions that will be discovered with the data. Both the anonymization technique and the NDA are hypothetical to reserve the security of users privacy.

Conclusion

Compared to traditional data composed to calculate official statistics, they are cost-effective and can deliver earlier or even near real-time insights. They might also be used to test ideas and define future research questions. Credit-scoring agencies and creditors continually check and size new credit-scoring models. The accessibility of “big data” could generate opportunities for creditors who want to prospect, consumers, support new accounts, manage customers and grow profits. It is already vibrant that the mobile phone data used in this study is prominent in the sense of ‘Volume’, ‘Velocity’, ‘Veracity’ and ‘Variety’. Analysis of the data and the resultant well-performing models show that it also has a positive effect for financial inclusion and on model profit, and as such is also essential for ‘Value’: the fifth V of Big Data!

See also here

.

Learn More

Comments are closed.