This is a grading project from course 'Make effective data visualization', a part of Data Analyst Nanodegree. Aim of this project is to select a dataset provided by Udacity, or own dataset and create a web visualization that tells a story in this data. I chose dataset of loans provided by Prosper, US lending marketplace. CSV file as well as file with description of every column is compiled by Udacity itself, it is available for download from its site.

Dataset is very large, it has almost 140,000 rows and 81 variables. It contains various columns about loan and borrower, ex. demographic data of borrwer, loan interest rate, size, various loan classifications, etc.

With data this big it is possible to construct a lot of different narratives. I decided to focus on loan rates by US state, and its relation to quality of loans.


Let's start with exploring how interest rates vary across the states. I prepared a separate dataset with median interest rate by state. It is also interesting to see how large is the variation of interest rates. For that I included the interquartile range.

Interest rates have rather narrow variations in US states. Lower quartile and median interest rate are within 5 percent points across all 51 states. Upper quartile varies more, between 19—28%.

Range of interest rates by state

Dot is median interest rate in state.

Line shows interquartile range of interest rate. Left end is first quartile, right end is the third.


Let's visualize interest rate and loan quality by state next. I used variable LoanStatus as a measurement of loan quality.

I put median interest rate on a US map, and colored states according to its median interest rate. Hovering over each state generates a barchart with share of loans by loan type.

Our previous plot siggested low variaion of interest rates, and this map confirms it. Most of states are in middle buckets, lowest buckets are only in Maine and Iowa. Southern states have larger interest rates, other states with high interest rate are scattered across the country.

States with higher interest rates indeed tend to have higher shares of bad loan types. To see if this is indeed true, let's move to a final plot.

Average interest rate in US states

Darker color → larger interest rate.

When you hover a state a tooltip will appear.

Tooltip shows share of loans by loan status. Bad statuses are highlighted in orange, good ones are in blue color.


This is perhaps the most interesting plot in this report. Indeed the interest rate increase with share of bad loans. However there are two distinct outlier states: Maine and Iowa. They have by far largest share of bad loans, above 36% each, while median interest rate is the lowest, around 15%.

Interest rate and share of bad loans
in US states

Bad loans: loans with status:

  • Cancelled
  • Chargedoff
  • Defaulted
  • Past Due

Hover over a dot to get a tooltip with exact values of average loan interest and share of bad loans.

Purple line is a regression line of all the datapoints excluding outliers, Maine and Iowa.

R2 coefficient of this line is 0.3222. Correlation coefficient is 0.568 on points without outliers, and -0.078 if we include outlier points to the datatset.

Note the y axis starts at 14%, not at 0%.

Conclusion

  • Interest rates vary little across the US states
  • Southern states have highest median interest rate
  • Share of bad loans highly correlate with interest rate in state
  • Two outlier states with high share of bad loans and low interest rate: Maine and Iowa.