Datalib provides a simple API for doing basic data analysis and manipulation. It allows uploads from CSV, TSV, DSV, Json, TopoJSON and Treejson among others. For this post I want to load data in from a CSV and compare the three point shooting of Steph Curry, Kyle Korver and Lou Williams. I chose these three because Lou Williams is seen as a high variability player; either he’s not or not, while Korver and Curry are seen as consistent.
First we need to load the data in and then group by player then we will calculate a few statistics. Doing this in Datalib is mostly simple asides for a disregard of camel case :(.
On first pass I really like the .table() method, it reminds me of features dataframe API’s use. It takes JSON and puts it in a nice summary for you without doing anything nifty with lodash. From here you can visualize the data pretty easily, or even take this data and dig deeper.
About the data
Interestingly enough Lou Williams doesn’t have the highest variance among the three, he is just not as good at making 3’s. Curry and Korver shoot a much higher percentage and even with their higher variance they are in a different world than Williams shooting wise.
I’ve got some lofty plans for Datalib. Next up I want to look at the limitations of the package by loading some huge JSON and CSV files and trying to do some summaries.