Why isn’t out-of-time validation more ubiquitous?

Train, validate and test partitions for out-of-time performance take planning and thought (This piece is also at TDS.) The purpose of supervised machine learning is to classify unlabeled data. We want algorithms to tell us whether a borrower will default, a customer...

Building an interactive web “mapp” with Shiny

The purpose of this post is to discuss the key elements in developing an interactive web application that displays data with geographic component. I discuss developing an app using Shiny – a powerful R package. I briefly compare that process to building a...

What drives the length of whiskers in a box plot?

Let’s consider a small data set with 12 observations sorted from lowest to highest: (1,1,4),(4,5,8),(8,9,10),(10,12,13).  I grouped the observations into four equal groups so that we can easily spot the quartiles. (I purposefully made the numbers at the border...

Programming Empirical Analysis from Beginning to End

I created a project that illustrates the use programming code to perform empirical analysis from beginning to end: from database retrieval, through cleaning, manipulating and analyzing the data, to compiling the write-up and display of the results. It consists of...

Review of Stata’s dyndoc

As a huge fan of Stata I was super-excited about dynamic markdown documents newly available in the latest Stata 15 release. I played with the feature for the last few days, and can report that I was able to produce a decent looking markdown document using Stata....

Shiny App for Exploring Colleges

I created my first shiny app. It visualizes data from IPEDS and College Scorecard. It is now mostly for my soon college-bound children and their friends to play with, but I see that shiny is very flexible and powerful tool. Let me know what you think. You can fork the...
Skip to toolbar