Why isn’t out-of-time validation more ubiquitous?

Train, validate and test partitions for out-of-time performance take planning and thought (This piece is also at TDS.) The purpose of supervised machine learning is to classify unlabeled data. We want algorithms to tell us whether a borrower will default, a customer...

Building an interactive web “mapp” with Shiny

The purpose of this post is to discuss the key elements in developing an interactive web application that displays data with geographic component. I discuss developing an app using Shiny – a powerful R package. I briefly compare that process to building a...

What drives the length of whiskers in a box plot?

Let’s consider a small data set with 12 observations sorted from lowest to highest: (1,1,4),(4,5,8),(8,9,10),(10,12,13).  I grouped the observations into four equal groups so that we can easily spot the quartiles. (I purposefully made the numbers at the border...