Going down the Rabbit Hole

The entire point to transitioning my blog from a personal account of my life to my data science portfolio is so that I can practice my data science skills and document the journey, I was taught that there are four skills that make up the pillars for every data scientist.

  1. Statistics – Every data scientist needs to have a firm grasp of statistical methods to be able to perform data analysis effectively.
  2. Programming – Programming skills are integral to the practice of data science. Especially when working with extremely large datasets as we frequently do.
  3. Visualization – A data scientist has to also tell the story locked within the data. Otherwise what is the point? Having an artistic eye is key.
  4. Subject Matter Expertise – I tend to take this pillar for granted. I am one of the leading SMEs in my industry, so when I am working on a problem at work, I take it for granted that I understand the data I’m looking at, or understand the work involved with the tools I build because I can put myself in the seat of someone new and ask myself, “What would have been something good to know when I first started that would have made my life and professional journey easier?”

Having the ability to instinctively answer that last question is one reason this portfolio is so invaluable. Understanding the different types of data and the questions that arise in other fields expands a data professional’s confidence and skills. That brings me to the Rabbit Hole.

Rabbit Hole

I had this problem in real life. I had to visualize some data. I’m building a tool that takes input from an analyst, updates a database and then populates a dashboard for management. Sounds like a pretty typical data visualization pipeline. The problem is that most managers in my experience don’t understand how powerful and interactive dashboard can be. The ability to create this functionality is somewhat limited. So in the absence of real tools, PowerPoint reigns supreme. Not that I’m a PowerPoint snob, I was once dubbed the PowerPoint Princess. However, these days we can be so much more creative.

I wanted to create a dashboard that a user could interact with by ‘filtering’ certain parameters. I put filtering in quotes because it is a tricky thing giving over control. I want my users to be able to play with the data so they can understand it, but I want it to be a guided experience, but one they don’t realize has been curated for them. I was beating my head up against the problem of how to limit the user’s options without them knowing it. So this is what I did…

Data Set

First, I wanted to create a dataset that simulated what I was working with in real life. The data needed to have categories, subcategories and data points. Each row of the dataset needed to be unique – one row for each data point. The information captured was a score from 0-5 for each individual feature. From there I wanted to work with two tables, one with the raw data as entered and one that would count the number of times a feature was scored 0 to 5. I’ll call this the bin table. Now this sounds like a pretty straight forward problem.