Data Science

Guidelines\Rubrics to deliver Course Paper

Three deliverable’s

Deliverable 2, 15 points


· The deliverable should contain the following components:

(4) Pre-processing activities, Features Selection / Engineering (10 %)

· What were the most important features?

· We suggest you provide:

· a variable importance plot (an example  about halfway down the page), showing the 10-20 most important features and

· partial plots for the 3-5 most important features

· If this is not possible, you should provide a list of the most important features.

· How did you select features?

· Did you make any important feature transformations?

· Did you find any interesting interactions between features?

· Did you use external data? (if permitted)

(5) Training Method(s) 10 %

· What training methods did you use?

· Did you ensemble the models?

· If you did ensemble, how did you weight the different models?

A6. Interesting findings

· What was the most important trick you used?

· What do you think set you apart from others in the competition?

· Did you find any interesting relationships in the data that don’t fit in the sections above?

Many customers are happy to trade off model performance for simplicity. With this in mind:

· Is there a subset of features that would get 90-95% of your final performance? Which features? *

· What model that was most important? *

· What would the simplified model score?


· * Try and restrict your simple model to fewer than 10 features and one training method.

(6) Accuracy metrics reporting, charts, Model Execution Time (10 %)

Many customers care about how long the winning models take to train and generate predictions:

· How long does it take to train your model?

· How long does it take to generate predictions using your model?

· How long does it take to train the simplified model (referenced in section A6)?

· How long does it take to generate predictions from the simplified model?

