The midterm

The idea is for you to explore using logistics regression, support vector machines, decision trees and random forest

I want you to explore at least 4 ways of running each one. As examples:

Using logistics regression there are quite a few parameters:

  • l2 penalties (or none). When using l2 penalties, what is the correct C coefficient?
  • type of algorithm/solver to use
  • type of way of handling multi-class (number of classes > 2)
  • preprocessing – do you need to center/scale the data before hand?

    • My guess is no – all the features are in the same scale, but it should be verified

And with Support Vector Machines (read:, explore:

  • Different multi-class parameters
  • LinearSVC vs SVC
  • For SVC, different kernels
  • Different margin

With Classification trees (, sklearn has two types:

  • Decision Trees (
  • Extra Tree Classification ( I never used this one.

And there are random forest (

Read the documentation, select the 4+ ways you want to explore each of these 4 classifiers, AND WRITE UP NOTES IN MARKDOWN CELLS Your interpretation and conclusions are really important