Feature Engineering lays the foundation for successful insights and modeling.

Feature Engineering normally describes the creation of new features (variables) from existing features of a dataset in order to derive new insights or to improve model performance. However, in the scope of this article, I am broadening the definition to include the major preprocessing steps that are necessary prior to modeling. These include, but are not limited to, the following:

  1. Missing Data Imputation
  2. High Cardinality and Rare Labels
  3. Outliers
  4. Variable Transformation (Normalizing, Scaling and Encoding)

Each of the above areas of feature engineering can be handled in many different ways…


Statistics is a powerful tool for understanding and interpreting large amounts of data. It can help drive sound decision making but can also be misused. One of the most important realizations about statistics is that it cannot be used to derive an exact answer, unlike other areas of math. Even though statistics has singular measures describing the center of the data, such as the mean and median, they do not thoroughly describe the data. A single answer when trying to quantify a large dataset can be extremely misleading. That is why statistics, instead, focuses on providing a holistic answer using…

A couple of years ago I was given the opportunity to lead a large data analytics project related to using machine learning and natural language processing to gain business insights into our equipment maintenance strategies of over 2 million world-wide equipment.

The ask was simple: figure out whether or not we were performing the right maintenance on the right equipment at the right time. And if we weren’t, use the 10+ years of maintenance, spend and reliability data to improve the maintenance strategies. Sounds pretty straightforward, right?

The prize was significant: potentially tens of millions of dollars of saved money…

Brian Bentson

