Feature Engineering: What Actually Drives Model Performance

There is a persistent imbalance in how data science education frames the machine learning pipeline: considerable attention goes to model selection, hyperparameter tuning, and evaluation metrics, while the work that arguably matters most (transforming raw data into informative representations) receives comparatively little formal treatment. Feature engineering sits at the heart of predictive modeling, yet it remains as much craft as science.

At its core, feature engineering is the process of constructing new input variables from existing data in order to improve the performance of a learning algorithm. As Heaton (2016) explains, these engineered features "are essentially calculated fields based on the values of the other features" (Heaton, 2016, p. 1), taking forms such as ratios, differences, logarithms, polynomial terms, or domain-specific composite measures. The familiar body mass index (BMI) is a classic real-world example: it is not directly measured but derived from height and weight, and carries far more predictive signal for health outcomes than either variable alone.

The practical stakes of this work are significant. As Chicco (2022) underscores, "data cleaning and feature engineering are key pillars of any scientific study involving data analysis" (para. 1), and citing the foundational observation from Domingos, that "easily the most important factor is the features used" (Introduction). This aligns with a well-known industry heuristic that Felice et al. (2023) also cite: "students in data science are usually taught that 80% of the workload on an ML project is about preparing the data, while the remaining 20% are concerned with the actual choice of ML model" (p. 1). The implication is that practitioners who fixate on model architecture while underinvesting in feature construction may be optimizing the wrong part of the pipeline.

One dimension that often goes underappreciated is the interaction between feature types and model architectures. Heaton's (2016) empirical work demonstrates that not all models benefit equally from the same engineered features: gradient boosting machines, for instance, can learn certain mathematical relationships (like polynomial transformations) internally, making manual engineering of those features redundant. As he notes, "if a model can synthesize a planned feature, it is not necessary to provide that feature" (Heaton, 2016, p. 1). This has real implications for how practitioners should approach feature design: the decision of what to engineer should be informed by which model will consume it.

Felice et al. (2023) push this thinking further by introducing a formal framework, Statistically Enhanced Learning (SEL), that categorizes feature engineering into three levels of increasing abstraction:

SEL Level	Type	Example
SEL 1	Proxy variables	Household consumption as a stand-in for standard of living
SEL 2	Descriptive statistics	Average player age as a team maturity proxy
SEL 3	Advanced modeling features	Exponentially weighted moving averages of wind speed for energy forecasting

What unites these levels is the goal of recovering latent signals: "SEL rather is a means to recover information from signals that cannot be detected" (Felice et al., 2023, p. 4). This framing reorients feature engineering from a data wrangling chore to a principled inferential task. In SEL, the practitioner is making explicit modeling decisions about what information is missing from the raw feature space and how to reconstruct it.

Given how central this work is to model quality, it is worth asking: how should practitioners develop intuition for which features to engineer? Is the answer primarily domain knowledge, automated search (AutoML/AutoFE), or iterative experimentation? I'm curious if you have found that domain expertise or exploratory data analysis has been more generative for you in practice, and if you have experimented with automated feature engineering tools.

References

Chicco, D. (2022). Eleven quick tips for data cleaning and feature engineering. PLOS Computational Biology, 18(12), e1010718. https://doi.org/10.1371/journal.pcbi.1010718

Felice, F., Ley, C., Bordas, S., & Groll, A. (2023). Statistically Enhanced Learning: A feature engineering framework to boost (any) learning algorithms. arXiv. https://arxiv.org/pdf/2306.17006

Heaton, J. (2016). An empirical analysis of feature engineering for predictive modeling. arXiv. https://arxiv.org/pdf/1701.07852