What started as a casual dinner conversation between two very different researchers in 2016 – one a data scientist and engineer, the other an expert in economic models – has since turned into a journal article quantifying the effects of the “beauty premium,” the notion that those who are more physically attractive tend to have a greater income.
The research team’s engineer is Stephen Baek, an associate professor of data science at the University of Virginia, while the econometrician is Suyong Song, an associate professor of economics and finance at the University of Iowa. Five years ago, the two found that their research interests overlapped more than they initially realized, causing an unexpected idea to spark.
Baek began his collaboration with Song as a researcher at Iowa before joining the UVA School of Data Science faculty in August 2021. In his previous work, Baek analyzed and modeled human body shapes for engineering applications such as product design, virtual fashion, garment design and ergonomics. Song, on the other hand, brought expertise studying economic models that suffer from measurement and reporting error.
Compared to previous publications on the beauty premium, Baek and Song’s research methods are novel, due to the nature of their data set, sourced from the 2002 Civilian American and European Surface Anthropometry Resource project, or CAESAR. In addition to self-reported height and weight measures – which have been used in previous studies – the project also gathered 3-D body-scanned data, extensive information on demographic and family income, as well as tape-measure and caliper body measurements from nearly 2,400 civilians. With this data, the two researchers could provide a richer story of physical appearance and socio-economic variables.

UVA’s Stephen Baek, above, and Suyong Song of the University of Iowa have used novel data and techniques to provide a richer link between physical appearance and family income. (Contributed photo)
“The issue with previous works was that people were oversimplifying the parameters to describe body shape,” Baek said. “The traditional processes for determining physical appearance, such as stature, weight and BMI, are imperfect processes, and therefore not capable of capturing all the dimensions of human body shape.”
Using a novel machine-learning algorithm called a “graphical autoencoder” or “deep machine learning,” the 3-D scans were inputted to encode geometric features of human body shape. After the machine was introduced to thousands of individual scans, the algorithm reduced the data’s dimensionality – from a few hundreds of thousands of points down to a few important features –characterizing each human body shape using numerical values. Baek and Song then visualized the features to determine which body parts the algorithm was referencing and estimated their relations with socio-economic variables. Using this scientific approach, the causal effects of physical appearance could be quantified.