Aesthetic of Rigor

On the appearance of credibility

Nov 25, 2025

Our culture has something of a fetish for numbers.

Do you have a statistic for that? Where does “The Science™” show that what you are saying is true? How much quantitative analysis went into your claim? Show me the graphs and show me the numbers, or else.

This is a curious state of affairs, given that the average person is not all that good with numbers. I know that I’m not. Despite having been trained as an Earth scientist, I humbly admit I am not the most quantitative of people. I can pass muster with my math, but I consider myself more a rigorous qualitative thinker than a quantitative one.

However, I know enough about numbers to understand that they can be dangerous. Where did this number that I am supposed to trust come from? Why was it calculated? Why was it quantified in that particular way? How were the boundaries of the analysis defined? Why should I trust it?

I think these questions matter because what I call “the aesthetic of rigor,” the mathematical sheen surrounding a number offered as proof of a truth claim, often conceals serious structural problems in the number, analysis, or projection itself.

Here is a tangible example from my work in safety that illustrates this phenomenon. The Total Recordable Incident Rate (TRIR) purports to represent the relative occurrence of workplace injuries within a given company for the purposes of cross-company and cross-industry comparisons. It is quantified as follows:

\(TRIR = \frac{\text{Number of recordable incidents} \cdot 200{,}000}{\text{Number of work hours}}\)

The number of recordable incidents refers to injuries and illnesses deemed recordable by the Occupational Safety and Health Administration (OSHA) over a given period of time.
200,000 is a scaling factor that normalizes the rate to the number of hours that a company of 100 full-time workers should expect to work over one year.
The number of work hours is the total number of hours worked during the time window in which a given set of recordable incidents occurred.

This metric seems reasonable enough. It makes sense to rely on the categorization scheme of the highest workplace safety regulator in the country to determine what is recordable or not. It also makes sense that you would want to figure out how to normalize this number so that the incident rate of small and large companies can be easily compared.

That being said, we need to consider the size of these variables and the constant. The number of work hours scales with the size of the employer. The larger the employer (i.e., more employees), the more work hours they log over a given period of time, and vice versa for smaller employers (i.e., fewer employees).

Consider the following: Small Company has 10 employees (20,000 worker hours) and Big Company has 100 employees (200,000 worker hours). Over the period of one year, both companies suffer one OSHA-recordable incident.

Here is the TRIR for Small Company:

\(\text{Small Company TRIR} = \frac{\text{1} \cdot 200{,}000}{\text{20,000}} = 10\)

Here is the TRIR for Big Company:

\(\text{Big Company TRIR} = \frac{\text{1} \cdot 200{,}000}{\text{200,000}} = 1\)

So, with the same number of recordables (1), Small Company’s TRIR looks much worse than Big Company’s solely because its one case is spread over far fewer hours than Big Company’s single case. This highlights how noisy TRIR is as a metric for small employers versus larger employers.

Some excellent research from Dr. Matthew Hallowell in collaboration with the Construction Safety Research Alliance indicates that for many companies, especially smaller ones, TRIR has almost zero predictive validity. It is statistically random noise masquerading as a performance metric.

Yet, despite this fragility, TRIR is treated with reverence. It is plastered on boardroom slides, used to deny or grant vendor contracts, and used to determine executive bonuses. It looks rigorous. It has a formula, a regulator’s stamp of approval, and a decimal point. It possesses the aesthetic of rigor, but lacks the fullness of truth.

This is the trap of tempting quantification. When we encounter numbers like this, we should ask the inconvenient questions. Who built this ruler, and for what reason? Is it actually measuring what we think it’s measuring? Or have we just found a sophisticated way to stop asking questions at all?

Discussion about this post

Ready for more?