What to look for in a good assessment
The Wild West of psychometrics
In the 2000s and early 2010s, behavioral science had its golden era. Big theories, bold claims and bestselling books made it one of the most popular branches of psychology – adored for its seemingly magic insights by HR departments and governments alike.
But as Helen Coffey points out in a recent Independent article - behind the headlines a quieter problem was brewing. As others made their names through publication of headline-grabbing studies, scientific rigour slipped in the rush to cash in. Psychologists were often unable to replicate the same findings, and it became difficult to separate the real gems from the ‘junk science’.
The fallout was a wave of high-profile retractions, broken confidence in behavioral research, and confusion over what we should believe. And that legacy persists today.
At Thomas, we believe science must earn that trust back. And that starts with calling out bad practice and showing what good science really looks like.
If you are currently on your psychometric journey, here are some of the ‘junk science’ practices you should look out for when choosing your assessment partner.
1. p-Hacking – the problem you’ve never heard of
What it is: Manipulating your data - intentionally or not - until it hits a statistically significant result (typically p < 0.05). This can involve removing bad data, removing outlying data or being highly selective about which data you include. These techniques are often important in research to ensure findings are accurate. Over-engineering your data might get you the number you want and enable publication of research studies, but it decreases reliability and can mean your findings aren’t genuine.
Example: In 2011, a now-infamous study claimed to show evidence of precognition—that people could predict the future. It was published in a respected journal and passed peer review, but many in the research community were stunned. Why? Because it relied on subtle data manipulation and selective reporting - classic p-hacking tactics. The findings couldn’t be replicated and the backlash sparked a wave of reform across psychology.
How we avoid p-hacking at Thomas:
- We pre-register our hypotheses and analysis plans - so we can’t cherry-pick after the fact.
- Our psychologists are encouraged to share inconclusive results and test again, not manipulate data to “make it work.”
- We prioritise long-term accuracy over short-term headlines. If it doesn’t work for customers, it doesn’t work for us - so we repeat key studies to ensure we find the same things with different participants.
2. Salami Slicing: One study, many stories
What it is: Why have one piece of research when you can have two? Salami slicers butcher the validity of their research by breaking one dataset into multiple papers to inflate publication count and perceived impact. Using the same sample of participants for multiple studies without disclosing it can give the impression that similar findings have been replicated, where in fact the results only come from one study.
Example: Brian Wansink, a prominent food scientist at Cornell University, undermined his own scientific research by taking a ‘buy one get one free’ approach. He was found to have sliced larger studies into multiple papers, and encouraged his students to do the same. While it doesn’t sound unreasonable to a non-scientist, presenting fragments of one study as distinct findings removes the context and can bias findings, leading to dubious results.
Avoiding salami-slicing with Thomas:
- We don’t inflate impact through duplication.
- We publish comprehensive, standalone studies that stand up to scrutiny.
- We take our data collection standards seriously, and never replicate important research across data samples.
3. Small samples, big claims
What it is: Drawing big conclusions from tiny, often unrepresentative datasets - often using psychology undergraduates as test subjects due to their easy accessibility for academics. Psychology students in particular are likely to have a good idea of what the research is about, and have been shown to consciously or unconsciously alter their behaviour to confirm what they think the research is trying to show. This massively undermines research findings.
Example: Amy Cuddy’s famous 2010 study on ‘power posing’ – used to terrible effect by the Conservative party – claimed that dominant body language could boost hormones and job interview success. Nice idea – but it was based on just 42 participants—and failed to replicate under more rigorous conditions.
At Thomas:
- We never use student samples.
- Our research is done with working adults, drawn from a dedicated occupational panel.
- And as a global business with assessments in use every day, our sample sizes are correspondingly big.
- All of which means you can be confident our findings are relevant to the workplace, and to the real world - because that’s where our assessments are used.
Why It matters
Scientific rigour isn’t just a nice-to-have - it’s what gives assessments real utility. At Thomas, we build for the long term. That means:
- Transparent research practices
- Real-world testing
- Continuous validation
- And yes, sometimes publishing results that challenge what we expected or previously believed. It may not be quite so exciting as the psychometrics of the noughties, but you can be sure it’s true.
As Stephen Cuppello, our Director of Psychology, puts it:
“We ground all of our findings in academic literature, to ensure that we are not reporting on spurious results. And we also encourage our research psychologists to fail! It's important to evaluate the success of research based on what we learned not what we found. We don't always find support for our hypotheses, but science isn’t about always being right - it’s about learning something meaningful.”
So if you’re using psychometrics to hire, develop, or lead, ask the hard questions. Who was this tested on? What was the sample size? Has it been replicated?
And if you’re not satisfied with the answers, come and talk to Thomas.