Insights

Amberjack’s Immense Data Sets: Analysing and Utilising Data Correctly

June 27, 2022

Share this Article

In the best film adaptation of Roald Dahl’s Charlie and the Chocolate Factory (the 1971 version – this is not open for discussion!), when Willy Wonka is showing his guests into the first room in the chocolate factory he sings hypnotically:

“If you want to view paradise, take a look around and view it”.

Growing up I always wondered how Charlie Bucket felt in that scene. I now know it was probably similar to how I felt when our Head of Technology showed me the assessment data available in our volume assessment platform as part of my induction to Amberjack.

Amberjack’s Data

As Amberjack support clients across the full end-to-end talent assessment lifecycle, with the capability to host all stages within our technology, we are often in a position where we can look at a complete data set to really understand how our processes are performing. I’ve already seen the benefits of this as I’ve been involved in rich conversations using this data in end of campaign reviews and client catch-ups. I’ve also been able to share some of the cracking findings coming out of our annual insights work based on the data of the hundreds of thousands of candidates we assess.

But with great power there must also come great responsibility. Since seeing the data, and how it’s being used and requested by our clients, I’ve been reflecting on the appropriate use of data. This is particularly important to me because – as an Occupational Psychologist – I work to the British Psychological Society (BPS) Code of Ethics and Conduct which bounds us to presenting data accurately, and safely, to ensure decisions are being made with the highest levels of confidence.

Since being given access to these significant data sets, I’ve been applying three key questions, and feeding these into how the data is being shared.

1. Have we started off with a hypothesis and used the data to test it?

With such vast data sets, there is an increased chance of finding spurious relationships by chance. These sorts of chance relationships happen more than you would think. In fact, there are entire books dedicated to them. It is much more likely you are finding something meaningful if the data is showing what you predicted would happen.

An example of this came up when we shared our latest Insights data which explored outputs from over 300,000 early career candidates and 95 employers.

Our data set showed encouraging trends that a significantly increased proportion of applicants with a social mobility “flag” on their candidate record (accessed free school meals, identified as having refugee status, first generation to attend university, etc.) were being hired for graduate and apprentice roles. In fact, they were outperforming those without a flag.

This could still just be a chance finding. The much more likely reason, however, and the reason we were so interested in this statistic, is because our assessment processes are designed based on the vision that everybody should be assessed based on their future potential, rather than past experience or privilege. As the 300,000+ candidates had been through our processes, and a high proportion of them would have completed our online screening assessments this is exactly what we were expecting to see. It is therefore far more likely this is more than a chance finding.

If you are interested to find out more about how we assess for potential, rather than past experience or privilege, you can request a copy of our paper, ‘Amberjack’s Model for Identifying Potential’, here.

2. Is sharing this data helpful?

Hear me out. I know there are a lot of people who will bestow the virtues of complete transparency and open-source data, and I do absolutely see the merits in the right circumstances. However, I’ve also seen situations where over sharing and overwhelming people with data has done more harm than good.

One of the areas of focus in the BPS Code is “Competence” where they note “our members offer a range of services that usually require specialist knowledge training and skill” with competence referring to an “ability to provide those services to a requisite professional standard”. I’m sure we’ve all been in a meeting where data analysis is being discussed, certain knowledge or understanding assumed, and someone has made an inaccurate inference based on the data. Hopefully they’ve been corrected to avoid any decisions being made based on that inference.

But what about the data shared in the Appendix in the slide deck which you didn’t have time to discuss? What about the senior manager who was forwarded the deck and made the same inference without being in the discussion where the misunderstanding was cleared up. Are we ever guilty of being transparent to the point of inadvertently sharing information in a way that means it is impossible for those we are sharing it with to receive it competently?

3. Is the data being presented in a way that makes incorrect understanding impossible?

The reason why the incorrect inferences referenced in Point 2 happen is often because of how the data is presented. Here are some of the most common mistakes I see:

1. Zooming too far into a scale

In a recent end of campaign review we were sharing the Net Promoter Score (NPS) of the different assessment stages we had been involved in. NPS is a standardised measure of candidate experience, scored from -100 to +100 meaning you can easily benchmark what candidates are saying about your process. The NPS on each stage of the process ranged from +50 to +75. These are very positive statistics, particularly given the large number of respondents to the questionnaire who had not been successful. However, the scale we were sharing started at +40 and went to +80 to make it easier to read on the screen. Suddenly, the score of +50 for the video interview looked like a problem and attendees immediately started discussing how to fix it. We were able to correct the course but what if we weren’t there?

2. Not providing the sample size

A commonly used method of reviewing how your assessment process is performing in terms of diversity, equality and inclusion is adverse impact analysis – are a disproportionate number of your minority group candidates being sifted out at a certain stage?

This is something we look at in all our campaigns. I was once in a review where everyone was very self-congratulatory as 50% of their candidates who recorded their ethnicity as “Black” passed the assessment centre as opposed to 40% of candidates from all other groups. This is indeed good news looking at the data alone, but the fact was only 4 candidates who identified as Black attended the assessment centre. This was not initially considered. It would have only taken one of the two successful candidates to have scored a couple of points lower on one exercise, missed the benchmark, and the percentage would have fallen to 25% or perhaps even 0% and suddenly this would have become a big problem to fix. To avoid this issue, always include the n (number of candidates).

3. Not providing any context or commentary

To avoid issues of people misinterpreting, make sure your commentary is in the data where needed. What is the year-on-year trend? Are there any industry benchmarks this can be compared to? Are there any reasons for this data trend and are these reasons evidence-based or your speculation? Without this context and commentary, you are greatly increasing the chance of the wrong inferences being made.

Utilising Data Properly

In the song I referenced at the start of this piece, Willy Wonka also sings,

“Come with me, and you’ll be, in a world of pure imagination.”

I suppose what I’m saying is, when sharing data, the aim should be to absolutely avoid people entering into a world of pure imagination. I’m very excited about the opportunities this data makes available to the team here at Amberjack, but I’m also aware of the importance of making sure it’s used properly.

Share this Article