Playing the Numbers Game

Maths has never been my strong point. Since leaving school, and beyond the basics required to get by in life, I have done everything possible to avoid the subject and its baffling world of logarithms, algebra and the dreaded trigonometry (the mere mention of the terms sine, cosine and tangent still brings me out in a cold sweat). So it may seem strange then that the methodology I adopted for my history PhD thesis required me to play the numbers game once again.

One of the key objectives of my research was to evaluate perceptions of, and reactions to, crime and criminality in Hull and East Yorkshire during the interwar period. My primary resource was the local newspapers produced throughout this period, specifically the Hull Daily Mail. My plan was to carry out a detailed content analysis of articles, features, editorials, comment pieces and letters to the editor in these newspapers. But the period under analysis was 21 years long and featured approximately 6,500 editions. As is often the case with media content analyses, the solution was to select a sample from the whole – but the sample, of course, needs to be as ‘representative’ as possible. And this was where the fun (and sleepless nights) started.

As anyone who has tackled sampling in research knows, there is no shortage of techniques out there. You can use simple random sampling, stratified sampling, convenience sampling, cluster sampling etc. etc. But it is your source material, as well as your research objectives, that will determine which is the most appropriate technique to use. After extensive research into all the options, I came across a methodology called ‘constructed week sampling’ that had been tested and used successfully in the content analysis of newspapers and a range of weekly and monthly publications. For this technique you need to identify all Mondays in the population and select one randomly, followed by all Tuesdays, Wednesdays, Thursdays etc. until a full week has been ‘constructed’, ensuring that every day of the week (Monday to Saturday in the case of the Hull Daily Mail as no Sunday edition was produced) has been selected equally and no bias has been attributed to a specific day.

Building on earlier investigations into similar sampling techniques, such as those by Davis and Turner (1951), Stempel (1952), and Jones and Carter (1959), Riffe et al. carried out research into the efficacy of this type of stratified sampling – both in terms of optimum sample size for reliability, and efficiencies for time and resource limitations – and tested against simple random sampling methodologies.[1] They discovered that constructed week sampling was more efficient than either random sampling or consecutive day sampling (which starts at a given point and includes each day of the week but may fail to sample across the whole period under investigation).[2] Riffe et al. concluded that using constructed week sampling in newspaper content analysis can provide reliable estimates of the population mean – i.e. the sample would be reflective of the whole, and significantly more reflective than simple random sampling.[3] Furthermore, their study found that one constructed week was as efficient as four in an examination of six months of editions of a daily newspaper (exceeding basic probability theory expectations), the significance of which is immeasurable to the researcher with limited time and resources.[4]

Of course, I had to test this for myself. For the purposes of consistency, January was selected for the test month, and the years, which were at regular intervals throughout the whole period, were chosen to reflect and capture changes in newspaper size across the years. The number of incidences of crime were counted and the mean calculated for each of the four months and for the population as a whole. In total, 106 editions of the Hull Daily Mail were analysed for the sample. To test both the effectiveness and efficiency of the constructed week sampling technique, 20 samples of a 6-day (one week), 12-day (two weeks), and 18-day (three weeks) were selected. The days were chosen by assigning each a number and then using an online random number generator to select each corresponding day. The mean for each sample was then calculated. So far, so good.

Then things got a little tricky, as I was confronted with formulae, calculations and statistical terminology that sent me into a spin (and brought back all those terrible memories of confusing maths lessons). The Riffe et al. study had used a 95% confidence interval to calculate the efficiency of constructed week sampling. This meant that the results were within two standard errors of the mean population (the internet helped with the explanation). So I constructed my test to try to do the same. Where the population mean was 14.1, the population standard deviation stood at 4.88, and calculations were made for the standard error ranges (again, the internet was a lifesaver here, and so too was an academic in the Mathematics department). It took hours of reading and re-reading to grasp the essential elements of these calculations. The findings, however, did make the hard work worthwhile.

The test sample results echoed those by Riffe et al., with all three constructed week samples (6-day, 12-day, and 18-day) meeting the 95% confidence interval, making them a suitable sampling technique for a content analysis of the Hull Daily Mail. Moreover, the three constructed week samples also fell within one standard error of the mean, well above the 68% predicated by the Central Limits Theorem for random samples.[5] Again, following the examples of Riffe et al., it is safe to conclude that as little as a 6-day constructed week would provide a reliable representative sample for four months of the Hull Daily Mail. Extrapolating these findings means that three constructed weeks would cover each of the 21 years under investigation. In the end, 378 editions of the Hull Daily Mail were analysed for crime-related content, which proved to be a challenging but ultimately achievable figure. A larger sample size for the quantitative (and subsequent qualitative) analysis would have required additional time and resources and, as the test findings reveal, may not have yielded a more representative or reliable sample.

It was a hard slog to get to the final results, and it will no doubt take years for the psychological scars to heal, but the exercise did prove useful. In my case, stepping out of my comfort zone and confronting my fears of mathematics should yield better results for my overall research. It may also prove valuable to other researchers who are about to tackle sampling techniques and have a similar phobia of numbers.

Just don’t ask me to do it again…

Ashley Borrett

[1] D. Riffe et al., ‘The effectiveness of random, consecutive day and constructed week sampling in newspaper content analysis’. Journalism Quarterly, 70, 1 (Spring 1993), 133–139: F. J. Davis & L. W. Turner, ‘Sample efficiency in quantitative newspaper content analysis’. Public Opinion Quarterly, 15, 4 (Winter 1951), 762–763; G. H. Stempel, ‘Sample size for classifying subject matter in dailies’. Journalism Quarterly, 29, 3 (Summer 1952), 333­–334; R. L. Jones & R. E. Carter, ‘Some procedures for estimating “news hole” in content analysis’, The Public Opinion Quarterly, 23, 3 (Autumn 1959), 399–403.

[2] Riffe et al., ‘The effectiveness of random, consecutive day and constructed week sampling’, 139.

[3] ibid.

[4] ibid.

[5] ibid, 138.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s