Menu Close

Data Analysis and Issues

In this week’s post, project volunter Philip James looks at some of the challenges of working with large quantites of data across a team of many people – and back to some of the potential problems with how the data was created in the first place. Philip is one of the team based with the National Railway Museum and has valiantly been transcribing accidents and investigating some cases that caught his eye. Here he thinks more broadly, including around accidents that are currently being prepared for public release – our thanks as ever to Philip for this post and all his hard work. The current database can be found here.



In my previous post, I noted the enlarged pool of project data available to me, about 5,000 rows, and thought this might be a good time to undertake a superficial analysis of the data. On this occasion, I have not focused on a single accident of interest but have referenced several that support specific observations. It might also be a time to reflect on potential ideas and issues for the project and for researchers using the data. (All of my contributions can be found here.)

Data Issues

I wanted to check the data pool to see what shape it is in and identify any potential problems. The project has been evolving and we have all been growing with it. The guidance on how we should use each column in the spreadsheet has been developing and individuals will have made their own judgements on how best to record data for each accident.

Inevitably there are differences in style. Also, we are all prone to occasionally missing out something or putting it in the wrong column. I spotted several of my own mistakes in the process. Some errors, if not corrected would lead to a loss of data while others simply leave it in the wrong place or in an inconsistent format. I concluded that about 1.55% of the entries had issues and these are with Craig Shaw (NRM volunteer) and Mike Esbester (project co-lead) for investigation. They have been doing similar work of their own in this respect, and with the help of other volunteers, have been making corrections.

One of the changes introduced after commencement of the project was the decision to split forenames and surname into separate columns. Easy to do when making a new entry and not difficult when amending a small number of existing entries but when you have hundreds to retro-fit you have to use the text manipulation features of Excel. You then discover that variations in the number of forenames complicates the task and apparently trivial details such as dots or multiple spaces between names or initials can invalidate the assumptions made when using these features. The use of a comma in place of a semi colon could also prevent a function working as intended.

It is probable that in future, users of the expanding data pool will need to use text manipulation features on various columns so following the guidance given by Mike Esbester and Craig Shaw is most important. At some point, it may be necessary to filter data while keeping it in a spreadsheet or even move it into a database and these are the times when absolute consistency in the format will be of particular importance.

Database Design and Requirements

Anticipating a future database to hold and aid analysis of project data, it is worth mentioning that there are a number of drawing and other tools that can be used to support various specialist tasks such as database design.

This is not the place for a technical discussion about database design but it may be worth considering some of the candidate requirements in terms the lay person can understand, particularly where they can be linked to the experience of volunteers and needs of researchers, especially those who are likely to be using the data long after the data capture phase has been completed.

When designing a database, it is usual to start with a data model. Being a former Data Analyst, I have prepared one using the data entry spreadsheet and my own experience with accident reports as a guide. Suffice it to say that such models can become large and complex and you can end up trying to model the world. Clearly a project like this must focus on the key activities of those using the data, most likely to be researchers and not expend effort on features of little benefit. With this in mind, some observations follow.

Contemporary Counties

The January 2020 version of the Project Handbook asked volunteers to include the county in which the accident took place. The requirement was for the county at the time of the accident rather than the current county.

“If you can identify the county in which the accident took place, according to the contemporary boundaries, please include it here. In terms of finding the relevant county, a Google search is a helpful starting point.”

It wisely suggested an Internet search although I must add that there are other search engines besides Google and for some searches it may be best to use several as each search algorithm is different and may prioritise results other than those you are looking for.

Suffice it to say that there have been various changes to county boundaries and administrative areas over the years and probably some during the period of the reports so apart from counselling care, there is little I can add.

It is also unlikely that research into railway accidents will need to pay much attention to changes in county boundaries and local government authorities so significant data modelling of this aspect may not be needed.

Grades of staff

At present volunteers simply enter the staff grade or role as it is described in the report. Each company had its own grades and roles and sometimes staff were acting in a grade or role or deputising for an absent senior. There is probably no viable alternative to this approach although it means researchers will need to treat comparisons of apparently similarly graded staff with care. There might be a case for having two fields for grade, one for the official grade and another for the acting or temporary role if different.

Linked Accidents

Inspectors sometimes make reference to other accidents that had similar causes or features and volunteers will have noted the similarity between many accidents. I have been able to copy and paste from one accident to another on many occasions just making detail changes such was their similarity. For investigative purposes, it may be desirable to link such accidents by a class key,[1] effectively a label that can be digitally stuck on things to enable them to be grouped.  This would enable researchers to quickly find all accidents labelled as having some related characteristic. This could be progressed if a database solution is sought.

People involved

As well as the victim, reports often name others involved and occasionally individuals may be associated with more than one incident. The project has already come across such cases but at present it relies largely on somebody spotting a name turning up in different accidents. If they are in the same batch then there is more chance of this happening but otherwise a methodical search is needed.

A database solution would need to incorporate this facility in its structure. A potential problem is that some reports capture minimal information about people: sometimes even the full name of the victim is not stated and it may be challenging to find further details elsewhere, so it may not be expedient or efficient to go into much detail. A person might appear as a victim in one report and as another named person in another. Their grade or role may be different in each due to changes in their employment. They may also have changed location.

If a spreadsheet-based approach is retained, then a potential solution is to split reports where more than one ‘other persons’ are named and create a record for each. Sorting on name will cluster the records together and analysis of grade, role and location will help determine if we are dealing with the same person, two people with the same name or the same person with details recorded differently on each occasion.

Rules and Recommendations

Reports identify relevant rules but it is not known if they are the same for all companies or even what they say. Only summary information at most is given about them. Presumably handbooks with rules survive but only a small proportion may be relevant to this project. It may be best to link similar rule breaches through a class key as discussed earlier. Similarly, for recommendations.

I have not researched rule books but Mike Esbester points out that these followed a standard model, agreed across many or all companies, via the Railway Clearing House. Rule 24a, for example, was the same throughout the UK, and similarly the rest of the book. I have not seen anything to contradict this although some reports suggest individual companies may not have signed up to particular rules and were encouraged to do so, often by reference to good practice elsewhere. The North British Railway 1898 version is seen in this online exhibition.


During the period covered by these reports, companies will have merged and changed identity, particularly during the grouping after the First World War. A line of research may be to study particular companies and their predecessors or successors so a database solution would need to build in that association. The data for this exists[2] but not in the reports. Also, information is available online.[3]

Accident Types and Injuries

With a much larger pool of data to analyse, it is possible to have more confidence in the overall trends. About 20% of accidents involve a ‘fatality’, 38% involve ‘shunting’, 24% occur whilst ‘about the track’ and 12% while ‘working trains’. These figures were similar to my observations with a smaller data set.

Compared with the larger data pool, I have more frequently categorised casualties as ‘other’ (23% v 15%) or ‘multiple’ (14% v 11%) while less frequently identified ‘contusions or bruises’ (12% v 20%), ‘cuts or lacerations’ (5% v 7%) and ‘goods handling’ (3% v 6%).

For other types of injury or accident, my figures are not much different to the larger pool and the percentages are low in all cases. In part, the differences may be due to the accidents themselves. I have noticed marked variations between different batches I have worked on. Another explanation could be the way individuals categorise accidents for reporting purposes.

Many injuries could easily fall into either ‘other’ or ‘multiple’ but I have more in both categories. Another explanation is that an injury might record cuts, bruises and various other types but the reader may pick up on one while ignoring the others thus attributing an injury to a specific type rather than to multiple. Similarly, some accidents occurred while a person was handling goods but the real cause was a shunting move causing them to fall. It is conceivable that such an accident might be attributed to ‘goods handling’ when the real cause was ‘shunting’.

It is not for me to say that one approach is right and another wrong but it illustrates the problem volunteers have and potentially may affect the conclusions of researchers. Perhaps there is a case for having multiple classifications for type of accident to reflect the different contributing factors. It may even be necessary for researchers to review and recategorise accidents.

Assuming that the injuries I have seen are typical in mix to those others have seen, then the difference in categorisation may be more of a problem. It may be necessary for researchers to review and recategorise injuries once they have decided what criteria to use to do this. This may sound like making it up as we go along but then until you have seen a significant quantity of real data the problems may not be apparent. Certainly, the work of many volunteers has made such issues apparent so it is in no way a criticism of their efforts that further work may be needed.

When reviewing my draft for this piece, Mike Esbester indicated that involving the current rail industry in a review of accident categorisation will help increase the utility of the database to them. For me this is an unexpected but welcome potential utilisation of the resource.

Accidents Not Witnessed

Most victims survived their accident and were able to give an account of what had happened although some such accounts might be at odds with other evidence. Inspecting officers would sometimes say when they doubted the evidence of a witness or if evidence was contradictory or given in an unsatisfactory manner.

In some cases, those with fatal injuries remained conscious and survived long enough to give an account of some sort before succumbing to their injuries.[4] One fatal accident report features a victim who survived for several days and eventually died from what would appear to be a survivable injury.[5]  Another account involves a person surviving the accident only to die from an unrelated cause before having the chance to give evidence.[6] One unfortunate victim caught pneumonia and died in hospital, his injury, although serious, being survivable.[7]

In other cases, a fatal accident would have been witnessed by people competent to say what had happened and it might be possible to make defensible assumptions about the cause and the behaviour of the victim. In other cases, the accident was not witnessed and it was for the inspector and others to gather what information they could from the scene and the circumstances leading up to the incident.

Typically, unwitnessed fatal accidents involve staff being caught between vehicles during shunting or knocked down by trains while about the track. Often the time of death could be determined from trains passing or activities taking place but sometimes there was uncertainty over time and precise cause. If several trains passed a location, which one caused the fatality?[8] Train crews might remain silent for their own defence but as keeping a lookout from a steam engine was difficult, they may not have seen a person about the track. The inspector might have to take a view as to the most likely cause and who, if anyone, acted incorrectly.

Inevitably assumptions were then made about the actions and intentions of the deceased and these might reflect on the competence or even the integrity of the person concerned. This is worrying because the victim was not available to defend themselves and might have had a plausible if improbable explanation. Relatives and dependents might be distressed by the conclusions reached and it is not clear how these might have impacted compensation, if any, awarded.

The focus is inevitably on the logic lying behind the investigator’s conclusions. Perhaps the investigating officer was able to consider previous behaviour of the victim in similar situations or other witness accounts not recorded in the accident report. Inspectors sometimes refer to previous behaviour of the accident victim or even the behaviour of groups of people at a location.

There is also the question of police involvement. Most accidents were clearly workplace accidents even if the competence of some parties was questioned and I doubt that the police would have been involved in such instances. Others could conceivably be due to malicious conduct on somebodies’ part and in these, the police might have needed to be satisfied the unwitnessed death was not a premeditated criminal act. This done, it was for the investigation to determine what safety issues arose and what lessons could be learned. This probably involved a lower level of proof than might be required if a criminal trial were taking place.

Mike Esbester makes the observation “I don’t think the police were involved, as a rule. I interviewed a senior former British Rail employee who’d worked a lot in Health and Safety, and he put a change down to the later 1980s, I think, which saw police getting more involved in employee accidents, with a view to prosecutions; before this it seems rather like it was left well alone.”

Privatisation in the 1990s also led to more Police involvement, perhaps a result of the weakening of the central governance structure and the possibility of rival companies eager to avoid or deflect responsibility. I recall the railway press of the time noting the lack of relevant experience and training within Police ranks[9] to fulfil such a role and the potential of this for increasing risk and unnecessarily disrupting railway operations. They were particularly scathing of the role and capabilities of the Health and Safety Executive.

Accident Victim Profile

Accidents reported are almost invariably to men. I found one exception[10] in the batches I dealt with and other volunteers have found at least three others for the 1911-15 era (blogged about here, here and here). This still amounts to only four instances in nearly 5,000 reports. Curiously none of these are in the period of the First World War when women were employed in some formerly male roles. That said, our records only cover the early part of the war.

Women were employed in numbers during both world wars (see this post or this one, for example) but typically had to give up their jobs soon after. Apparently, the investigations continued during the First World War but it is not known what happened to the reports. After the Second World War, reports were not made publicly available so there is no immediate prospect of seeing a period when more reported accidents might have involved women.

Motivation for Accident Reporting

A separate issue concerns the wide variation in severity and consequence of accidents and injuries. In one accident, it was not possible to identify who had left a wagon foul of a junction because it was not reported until the following day.[11] This makes me wonder who reported accidents and what if any incentive or compulsion there was to do it? Clearly those involving a fatality or serious injury would be reported but sometimes the injuries might seem trivial and the causes purely accidental. Could there be a lot of accidents, mostly with trivial injuries that don’t appear in the record because they were not reported?

Also how much say did the reporting officers have over what accidents they reported on? I assumed anything officially reported had to have an investigation of some sort even if the outcome was a short report stating the obvious. Not so, Mike Esbester comments: “The companies certainly did investigate all cases, and it was from these reports and paperwork that information was sent to the Board of Trade, via a standard submission form; the Board of Trade inspectors then worked out which cases they would investigate.”

NRM volunteer Brian Grainger has previously contributed an imaginative insight into understanding the inspectors about whom little is known and is currently working on how accidents were selected for investigation so I refer readers to his work, here.

For now, it suffices to say that accidents were probably graded in severity according to time off work, a metric that sometimes appears in reports and the inspectors might agree among themselves how to allocate work, presumably also agreeing if and how to coordinate their work. We should remember that their time was finite and had to be used to best effect.

The common feature of our data set is that all accidents involve an injury, whether trivial in some cases or a fatality. Potentially there could be lots of near misses where nobody was hurt but a mistake or bad practice could have had a much worse outcome. In some reports, the investigating officers say the outcome could have been more serious.

Mike Esbester has received a revealing guest blog post from a former BR cleaner, fireman and driver detailing some of the accidents he’d seen or had himself confirming that a great many were not reported. [Mike’s note – this one to come: watch this space!]

Flaws in Railway Construction and Operation

Most of Britain’s railways were built during periods of railway mania, often by small companies who merged or were taken over by larger ones. Many had limited budgets and this might be reflected in poor construction or unsafe but cheap methods of operation.

The work was done in an age when risks to life and limb were tolerated more so than now and perhaps the experience to understand and technology to mitigate the risks had yet to be developed. Some of the flaws could be attributed to genuine mistakes but other might be seen as cost cutting or incompetence.

Contractual fraud was also a risk, the width-restricted tunnels on the Tonbridge to Hastings line being a case in point. Extra brickwork had to be added later to compensate for what the contractor failed to use during construction. Until the singling of tracks through the tunnels, nonstandard rolling stock had to be used. While some structures have stood the test of time, other have not and needed replacement.

The Tay bridge collapse is perhaps the best-known example of a structural failure although many structures have in whole or part been rebuilt to remove unsatisfactory materials or weaknesses before they failed in operation.

The take-over of small companies by larger competitors eventually resulted in most construction faults being rectified and better methods of operation introduced, the new owners having the resources and perhaps also the need to do this. The accident reports being seen in this project date from a time after many of the early mergers and takeovers so it might be interesting to see if they offer clues to earlier faults and failings. They may also offer clues to if and how operational methods and rules were catching up with operational reality.

Some accident reports make reference to lineside structures, some of them redundant, being too close to tracks. The design of wagons, brakes and couplings I mention in this post. Rules and procedures for shunting exist but were not necessarily standardised or fully fit for purpose. They may not have been explained to those who needed to know about them. There are also plenty of cases where rule books had not been issued.

The inspectors sometimes refer to unsatisfactory ways of working, a veiled criticism of companies and managers for not doing all they could to improve safety. Often it is something basic like ‘has a shunting warning been given, understood and acknowledged by those it may affect?’

A recurring theme is men boarding and alighting from moving trains. In many cases those involved in shunting would be expected to ride in an open wagon on top of coal or some other commodity. They might need to alight to operate hand brakes or points while travelling between distant sidings. Unsurprisingly, accidents happened. The inspectors often made recommendations that where men have to travel with a train, then a proper vehicle should be provided.[12] They were probably thinking of a brake van but didn’t want to rule out another solution.

Signalling is largely outside the scope of the project but within yards and sidings it does feature and there appear to have been deficiencies in places. There are many instances of staff crossing or walking along the line as though they are public roads. In some places railway staff habitually crossed the line even when a footbridge was present.[13] Does this reflect an incomplete safety learning process or pressures to save time? Failures to appoint dedicated look-out men also feature regularly.

Philip James



[1] The ICL Data Dictionary, a database tool for holding data about data had such a feature called ‘classification’ enabling a variety of different elements to be linked for ease of reference.

[2] Several editions of the Railway Magazine circa 2006 had family trees for the big four post grouping companies.


[4] 1909 Q1 App C, Cork, Bandon and South Coast Railway, 1/2/1909, Cork, Inspector J J Hornby; 1909 Q1 App C, North British Railway, 8/3/1909, Cockburnspath, Inspector C Campbell.

[5] 1913 Q1 App B, London and North Western Railway, 13/3/1913, St Helens, Inspector J J Hornby.

[6] 1923 Q4 App B, London and North Eastern Railway, 27/12/1923, Springhead, Hull, Inspector J L M Moore.

[7] 1910 Q2 App C, Great Northern Railway (Ireland), 4/4/1910, Belfast, Inspector C Campbell.

[8] 1907 Q4 App C, London and North Western Railway, 2/10/1907, Gildersome Tunnel, Inspector A Ford.

[9] The British Transport Police may have been an honourable exception.

[10] 1923 Q4 App B, London and North Eastern Railway, 27/11/1923, Ebberston Lane Level Crossing, Inspector J L M Moore.

[11] 1907 Q4 App C, Lancashire and Yorkshire Railway, 31/12/1907, North Docks, Liverpool, Inspector J J Hornby.

[12] 1910 Q2 App C, Barry Railway, 1/6/1959, Barry, Inspector J J Hornby.

[13] 1910 Q2 App C, Caledonian Railway, 6/5/1910, Rutherglen Station, Inspector C Campbell.

1 Comment

  1. TW

    “Most of Britain’s railways were built during periods of railway mania, often by small companies who merged or were taken over by larger ones. Many had limited budgets and this might be reflected in poor construction or unsafe but cheap methods of operation.”

    An interesting comparison is the US – compared to them, British railways were extravagant, and this is definitely visible in turn of the century staff (and passenger) accident rates.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.