3 Structure of the data

The broad structure of each census is:

<census>
    year
    <household>+
        rooms-occupied
        in-occupation
        <abode>
        <member>+

where + means 'one or more'. In other words, a census consists of one or more households, each of which is made up of an abode (=house name) and one or more members. The year attribute is the year the census was taken. rooms-occupied and in-occupation are explained below.

Each <member> can contain the following fields:

<member>
    <name>
        <forename>+
        <surname>
    <relation>
    <condition>
    <age>
    <sex>
    <defects>
    <trade>
        standard
        work-status
        standardised-work-status
        at-home
        annotation
    <birthplace>
        standard
        county
        standard-county

<name>, <relation>, <age>, <sex> and <birthplace> are all obligatorily present (except where illegible in the original). The other fields may be absent.

Generally the censuses get more detailed over time. Earlier censuses lack some of the fields listed above. The 1841 census, for instance does not have relation or condition, and only includes minimal information on birthplace (see §3.13).

Unoccupied houses are recorded in the original censuses and also in the electronic version. These have <abode> but no <member>s. (See also §3.4.)

3.1 General approach to the data

The original presentation of the censuses, with all its peculiarities of spelling and phraseology, is interesting in its own right. Also interesting, though, is the possibility of processing the data automatically (by program), for instance by sorting it in ways that differ from the original order, or by searching for particular values (a name, say). In order to be able to process two items (such as two names) that are equivalent, they need to have the same form. If they are different in the original census, they need to be standardised. These two requirements - fidelity to the original and standardisation - are in conflict.

One way of standardising the data is to edit it directly. But this loses the original form. Another way is to add standard forms as annotation, and then to process only the standard forms. In the present collection, names of houses and of people have been kept in their original form, except for spelling out some abbreviations. Other fields have been standardised (on the assumption, for instance, that it is not interesting that 'widower' is sometimes spelt out in full, sometimes written as 'wid', and sometimes written as 'widr'.

Some editing has been done on the <trade> field to standardise the spelling, spacing, and capitalisation of names of trades. For instance, 'Washer woman' and 'Washerwoman' are standardised as 'Washerwoman'. However, no attempt has been made to standardise trade information consisting of several trades (e.g. 'Lacemaker and servant') or where specific details of the trade are given (e.g. 'Farmer of 250 acres employing 5 labourers'). Given that there is only partial standardisation, the <trade> field is only partially searchable in its present form. For the 1901 census, trade information has been entered in both original and standard form (see §3.12).

Information on birth place has been separated into (a) the name of the village, town or city and (b) the name of the county. This allows these parts to be searched separately.

3.2 Rooms occupied

The rooms-occupied attribute of <household> indicates the number of rooms occupied where the number is less than five. In houses with a lot of people in them, this number indicates how crowded the rooms are.

3.3 Not in occupation

The 1901 census has a column marked 'Not in occupation'. Properties that are marked as not in occupation tend to be those that are named but have noone listed as living there. In these cases, 'Not in occupation' is redundant. It is omitted.

3.4 In occupation

In a small number of cases, the 1901 census records a property as being 'In occupation' even though noone is listed as living there. In these cases, an in-occupation="true" attribute is attached to <household>.

3.5 Abode

Abode is a house (or farm, pub, etc.) name. In some cases - perhaps where there is no conventional name to distinguish one house for another - it is the enumerator's⁵ invention.

The enumerator on the 1871 census often wrote 'Cot' for abode, presumably meaning 'cottage'. We have put this in lower case to distinguish it from the proper name Cotte (and its variant spellings).

3.6 Name

Forenames and surnames are tagged as separate items. Some forenames are initials. Forenames which the enumerator wrote in abbreviated form (but not initials), such as 'Thos.' for 'Thomas' have been expanded to their full form.

3.7 Relation

Censuses are structured in terms of households. Households are seen as having a head and some number of dependents (possibly zero). The head is male apart from cases where the household is headed by a woman whose husband is absent or by a widow or a spinster. The relation field classifies members of the household in terms of their relation to the head. The head himself (sometimes herself) is labelled 'Head'.

3.8 Condition

'Condition' means marital status, including whether the person is a widow(er).

3.9 Age

Most ages are in years. ages of children are sometimes given to the nearest month, or occasionally the nearest week or day. Examples are given below:

    3             = 3 years
    0.3           = 3 months
    1.3           = 1 year 3 months
    0.0.3         = 3 weeks
    0.0.0.3       = 3 days

In the original censuses, age appears in one of two columns: age for males, and age for females. This implied sex information is made explicit in the sex field. See §3.10.

In the 1841 census, ages that are multiples of five occur much more often than expected. It appears that the enumerator has rounded to the nearest multiple of five (e.g. 60, 65, 60) where there was a doubt.

3.10 Sex

This field is not in the original. It is added to allow sorting by sex.

3.11 Defects

All of the censuses have information on what might be called 'defects' (the labelling in the originals varies). The categories used include 'blind', 'cripple', 'deaf', 'idiot', 'imbecile'.

3.12 Trade

In the current version, trade (occupation) information is only partially searchable, because terminology and spelling has not been fully standardised. Nor have the different elements within the <trade> field been marked up separately.

For the 1901 census, the original form of the trade or occupation is recorded, as well as a standardised form (allowing searching). The standard form is recorded using the attribute standard.

In addition, annotations concerning occupation that were added later (apparently by the enumerator) are recorded using the attribute annotation.

3.12.1 Employment status

The 1891 census has columns for recording people's employment status. The categories are 'employer', 'employed', 'neither' and blank. There is no indication of how many people employers employ.

In the electronic version of the 1890 census this field has been omitted. However, for the 1901 census this information is retained using the attribute work-status. Here the most common categories are 'worker', 'own account' and 'employer'. The work-status attribute keeps to the original spelling. To allow for searching on work status, a standard form is used: standardised-work-status.

3.12.2 Working at home

The 1901 census recordes whether a person works at home. Where the enumerator puts 'at home', an attribute at-home="true" is included in the <trade> element.

3.12.3 Example `<trade>` element

An example of a <trade> element, with all the above attributes is:

    <trade standard="Shoemaker"
           work-status="own acc"
           standardised-work-status="own account"
           at-home="true"
           annotation="Boot M.">Shoe maker</trade>

3.13 Birthplace

Birthplace has two parts: (a) village, town, or city, (b) county.

In the data files, county is put as attribute of <birthplace>. Where the census returns omitted county information it has been added in the present electronic version.

Original spellings of placenames are used. Also, original counties are kept to (e.g. St Pancras, Middlesex).

The 1841 census has yes-no categories for recording people born outside the county in question (Devon in the present case) and born in Scotland, Ireland or 'foreign parts'. These have been recorded in the data files as

    <birthplace county="outside Devon"/>

and

    <birthplace>outside England and Wales</birthplace>

respectively.

For the 1901 census, standard and standard-county have been used to record standard forms of the settlement and county respectively, where the enumerator has used non-standard forms.

Branscombe censuses

1 Introduction

2 Version