<trade>
element
The censuses included in this collection are the 1841, 1851, 1861, 1871, 1881, 1891 and 1901 censuses. These are the ones for which the original data are available (rather than just summaries).
The electronic versions of the censuses were assembled in the following way. First, printouts of the original census returns1 were transcribed onto computer files.2 Next, the raw data was annotated using an XML markup.3 The result is a set of XML files - the data files - one per census.
For readability, viewable HTML4 files have been generated from the data in two formats.
The present version of these Branscombe censuses is version 2. Version 2 differs from version 1 in the inclusion of the 1901 census.
The broad structure of each census is:
<census> year <household>+ rooms-occupied in-occupation <abode> <member>+where
+
means 'one or more'. In other words, a census consists of one or more households, each of which is made up of an abode (=house name) and one or more members. The year attribute is the year the census was taken. rooms-occupied
and in-occupation
are explained below.
Each <member>
can contain the following fields:
<member> <name> <forename>+ <surname> <relation> <condition> <age> <sex> <defects> <trade> standard work-status standardised-work-status at-home annotation <birthplace> standard county standard-county
<name>
, <relation>
, <age>
, <sex>
and <birthplace>
are all obligatorily present (except where illegible in the original). The other fields may be absent.
Generally the censuses get more detailed over time. Earlier censuses lack some of the fields listed above. The 1841 census, for instance does not have relation or condition, and only includes minimal information on birthplace (see §3.13).
Unoccupied houses are recorded in the original censuses and also in the electronic version. These have <abode>
but no <member>
s. (See also §3.4.)
The original presentation of the censuses, with all its peculiarities of spelling and phraseology, is interesting in its own right. Also interesting, though, is the possibility of processing the data automatically (by program), for instance by sorting it in ways that differ from the original order, or by searching for particular values (a name, say). In order to be able to process two items (such as two names) that are equivalent, they need to have the same form. If they are different in the original census, they need to be standardised. These two requirements - fidelity to the original and standardisation - are in conflict.
One way of standardising the data is to edit it directly. But this loses the original form. Another way is to add standard forms as annotation, and then to process only the standard forms. In the present collection, names of houses and of people have been kept in their original form, except for spelling out some abbreviations. Other fields have been standardised (on the assumption, for instance, that it is not interesting that 'widower' is sometimes spelt out in full, sometimes written as 'wid', and sometimes written as 'widr'.
Some editing has been done on the <trade>
field to standardise the spelling, spacing, and capitalisation of names of trades. For instance, 'Washer woman' and 'Washerwoman' are standardised as 'Washerwoman'. However, no attempt has been made to standardise trade information consisting of several trades (e.g. 'Lacemaker and servant') or where specific details of the trade are given (e.g. 'Farmer of 250 acres employing 5 labourers'). Given that there is only partial standardisation, the <trade>
field is only partially searchable in its present form. For the 1901 census, trade information has been entered in both original and standard form (see §3.12).
Information on birth place has been separated into (a) the name of the village, town or city and (b) the name of the county. This allows these parts to be searched separately.
The rooms-occupied
attribute of <household>
indicates the number of rooms occupied where the number is less than five. In houses with a lot of people in them, this number indicates how crowded the rooms are.
The 1901 census has a column marked 'Not in occupation'. Properties that are marked as not in occupation tend to be those that are named but have noone listed as living there. In these cases, 'Not in occupation' is redundant. It is omitted.
In a small number of cases, the 1901 census records a property as being 'In occupation' even though noone is listed as living there. In these cases, an in-occupation="true"
attribute is attached to <household>
.
Abode is a house (or farm, pub, etc.) name. In some cases - perhaps where there is no conventional name to distinguish one house for another - it is the enumerator's5 invention.
The enumerator on the 1871 census often wrote 'Cot' for abode, presumably meaning 'cottage'. We have put this in lower case to distinguish it from the proper name Cotte (and its variant spellings).
Forenames and surnames are tagged as separate items. Some forenames are initials. Forenames which the enumerator wrote in abbreviated form (but not initials), such as 'Thos.' for 'Thomas' have been expanded to their full form.
Censuses are structured in terms of households. Households are seen as having a head and some number of dependents (possibly zero). The head is male apart from cases where the household is headed by a woman whose husband is absent or by a widow or a spinster. The relation field classifies members of the household in terms of their relation to the head. The head himself (sometimes herself) is labelled 'Head'.
'Condition' means marital status, including whether the person is a widow(er).
Most ages are in years. ages of children are sometimes given to the nearest month, or occasionally the nearest week or day. Examples are given below:
3 = 3 years 0.3 = 3 months 1.3 = 1 year 3 months 0.0.3 = 3 weeks 0.0.0.3 = 3 days
In the original censuses, age appears in one of two columns: age for males, and age for females. This implied sex information is made explicit in the sex field. See §3.10.
In the 1841 census, ages that are multiples of five occur much more often than expected. It appears that the enumerator has rounded to the nearest multiple of five (e.g. 60, 65, 60) where there was a doubt.
This field is not in the original. It is added to allow sorting by sex.
All of the censuses have information on what might be called 'defects' (the labelling in the originals varies). The categories used include 'blind', 'cripple', 'deaf', 'idiot', 'imbecile'.
In the current version, trade (occupation) information is only partially searchable, because terminology and spelling has not been fully standardised. Nor have the different elements within the <trade>
field been marked up separately.
For the 1901 census, the original form of the trade or occupation is recorded, as well as a standardised form (allowing searching). The standard form is recorded using the attribute standard
.
In addition, annotations concerning occupation that were added later (apparently by the enumerator) are recorded using the attribute annotation
.
The 1891 census has columns for recording people's employment status. The categories are 'employer', 'employed', 'neither' and blank. There is no indication of how many people employers employ.
In the electronic version of the 1890 census this field has been omitted. However, for the 1901 census this information is retained using the attribute work-status
. Here the most common categories are 'worker', 'own account' and 'employer'. The work-status
attribute keeps to the original spelling. To allow for searching on work status, a standard form is used: standardised-work-status
.
The 1901 census recordes whether a person works at home. Where the enumerator puts 'at home', an attribute at-home="true"
is included in the <trade>
element.
<trade>
element
An example of a <trade>
element, with all the above attributes is:
<trade standard="Shoemaker" work-status="own acc" standardised-work-status="own account" at-home="true" annotation="Boot M.">Shoe maker</trade>
Birthplace has two parts: (a) village, town, or city, (b) county.
In the data files, county
is put as attribute of <birthplace>
. Where the census returns omitted county information it has been added in the present electronic version.
Original spellings of placenames are used. Also, original counties are kept to (e.g. St Pancras, Middlesex).
The 1841 census has yes-no categories for recording people born outside the county in question (Devon in the present case) and born in Scotland, Ireland or 'foreign parts'. These have been recorded in the data files as
<birthplace county="outside Devon"/>and
<birthplace>outside England and Wales</birthplace>respectively.
For the 1901 census, standard
and standard-county
have been used to record standard forms of the settlement and county respectively, where the enumerator has used non-standard forms.
Two views of the data are provided. One view is organised by household, following the pattern of the original census returns. The order in which households appear reflects the enumerator's walk around the village as he collected the data. This view is useful where the occupancy of a particular house is of interest.
The second view is organised alphabetically by individual. This view is useful when a particular individual or family is of interest.
Although the amount of information recorded in increases over the decades, the same format is used for each census. Where information is not available, the relevant columns are blank.
A few pieces of information in the data files (.xml
) are not displayed. There are two reasons for not displaying them. First, some information was not present in the original. This includes <sex>
and the standardised forms of trade (occupation) and employment status. This is information that was added to enable automatic sorting and searching of the data.
The other reason for not displaying information is to save screen space. In particular, <defects>
is not displayed because it is usually empty.
However, if one wants to see these pieces of information, one can open up the data files (see §5) with a text editor and search for, e.g., '<defects>'.
The package is structured as follows:
censuses households census1841.htm census1851.htm census1861.htm census1871.htm census1881.htm census1891.htm census1901.htm individuals census1841.htm census1851.htm census1861.htm census1871.htm census1881.htm census1891.htm census1901.htm data census1841.xml census1851.xml census1861.xml census1871.xml census1881.xml census1891.xml census1901.xml description source description.xml description.htm style css census.css xsl common.xsl households.xsl individuals.xsl
For viewing the censuses the files of interest are the ones under censuses/households
and censuses/individuals
. These show the census information arranged by household (as in the original) and by individual respectively.
The data
directory contains the actual census data. Some information is contained in these files that is not presented for viewing (because of limits on what can be fitted onto the computer screen).
The description
directory contains this file.
Style information, controlling the appearance of the censuses and of this file, is contained in style/css
and style/xsl
.
www.w3.org
).www.w3.org
).May 2005 | Last updated December 2008 |