BROWN CORPUS - XML VERSION

This version derives directly from

"A Standard Corpus of Present-Day Edited American
English, for use with Digital Computers."
by W. N. Francis and H. Kucera (1964)
Department of Linguistics, Brown University
Providence, Rhode Island, USA
Revised 1971, Revised and Amplified 1979
http://www.hit.uib.no/icame/brown/bcm.html

as distributed with NLTK (version 0.9.2)

The TEI-XML version of the texts are in the directory Texts.
The file driver.xml or driver2.xml can be used to process all of them 
together with the TEI-conformant header in file brownHdr.xml

Alternatively, the file corpus.xml contains a validated copy of the whole
corpus as a single file.

A perl script was used to convert the texts from plain text to TEI XML, 
and to integrate the texts with the available metadata. 
This perl script and other data sources used are all
in the directory XML-Work.

The TEI scheme used to validate the corpus is documented by the TEI
ODD document brownodd.xml, from which are generated the files in the
directory TEI: this contains schemas in DTD, RelaxNG, and WSD as well
as documentation in XML and HTML.

Please address any enquiries about the TEI conversion to
lou.burnard@oucs.ox.ac.uk



