Digital Early American Imprints, Series I. Evans (1639–1800)
(access by subscription). Readex, a Division of Newsbank, Inc.
Reviewed Jan. 24, 2005 to March 31, 2005.
The online Evans digital collection, part of Readex’s Archive of Americana, consists of more than 2.3 million page images of microform copies of more than 37,000 books, pamphlets, and broadsides—every known American imprint prior to 1800, as listed in Charles Evans’s American Bibliography (14 vols., 1903ï¿½1959). The online Evans series includes neither books by American authors printed overseas nor overseas imprints popular in America, so it is not a complete body of what early Americans read, but it does contain a large portion of it. An optical character recognition (ocr) engine has been used to scan the microforms and translate the page images into ascii text. The ocr engine provides the main feature of the collection, the ability to perform full-text searches on the complete corpus. The collection is expensive, costing initially somewhere between $20,000 and $100,000 depending on the size of the subscribing institution, with an annual maintenance fee of $2,000.
The digital Evans series has been hailed as revolutionary and democratizing by some of those fortunate enough to be able to afford it. Others—including participants in H-Net’s colonial and early American online discussion lists—have described it as part of the twenty-first-century theft of the commons. In my own case, the pricing structure of the Evans series and the other Archive of Americana offerings moves it well out of the range of the possible for a publicly funded state research university in the middle of the Pacific Ocean with only two faculty members working on anything related to early America. Without some gracious (and thus far imaginary) donor, our library simply cannot afford the price of such “democratization” within current budgetary constraints. To be fair, the Evans collection prices out to a little more than two dollars per title for our university. Just the same, finding more than $80,000 (the quoted price, including the trade-in value of the university’s Evans series microcards) during a time when the university is losing and not replacing many humanities faculty members is a dim prospect.
Alternative pricing structures that take into account how many people would use the resource rather than the size of the institution would move the digital Evans series—and the rest of the Archive of Americana, once it is ready—more toward being democratizing in any serious sense of the word. For example, I could see writing a grant for a few hundred or even thousand dollars for single-user access to the resource for a week, a month, or some other set period of time. Such access could be priced about the same as a research trip to an archive.
The digital Evans series has some problems under the hood. First and most important is the somewhat erratic ocr engine. A search for “creole” gave back “[in-]crease,” "credible,“ "people,” "seele,“ "groote” (the latter two being German words), “credit,” "[illegible],“ "creek,” "Greele“ (a surname), ”geese,“ and ”treble,“ along with 11 instances of ”creole,“ among the first twenty or thirty of 136 results. Such generous interpretations mean that few instances of ”creole“ will be missed, but there are a lot of false hits, so any statistical usage of the results would need to be carefully and laboriously checked for validity. The inaccuracy of the ocr is most likely why the promised—and powerful—feature of making the underlying text available in ascii form is missing. Any findings must be transcribed by hand rather than being copied and pasted. A second problem is that sessions time out without being saved. If one wants to explore the 8,670 instances of the word ”freedom," she can forget about lunch unless it is at the computer. After fifteen minutes or so the session times out, saving neither results nor searches. A new search must be started, and the reader has to navigate back to the place where she left off, a time-consuming process. While complex Boolean searching is a boon, it is advisable to save search strings in a separate notepad in order to retrace one’s path through the results if one wants to replicate a timed-out search exactly. Even then, the limits on an advanced search have to be reset each time, introducing a window for errors. While still much quicker than combing through microcards, the interface is slow, even on a fast connection.
Perhaps such a critical assessment of the digital Evans series sounds cranky, but Readex makes big claims and costs a lot, setting expectations high. Despite its problems, it would be revolutionary if made more accessible. The ability to do full-text searches on such a large corpus of materials changes the nature of what kind of historical research is possible, opening up avenues of inquiry that were only imaginable a few years ago. As it stands, though, the digital Evans series is a revolution for the privileged.
Richard Cullen Rath
University of Hawaii
Letters to the Editor
To the Editor:
At Readex, we looked forward to the Digital Early American Imprints, Series I review in the influential Journal of American History (JAH, Sept., 2005). Perhaps because so many leading scholars have called this new collection revolutionary and democratizing, we are especially disappointed by Professor Richard Rath’s narrow perspective, described by Rath himself as “cranky,” on this resource now in use worldwide at 295 institutions of every type and size. Rath states, “The OCR engine provides the main feature of the collection ... ” but never notes the deep indexing that most distinguishes Early American Imprints from comparable databases. By enabling users to search and browse more than 37,000 works by author, title, subject, genre, and other fields, this collection offers unprecedented access to this material. For example, researchers browsing “Society, Manners, and Customs” are presented with more than 300 subtopics. Rath suggests “others” have accused Readex of participating in a “twenty-first-century theft of the commons.” This unfortunate charge ignores the reality that every imprint in the Evans bibliography is no less freely available today than before Readex created images from these works—first on microprint and now digitally. Most scholars hail Readex for dramatically expanding access to these imprints, originally filmed from the holdings of more than 250 institutions and private collections. Researchers no longer need to go to the imprints, because Readex has brought the imprints to the researchers—around the clock via the Web. Rath writes, “The collection is expensive ... ” when, in fact, prices for Early American Imprints have made the collection broadly obtainable, whether through acquisition, subscription, or other flexible options. While many institutions whose libraries are members of the Association of Research Libraries are customers, they account for only a quarter of the nearly 300 diverse institutions, including dozens of small colleges—many with fewer than 1,500 students—now enjoying access. Rath states that his own “publicly funded state research university in the middle of the Pacific Ocean” has “only two faculty members working on anything related to early America.” This surprisingly restrictive view, limiting the importance of 160 years of American printed works to faculty in one department, fails to recognize this collection’s multidisciplinary value not only to teachers, but also to students throughout his university’s departments of American Studies, Economics, Ethnic Studies, Geography, Journalism, Literature, Philosophy, Political Science, Religion, Women’s Studies, and others. Rath incorrectly states “the promised ... feature of making the underlying text available in ascii form is missing.” No commercial publisher of comparable digital collections, including Readex, has promised, let alone provided, the underlying OCR output. The limitations of OCR-generated text are well known. For this reason, the University of Michigan has undertaken a series of “Text Creation Partnerships” with other academic institutions to key the text of works from digital collections, including Early American Imprints. In concluding that “... it would be revolutionary if made more accessible ...,” Rath unfortunately fails to discover why hundreds of institutions—from high schools to large universities—have already brought this essential collection to their researchers.
A Division of NewsBank
To the Editor:
David Braden’s claim to be making the texts of the Digital Evans Series widely accessible is more limited than Readex would have us believe. When there are over 4,000 degree-granting institutions in the United States and more than 8,000 worldwide, 295 is a small and privileged elite, even without the counting high schools, libraries, government agencies, and historical societies that Readex also claims to be serving. Braden argues that “most scholars hail Readex for dramatically expanding access to these imprints.” In fact, only the privileged few with access are hailing it thus. He writes that the collection is affordable because dozens of small colleges have bought in, but small private colleges are often among the best endowed in proportion to the number of students. How can a university that can’t afford to hire new professors afford the digital Evans? There are ways, but the current pricing structure is geared toward profit, not access. American exceptionalism also needs to be addressed if the Digital Evans is to become accessible to institutions without an American focus. Braden wrongly assumes that I was referring only to my own department when I said there are only two people working on American history before 1800. He apparently has difficulty imagining a large university without American history as a top priority. We do exist. The indexing Braden says I neglected is a feature of Evans’s printed bibliography. Readex digitized the subject headings and other fields in the 1990s. The Digital Evans collates and then links them to page images, but someone else’s typology is just that. The old category-based Yahoo! is sometimes helpful, but Google and the other full–text search engines have changed how we relate to the world of words. Full-text searching makes making one’s own categories and connections possible. Searching by author, title, subject, and other pre-made categories is as revolutionary as a library catalog. Braden contends that Readex never promised the underlying text to the page images. Yet as recently as July 22, 2003 (see
Richard Cullen Rath
University of Hawaii