Old Hay in New Stacks:
Researching Photo Archives in the 21st Century

Richard Pearce-Moses

Presented at The American Association of Museums, April 1994
Copyright 1994. Brief excerpts from this paper may be quoted if fully cited. Further use is prohibited without the advanced written permission of the author.
* * * * * * *
I would very much appreciate feedback from readers on this piece. To contact me, see http://home.comcast.net/~pearcemoses/.
Jacques Barzun and Henry Graff observe in The Modern Researcher
What is tantalizing about the proverbial task of finding the needle in the haystack is that you are assured the needle is there. The space is restricted, the object is unique, it must be possible to find it. And no doubt, with enough care and patience it could be found with just a pair of eyes and a pair of hands.(1)
Archival research bears an enormous parallel to that proverbial haystack because it suggests the large quantity of unorganized materials one must mine to find the useful nugget of information.

In the age of Infoglut, the haystack is not static; the amount of information to search through grows faster than one can sort through it. The last time I checked, more than 11 billion photographs were made in the United States each year; and that was 1981. Mass media is relying more and more on visuals for news and entertainment. Publishers that were once satisfied with a few photographs for a book now are asking for thousands for Multimedia/CD-ROM products. The enormous number of images coupled with the high demand for visual information makes Infoglut seems imminent for graphic archives.

I'd like to offer a somewhat contrary opinion that Infoglut is merely a hip term for an old problem. The proliferation of computers and the growth of electronic information on the Internet has not increased the amount of information so much as it has increased the availability of information that already exists in another form.(2)

Infoglut is just old hay in new stacks. The fundamental problem of managing large quantities of information remains the same. Archivists have been dealing with Infoglut for years. It wasn't an electronic glut, but it was just as overwhelming. Acquiring a few hundred items may represent a very good year for an art museum, but it's peanuts by comparison to archives which may acquire a single collection containing hundreds of thousands of items. Arizona State University has been designated as the repository for the records of American Continental Corporation; more than 7,000 linear feet of records. Paper records. If more than a mile of records isn't glut, I don't know what glut is.

Although many people fear being overwhelmed by electronic information in the future, others already see themselves overwhelmed with hardcopy records. In particular, many repositories find their photographic collections in a quagmire. It is hardly unusual for repositories to ignore the photographs that came into their collections over the years; with the increased use of visuals in the mass media and the rise of CD-ROM "edutation," these collections find they need to get control of their graphic collections quickly.

Many archives have looked at computers as a magical solution to their problems of collections management; they dumped information about their collections into a computer, only to discover that they had far from solved their problems--they had automated them, so that they could make more powerful mistakes even faster!

In spite of some notable failures, automation has had a positive effect on archival practice. Archivists now have a much better understanding of the underlying principles of descriptive practices and the needs of their patrons. The two fundamental principles of archives have been reaffirmed; provenance and original order preserve important contextual information and offer basic access points. We've also learned some limits to those principles, that we need to provide access to the secondary value of collections and beef-up finding guides to make them sufficient to aid patrons in selecting materials.

Traditional archival practices have benefited from these insights to produce even more effective strategies for gaining control over Infoglut so that our collections are accessible to researchers. When we have mastered those strategies, we can automate them to take advantage of the computer.


I thought I would give an overview of basic archival principles and techniques with some nuts-and-bolts tips on how they can help you get control of your collections.


Maintaining collections by provenance is the principal means of archival control; records created by different sources are kept as separate collections rather than mingled into a single, large collection. The key factor to this strategy is to manage collections of information, rather than attempting to manage the items. Consider the number of photo collections you have, as opposed to the number of photos you have; at ASU it's a ratio around 300 collections to 450,000 photographs (or 1:1500). By working with collections you're effectively reducing the size of the haystack by thinking in terms of "chunks" rather than in terms of the individual bits of hay.

Museums and libraries are used to working with item-level control; overcoming the bias towards item-level control is an enormous challenge, especially to those accustomed to it. If you have a small enough collection or a large enough staff, item- level control is practical. Otherwise if you're suffering from Infoglut forget item-level control except as a long-term goal that may never be attained.

To gain control of your holdings, make sure you have basic information about your collections. One effective method to capture this information is a retrospective accessioning project. Make a list every collection in the repository, capturing the storage location, provenance, and a very brief description.

Over a period of time, add additional information about the collection. Work in "layers," capturing one or two pieces of information during successive surveys. Many people try to capture complete information for each collection before moving on to the next; however, this means many collections have no control while others have exhaustive control. The layered approach spreads access and control evenly through the repository. The layered approach has a built in editing process, providing a chance to check previously recorded information in successive sweeps. Over time, create a complete accession record for each collection in pieces, recording:

Don't research during the survey; instead, say "Probably, possibly, may be" and give it your best guess. The information will be revised later.

Next, search administrative files, finding guides, and other documents, comparing your inventory against the records. The files may reference collections you can't locate; these often turn out to be collections which you have mis-identified or are unidentified. Some collections may have no documentation; take this opportunity to begin a collections file.

Going through the administrative files corrects and supplements the shelf inventory. Collections unidentified on the shelves may be identified in the files. The files often contain accession information not found with the materials, including the immediate source of acquisition and restrictions on the use of the materials.

Once you have basic control of your collections, then you can begin to refine your control at the series level. After you have control at the series level, you can continue a process of stepwise refinement to the folder level and finally (if you ever get there) to the item level. As you go through each layer, you can revise and correct previous descriptions based on your more complete knowledge of the collection. Realistically, you may decide to survey some important collections in depth before you describe less valuable collections; but I encourage you to have at least a minimum collection-level record for all your holdings.

This layered approach rapidly provides a broad level of access to all your collections. First, you don't have a large number of items without any access because you've spent so much time providing detailed access to a few items. Second, providing patrons with a basic list of collections by provenance with a terse description typically indicates the records' primary value--the reason the records were made in the first place. If you're searching for land ownership, you know to check the records of the County Recorder rather than the Fire Department. Secondary values can often be inferred.


The second principle of archives--original order--is a key to expediting access to records. Many collections have an internal order which reflects their process of their creation. Original order has a practical value; it may not match researchers' queries perfectly and it probably won't be consistent with other collections' order, but it's immediately available. Reorganizing the materials will take time; exploiting existing order and any accompanying indices means you can spend time doing other things.

Original order is particularly important in organic collections because the contextual relationship of the records may contain as much information as the records themselves. Many people try to save work by reorganizing the materials by subject headings, but that destroys important information. I can't emphasize enough: if you aren't trained in appraising records for contextual order, don't change their order; you can organize descriptions on paper to your heart's content, but leave the originals alone unless you know what you're doing. (Now, I'll step off my soap box.)

In your finding guides, describe the order of the materials with instructions to patrons as to how to request materials. For example, a collection of portraits might include the note "Arranged by name of sitter; include the name of the sitter on your call slip and you will be brought the appropriate box."

If there is an existing finding guide, direct the patron to request it. Ideally, the finding guide will index entries in a meaningful fashion, but many are registers are in chronological order. Even though the important elements may be randomized by date, scanning the entries for a relevant name is likely going to be quicker than trying to scan the records themselves.


Archives have long relied on guides to provide access to their collections. The guide limits the size of the haystack by describing the components of a collection, and by describing the order of the collection it helps researchers know which components to select. Brief descriptions of the components aid in the selection process.

I believe that finding guides are very important documents. They provide a single place to record the archivist's intimate knowledge of the materials gained during processing. I also use the finding guide as the place to record information about the collection that researchers need to know in the reading room; it's usually the reference archivist who knows to direct the patron there, but information on restrictions, credit lines, and similar administrivia is in an easily accessible place rather than buried in a file somewhere else.

The collection is typically described hierarchically, beginning with an administrative history or biographical note to place the collection in the context of its creation and then gives an overview of the collection in a scope and contents note. The next level in the hierarchy is the series, and within each series are a list of folder headings.

The strategy we need to learn is one of effective description. What do patrons need to know about the materials in order to make an intelligent selection? More and more, I believe that folder headings are not sufficient; the record creators were so familiar with the materials that they only needed a mnemonic. Researchers unfamiliar with the materials need more information.

Rob Spindler, a colleague on the reference desk, observed that patrons tend to ask questions in terms of a topic, a format, and a date. Minimally, we should provide that information with the folder heading. Depending on the nature of our mission, we may want to describe other aspects. The Center for Creative Photography is interested in 20th century photographers; they might not choose to note every individual referenced in the records, but they might always note a photographer.

In studying patron understanding of descriptions of archival collections in online library catalogs, Rob Spindler and I discovered that people frequently misinterpreted the descriptions written according to Archives, Personal Papers, and Manuscripts(3) (APPM)--the standard rules for describing archives in archival collections.

At the Heard Museum I've taken the liberty to deviate from the rules a bit to describe records in a manner that I hope is easier for patrons to understand. I use APPM to construct a basic description that includes the title or folder heading, expanded to include a topic if necessary; the dates; and indicate provenance or responsibility. I don't worry about transcribing "exactly as to wording, order, and spelling . . . ;" I supply useful information as necessary. I always include a statement of extent, to indicate the quantity and form of the materials; I seldom give specific materials information or dimensions. If necessary and appropriate, I include a note that gives additional detail. I may detail the components within a contents note, rather than adding another layer to the hierarchy; or I may note the subseries in a series note to provide a summary overview. For example:
Starlie Lomayaktewa, Mishongnovi town crier, ca. 1982 / by Suzanne and Jake Page. 1 photographic print.

Geronimo--Apache, before 1907 / by Edward S. Curtis. 1 photogravure print.

Indian trip journals, 1959-1989 / by R. Brownell McGrew. 32 spiral notebooks of mss.

Karl Moon portraits of Native Americans, ca. 1910. 5 photographic prints. 487. Desert dawn, c1907 -- 488. On the way to the trading post -- 489. A pool by the trail -- 490. In the Indian country = Awaiting the signal : To illustrate The End of the Trail -- 491. The meeting place : To illustrate The End of the Trail.

Archival description has traditionally been a list of folder headings. However, those headings often fail to capture much of the information researchers need to make a meaningful selection of materials. We must go beyond transcription to analysis; we must provide a sense of the contents of the folders.

The key is that I am interpreting the materials. Minimally, I am synthesizing information in the records that researchers need in order to decide if the material will be relevant. Traditional bibliographic description relies on transcription of existing information, but archivists often do not have the luxury of having something to transcribe. At this level, interpretation is no different from an annotation in a bibliographic record.

Any interpretation beyond synthesis verges on anathema to bibliographic catalogers; they believe strongly that the cataloger must leave judgment of a work to the patron. Ultimately, I agree; it is the patron who must interpret the materials. My intent is to provide information that will aid in the selection of materials. When possible I rely on scholars' interpretations and annotations, which are more authoritative than my own. Also, I put a rhetorical flag on anything that might be considered a value judgment, such as "May be interpreted as . . . . "

Archives differ from general collections, and I believe this difference justifies interpretation. For example, a cataloger would never label a book as propaganda. But in an archive that specializes in World War II, what good is it to collect examples of propaganda if it will not provide access to it under that heading?

Photographic collections place particular demands on archival description for several reasons. Photographs belong to the realm of vision, textual records to that of hearing. Photographic records' particular value is their ability to communicate information words cannot; we use photographs to "point" to things we cannot describe.

One of the most important requirements for a cataloger in a photographic archive is the ability to translate visual information into words. To write effective descriptions of visual materials, we must give the information relevant to patron queries: Who are the photographers?, Where and when were the photographs made?, and What is depicted in the images? But effective descriptions must go beyond the simple naming (transcription) of the who, what, where, and when. Those descriptions should suggest the how of the photographs; photographs' style or genre significantly colors the meaning of the pictorial elements. As important, researchers need to know the why behind the images; what is their significance, the reason they were made.

Effective description is not necessarily measured by the amount of detail. To the contrary, if we bury researchers under too much description, we might as well bury them under the records themselves. Our description should support the repository's mission; don't note every datum, but do not those that directly relate to the mission.


Although provenance and original order are valuable strategies, archivists have long recognized their limitations. When working with the archives of an organization, the provenance provides the essential access point: one locates relevant records by identifying the agency that would have created them. Access by provenance was entirely adequate within the institutional context, where people needed records for their primary value. However, provenance provided no meaningful access to the secondary value of the records. Access by provenance works poorly for personal papers, where the subject matter is often not apparent from the name of the individual.

In the worst case scenario the repository created a finding guide for each collection and the researcher was forced to look through each guide. That was not so onerous when there were only a handful of guides; but searching a hundred guides was impractical.

One technique to supplement access is a topical bibliography or essay describing relevant collections. This technique is a particularly handy reference tool to help direct patrons to collections related to common queries.

Another common technique to provide access is to create a single index containing entries for all collections. The index may be nothing more than a heading followed by pointer to a collection guide or to the materials (in terms of the collection, box and folder numbers). Or the index may be more like a bibliographic catalog, containing both the callmark and a description of the materials.

Providing description with the index headings can aid the researcher in the selection of materials. Great in theory, but the actual mechanics of producing an archival catalog containing adequate description of materials under the many different headings is an enormous amount of work.

Computers promised to reduce the amount of work, and the Society of American Archivists adopted APPM and the USMARC-AMC format as a means to automate the cataloging process and improve access. With mixed results.

The use of APPM and AMC has changed significantly in the ten years since they were brought online. Originally, archivist tended to try to lump the entire finding guide into one enormous record that took up a dozen screens for description and a dozen dozen screens for index headings. Search and retrieval engines have generally been developed for bibliographic databases that organizes the hits according to a bibliographic notion of main entry rather than the archival principles of provenance and original order. Creating a APPM/MARC record was generally done after a traditional finding guide had been written. Instead of saving work, automation became another layer in the descriptive process and took--rather than saved--time.

At the same time archivists were learning the basics of AACR/MARC-based online description, the Internet was born. Instead of adding another step to create a MARC record, many archives are loading their word-processed finding guides onto Gophers. Instead of indexing the materials, they're using Veronica and WAIS to provide access through full-text searches of the guides.

Gopher descriptions are scattered throughout the Internet and lack consistency; ironically, we've dumped the haystack on the researcher. But, the researchers seem to love it. The positive reception of finding guides on Gophers may give us some clues to how we can facilitate access.

First, Gopher are often organized regionally and many topics lend themselves to regional access. Also, many patrons know which repositories hold relevant materials; check Yale for Texas and the Southwest, check the Bancroft for California and the Southwest.

Wired researchers haven't jumped into an electronic haystack; the haystack is organized in ways that will lead them to many useful collections. They may miss some important materials in out of the way places, but the poly-hierarchical organization of Gophers seems to be an effective way to winnow down the size of the haystack. And Veronica and WAIS searches are still primitive, but they often help researchers find those out-of-the- way collections.

Second, Gopher guides are much more detailed than the collection-level AMC records. My sense is that these descriptions are so broad as to not aid researchers in the selection of materials.

Gophers are great browsing tools. I think they will show some limitations as their novelty wears off and the size of the Internet grows.

Researchers on the leading edge of the Internet are some of the most intelligent and diligent researchers; but Gophers may not be as effective for those who want more direct access. Many of those on the Internet pay no access charges, so their only expense in searching is their time; if people start being charged for time or file transfers, they will want more direct access to relevant materials to save money.

In the middle is virtue. I happen to believe that APPM/MARC descriptions can work effectively in archival description. We need to play more with their implementation to find out what really works and what doesn't work, but I think they can work.

Why use a thirty year old data architecture? My first response is simple: because it's there, so why reinvent the wheel. Second, the format and standards are not thirty years old; they've evolved and will change, and if something need to be tweaked to make it work better, we'll tweak it.

I believe the significant problem that APPM/MARC has made apparent existed before we tried automation.

Many archivists have seen computers as a magical solution to their problems. But the computer cannot think; it can only process information according to a set of procedures. The process of automating archives has demonstrated many of the problems of past procedures used to provide physical control and intellectual access to their collections. As a result, archival practice is improved for both archivists and researchers; archivists will have been control, and researchers will have better access to materials.


  1. The Modern Researcher, 4th ed. (1985)

  2. The slang neologism "Shovelware" reinforces my belief that Infoglut is more a reformatting of existing information than an explosion of new information. The term refers to products that try to capitalize on the glitz of the new CD-ROM medium merely by publishing megabytes of existing data or programs.

  3. Steven Hensen, 2nd ed. (Chicago: Society of American Archivists, 1988).

Return to Richard Pearce-Moses' Home Page