Caught in a World Wide Web of Words:
Musings on the Future of Archival Description

Richard Pearce-Moses

Presented at the Society of American Archivists 59th Annual Meeting, Washington DC, September 1995.


Draft

I would appreciate any comments on this draft that readers may have. Please do not distribute or reproduce this paper without permission. Richard Pearce-Moses
To contact me, see http://home.comcast.net/~pearcemoses/.

Abstract

Finding aids are first and foremost an access tool. Good finding aids go beyond a recitation of folder headings; they facilitate access by bundling similar bits of information into manageable units, explain the context and significance of the materials, and embody the archivist's experiential knowledge of the collection. This synthetic and abstract information supplied by the archivist cannot be replaced by automated systems that query the records directly or an artificial intelligence system.

In addition to providing access finding aids often have secondary functions recording administrative information about collections and increasing security.

This paper proposes objectives and means for an effective finding aids system. It suggests a rationale for minimum, sufficient description and essential access points. The paper also suggests tools for coordinating finding aids into an integrated access system.


I. Introduction/Straw Dog

It is a common conceit of science fiction novels that all documents are online; some character seeking information taps a few words into a console, to have the whole of recorded human knowledge searched. Accepting for the sake of argument that all archival collections will be converted to a machine-readable format, it would seem that finding aids would be superfluous. They would be imperfect abstractions, of limited value when the whole of the originals is readily accessible. Where is there value in a mediating aid to locate information that can be searched directly?

Many researchers believe that if they could just get into the stacks, they would magically find the documents they need; some see automated access to digitized collections as a means to achieve that end, as patrons can search through the records at no risk to the originals.

Nevertheless, loading data into computers does not necessarily increase the accessibility of the proverbial needle. To the contrary, it merely digitizes the haystack. Merely throwing open a barn door to a room of haystacks will not help patrons find the needle they want, whether the collections are in an electronic or a paper format. Researchers need a map, not a pitchfork, if they're to locate relevant materials. Finding aids are the maps that archivists provide researchers to facilitate locating relevant materials.

Finding aids' value lies in the information they contain. Through scope and history notes, an archivist creates signposts to relevant collections. By describing a collection's organization, an archivist maps its contents. The archivist's synthesis and abstraction of a collection provides patrons a shortcut to rummaging through boxes. It is this human factor that will ensure that finding aids will continue to be useful.

To date computers cannot think, they cannot synthesize data into knowledge, nor do they have the benefit of epiphany. Hence, computers are not going to be able to explain or interpret the data they hold in their memory. Although computers may be able to recognize patterns within documents (and that ability is valuable), their ability to understand the nuances of human language (much less, visual documents) is no where near a level to be of serious aid to researchers.

In fact, computers offer us a powerful tool to improve the quality of access to archival materials. However, the only way a computer can help is if we tell it how to do something helpful. Regardless of whether we are designing a manual or automated system, we must have a good idea of what we're trying to accomplish. We need to understand the nature of both the data and the query to be able to design an effective system.


II. A look at guides past

If we want to know how to automate finding aids, we should first analyze the manual system and build on that model. Such an analysis consists of two parts; how finding aids are constructed (description) and how they are used (retrieval).

In the summer of 1995 I conducted an informal survey on the Archives and Archivists listserv to get a sense of descriptive practice in the profession. I received many thoughtful responses filled with extensive comments.

Most respondents reported they had learned to write guides by the seat of their pants. They used existing guides as models; those who had some training in school felt it was inadequate. Given that most respondents' practice was based on their own experience, it's no surprise there are few universal principles coming out of what is largely disparate, parochial knowledge. However, a few basics of archival description surfaced:

In addition to recoding information about the context and contents of a collection, finding aids are sometimes used as a place to record administrative information. Such information may include notes regarding acquisition, copyright, form of citation, restrictions on access, and location of related materials.

There was considerably less consensus among respondents as to the value of this information in finding aids. Some felt that some administrative information could be useful to researchers. Some also felt the information was more useful to the reference archivist. A number of archivists felt that information was important, but maintained that information in other places. Information about the acquisition of a collection may be taken from the deed in the administrative office; changes to order, deaccessioning information, and preservation information may be kept in a collections file.

I had hoped this survey would reveal some basic principles of archival description, a model that could be automated. But the survey suggests that there is little consistency in how we describe collections in finding aids. While many respondents developed style sheets to ensure that their guides had a consistent structure, no one gave explicit details on the contents of the parts. What goes into a scope note? Into a history note? Into a folder list? The underlying principle seemed to be to give the patron enough information to decide if the originals should be consulted, but we don't seem to know what information the patron would use as the basis for such a decision.

These themes do not constitute a sufficient model for finding aids or access systems. As such, we are left to the vagueries of intuition as to what constitutes good description.


III. Retrieval: How Do Patrons Use These Guides

Although we may have some vague notions about archival description, we are virtually clueless to determine if what we do is efficacious.

We simple don't know enough about how researchers use guides or what access means to them. Barbara Orbach Natason,(4) Paul Conway,(5) and Mary Jo Pugh(6) have addressed those questions. Here are some key points:


IV. Towards a Standard for Finding Aids: Objectives for Access

What might guidelines for finding aids look like? I would like to propose some fundamental objectives and means for archival description. In particular, I want to think about those guidelines in terms of access for researchers; after all, that was without question the most important function of guides in the eyes of the respondents.

For archives, I believe access can be defined in terms of five broad areas:

By provenance:

Archives must be able to answer the question, "What records do you have by N.?" Providing access through an individual or agency responsible for a collection is the primary access method used by archives. Access by provenance is highly effective, especially for locating relevant records whose primary value is the subject of the search.

But provenance becomes more complex when considering the components of a collection. The Office of the Director may be responsible for the series "Correspondence," but it was Marty Sullivan who wrote the letter to Buster Smith on July 1. The Public Relations Department is responsible for their records, but those records may include important research reports created by an outside consulting firm.

Providing access to creators of components within personal papers and manuscripts is even more important, as the relationships between individuals and organizations is often serendipitous and unpredictable.

Archives must be able to provide access to provenance at all levels.

By characteristics of provenance:

Archives must be able to answer questions based on characteristics of the provenance; for example, "What records do you have by black poets, Catholic missionaries, anthropologists, or oil refiners?" At Arizona State, I was asked this type of question much more often than I was asked for the records of a specific individual or organization. Among many possible characteristics, archives should consider a corporation's function, place of operation, ownership, or balance sheet; or an individual's occupation or avocation, and such diverse possibilities as gender, race, religion, and other personal traits.

By title:

Titles are often considered rare or inconsequential in archival collections, and therefore of little value as an access point. However, I believe titles are not insignificant. While few collections may have a formal title, all collections can be cited formally in notes and bibliographies; to the extent that citation identifies the collection, it serves as a title and is useful for locating the material. The fact that such citation/titles are generally assigned by the archivist is irrelevant to their functional value.

Collections may contain published and quasi-published materials, photographs, and other items that have formal titles which are important keys locating specific materials. Materials may have identifying codes that can be used to identify materials quickly; negative numbers, technical report numbers, and standard recording numbers are examples.

Finally, some materials may actually have "nick-titles;" a word or phrase by which a work is commonly known--sometimes formed from a subtitle, but often completely independent of any title the creator has assigned. The best example I can think of is Paul Strand's "Mexican Portfolio;" I don't know any photohistorians offhand who know the formal title of this work, but everyone knows it by its ad hoc title.

By content:

Content analysis includes personal and corporate subjects, locales, topics, and chronological periods. However, the notion of archival content goes beyond the bibliographic notion of subjects, which is limited to what the work is "about."

The bibliographic sense of subject looks to the synthetic nature of a book and its major themes. Archival collections' primary value is roughly equivalent to the synthetic subject of a book; but archival collections' are often more valuable for their secondary subjects which, because of their organic nature, are not as tightly constructed by their creator and must be teased out of the materials by the researcher.(9)

Records which grow out of an organic process may be about some topic other than that process, but because those records provide primary evidence for understanding the process the process itself is considered a subject. A similar problem lies with the confusion as to whether a work's creator should also be considered a subject, even though the work is not self-referential.

A frontier diary can illustrate both problems. The author may not talk about himself explicitly, but recounts the events of each day. Such a diary truly details the individual's life, if not overtly; hence the individual is a subject as well as an author. Further, the author may never speak of those events in terms of life on the frontier, but the data such a diary contains about life on the frontier makes the frontier a legitimate subject.

As another example, the accounting, personnel, inventory, and administrative files of an advertising agency may not be explicitly "about" the advertising, but advertising certainly is a principal subject.

Archivists have long wanted to increase subject access to their holdings, but subject analysis is far from widely practiced. I suspect limited resources are a significant barrier to subject access. I also suspect that the lack of a standard approach to subject analysis is a barrier; each repository must reinvent this wheel.

By form/genre:

Finally, archives contain information in many formats and of many intellectual types. Many researchers' topics are based in specific forms or genres, and archives must be able to assist them. Some researchers search broad forms, such as photographs or poetry, whereas others seek highly specific forms, such as uranium-collodion prints or sonnets.


V. Towards a Standard: Context

In addition to providing these access points, archival description should aid in the selection of material by providing an abstract of the material. If we're going to aid patrons in selection of material along the lines of provenance and its characteristics, titles, content, and form/genre, we must include information about those elements in our description. Presto! Here's a list of what constitutes sufficient description.

The finding aid's biographical/administrative history note should clearly identify the provenance of the collection and characterize the provenance in terms that match the repository's mission and the value of the materials. The scope note should include individuals, corporations, topics, locales, and periods represented in the collection. Nothing surprising there, and APPM, RAD, ISAD(G), and other sources give guidelines for constructing such notes.

However, I believe we must go further by providing such information to the components within a collection. An enormous wealth of information is buried in the component levels of collections, but often there's no clue as to the nature of that information in the finding aids.

A patron searching for records of the Santa Fe Railway might think to check the Fred Harvey Indian Department records because the corporate relationship is well known; but a straight list of folder headings would not suggest this collection is more likely than others to contain photographs by Charlie Lummis and Karl Moon, who are both represented in that collection. The Byron Harvey collection relates primarily to Native American Fine Art, but one subseries on San Francisco contains significant information on the occupation of Alcatraz; what researcher could be expected to infer that from Harvey's name or the subject of Indian Art?

As a profession we have relied heavily on the technique of research through inference. In many--possibly most--instances, inferring one subject from another will locate relevant material. While researchers are expected to come prepared with a fundamental understanding of their topic so that they can make such inferences, it is unreasonable to expect a patron to know every possible related name or to imagine some tenuous relationship. Further, checking every possible relationship is apt to lead patrons down many blind alleys. To the extent we can we should give patrons positive evidence of information (and even note the absence of some subjects likely to be inferred but missing), rather than leave them clueless.

Potentially, every level of description should contain information on those five broad areas of access: provenance and its characteristics, titles, subjects, and form/genre. By way of explanation, I'd like to share my pet peeve, contents lists. I've seen good, and I've seen bad. As an example of bad, here's a list I downloaded from the Internet:(9)

SERIES: Research and Development Project Reports
NOTES: None.
Box 1 Folder 1 - 8
Box 2 Folder 9 - 19
Box 3 Folder 20 - 31
Box 4 Folder 32 - 42 . . . and so on, through . . .
Box 31 Folder 350 - 359.
How does this list help a researcher select materials? What's in the folders? If the provenance and subjects are adequately covered in a series level scope note, why waste time creating this list at all? In the absence of notes, why anything more than a series title?

In some instances folder headings may help select materials; if the scope note describes the nature of the correspondence and indicates the materials are arranged alphabetically, a list of headings

1/1 Abrams - Adams
1/2 Afton - Albert
1/3 Albertson - Anthony
tells the researcher where to dive in.

However, I don't think we should create folder lists unless they're going to provide useful information. One archivist I spoke with while researching this paper numbered her materials as series/folder rather than box/folder. Because series typically spanned boxes, her folder list did not indicate the artificial divisions imposed by physical housing, and she created a separate container list. When I asked her why or what information that container list provided, her only answer was that she thought you were supposed to.

In fact, I think we should completely revisit the notion of the folder list. Instead of writing a list of folder headings, I often create a "umbrella" description that summarizes the information. On average it save me time, but more importantly is packages the information into more managable units for patrons. I write these summaries for small clusters of materials that are not series or sub-series; a patron should be able to examine this material in a reasonable amount of time.

For example, the Fred Harvey photograph collection uses headings that are subdivided topics; rather than listing some fifty folder headings for the Santa Fe Railway, such as

ATSF -- Barber shops
ATSF -- Dining cars
ATSF -- Dinnerware
ATSF -- Kitchens
ATSF -- Newsstands . . . .
I wrote a single umbrella description for the Santa Fe Railway with a scope note that captured the same information in the headings, as well the names of photographers, which was not in those headings. Patrons get a box instead of a specific folder, but it's still a reasonable amount of information for the researcher to look through; and there's a good chance the patron would have wanted to see all the material as it's related.

Ultimately, we must recognize that archival description differs fundamentally from bibliographic description.(10) Bibliographic description emphasizes transcription, largely to aid in distinguishing variant editions. Because archival material is largely unique, the need to distinguish editions is irrelevant. Moreover, archival materials seldom have anything equivalent to a titlepage to transcribe; the archivist must interpret this information from the materials themselves.

Interpretation is a loaded word because it suggests bias. But I use the term in the sense of translation and explication; archivists are forced to summarize and synthesize the many pieces of a collection into a concise statement that conceptualizes the materials. This process of interpretation is the first step in embodying an archivist's knowledge of a collection.

Folder headings are one of the few things in an archival collection which could be transcribed exactly; collection and series titles are usually supplied. But even if possible, is it desirable to transcribe folder headings? A survey on LCSH-AMC suggested archivists share no consensus on this practice.

Some archivists silently modify folder headings; in some instances, they are merely normalizing headings by spelling out abbreviations and standardizing spellings, though in other instances they supply additional information. Another approach transcribes the title exactly and supplies the additional information in a note.

It's not a question of which approach is right; different situations demand different approaches. Many respondents felt that folder headings in personal papers were of limited value, as the heading was often little more than a telegraphic, mnemonic phrase to the creator which would be meaningless to a researcher; those headings should not be transcribed, or they should be annotated. Those same respondents felt folder headings in corporate archives were more meaningful, and generally transcribed them. In some instances, the archivist may appraise the headings as sufficiently interesting or significant that they are carefully transcribed.

Personally, I believe that folder headings are often insufficient; what was clear to the creator is opaque to subsequent users. Ultimately, it is the archivist's judgment guides the best approach on a case-by-case basis, the folder heading, the contents, or both. "We are archivists because of our training and experience in dealing with significant historical documentation. We tout ourselves as being able to evaluate evidence from the past and determine whether it will be useful, or even valuable for the present and the future."(11)


VI. Integrating Access

No matter how exhaustive the description, access is still sorely limited by the fact that the information is scattered among many discrete finding aids. Researchers must determine which finding aids to consult, then they often must read through a great deal of irrelevant information to locate the materials they desire. Archives must find some means to direct patrons to relevant collections (and portions of collections).

The traditional means to connect researchers with relevant collections has been through the reference interview. But woe to the researcher who arrives on the day the archivist who processed the relevant collections is sick or just retired and is confronted by a student or new hire who has limited knowledge of the collections! Archivists must find ways to embody their experiential knowledge of the materials into guides and access tools so that it isn't lost.

Repository guides and topical guides containing descriptions of each collection in the repository are other traditional tools, the former typically being organized by provenance (or title for artificial collections) and the latter grouping descriptions by subjects. Both types of guides have enormous value as they can be browsed easily and widely distributed.

Printed guides quickly fall out of date, and before the advent of word processors it was common for decades to pass between editions. The rise of desktop publishing and high quality photocopying overcomes the cost of updating guides, making it economical to produce tiny editions that are frequently updated. Unfortunately, few repositories are taking advantage of these tools.

The National Union of Manuscript Collections, never easy to use because of its size and many indices, is no longer in print, having been replaced by the national online union catalogs of RLIN and OCLC. Chadwyck-Healey's National Inventory of Documentary Sources is another automated union guide available on CD-ROM. NIDS differs significantly in that it also supplies microfiche copies of the entire finding aids for each collection.

As noted earlier, researchers do not seem to use these tools. In many instances, they are not aware they exist. In other instance, they are not necessary because the researchers already know where the relevant materials are located. Neither utility has published a study on the number of researchers who have successfully located materials using these databases. However, someone is searching those databases. It's entirely possible that researchers learn about relevant collections from archivists who have searched these databases. Ultimately, the limited success of collection-level databases calls into question the theory that such a record (a MARC "wrapper") will be useful for locating relevant finding aids, which are then used for more detailed access.

Within a repository, the most obvious approach to providing integrated access might look something like library catalog. Descriptions are organized under relevant headings. Under a heading for Apache Indians, a researcher might find an abstract of the A. F. Randall photographs of Apache Indians and an abstract of the Apache Indian Photographs, a chunk in the Fred Harvey Company records.

Generally, the collection level abstract lacks the detail of the contents list, and the component level abstracts lack the context of the surrounding documents. However, regardless of level the heading must justified in the description to aid in selection of materials. This approach treats each level of description as a hierarchy of successive in-analytics.

[Incorporate examples of collection, series, folder, and item level descriptions.]

[Work in Terry Eastwood's observations about the shift in Canadian descriptive practice from a catalog to finding aids, with the concurrent disintegration of access.]

Such an index could even be produced on three-by-five cards, and a few repositories have cabinets filled with thousands of drawers of such cards (though often those cards are merely pointers lacking the description to aid in selection). However, the computer can improve searching and display of archival descriptions in the same manner that OPACs improve searching and display of bibliographic descriptions in catalogs.

[Description of archival OPAC.]

In fact, the computer can help us here. We do not need to change the fundamental structure of finding aids. But, I believe that structure needs to be atomized so that it can organize and present the information it contains in different ways.

[For instance, instead of presenting a patron with a set of guides, present them with summaries based on the contents of all guides in a repository. Answer the question, "What photographers are in your collection?" with a summary list linked to the relevant citations in the guides, rather than pointing them to a set of guides--which could be every guide in the repository.]

To do this, I think we need to quit thinking of the finding aid as a monolithic document fixed in its order and content. Instead, we need to see the finding aid as a collection of descriptive elements that can be sorted in many orders and can be filtered to produce subsets of the whole. The traditional finding aid is just one view of a collection, showing its original order and the organic relationships between the parts. But, if we cut the folder headings apart, we could select and sort them any number of ways.

At the Heard Museum, a collection and its components are described in a single database; the same five access points are provided for the collection, its series, and the folders. To produce a repository guide, I generate a list of collection-level descriptions organized by provenance. To produce a finding guide, I generate a report that includes descriptions of all components in the collection in the order in which they appear in the collection. To produce a topical catalog, I produce a report organizing the individual descriptions by subject headings.

Other software on the near horizon offers other solutions:

[Web/HTML]

At the Heard Museum, I've used Web software to provide this level of access while preserving the completeness and context of information in a finding aid. A patron can approach the collections through a traditional finding aid that documents the collection and its components in its original order. Or a patron can browse a list of topical headings that point to the full description exactly where it appears in the finding guide.

[SGML]


VII. Indexing Vocabularies

[LCSH, AAT, LCGTM, local headings]

[Effective use of expanded authority records for name and topical headings]

We recognize the value of providing context through an administrative history/biographical note, but do we index that note? At the Heard Museum, I am in the process of adding extensive notes to be coupled to headings in the catalog. Not only will these notes provide basic information for novice researchers and handy reminders for the forgetful experts, they provide justification for additional access points. Tribal affiliations, occupations, and other characteristics will be traced based on the information in those notes.

[Form of headings]

Collections should be indexed using headings that aid browsing. Because of the focused collecting policies of most repositories, most indexing vocabularies are too broad. An Arizona history collection has three basic divisions for history in LCSH; hence, local indexing practice must build better vocabularies. The notes in those vocabularies which describe the use of terms do more than bring consistency to their application; they help patrons understand the nuances of the collection. It may seem elementary to us, and it may seem silly to provide this information to expert researchers; but we shouldn't assume all are researchers are experts. (Or that their memories are perfect.)


VIII. Strategies for Access

All this access may sound impossible, and I would be the first to agree that direct access to every creator, title, subject, and form/genre in a repository is impossible given the realities of resources. However, basic archival principles suggest an approach that makes this type of access reasonable:

First and foremost, allow the mission and appraisal standards to guide what access points will be provided, which collections to analyze in depth, and to what level those collections should be analyzed. Do not apply one standard to all materials, but emphasize access to your important collections. Do not assume that all materials in even your most important collections should be exhaustively indexed.

If you are a regional history collection, you may want to provide access by format in the broadest terms based on questions received at the reference desk. At the Arizona Collection, I often was asked for photographic materials, and I was occasionally asked specifically for stereographs. However, I was not the place to study the history of photography (that's down the street). I provided limited access to stereographs, but no access by process; I couldn't help the one researcher in six years who wanted to study a specific process.

Frank Burke is concerned that a great deal of description of institutional records (as opposed to special collections) is focused on the odds-and-ends series that make up less than ten percent of the whole. If the beef is in one series, spend your time analyzing the beef and disregard the other series.

Second, don't discount the general archival principle of broader access before specific access. No doubt the richest veins are those at the collection level. And those broader headings provide some access to narrower topics through contextual clues. Although I'm calling for more component- level analysis, access is organic; it can grow over time.

Again, appraisal must guide descriptive practice. After a collection is first accessioned, it is given a single description as an organic whole. While the accession level record is refined during processing, some collections of lesser value may not be described in any greater detail; a researcher calling for the materials will be brought the collection exactly as it stands, boxes and folders. In other instances, collection of significant value (especially those associated with an exhibition we're working on) will receive collection, series, and folder or item level descriptions.

Ultimately, a finding aid's value is the information which an archivist has captured about a collection and how that information is organized. The archivist can help researchers by including comments that bundle many pieces into a more meaningful chunk and by assigning headings to help them locate these materials. I believe this synthesis of information--which the computer is incapable of--makes the guide useful to researchers. The archivist's description must transcend mere facts and embody knowledge.

Free-form or standard, guides must contain real meat if they are to be useful. How many hours do we spend producing folder lists that offer no real information? I believe our patrons will be better served if we provide them a beefy summary that brings out a few salient items rather than a list of slightly varying folder headings. Again, this summary records the archivist's knowledge in a means that can be accessed by the patron--long after the archivist is gone.

Most finding guides are top-heavy, with a great deal of description of the whole and minimal description of the components. But, series, subseries, and items within a collection can also be described in the same fashion.

My guide to the Fred Harvey collection doesn't look at folders or items; it looks at "chunks" within the collection. I took the time I would have spent transcribing fifty folder headings to write a beefy description of the contents. Asking patrons to look through roughly a box (and when they get the box, they'll be guided by the folder headings I didn't transcribe) is not onerous.

Providing this additional description gives keyword and WAIS search engines a lot more to sink their teeth into than folder headings.


IX. The archivist as access tool

I believe the real value of automation is its ability to provide different perspectives on our holdings. We have long stood by the traditions of provenance and original order, two principles I continue to believe in firmly. Yet, many of the researchers I have contact with are not very interested in the context of creation; they want direct access to components within the collection. If we can use the computer to sort the descriptions according to different access points, we can provide more ways to access the information in the materials.


X. How to automate

MARC: Have we given up too soon?

Just when I thought we'd agreed on a data standard, everyone's decided MARC's out of date. Well, MARC was developed in the sixties to produce card catalog cards, and most people seem to think that if we were going to do it all over again we'd do something different. But we've got it. And for all that I've heard about how MARC doesn't do everything, it does do 95% of what I need it to do.

When we think of MARC, we often think of collection or series level descriptions on the big utilities OCLC and RLIN. But, I use MARC locally to produce all my access tools, combining descriptions from every level into a variety of tools ranging from the finding guides to topical catalogs to web guides.

MARC also suggests enormously complex records with fixed fields and weird encoding. MARC need be only as complex as is necessary for the job at hand; MARC requires only a title and a statement of extent: two fields. If you don't need more, don't use more. But if you do need more, it's there.

Simultaneously, MARC is only half the picture. What standard shall we use for data contents? For example, I use APPM as rules to describe every level within the collection.

I think we're just getting to the point where we have the expertise and tools to explore the potential of MARC. I'd hate to see all the other neat things happening in automation sidetrack some of the creative genius out there in the fields from playing with this particular tool right now.

The Web: Have we jumped the gun?

Gopher, the World Wide Web, and other Internet tools are fairly primitive, but they have a number of distinct features. From the archivist's point of view, they can accommodate existing electronic guides in half a dozen formats, they can be linked together and to digital surrogates of the materials they're describing.

Researchers seem to like them. I suspect they like them more than the bibliographic utilities because they're free and because they contain more meat than a collection-level record. Another possible reason researchers like guides on the net is that they don't have to try to explain to an archivist what they're looking for; if this is the case, we need to look even more closely at the reference process to ensure that patrons are not intimidated or patronized.

But the Internet is chaotic and HTML is not a meaningful standard. If you load your guides, will people find them to download them? Different search engines retrieve the contents of those guides with varying degrees of success.

SGML: Have we bitten off more than we can chew?

SGML seems to be the next thing on the Internet horizon. It has all the benefits of HTML and promises much more. I want to tread very lightly here because I have not been able to keep up with everything coming out of Daniel Pitti's work with the Berkeley Finding Aids Project. I've heard nothing but superlatives and that SGML can do anything and everything I can imagine. Hence, the few concerns I raise are humbly offered with a big grain of salt.

BFAP has developed a document type definition (DTD), which serves as a standard data structure. But, they have not developed a data contents standard. In my opinion, this is something of the cart before the horse. However, they have defined the basic data elements of a finding aid, and that is certainly the first step in developing a contents standard.

What I've seen of SGML suggests it can do an enormous amount, but I haven't seen it do something I need to do but cannot using MARC and HTML. No doubt SGML can do any number of things more elegantly, but I'm not sure I need that elegance.

One of the reasons the Web and HTML are widely accepted is because they're free. Only one SGML browser is available as shareware, and but it doesn't do much more than produce a document prettier than HTML. How many researchers are going to fork over $200 for search and reporting features without a free test drive?


XI. Predicting the future

Ultimately, I'm less concerned about data architecture and software considerations. I feel I can create finding aids and other access tools that accomplish my objectives through the means I've outlined using MARC and HTML. In addition to SGML, I suspect we'll see a variety of tools that can be used to increase access for a good while. While this may complicate information sharing and the development of multi-repository indices, I suspect a lot of that can be ironed out over time.

I'm more concerned about the contents of our guides, not how they're automated. First and foremost, we need quality information in our guides. Archivists' value is providing researchers access to our materials. As long as we care for collections, there will be a need for access. If the profession is to truly flourish, we must ensure that people can find the information in our collections. To do that, we must always look for ways to embody our knowledge of the collections in our finding aids.


Notes:

(1)  Anonymous correspondent, private e-mail to the author, 21 July 1995.

(2)  On the LCSH-AMC listserv. Discussion indicated two distinct schools; those who transcribe and those who supply. Those who transcribe seem to do so exactly, even when the heading offers little sense of the contents. "Hopi Elevator" may sound like something interesting to someone working on mechanical, intra-building transportation, only to discover it's a genre piece of a Hopi man climbing an elevator; similarly, "Hopi Express" is not about a train, but shows Hopi children on a burro.

(3)  Private correspondence, April 3, 1996.

(4)  "The View from the Researcher's Desk: Historians' Perceptions of Research and Repositories," American Archivist 54:1.

(5)  "Understanding Users and Use at the National Archives," presented at the Society of American Archivists annual meeting, 1991.

(6)  "The Illusion of Omniscience: Subject Access and the Reference Archivist," American Archivist 45:1.

(7)  Helen R. Tibbo, "The Epic Struggle: Subject Retrieval from Large Bibliographic Databases," American Archivist 57:2 (Spring 1994).

(8)  One should note that this sense of archival subject may be applied to books as well. Looking at a collection of books' secondary value, one might discover any number of additional subjects. A collection of German literature on genetics from World War II might likely have subjects relating to propaganda, the war, and Jews.

(9)  I've picked this list because I think it illustrates my point well and because it would be unlikely someone might identify the repository from which it came. (Of course, the fact that its subject matter has no clues as to the collection, much less a repository likely to hold it, is evidence of my argument.) On the chance someone might recognize the list, I would like to note that I have no desire to demean or ridicule its creator. Not knowing the circumstances, there might be a very good reason for writing it this way.

(10)  Marion Matters, in private conversation, has questioned if archival and bibliographic description are truly so different. She observes that most descriptions for a linear inch of material that sufficiently abstract author, title, principal subjects, and form/genre easily fit on a three-by-five card, regardless of whether that material is a book, a manuscript, or a photograph. The difference lies in the amount of information necessary to describe sufficiently material measured in linear feet.

Nevertheless, few bibliographic catalogers supply much information beyond a transcription of the titlepage, while archivists generally must abstract the type of information typically found on a titlepage from the materials themselves.

(11)  Frank Burke, "Real Archivists Don't Use MARC," Archives & Museum Informatics 3:1 (Spring 1989), p. 7.