Defining Humanities Computing methodology
by Manfred Thaller

During the development of computing fads have been coming and going; the way in which the Humanities reacted to "computing", "computer science", "information technology" or whatsoever else the most popular term has been at a specific time, has changed along with them. This dependency on trends, as reflected by popular media, has not always been healthy. The discussions about artificial intelligence in the later eighties are a good example. Then for a short time not at least to speak about the necessity of including an expert system in a project was almost impossible. Unhealthy this has not been so much, because very little ever came of that fad, but because it has rather fundamentally discredited the notion of expert systems being relevant for the Humanities at all. Therefore, even considering the notion seems currently to be politically almost as unwise, as it seemed unavoidable then.

The following section tries to clarify, what we can actually say about the relationship between computer science and the Humanities, that remains valid, while fads change. We add one more restriction: Traditionally discussions of that type become easily unfocused, because there are three types of relationships between a Humanities' scholar and computing technology, which, to the detriment of them all, are frequently intermingled. Computers can be used to gain scientific knowledge, to teach that knowledge and to disseminate it. These three sets of activities are of course related; but the challenges they pose and the problems they have to solve, are quite fundamentally different.

In the following, we intentionally restrict ourselves to the first of the three: We are dealing with methods, that is with the canon needed to increase the agreed upon knowledge within an academic field. And we restrict ourselves with those, which can profit from the use of computational equipment or concepts. As this invariably requires the possibility to pose a question in such a way, that a formalism exists, we speak about "formal methods". This is an intentional restriction of the field of discussion. We do not e.g., discuss how to use a computer to teach a traditional subject, nor how to produce books more cheaply. (Though at some stage the reader will find a discussion on how far the new media make information available at such a scale, that the methods to cope with it have to change).

A final restriction: Information Technology as such is changing the world in which we are living in many ways. The arts on the one hand and the social sciences on the other, are very much geared to the reflection and interpretation of the world in which we are living. They, therefore, have to tackle IT as other changes of the society in which we live. Humanities, in our understanding, are different, however, from the production of art and from the interpretation of societal changes. So neither the artistic, nor the sociological implications of a new generation of media are our topic.

While we restrict our topic in this way, in another we would like to see it as broad as possible. The formalisms needed to apply some of the tools to handle language, which have been developed in computer linguistics, are part of our agenda; as are the pre-requisites to apply the canon of quantitative methods; as are the considerations which have to go into the transfer of knowledge from a document into a data base; as are the assumptions that go into the application of a GIS to a Humanities' topic. These are formalisms, which from our point of view, however, exist independently from the Humanities. GIS are not our topic, nor databases, nor quantitative methods, nor, indeed, computational linguistics. Our subject is their application to the knowledge domain of the Humanities, to improve the possibilities for research in the latter.

ó

Talking to Ralph Griswold, the developer of SNOBOL, one of the early programming languages dedicated to the processing of strings and, therefore, textual objects, one of the authors of this chapter once listened to the following story: "You know, computer science is all but a homogenous field. A short time ago I had a European visitor. Talking about various matters he said at one stage: 'Being a professor of computer science, I sincerely hope, that nobody will ask me to get close to a keyboard again.' Having done programming for most of my life,", Ralph Griswold continued, "I did feel offended."

"Computer science" is a very wide ranging field, going from one extreme, where it becomes almost indistinguishable from mathematical logic, to another one, where it is equally hard to tell the differences between it and electrical engineering. This, of course describes the genealogy of the field: in mathematics the Turing machine would be an interesting construct, even if it would have no relationship to anything ever built out of a material more solid than ideas. Similarly, the building blocks of computers arose independently of their construction: transistors did change many aspects of our daily life, long before computers built from them started doing so in their own right.

Having widely different ancestors in itself, computer science in turn became parent to a very mixed crowd of offspring: Disciplines like Medical computer science, Juridical computer science have sprung up in recent years abundantly. Some of them, like the "forestry research computer science" (Forstliche Biometrie und Informatik) for which a German university recently accepted a Habilitation, will probably continue to raise eyebrows for some time to come. Others, notably computational linguistics, have established themselves as independent areas of research and self contained academic disciplines quite beyond dispute.

The existence of this wide variety of disciplines, related to or spun off from computer science in general, implies two things. (a) In computer science itself, hybrid as it is, there must be a core of methods, which are independent from their origins (otherwise we would speak also of medical mathematics). (b) For the application of this methodological core a thorough understanding of the knowledge domain to which it is applied is necessary (otherwise the concept of a medical computer science would not make sense).

As in many other cases, what does not constitute this "self contained, but application related" core is more easily specified, than what does. Pure and clean engineering topics are not part of it - though, of course, the construction of sensors in the bio-sciences may require knowledge, which the construction of sensors in thermal physics does not. The logical hard core should also be independent from the disciplines to which it is applied - though, of course, there are fields where fuzzy systems and their backing theory are more central than within others.

Leaving aside these subtle shades, for the purpose of a short introduction, we define: The core of computer science, which is more than the sum of its intellectual ancestors, which still requires an intimate knowledge of the knowledge domain to which it is applied, however, is the following.

This may, at first look, seem to be a highly abstract definition, which has few practical consequences, particularly if compared to what is actually going on in Humanities Computing.

As to practical consequences: Surprisingly the preceding paragraphs lead to a few conclusions, which may explain, why a very large number of attempts at introducing university courses in some branch of Humanities Computing have failed, over the years.

If we accept the assumption, that the way in which the general core of computational methods, in the sense above, is used, depends on the domain of knowledge to which it is applied, we also have to accept, that applying computational methods without an understanding of the domain to which they are applied, leads to disaster. In more practical terms: A German university in the early eighties introduced a study programme called Informatik für die Geisteswissenschaften, which required more course credits for numerical analysis than a computer science master at many other universities. The same course did not require of the students to work in a single project, which asked them to apply their knowledge to a topic of the Humanities. After a spectacular student interest in the first year, the course had to be stopped in the second, as no students were willing to take it anymore.

It is pointless to teach computer science to Humanities scholars or students, when it is not directly related to their domain of expertise.

On the other hand, time and again, skills in computing are mistaken by Humanities scholars for a qualification in computer science. A good point in case is the plethora of word processing courses, which rose at American universities in the early days of the PC introduction, again, in the eighties. Few of these did not collapse within a few years, as the students discovered, that it was ultimately more convenient to learn the content of such courses at their own pace, based on general manuals and introductions.

Humanities computing, which is not based on an understanding, what computer science is all about, is a transient phenomenon, fluctuating wildly with the fads of fashion.

If these seem for the reader not sufficiently practical conclusions drawn from the initial statements, we ask her/him to remain patient for one more consideration, before we turn the observations into recommendations. How are the definitions above to what is actually going on at European universities?

We propose, to group the teaching and research, that can be observed at the various Humanities related institutes and faculties into three groups.

  1. A very large number of courses at Europe's universities are dedicated to the provision of basic computational skills for Humanities students. These will usually be geared towards specific disciplinary needs: A student of Russian needs to know how to write, display and print Cyrillic. As long as they are related to skills only, they do not influence the way in which scientific results are gained. At this level we are simply talking about the application of tools.
  2. A much smaller number of courses - and a substantial number of research projects - use computationally based methods (like data base technology) or computationally dependent ones (like statistics) to gain scientific results, which could not be gained without the tools employed. At this level, therefore, we talk about the application of methods.
  3. A small number of courses and projects, finally, deal with the study of computational methods themselves, aiming at their improved understanding, without claiming directly, to gain a new insight in the discipline. They are involved with the development of methods.

For readability's sake, we will refer to these levels in the following paragraphs as the Humanities Computer Literacy, the Humanities Computing and the Humanities Computer Science levels respectively.

For all practical purposes, most public discussions have been focusing on the Humanities Computer Literacy level. This is most unfortunate, as it is exactly here, where the changes of requirements are most frequent. And it is the low mean life expectancy of such courses, which create the feeling, that no progress is being made. The decision of a German university to accept a course "Computer Science for German Studies: WordStar 2000" in the eighties, did damage the credibility of the Humanities in the computer science department at that university. Worse: the simple fact of the short half life of such application packages implies a very short usefulness of such courses. Unfortunately, the problem is not restricted only to individual courses, which may be read as amusing examples of uninformed enthusiasm. It can have very serious consequences, indeed: There has been a department founded for Computing for the Humanities, which was created in the eighties to provide computer literacy for each student of the arts faculty. Not to far into the nineties, at least one of the departments of that faculty put a threat to them, that they would train their students by independent courses, if they would not revise their curriculum to the recent needs. And recently this department has been closed down, as the arts faculty considered it without value for its students.

Considering elitist positions, one might wonder, whether it is the task of a university to teach basic computer literacy at all. Students never got academic credit for typewriting skills before the invention of word processing; why should they get such for word processing skills now? Before being accused of being overly elitist, however, we would like to point to two important differences.

The more visible one: Typewriting has been a skill that remained stable between finishing secondary school and gaining a doctorate. The modern information technologies have a habit of changing sufficiently rapidly so that what was almost arcane knowledge at the start of a freshman's (or woman's) first term now, can easily have turned into basic computer literacy at the time of her or his graduation as master, leave alone PhD. If we are taking the notion of lifelong learning serious, we might, therefore, claim, that computer literacy should indeed be something, the arts faculties should be concerned about: Not for its own sake, but to train students in updating their own knowledge - and impressing the constant necessity of it upon them.

Less visible: While new techniques like the usage of word processing, spread sheets, simple data bases and most recently web-authoring have rapidly turned from advanced knowledge to survival skills, one can master them completely and thoroughly - and still be helpless, when applying them to a Humanities discipline. Even today many people who use word processors routinely will find it challenging to include Cyrillic characters into their texts. A person can routinely submit his tax returns with the help of a spreadsheet and still despair in doing meaningful computations with a medieval list of taxation. A student can have a brilliant homepage but still be unable to encode a literary text in such a way, that it remains useful beyond the lifetime of his current full text retrieval package. Even computer literacy, therefore, has to be taught in the Humanities by concentrating on the specific problems posed by the disciplines. Word processing for literary disciplines has to concentrate on peculiarities of the specific languages of editorial styles; quantitative packages have to be taught to historians in a way to prepare them for a world of non-decimal numbers; markup for text-based disciplines has to look to general principles, not the peculiarities of a specific generation of browsers.

To fulfil both requirements, Humanities Computer Literacy should be taught to Humanities students only, if two prerequisites can be taken for granted: (a) It is taught by teachers who themselves are fully trained in Humanities Computing. (b) There is no fixed canon of skills, but it is understood, that precisely the courses at the most introductory level have to be revised year by year to keep them at the shifting edge between what a student can be expected to learn by her or himself and what they can not.

In a nutshell: Nobody should teach computing skills to a Humanities student, who has no experience in computer supported Humanities research, preferably in a subject close to the one from which the student population of the course to be taught is being recruited. Exceptions always exist; but there are few of the (many) conferences on one angle of Humanities computing or the other taking place every year, where the great problems of communication between "pure" technicians and content-interested Humanities students are not being described as a severe problem.

Humanities Computing, the second of our three levels does than constitute the sum of all existing methods, which can enhance the scientific validity of results in research or enable the pursuit of research strategies which otherwise would not be possible. It starts with methods adapted from other fields of study - for example the canon of analytical statistics, which has been developed for various fields. To apply this canon to authorship studies, the traditional sampling techniques have to be augmented in specific ways. It continues with methods which originated in other fields, has developed in completely independent approaches in specific Humanities disciplines, however. In art history, e.g., thesaurus based systems were originally adapted from other disciplines, have taken on a life of their own and started a discussion on the proper way to describe the content of images, which has no clear equivalence in other fields. An finally, there are computational methods, which developed more or less within a field of the Humanities, independent of other disciplines. For example, the long and rich tradition of methods and techniques for the identification of individuals in historical documents, though their names may vary by orthography, variable subsetting of name sets, property based name shifts and other causes.

Humanities Computing is a field, which is most clearly in need of being stabilised institutionally. The tradition of the field is incredibly long. Many of the questions about the best way of entering Humanities information into a computer in a form it can handle, which are being discussed today, can already be found at the conference volume of the Wartenstein conference in 1962, which seems to have been the first attempt at surveying the field. One of Humanities Computing major problems is, that it has a tradition, of which few of its followers are aware. It is highly significant in that context, that today a fresh wave of discussions about whether such a field has been ignited by two widely popularised WWW papers of Willard McCarty, where the author simply assumes, that he can totally ignore a tradition of forty years and start from scratch.

This lack of perception is most unfortunate for the individual researcher, as it usually means, that newcomers to the field have to rediscover many solutions, which are well known since a long time already. It is even more unfortunate for the Humanities as a whole, as it means that the methodological advancement proceeds much slower, that it could. In most European countries, Humanities Computing describes a specific stage in the life of a scholar. The vast majority of practitioners are in the stage of their PhD thesis or in the years immediately after that. And, in the current system, in most European countries they face, after working actively in the field for ca. five years, a crucial decision. Either they become computer specialists, which means that the leave academia for the industry, or the fall back upon more traditional straits in their home disciplines, as permanent positions for Humanities Computing specialists rarely exist.

As long, as we stay with our original definition, that Humanities Computing is defined as the application of computational tools for the benefit of the various Humanities disciplines, there is nothing wrong with this situation. Still, it means, that many researchers all over Europe are constantly re-discovering some of the basics of Humanities Computing, while few, if any, possibilities exist to hand on their discoveries further. To solve that situation, we propose, that, as we asked Humanities Computer Literacy to be taught by people with a Humanities Computing background, Humanities Computing should in turn be taught by Humanities Computing Science specialists. Persons, that is, which make the study and development of the possibilities of computer applications in the Humanities their profession. With a solid background in one or more Humanities fields they understand the problems of these disciplines; with a strong background in computer science in general, they are able to contribute to the development of data structures and algorithms as defined initially.

This field of Humanities Computer Science has to be European from the very start. The field itself profits from the strongest possible emphasis on internationalisation: as any other new discipline, it is in the danger of being influenced overly much by the idiosyncrasies and preferences of a few individuals dominating a national academic system, otherwise.

Creating a European framework of reference has, however also an added European value. Very few institutions exist today, which offer training on a level, which could be clearly identified as Humanities Computer Science by the terms above. There are many attempts, however, to offer to Humanities students introductions into computational skills and appropriate background knowledge, bundled in a confusing plethora of degrees add-on diplomas and occupationally qualifying course. This has two massive drawbacks: