1. Information on Pynchon's Works: a Short Overview
The first bibliography of secondary items on Thomas Pynchon was published in 1972, not yet a decade after the publication of V., when Joseph Weixlmann listed in Critique 14.2 10 pages of items. This indicates even then, prior to the publication of Gravity's Rainbow, 'The Importance of Thomas Pynchon' (1). Some more were to follow (Herzberg, 1976; Scotto, 1977; Nakatani, 1987 and 1989) culminating in an extraordinary publication by Clifford Mead in 1989: apart from 36 pages of primary items, some illustrations, pictures and an overview of Pynchon's juvenilia, he identified at the time about 90 pages of secondary items, ranging from complete monographies on Pynchon to valuable articles in the popular press and only omitting what he calls in the Preface some trifles of negligible value (2). He also lists 150 Ph.D. dissertations of Canadian and American origin. Mead's bibliography as a whole was not updated since, though he published 2 more bibliographies: one on Vineland (1994) and another on Mason & Dixon (2000), each about 10 pages. Three other were published in Pynchon Notes: two of these (Sato, 1981; Osterhaus, 1999) on Pynchon studies in Japan, the third one being a cumulative bibliography of the Pynchon Notes issues 1 to 35 (Krafft, 1995) (3). Reception studies of Vineland (Keesey, 1990; McHoul, 1990) and Mason & Dixon (Clerc, 1997; Keesey, 1995 [sic]), though not strictly speaking bibliographies, contain information on more than 120 new items in the popular press.
The Modern Language Association organised in 1975, 1976 and 1977 special sessions devoted to Pynchon's works, but chose to discontinue these in the two following years. Out of a certain discontentment with this situation, two young scholars, John M. Krafft and Khachig Tölölyan founded Pynchon Notes in 1979, a newsletter published in order to keep track of publications from and about Thomas Pynchon, and as a forum to renew or establish contacts between readers and academics interested in Pynchon's work alike. Pynchon Notes grew rapidly into a well-respected journal with the rather atypical eigenvalue that it is also read outside Academe. Every issue contains a bibliographical section of primary and secondary items. Contributors, first from the US and Canada, and later on from around the world, reflecting the internationalization of Pynchon studies, keep on sending in bibliographical information, ranging from what the editors called in Pynchon Notes 1 (October 1979) the minutest fragments on Pynchon in the popular press, to bibliographical information about scholarly reviews, articles, lectures, monographies, information about artistic pieces or even multimedia, works in progress and the like, sometimes accompanied with a short quote or description. This section is quite substantial: Pynchon Notes 54-55 (Spring-Fall 2008) listed exactly 300 new items. John Krafft stores all the published or to-be-published bibliograhical information in one document which counted in the Summer of 2006 an impressive 266 pages. Excluding the information on nearly 400 dissertations and the primary bibliographical information (212 items, including 39 support notices by Pynchon), it has over 3,000 secondary items listed in 27 languages. A rough estimate is that the number of items on Pynchon and his works has more than tripled since 1989.
Pynchon's readers and scholars were early internet adopters: the pynchon-l discussion list archives date back to 1991, and in 1995 the Pomona Pages were published, the first web site on Pynchon. Numerous Pynchon pages and sites like Alan B. Ruch's and Larry Daw's The Modern Word or Tim Ware's ThomasPynchon.com, often contain suggestions for further reading; non-English languages were to follow: (" Die Saubere Schweine", Sell, s.d.). Throughout the '90s electronic (versions of printed) journals appeared, Postmodern Culture being the first peer-reviewed one that was exclusively available online. Many institutions made the data in their archives, or rather the information on their data, more or less public. Furthermore, there are the online wiki's on the novels (Ware et alii, 2006 onwards), the blogs and the internet group readings. The quantity and speed with which all this information is published seems to be on the increase: two months after the publication of Against the Day on November 21, 2006, one could easily identify 75 articles in print and online; and the first conference on Pynchon's biggest was organised 7 months after its publication, in Tours, France. A week after Against the Grain, the 9th International Pynchon Week at the Ludwig-Maximilian University in Muenich (June 2008), organised by Sascha Poehlmann, most lectures were available as mp3 online.
A massive amount of information: how to cope with it? In other words, what kind of information are you looking for? There are many angles from which to check information on Pynchon and his works: a curious reader just wants to acquire more general information on Pynchon's works. A student may want to know more on narratology. A surfer might be interested in online reviews of The Crying of Lot 49. Someone wants to see a list of critical articles on a Pynchon monography like Beressem's Pynchon's Poetics. A Japanese scholar looks for a list of available articles on Gravity's Rainbow in his or her language. Most people these days use search engines; while online catalogues and search engines have many advantages (easy to use, quick, and returning a lot of information) they are not detailed enough. Imagine you want to know more on Mélanie l'Heuremaudit, a character in V.'s Chapter 3. Your trusted catalogue would never hold information that is so detailed that it refers to a particular chapter in a certain monography, in this particular case in Beressem's Pynchon's Poetics. A catalogue should hold information that this study contains chapters about Pynchon's first four novels, probably it will also mention that the theoretical frame uses Lacan, Derrida and Baudrillard. It will never list that 30 pages are devoted to V.'s Chapter 3. Another example is the strange search engine used by Project Muse. This search engine is apparently based on two algorithms: (1) the more recent an article is written, the more relevant, and (2) the number of times a particular search string is used in an article. This is inefficient; irrelevant items might show up and in fact they do. And a third issue we are faced with is that articles 'talk' to each other. Duyfhuizen's 'From Potsdam to Putzi's' shows convincingly that at least a part of Gravity's Rainbow cannot follow Weisenburger's strict and 'desirable' circular chronology, forcing Weisenburger to loosen up his chronology. The latter gives Duijfhuijzen an obligatory nod in the Companion's Introduction to the 2nd edition, while paradoxically stating that 'claims about the circular design for Gravity's Rainbow have stood up well' and apparently not mentioning this article in the bibliography. While Duyfhuijzen's article cannot be classified as a review of criticism, as 15 percent of the articles are, it must be interesting to compute this kind of intellectual dialogue. All in all, the question how do I cope with such an amount of information?' can be stated otherwise: is the bibliographical information I retrieve, valid for me? To have this question answered, transparent criteria of the qualification of a secondary bibliographical item are required. Hence this attempt to create a relational database which ought to be structured in such a way that the qualification of an item is part of the database itself. Let's take a nice walk to the library.
2. A Nostalgic Trip to the Library
It may sound highly unlikely, but yes, there was a time we did not have computers around. An encyclopedia was a series of printed books! Imagine the teacher told you to deliver 5 pages about the V-1 and V-2 attacks on Antwerp: you went to the local library where there was a whole room full of drawers. Stacked in these drawers, thousands and thousands of cards that smelled strangely. Each card contained enough information to identify an item uniquely, usually a book. You checked for the drawers where the cards were sorted by subject, which in our case was History → 20th Century → World War 2 → Weapons → Bombs and Rockets; or it could be Sciences → Natural Sciences → Physics → Aerodynamics → Rockets. Or you did not find what you were looking for, which was more likely. You went over to the nice lady working there who consulted a strange book and said to check out section 629.4. Now, the whole library was sorted around this funny numbers, starting with 000.00 (no pun intended). All books, even novels had such a weird number on the cover.
These cards migrated to databases, and the way the cards were sorted also. The examples above show why this database is called hierarchical: parents can have many children, and each child can become a parent. Disadvantage is that once a book was classified using the Dewey Decimal Classification (or another one) the user had to follow a specific path to the information he or she was looking for. The way this information is stored is rather unflexible: it is not unlikely that Gravity's Rainbow is classified by a lazy librarian as Science Fiction; it was after all one of the candidates for the 1974 Nebula. When running a query on contemporary American literature it would never show up: the query would be something like: show me all items with a number that is 813.54, while Gravity's Rainbow was classified as 813.54.19. Search engines had to be developed to circumvent this path to information and other tools for exporting and presenting the information were needed: the additonal IT workload was, and is, all in all a very expensive process. An example is the catalogue of catalogues, OCLC's WorldCat (OCLC owns the Dewey Decimal Classification) where only the search engine remains, implying that the DDC is in fact useless as the basis for an online bibliographical database system -not to mention that using a search engine as solution is rather inelegant. Hierarchical databases are useful in many environments but a new database model was needed.
The logical description of a relational database system based on the set theory in mathematics was published in 1970 by E.F. Codd, a British mathematician employed at the IBM laboratories. At the time, This Firm was not interested in developing this model further into commercial relational database programs; it would do so only after being pushed into it by some of its largest accounts. During the '70s work was carried out in different universities, mainly at Berkeley, and by companies like IBM on a computer language intended for the manipulation of information stored in databases. While not strictly speaking relational, this language, now called SQL and available in many flavors, offers the possibility of creating relationships between data. E.F. Codd was very unhappy with its development because not following the strict rules he recommends for relational databases; and a fully fledged relational computer language simply does not exist. At the end of the '70s a small Californian start up company called Relational Software that was mainly funded by the American Army, released the first commercial database system that allowed creating relations between tables, beating IBM with only a few weeks. It was the very same year Pynchon Notes kicked off (4). And the start up changed its name to Oracle.
Think of a table in a database as a list with rows and columns; each row s called a record and the cross section of a column with a row is called a field. A relational database contains many of these lists and they correlate: by defining relationships between tables -in fact a table can be described as a set of relationships, a record in table A can refer to information in table B, or C, or N, or a combination of these tables. The relationship is often established through a unique key, usually a meaningless and comuter generated number. The logical combinations between 'one' and 'many' allow only 4 kinds of relationships. A database cannot be called simply 'relational' or 'non-relational': there are varying degrees of it, governed by a set a of rules called normalization. Each n+1 rule requires that the previous one is fulfilled. Most databases do not go beyond rule 3 which states that a set of data can be split into smaller chunks of information, but the database designer is too lazy to do so; beyond rule 3 tables a table can only be a set of relations: it holds information that only refers to other tables. Information is inserted once, and used endlessly. A table must contain the most basic human-usable piece of information and the stored information is unformatted or 'raw'. Finally, the logical design of the relationships between tables defines the relevance of a query. The record one retrieves is a combination of the information stored into different tables and does not exist as such. And that is what we are looking for: a system where the qualification of an item is in the database itself.
There's a problematic downside to the relational model: though efficient in maintaining and querying a database which includes the presentation of the content, the 'insert once' rule is rather timeconsuming. To take an example: the geographical location of a publisher requires writing into 3 different tables. A fourth table contains information on the publisher itself and refers to 5 other tables but excludes the information a particular publication. Once all this information is stored in the tables it can be easily re-used: the speed with which new records are added grows. But the initial work of making it possible to present an item is huge. Having accumulated electronic data over the last four decades, it becomes nearly impossible for a library to migrate the data from a catalogue's non-relational model to a relational one and explains why a library prefers to upload the information in a non-relational way. To present 3,000plus bibliographical items requires just below 13,000 records in 17 tables (the rate of roughly 1 to 4 is likely to fall for 4,000 or 5,000 bibliographical items). A relational database in an ideal world is built from scratch.
There are several contexts in which a bibliographical item can show up: we've seen the library cards that ultimately point to a physical location, with a varying sorting order, usually subjects according to the DDC, or alphabetically, which means by person(s) linked to an item. There is the bibliography at the end of an article and written in a particular style (MLA, Chicago, ...). Or, there are the books and articles that systematically sum up various items on a particular subject, like Mead's. One of the requirements of a database is that there is a complete distinction between form and content. The same relational database can be used for all of these forms of because the form, which includes the sorting order can be simply written into the query, be it for one item, or for many.
Finally, there is the issue of hygienic programming. One of the basic requirements is that a database server has to do all the work, and not the client machine -that is why it is called a server. This, however, excludes all techniques where information can be manipulated directly on the client machine. This makes the set up for manipulating or retrieving data a bit more complicated, but it guarantees that the integrity of the data is maintained.
3. A Query Example
A query, or view, is literally what it is: you ask the database something (you query), and the database responds in two steps:
- The correct constituting blocks of information are searched for and assembled, and then formatted according to hidden, but programmed, instructions on the database server. This process is called parsing.
- The formatted result is sent back to the client (that is you) and shown, hence the word view.
Here's a question containing 7 parameters, albeit translated into human language: which articles (parameter 1) in Norvegian (2) essai collections (3) in English (4) published after the year 2000 (5) deal extensively with [Pynchon's] Gravity's Rainbow's (6) Max und Moritz (7)? The response of the database is information on an otherwise fascinating article:
Dalsgaard, Inger H. "Gravity's Rainbow: 'An Historical Novel of a Whole New Sort'." Blissful Bewilderment: Studies in the Fiction of Thomas Pynchon . Ed. Anne Mangen and Rolf Gaasland. Novus Forlag, Oslo, Norway (2002): 81-102.
On a high level, one can identify 8 different 'blocks' of information, but we see only 2 of the parameters in the result. 6 blocks we did not ask for we get for free. They are necessary to constitute useful information. Here's what we see:
- The author of this article
- The editors of this essai collection
- The name of the publisher
- The geographical location of the publisher. Parameter 2 is answered: Norway
- The title of the publication.
- The title of the article. Parameter 6 is NOT answered. Imagine the article's title did not contain the string 'Gravity's Rainbow': even then we would have to have a valid result (we did not ask: where Gravity's Rainbow is in the title)
- The collection's temporeal identification of the collection. Parameter 5 is answered: 2002 is after 2000
- The physical location of the article within the collection
We can now easily see what kinds of relationships there are:
- One-to-one: only the article above is in a particular page range in this collection. This implies that the title of the article and the page range both have to reside in the same table.
- One-to-many: Inger Dalsgaard has published up to date 9 articles on Pynchon. This implies that the information stored on persons has to be separated from the information on articles, in other words: stored across different tables.
- Many-to-one: Lots of other persons have written articles in the year 2002 -in fact about 80 did. The year 2002 is in all of these articles, but it is not unique.
- Many-to-many: many persons have written many articles on many objects produced by a guy called Thomas Ruggles Pynchon, Jr whose works contain a tremendous amount of characters. This requires that the relations have to be translated into one of the three previous; in other words, it requires additional tables. Splitting this relationship up allows for managing variances in information. Example: this article is also about Wernher von Braun as historical figure. There must be a table that contains enough information to state: this article refers not only to Max und Moritz, but also to von Braun.
But the question remains: why is this (and only this) item returned? Apparently the database contains a bit more information than is shown in the result. Five of the 7 parameters are not shown. Here is what happened with these:
- Parameter 1: Articles. The item is classified as being an article, which means: it is not a review, an index, a blog, a song or one of the 40-odd other secondary classifications
- Parameter 3: Essai collection. An essai collection can be translated into database terms as: an object, which can be identified in time and place, and linked to at least one publisher, containing member-objects, where the persons that are linked to the member-objects are not the the same as the ones linked to the main object, and having an International Standard number that is an ISBN or ISBN-13
- Parameter 4: The article seems to be linked to a table that contains languages, in this case English
- Parameter 6: The article seems to be linked to a table that contains titles of literary works, in this case Gravity's Rainbow
- Parameter 7: There seems to be an additional table that contains information on characters or objects in Pynchon novels, like Max und Moritz, and it is more than a mere mentioning -hence 'extensively' in the original query (note that there can be no link between character and novel: Pig Bodine, Weissmann, Mucho Maas appear in more than one Pynchon novel -one can call it the Kilgore Trout syndrome)
4. Form and Content: Rendering a Bibliographical Item
The Modern Language Association citation style is a convention, and, as good old Charlie Marx would say, the bearer of its own history which means in this case: a mechanical typewriter located in the USA. It is, however, a worldwide standard in the Humanities, with local exceptions on several levels: Japanese names are shown by first name last name. The use of capitals in an article's title and subtitle in French is completely different from the American way. We have identified up till now 28 variances, and we have to write these into the hidden query instructions.
A purely mechanical typewriter can't produce italics. This is why the convention at the time said it was OK to underline an item's title. When electronic typewriters became widely available, italics became the fashion —they still are. Furthermore, the convention implied that everything should be done to avoid typing errors, as it was rather difficult to correct these. Hence it still is agreed (silently) that it is allright to use abbreviations, and to hide information. The information on an item is as short as possible: 'WA' but not Washington, no mention of country, no details on publishers. A fine example is the explicit prescription that if a publication is widely known, it is not necessary to give more details. For a Southafrican reader, an item stating that it was in the Times probably refers to a paper published in Johannesburg, for an American it is a NY paper, and a Dubliner might think it may be his or her local paper. MLA convention states implicitly that only American publications are widely known. There are many remarks to be made about the MLA convention but we strive to write these conventions into query instructions.
3. Using Metadata for Database Statistics
One of the golden database rules is that all information on a database is part of the database itself. This may sound abstract but it allows for the development of an analytical module on the database information.
A relational database can be easily expanded with new tables. In the process of developing this bibliographical model, it became clear that it would be hardly any work to use the information already there to offer some extra functionalities.
A first expansion is a table that contains abstracts of papers that are delivered at conferences. Linking it with the table that contains some basic information on author and title/subtitle, it is through this table further linked to all other relevant information -the only objects that needed developing are a few queries and 3 pages. This functionality is available.
A second expansion is the compilation of extended tables of contents. As a work of fiction follows the formal structure of title of work - parts - chapters and sections, 3 new tables were developed to contain the necessary information. This second functionality is also up and running. This offered the possibility of expansion three and four: if a reader wanted to offer a section-by-section reading, only one extra table is needed -and there. The fourth one (and at the moment -June 2009- not yet fully developed) is the recreation of indexes such as were published in Pynchon Notes 36-39, thus re-using the enormous amount of work that was carried out manually. The problem in the last extension is adding the information; the main advantage would be, however, that it opens up vast possibilities of querying all of Pynchon's works.
Future functionalities should include cover art, the physical description of items, and a link between the reviews of criticism/review essais and the works reviewed. A Pynchon scholar made an interesting suggestion: it would be a nice idea if non-Western items are described using the correct character set (Korean, Japanese, Russian), accompanied by a phonetic spelling and/or a translation. This requires a rather drastic re-definition of what is called the character sets, but it most certainly is possible. Work on this has not yet begun.
4. And What About a Primary Bibliography?
Pynchon's works have been published in at least 23 different languages. There are about 180 different editions of his work. These works haven't been described yet in this database. When the initial work on the secondary bibliography has been finished by the end of 2009, another extension will be the description of the primary items. This is also a suggestion made by a Pynchon scholar who has difficulties in convincing his university of the importance of teaching Thomas Pynchon. Yes, there are over 3 million copies of V. around.
5. Including Other Authors
A secondary bibliography on Thomas Pynchon like this one contains about one fourth of listed items on William Gaddis. This criticsm was often published in the same journals, by the same persons which opens the perspective of extending this bibliography to other contemporary authors. This would require only on or two new tables (at the moment there is no table that contains the name of novelists with an entry called Thomas Pynchon), but also the re-designing of existing tables -and this implies adapting current forms. All in all, this is a huge amount of checking current programming, and developing new code. At the moment this plan needs to remain vague. The basic design and most of the information that should be used, is already in this database. Creating bibliographies on other authors could be done in a much quicker way than the very slow development of the initial Pynchon database. The very idea, of course, is rather tempting. We are merely interested in working out this possibility on a database design level; we are, however, not interested in creating such bibliographies: Vheissu remains a site dedicated to the reception of Pynchon's works, for he is the only living author that keeps on puzzling you.
(3) Osterhaus' bibliography is in fact a translation into English of items in Japanese. He also lists two additional bibliographies published in Japan, one on Gravity's Rainbow, the other one on V. which are both for apparent reasons unusable for our purpose.
(4) 'Table Design' has nothing to do with the way data are presented. It is two things: What is the table's structure and what are the relations of this table to other tables. A relational database has to make a complete distinction between content, or the way data are stored, and form, the way data is presented or manipulated.
(5) An estimate in 2006 by Rachel Hollander was that it would require the labour of 2 fulltime database typists during a year, and 1 fulltime year of programming before this database could be considered as 'finished'. She was spot on.
Please note that when querying this database there are always 2 implicit queries:
- Show me secondary bibliographical information on Pynchon and his works and, when there is more than one result, sort it in a useful order
- Format the result in such a way that it mimicks more or less the MLA citation style