The technique of indexing (that is, listing occurences of a lemma in a text) has been applied to many of Pynchon's works, most notably in Pynchon Notes 36-39 (Spring - Fall 1995-1996), publishing indices up to and including Vineland. The indices were published separately: one for each of the novels, and one for the early work and non fiction Pynchon published thus far. The Companions, first in print (1988: Weisenburger), and online (2006: Ware), as well as other resources (2008: Hurley) remain indispensable for the creation of an integrated index. However, due to their encyclopaedic nature, these publications cannot be readily used as an index, which lists only the occurrences of a lemma within the text.
From this indexer's perspective following publication since PN 36-39 is of particular interest: Matt McLaurine's "Concordance of Characters in Against the Day" and his "Concordance of Characters in Gravity's Rainbow" (the latest in 2 versions and listing also the organisations). Matt's concordances are in spreadsheet format, and available online, being the first available index not produced using a word processor.
The model for an integrated index (this is, indexing occurrences for all of Pynchon's works in one index) we propose today is based on the relational database theory. The criteria for adding a lemma should remain identical regardless of the work it appears in, and the index should be independent of a specific edition or translation.
A Word on Relational Databases
In a paper published by IBM mathematician E.F. Codd (1970), the set theory in mathematics inspired the description of a relational database model. The idea is that information is broken down into small data sets (tables) that contain rows of unique (unrepeated) information, while the tables are related to each other through keys defining the database 'integrity constraints'; these identifiers are usually random values. An example: if we want to select all section headers of chapter 27 of Against the Day, we need to design as per below (this list shows the headers and one or more rows of each table; in bold the fields that are related to a key in the related table. The field names are not the ones actually used by the indexing database)
Table 1: Works By Pynchon
Fields in this table: id_subject, subject, epigram
'AD', 'Against the Day',
'"It's always night, or we wouldn't need light." -Thelonious Monk'
Table 2: Parts within a work.
Fields in this table: id_part, part, id_subject, starts, ends, part counter, epigram, part numerical
'2', 'Iceland Spar', 'AD', '121', '428', 'Two', '', '2'
Table 3: Chapters within a Part
Fields in this table: id_chapter, chapter, id_part, starts, ends, title, chapter counter
'27', '27', '2', '336', '357', '', '0'
Table 4: Sections in a Chapter
Fields in this table: id_section, section counter, start of section, id_chapter, starts, ends
'119', '6', 'Dally had imagined once that if she ever found Erlys again', '27', '353', '357'
We have now everything ready to write a server script and perform some database queries rendering detailed Table of Contents for Pynchon's works. Here's the Extended Table of Contents for Iceland Spar.
Such a set up is called 'normalised' and 'relational'. 'Normalised' indicates that there are no database select, update and insert anomalies possible (there are no rows, in whichever combination, that contradict each other, or offer ambiguous information), and 'relational' because of the constraints in the design. Of course, if we want to query the sections table and want to limit the results to one work, all four tables need to be addressed. Here is a query that checks for all the section headers in Against the Day that contain the word 'light':
SELECT start of section, s.page
/* only 2 fields from table sections needed
FROM sections s, chapters h, parts d, subjects w
/* we need to address all 4 tables in order to be able to limit
WHERE LOCATE('light', start of section) > 0 AND s.id_chapter = h.id_chapter AND h.id_part = d.id_part AND d.id_subject = w.id_subject AND w.id_subject = 'AD'
/*show only the relevant sections, use constraints, limit to Against the Day
ORDER BY s.page
/* set sorting order
This is the result:
- In the bright light of day, the figures still looked sinister, 329
- Next afternoon the light took its deep yellowish turn, 455
- A heavenwide blast of light., 779
- Late at night they would lie together watching lights, 879
- The light didn't come in exactly the way it was supposed to..., 919
- When they got moving again, Reef was delighted, 954
- One morning at first light they awoke into a firefight, 965
- Company searchlights set up on towers began sweeping the tents, 1008
- "Look at 'em down there." "All that light.", 1083
The table design process is repeated over and over again: another table in this database contains 'readings' which are linked to a section (this table is outside the scope of this index): here's the reading of chapter 27. Note that there's no table with the writer's name as the database contains information on Pynchon's works only; this information would be superfluous. All tables remain relatively small as we have to combine information through a SELECT statement in order to render useful information. It can take a while, however, before it becomes clear how a table and its relations should look like...
A downsize is that the information has to be complete: the 4 tables explained here do not contain information on The Crying of Lot 49 or Vineland, and incomplete information on Slow Learner and minor works. As long as this remains the case, we can't even think of creating an index for these works.
Relational databases have been around for 35 years (as long as Pynchon Notes) -IBM was not interested in developing it as a commercial product at first, but an employee of the recording device company Ampex, working on a CIA sponsored database research program, took the opportunity to start his own company, and received funding from the U.S. Air Force, its first customer being an Air Force base in California. The company changed its name from Relational Software to Oracle, and turned Larry Ellison into one of the richest persons in the world.
The first four tables for indexing purposes -and others, as shown- are ready. Next step is adding tables related to a lemma: three tables are required, a fourth one is optional -and it addresses a fifth one or the information is useless.
Items, qualifications, collections, aliasses and relations
Before we can add a concordance, we need to have a table containing the lemmas, or items that need to be indexed. This table not only contains the lemma itself, but addresses data from two additional tables of which the only function is to give context to an item. This mimicks the print index that often offers more information on an item. It offers the user also the possibility of querying for series of related items. Note that the lemma table has no relationship with the 4 tables that render the Tables of Contents.
Here's an example of the five first lemmata or items that are scientists (table qualifications) related to mathematics (table collections):
- Bertrand Russell
- Carl Friedrich Gauss
- David Hilbert
- Edmund Whittaker
'Bertie' is of course a short form of 'Bertrand Russell'. Note that both items are added separately. We can tie both items together by defining a relationship between the two. So, a fourth table containing these relationships is needed, with a fifth containing the qualification of he relationship. In this case the relationship is 'Short name'. Russell is also nicknamed 'Mad Dog' on page 538 of Against the Day. We are now able to group the lemma 'Bertrand Russell' in three different ways -the first two are required, the third one is not:
- Qualified as a scientist
- Belonging to the Mathematics collection
- Having 'aliases' with a short form and a nick name
The main issue with the qualification and collections table is that items are very easily categorised in too vague a way (ex. classified with language and vocabulary). This can be corrected by adding new entries in these tabled and requalifying the lemma (ex. curses, words starting with 'all-', ...).
It turned out after a few hundred test records that there is no relation between the qualifications and collections table.
It was an arbitrary decision to to have (historical) persons and novel characters indexed with their full name in one field. Deviations are indexed separately (Dally for Dahlia Traverse) whatever the format in which they appear in the text. As this index is not meant to be printed, the sorting order for persons is first name, last name.
Step 2, in short, in creating the database set up before adding occurrences requires 3 more tables for items, qualifications and collections, like wines. An item can also have a relationship to another lemma, which set up requires 2 more tables; using these two tables, however, is not required; but highly useful. An example is a list of all acronyms (insofar indexed) and what these stand for.
Indexing an Item
The index table contains the information on where an item occurs and as long as an item remains not indexed, it stays 'orphaned'. For this index we prefer to link an item to a section within a work. Different editions and translations have different paginations; the section or episode always remain identical in all forms. The page numbers that are added refer to pagination in the first edition of a work, with the exception of Slow Learner. It is theoretically possible to add a new table containing information on the pagination in version X or translation Y, to which the occurence can be linked.
The difference between an index and a concordance is that a concordance always shows information on the context: the words surrounding the occurence. We have chosen to use a hybrid form, where information other than the page number within the context can be shown.
The index table addresses a ninth required table containing information on occurence formatting . For persons, first or last name can be used or it can show the default value where the occurence equals the lemma. For words, several formats are possible: small caps, italics, single or double quoted, printing errors can be included, or strange orthography and antiquated usage.
The indexing table in itself is unreadable; only in conjunction with the other tables it becomes relevant. Here's an example showing where the lemma 'Lindsay Noseworth' appears in the text, and how (occurences 6 to 10 are shown):
- id_occurence, id_lemma, id_sectie, page, id_format
- 167, 449, 3, '15, 16', 7
- 224, 449, 4, '21, 23, 24', 7
- 329, 449, 6, '27', 1
- 330, 449, 6, '27, 28, 29,30', 7
- 452, 449, 9, '36', 1
And this is how Lindsay Noseworth as lemma looks like in the 9-table relational set up, with additonal information.
So far we have enough database tables which is sufficient for the creation of an integrated index, this is, a database that contains indices for all of Pynchon's works. But this is something else than automating an index. Indexing remains mainly a manual work, because it is at the indexer's discretion what is to be included (and who would be able to resist adding 'Zumbledy bongbong' to the index?). The current status (May 2015) is as follows:
- Against the Day: fully indexed up to page 545, or about 50% of the novel
- Gravity's Rainbow: items have been loaded, only a few items are fully indexed.
Adding Occurrences: Inclusion Criteria
Indexing convention dictates that what is capitalised should be included, so we have to index at least all proper nouns or proper names. This is often language-specific, while it should be noted that such specificities are often omitted from the index. This is not limited to what an index user should expect (characters, historical persons, geographical indications, concepts, events and many others), but it includes also English language specifics like days of the week -in many other languages not capitalised. Some examples:
- Webb Traverse, Kieselguhr Kid, Phantom Dynamiter of the San Juans, Webbie
- Archduke Franz Ferdinand, F.F., Francis Ferdinand
- Mexico, Shambala
- Cartesian grid, Æther
- Civil War, First International Conference on Time-Travel
- Saturday, the Radiant Hour
A second convention is that non-numerical strings are added. This rule not only applies to numbers, but also to formulas and expressions with a specific notation, sometimes 'foreign' (or non English) expressions. Ideally, drawings should be stored and qualified in the indexing database; this is not yet the case. Examples for this second convention:
- 1900, '00
- √-1, 2/4
- ¡Cuidado Cabrón!, $3.50-a-quart, .38
Indexing Textual Particularities
Sometimes -in Pynchon's case pretty often- an author wants us to note something special. This is done by using typography and/or punctuation, or the use of particularities, be it on lexical level, or syntactically. As Brian McHale noted that Pynchon, at least since Vineland, uses the language of the time and the genre he is writing in, way more items are to be indexed. Furthermore, our attention is drawn to words and expressions as they are stressed in the text: double or single quoted, italics, small capitals and others, and there are numerous 'foreign' words. Here are some examples:
- 'a roving military attaché'
- absquatulate, absquatulated, absquatulator
A third rule is the indexer's discretion to add an item. An example is a list of interjections.
Using the Index & Demo
Most items are only used once in which case the complete information on it is shown, including a link to the relevant lemma page. For items with more than once occurrence, limited information is shown but it includes the amount of occurrences with its corresponding lemma page displaying all its indexed occurrences. The lemma page for an item offers additional information on qualification, collection and related items. Here's the generic format for items with one or more occurrences, followed by some examples:
- Work, Part and Chapter
- Section/Episode Sequence with its first and last page (of the original edition)
- Occurence, which is a page number, sometimes with context
- Formatting, Qualification and Collection (if default value this is not shown)
- Link to its lemma page, displayed as a right arrow
Indexed items with more than one occurrence are displayed as:
- # of occurences as a link
- Qualification and Collection (if default value this is not shown)
- Strand AD: Three 32.4: 445-448 p. 446 (City and -) | Neigbourhood London →
- Format is not shown as it has the default value
- The right arrow opens its lemma page
- star AD: Two 14.1: 156-162 p. 161 Italics | Language Vocabulary →
- Formatting is shown as it does not have a default value
- X.O. 3* | Function The Chums of Chance
- This item has 3 occurences
- Georg Cantor AD: Two 21.2: 248-250 p. 250 (Dr. -) Last Name | Scientist Mathematics →
- Persons are added as first name first followed by the last name. The first name of a historical person is usually added regardless whether it 'exists' in the text, or not
- The occurence shows how the name is used in the text. In this case, only the last name is used. It is preceded by the person's title
- imaginary 2*
- When adding the item, it has not been qualified correctly
The Index Sorting Order
The database field format for an item is a string. This implies that numbers are sorted alphabetically, and not numerically. The sorting order of a character set is based on the corresponding ASCII numeric value of a character or symbol. Here's a list of the first character of all items with their numeric conversion:
- 36: $
- 39: '
- 46: .
- 48: 0
- 49: 1
- 50: 2
- 51: 3
- 52: 4
- 53: 5
- 54: 6
- 55: 7
- 56: 8
- 57: 9
- 60: <
- 65: A
- 66: B
- 67: C
- 68: D
- 69: E
- 70: F
- 71: G
- 72: H
- 73: I
- 74: J
- 75: K
- 76: L
- 77: M
- 78: N
- 79: O
- 80: P
- 81: Q
- 82: R
- 83: S
- 84: T
- 85: U
- 86: V
- 87: W
- 88: X
- 89: Y
- 90: Z
- 91: [
- 194: ¡
- 194: §
- 194: ¿
- 195: Æ
- 197: Œ
- 206: Δ
- 206: Ζ
- 206: Κ
- 206: Ν
- 206: Σ
- 226: √
(Value 206 is a capitalised Greek letter zèta)
Using the Index: Demo!
- Current index for all items. Lemma pages. Sections.
- Finding an Item: tommyknockers. Select section. Item in more than one work: Admiralty.
- Collections: Water
- Qualifications: Book Titles. Select the Chums of Chance dime novels
- Relationships: Nick Names
- Pages: 500. This does not work well yet below page 100.
- Index with Occurences: Lt. Tyrone Slothrop
- Link to the Secondary Bibliography: Author
- Following indices are available as of today. In print the most complete publication is Pynchon Notes 36-39 (Spring - Fall 1995-1996):
- Curling, Dean B. "An Index to The Crying of Lot 49." Pynchon Notes 36-39 (Spring - Fall 1995-1996): 69-81.
- Duyfhuizen, Bernard. "An Index to Pynchon's Shorter Works." Pynchon Notes 36-39 (Spring - Fall 1995-1996): 7-34.
- Duyfhuizen, Bernard and Brian Swatek. "An Index to V.." Pynchon Notes 36-39 (Spring - Fall 1995-1996): 35-68.
- Tölölyan, Khachig, Bernard Duyfhuizen and Clay Leighton. "An Index to Gravity's Rainbow, 2nd edition." Pynchon Notes 36-39 (Spring - Fall 1995-1996): 83-138.
- Troester, Änne and Dirk Vanderbeke. "Vineland: The Names." Pynchon Notes 36-39 (Spring - Fall 1995-1996): 139-149.
- Matt McLaurine published online lists of characters (and their occurrence for Gravity's Rainbow and Against the Day. Both are online, as Open Office Doc or Excel spreadsheet.
- An Index to Mason & Dixon was presented during the Malta International Pynchon week but remains unpublished.
- The model proposed here is a module within a larger attempt to create a complete data driven collection on Pynchon: primary and secondary bibliography, table of contents, comments, biographical information and more. Work on primary materials and a biography is in the works.
- A work like C. Julius Ceasar's Commentarii De Bello Gallico has a common distinction in Books (Book I, Book II, ...), with each Book divided in Chapters. Classical scholars always follow these same subdivisions.