Category Archives: digital humanities

Alt-Ac: The First Month

On August 1 I joined Brown University as their first Digital Humanities Librarian. This job is a dream come true. I was hired to help cultivate Digital Humanities projects by working with faculty, students, and staff, and serve as an ambassador for the great digital work already being done by the Brown University Library. I am also Brown’s new English subject librarian. I’ve decided to blog about my transition from history PhD student to library staff in the hopes that it might help others who are considering making the transition themselves. If future posts on this topic would be of interest, just let me know.

/* alt-ac is short for Alternate-Academic, referring to those of us with graduate level training in the Humanities who have chosen to work in non-tenure track positions within the academy, often (but not exclusively) in university libraries and Digital Humanities positions. To learn more, head over to Bethany Nowviskie’s blog. */

My first month at Brown has been an interesting combination of diving in head first and learning the ropes. On the DH front: I’ve already started working on a few longstanding projects, helping out where needed. I’ve met with faculty who are interested in starting new projects. And, anyone who knows me will not be surprised to learn that I’ve begun a DH Project Documentation survey, which consists of interviewing everyone in the library who is currently working on a DH project and documenting the project to date (goals, accomplishments, work remaining, technical specifications, etc.)

On the Librarian front: I’ve been learning the library’s systems for acquisitions, collection development, gift appraisal, and cataloging. I’ve joined the Exhibits Committee (group of librarians who coordinate the Library’s physical exhibit spaces). I’ve met with a faculty member who wants to set up an exhibit in one of the Library’s museum spaces next year. My office is right off one of the main study areas in the Library so several people have come in with reference questions and several more have called my office after finding me on the phone tree. All my colleagues have been extremely helpful and patient as I learn how to do this better. Seriously. I’m not just saying that in case some of them find my blog.

All in all it’s been a busy month! But getting back to the question of transitioning from graduate school to full-time staff . . .

Honestly, one of the biggest changes is simply having a 9-5 job. I was certainly busy at the University of Virginia, but I worked from home and set my own schedule. I’m enjoying having an office and a place where I can focus my energies, but when I get home I’m basically wiped. Hopefully this will change as I get more used to the schedule. For now, I’m drinking too much coffee and trying to remember I need to be in bed by 11pm. I was originally going to post this last night, but at 11:30 I still wasn’t done. Two months ago I would have pushed on and posted at 1am, but these days I can’t sleep in to compensate for a late weeknight.

Another change relates to my not-quite-finished dissertation. I’m working on it after hours at the office before driving home, usually spending 1 – 1.5 hours a day. It’s hard to find tasks that work well in that timeframe for where I am in my work cycle. Despite my occasional use of #scholarsprints on twitter, I typically write in 2 hour chunks. Though I still have a ways to go in pages and workflow, I’m finding that I like (and even look forward) to working on my dissertation to a degree that I haven’t felt in years.

As a closet generalist in a PhD program, I quickly tired of my favorite subject in all the world simply because it was all I did. Day in. Day out. All history. All the time. Now that I work with a number of disciplines and projects, I find myself looking forward to spending time with my Early American diplomats. Assuming I got to bed at 11pm the night before, working on the dissertation is more like a treat at the end of the day than a looming anxiety.

Finally, and this may be hard to express, there has been a change in how I relate to the people I work with on a day-to-day basis. I contributed and consulted on several DH projects at UVA, but always in the capacity of a graduate student who happened to be around. Sometimes the project was a summer job, sometimes consulting was part of my fellowship, sometimes I just had conversations with people who wanted a sounding board for their ideas. What I do at Brown hasn’t been all that different thus far, but my opinions have more weight and the activation energy required to turn one of my suggestions into a plan of work is much lower than last year.

It’s gratifying, but also somewhat intimidating, how quickly some of my ideas have taken off, so I am being very careful about what I suggest. The DH Documentation Project, for example, was something I suggested at the end of my second week. Within five days it had become one of my primary goals for the year, and I was assigned to interview dozens of people. If I had suggested something like that as a graduate student, I can’t imagine things would have moved that fast (assuming the project ever got started).

The best part of the job, however, is getting to help people. This is what I missed most in graduate school, where I often struggled with the feeling that I wasn’t a productive member of society (which may account for my decision to develop open source software). In my new position I help people, whether faculty, students, or fellow staff, all day long. Like I said at the beginning, a dream come true.

Republicans of Letters

Here are the slides for my January 26th talk at Brown University’s Center for Digital Scholarship, “Republicans of Letters: Historical Social Networks and The Early American Foreign Service Database.”

The abstract ran as follows, “Jean Bauer, an advanced doctoral candidate in the Corcoran Department of History at the University of Virginia and creator of The Early American Foreign Service Database, will discuss her use and creation of digital tools to trace historical social networks through time and space. Drawing on her research into the commercial, kinship, patronage, and correspondence networks that helped form the American diplomatic and consular corps, Bauer will examine how relational databases and computational information design can help scholars identify and analyze historical social networks. The talk will include demos of two open source projects Bauer has developed to help scholars analyze their own research, Project Quincy and DAVILA.”

Some of the slides are pretty text intensive, so if something catches your eye, go ahead and hit pause!

In Pursuit of Elegance

I wrote this for the HASTAC Scholars’ forum on Critical Code studies, which I co-hosted in January. To see the post in its original context, click here.

***********************

One of the older jokes about programming states that every great programmer suffers from the following three sins: laziness, impatience, and hubris. Laziness makes you write the fewest lines of code necessary to accomplish a given task. Impatience means that your program will run as quickly as possible. And hubris compels you to create code that is as beautiful as you can make it. These three criteria – length, speed, and elegance – are the benchmark for evaluating code.

But what makes code elegant? One of the first things you learn in a programming class is that (in most languages) the computer will completely disregard any white space beyond the single space required to differentiate one part of the statement from another. However, in the next breath, your instructor adjures you to follow indentation guidelines and fill the eye space of your code with enough blank spaces to make a Scandinavian graphics designer drool. So your code ends up looking rather like an ee cummings poem with lots of random space, oddly placed capitalization, and sporadic punctuation.

Of course that is the perspective of someone who is not used to looking at code. The indentations draw the eye to nested components (loops, subroutines, etc), the capitalization signifies variables or other important components of the program, and the punctuation stands in for the myriad of mathematical and logical operators absent from a QWERTY keyboard.

I believe the fear Matt Kirschenbaum discusses above comes in part from the visual strangeness of code. It just looks weird and impenetrable. The mantra embraced by too many programmers of “It was hard to write, it should be hard to read” doesn’t help the situation either. Academics don’t like feeling stupid (especially once they’ve left their graduate student days behind them) and the seeming impenetrability of programming syntax makes them feel that way.

Of course it’s not the academic who is stupid, it’s the computer. People who have little experience with how computers actually work often miss this critical distinction. The “thinking machine” does not think. Like Mark Sample’s now lost haiku generator, the computer has no vocabulary we do not give it. And as Mark Marino points out, as far as the computer is concerned, even those words are completely devoid of meaning. This gives the programmer an extraordinary amount of power, but within the constraints that everything must be broken down into components so simple even a computer can work with them.

My hope for Critical Code Studies, a field I have only just become acquainted while helping to create this forum, is that by analyzing the thick textuality of code and the highly social, highly contingent environments in which code is generated, we can find better ways of explaining code to those who are afraid of it.

As a historian of Early American Diplomacy who spends much of her day designing and building databases, websites, and data visualizations I find myself constantly trying to allay the fears of my less technically trained colleagues. However, there are crucial connections between the work of programmers and humanists. I think the link may lie with aesthetics.

This brings us back to laziness, impatience, and hubris. Speed and brevity were virtues of necessity in the early days of computer science. Early computers had very little memory or processing power. Even an efficient program could take hours, an inefficient one weeks. Also if the program was too long it could not be entered on a punch card. The vast amount of memory and processing power on even a budget home computer have made these restrictions all but obsolete except in the case of very small devices or very large data sets. Yet these criteria continue to have great psychological power, not unlike a great professor’s ability to reduce the complexity of a historical event to the essential points her students will remember, or the identification of previously unrecognized leitmotifs which draws an author’s body of work into a new stylistic whole.

The virtue of elegance comes straight from mathematics, which to me suggests that it is built into the very fabric of the universe. We all recognize beauty in some form. Sometimes the best way to understand a foreign culture is to determine what they value as beautiful and find in it the beauty that they perceive. The elegance of code is bound up in structure, process, and product. The better we can explain it, the more accessible code will become.

Do You See What I See?

This is the abstract for my talk, “Do You See What I See?: Technical Documentation in Digital Humanities,” which I gave at the 2010 Chicago Colloquium on Digital Humanities and Computer Science.

The actual presentation was more informal and consisted of a series of examples from my various jobs as a database designer.

The slides are embedded below.

*********************

Technical diagrams are wonderfully compact ways of conveying information about extremely complex systems. However, they only work for people who have been trained to read them. Humanists might never see the technical diagrams that underlie the systems they work on, reducing their ability to make realistic plans or demands for their software needs. Conversely, if you design a database for a historian, and then hand him or her a basic E-R (Entity-Relationship) or UML (Unified Modeling Language) diagram, you will end up explaining the diagram’s nomenclature before you can talk about the database (and oftentimes you run out of time before getting back to the research question underlying the database). Either scenario removes the major advantage of technical diagrams and leads to an unnecessary divide between the technical and non-technical members of a digital humanities development team.

True collaboration requires documentation that can be read and understood by all participants. This is possible even for technical diagrams, but not without additional design work. Using the principles of information design, these diagrams can be enhanced through color coding, positioning, and annotation to make their meaning clear to non-technical readers. The end result is a single diagram that provides the same information to all team members. Unfortunately, graphical and information design are specialized fields in their own right, and not necessarily taught to people with backgrounds in systems architecture.

A tool that I have recently designed may provide some first steps in that direction. The program is called DAVILA, an open source relational database schema visualization and annotation tool. It is written in Processing using the toxiclibs physics library and released under the GPLv3. DAVILA comes out of my work on several history database projects, including my own dissertation research on the Early American Foreign Service. As a historian with a background in database architecture and a strong interest in information design, I have tried several ways of annotating technical diagrams to make them more accessible to my non-technical colleagues and employers. However, as the databases increased in complexity making new diagrams by hand became a time-consuming and frustrating process. The plan was to create a tool that would create these annotated diagrams quickly to accommodate the workflow used in rapid application development.

With DAVILA you fill out a CSV file to label your diagram with basic information about the program (project name, URL, developer names) and license the diagram under the copyright or copyleft of your choice. You can then group your entities into modules, color code those modules, indicate which entity is central to each module, and provide annotation text for every entity in the database.
Once DAVILA is running, users can click and drag the entities into different positions, expand an individual module for more information, or hide the non-central entities in a module to focus on another part of your schema. All in a fun, force-directed environment courtesy of the toxiclibs physics library. Pressing the space bar saves a snapshot of the window as a timestamped, vector-scaled pdf.

I now use DAVILA to describe databases and have received positive feedback on their readability from programmers and historians. I have little training in visual theory or graphic design and would welcome comments from those with more expertise in those fields. DAVILA also only works with database schemas, but similar tools would be extremely useful for other types of technical diagrams. Collaboration would undoubtably be improved if, when looking at a technical diagram, we could all see the same thing.

For more on the project see: http://www.jeanbauer.com/davila.html.

And now, without further ado: My Slides

Partial Dates in Rails with Active Scaffold

As a historian I am constantly frustrated (but bemused) by how computers record time. They are so idealistically precise and hopelessly presentist in their default settings that creating intellectually honest digital history becomes impossible without some serious modifications.

In designing Project Quincy, my open-source software package for tracing historical networks through time and space, I quickly realized that how I handled dates would make or break my ability to design the kinds of interfaces and visualizations I needed to perform my analysis.

As a database designer, however, I balk at entering improperly formatted data into the database (I am firm in my belief that this will always come back to bite you in the end). So while MySQL lets me enter an unknown birth date as 1761-00-00, because it doesn’t require proper date formatting unless running in “NO_ZERO_DATE mode”, if I ever migrated the data to another database (say Postgres) I would be up to my eyebrows in errors. But I also don’t want to mislead my users into thinking that half the individuals in my database were born on January 1st.

So here are my solutions, drawn from the code of Project Quincy, which powers The Early American Foreign Service Database.

A relatively easy way to format partial dates in your frontend interface is to add 3 boolean flags to each date: year_known, month_known, and date_known. Then add the following method into your application helper (link to code here) to determine how you display each type of partial date.

For entering partial dates Project Quincy makes extensive use of ActiveScaffold, a Rails plugin that auto-generates an administrative backend. The nice thing about ActiveScaffold is that it is fully customizable. The problem with ActiveScaffold is that the defaults stink, so you basically end up customizing everything.

By default, ActiveScaffold treats date entry as a unified field, so you have to break up the javascript that knits day, month, and year together. You also have to change the default from today’s date to blank. If you enter only part of a date, it sets the other components to the lowest value possible.

Matt Mitchell, former Head of R&D for the University of Virginia Scholars’ Lab came up with the following elegant solution to my problem:

Create a partial view in /app/views/activescaffold/_common_date_select.html.erb and populate it with the following code.

And activate that partial with a helper method in your application_helper (link here).

And you should be good to go.

**************************************

If the pastie links go down, you can find the partial view and helper methods on Project Quincy at Github.

It’s [A]live!

It is with great pleasure, and no small amount of trepidation, that I announce the launch of the Early American Foreign Service Database (EAFSD to its friends). While the EAFSD has been designed as an independent, secondary source publication, it also exists symbiotically with my dissertation “Revolution-Mongers: Launching the U.S. Foreign Service, 1775-1825.”

I created the EAFSD to help me track the many diplomats, consuls, and special agents sent abroad by the various American governments during the first fifty-years of American state-building. Currently the database contains basic information about overseas assignments and a few dives into data visualization (an interactive Google map and Moritz Stefaner’s Relation Browser).

I have been a reluctant convert to the principles of Web 2.0, and I keenly feel the anxiety of releasing something before my perfectionist tendencies have been fully exhausted. The pages of the EAFSD are therefore sprinkled with requests for feedback and my (hopefully humorous) under construction page, featuring Benjamin West’s unfinished masterpiece the “American Commissioners of the Preliminary Peace Agreement with Great Britain.”

Over the next few months (and coming years) I will be adding more information to the database, allowing me to trace the social, professional, and correspondence networks from which American foreign service officers drew the information they needed to represent their new (and often disorganized) government. I will also be enhancing the data visualizations to include hypertrees, time lines, and network graphs.

This launch has been over two years in the making. As I look back over that time, I am amazed at the generous support I have received from my colleagues at the University of Virginia and the Digital Humanities community writ large. I wrote an extended acknowledgments page for the EAFSD, my humble attempt to recognize the help and encouragement that made this project possible.

Launching the EAFSD also gives me a chance to test, Project Quincy, the open-source software package I am developing for tracing historical networks through time and space. The EAFSD is the flagship (read guinea pig) application for Project Quincy. I hope my work will allow other scholars to explore the networks relevant to their own research.

To that end the EAFSD is, and always will be, open access and open source.

Introducing DAVILA

I have just released my first open source project. HUZZAH!

DAVILA is a database schema visualization/annotation tool that creates “humanist readable” technical diagrams. It is written in Processing with the toxiclibs physics library and released under GPLv3. DAVILA takes in the database’s schema and a pipe separated customization file and uses them to produce an interactive, color-coded, annotated diagram similar in format to UML. There are many applications that will create technical diagrams based on database schema, but as a digital humanist I require more than they can provide.

Technical diagrams are wonderfully compact ways of conveying information about extremely complex systems. But they only work for people who have been trained to read them. If you design a database for a historian, and then hand him or her a basic E-R or UML diagram, you will end up explaining the diagram’s nomenclature before you can talk about the database (and oftentimes you run out of time before getting back to the research question underlying the database). This removes the major advantage of technical diagrams and can also create an unnecessary divide between the technical and non-technical members of a digital humanities development team.

I have become fascinated by how documenting a project (either in development or after release) can build community. I’m not just talking about user generated documentation (ala wikis), but rather the feeling created by a diagram or README file that really takes the time to explain how the software works and why it works the way it does. There is a generosity and even warmth that comes from thoughtful, helpful documentation, just as inadequate documentation can make someone feel stupid, slighted, or unwanted as a user/developer. I will be writing on this topic more in the months to come (perhaps leading up to an article). In the meantime, check out DAVILA and let me know what you think.

Project homepage: http://www.jeanbauer.com/davila.html

The Cloisters, Part I

On the fifth day of Christmas, my husband and I took the A train the length of Manhattan up to one of my favorite spots in New York City — The Cloisters — home of the Metropolitan Museum of Art’s Medieval Art Collection. Even more than the art, I love the building, a medieval-style cloister built in the 1930s to house the collection, featuring beautiful courtyards and contemplative spaces, blending architectural styles, and in many cases, salvaged sections of buildings from several centuries once located all over Europe. Stain glass windows from Italy shine light on an altar from Spain in a room where the wall sconces display icons from Germany. Then you walk through an archway into an indoor courtyard supported by columns brought from the courtyards of ten other cloisters, now long gone.

Although I was on vacation, I couldn’t help but see the Cloisters as a metaphor for digital humanities. We are digital architects, creating new spaces to display the glorious works of the past and structuring the fragments to see new patterns in disparate sources. If we do our jobs right, the digital edifices should enhance not detract from the sources we seek to analyze and share. The framework of each project is tailored to the subject matter often with special nooks for contemplation and introspection.

Control your Vocab (or not)

I am a NINES Graduate Fellow for 2009-2010, and this post was written for the NINES Blog. To see it in its original context, click here.

Yesterday I had two conversations about controlled vocabulary in digital humanities projects (a.k.a. my definition of a really good day). Both conversations centered around the same question: what is the best way to associate documents with subject information? If you don’t attach some keywords or subject categories to your documents then you can forget about finding anything later. There are, in my estimate, two main camps for doing this in a digital project — tags and pre-selected keywords.

In my humble opinion, tags are best when you want your users to take ownership of the data. They decide the categories, so in some sense, they have a stake in the larger project and how it evolves. You might even be able to tell why people are using the data in the first place, by looking at what tags they associate with your (or their) content. On the downside, tags can be problematic for first time users who need to search (rather than explore) your data. On several occasions I have been confronted with tag clouds that have descended (or ascended) into the realm of performance art. They are fascinating in of themselves, but fail to provide a meaningful path into the data.

Pre-selected keywords often work best when a clearly defined set of people are in charge of marking up the content. They are great for searching, and if indexed in a hierarchical structure, can provide semantically powerful groupings (especially for geographical information). And if you have a Third Normal Form database, then you never have to worry about misspellings or incorrect associations between your keywords (Disclaimer: I love 3NF databases. I know they don’t work for every project, but when your data fits that structure life is good). As a historian, however, I am wary of keywords that are imposed on a text. If someone calls himself a “justice,” I balk at calling him a “judge” even if it means a more efficient search.

Of course, it all depends on your data and what you want to do with it, but my favorite solution is have, at minimum, two layers of keywords. The bottom layer reflects the language in the text (similar to tagging), but those terms are then grouped into pre-selected types. So “justice,” “justice of the peace,” “judge,” “lawyer,” “barrister,” counselor” all get associated with type “legal.” You can fake hierarchies with tags, but it requires a far more careful attention to tag choices than I typically associate with that methodology.

I implemented the two-tiered approach in Project Quincy, but I would love to hear other suggestions and opinions.

and the name of a good book

When you leave a message on my friend’s voicemail, she asks that you give your name, your phone number, and the name of a good book. Since I’m in grad school for history, I tend to end my messages with phrases like “if you suddenly need to know about balance of power politics at the turn of the nineteenth-century then…,” and when she calls back, we have a good laugh before she tells me about the new novel she’s reading.

I think recommending a really good book is one of the easiest ways to markedly improve someone’s life — personally or professionally.

But finding good books for digital humanities can be a real struggle. Especially since the market is flooded with computer books, most of them completely unsuited to the needs of your average beginning digital humanist.

So, I’ve decided to create an annotated bibliography for the digital side of digital humanities (the most frequently used languages/computer science concepts). This is hardly an exhaustive list of all the books that you could find useful, but instead a few books (2 or 3) on a given topic, aimed at beginners or those who want to move beyond basic knowledge. I assume that experts already know how to find the books they need. I’m starting with the books I’ve found most helpful and hope people will suggest new titles and categories over time.

To see the list, click on “Annotated Bibliography”, in the sidebar (under Pages).

Happy reading!