Category Archives: digital humanities

Republicans of Letters

Here are the slides for my January 26th talk at Brown University’s Center for Digital Scholarship, “Republicans of Letters: Historical Social Networks and The Early American Foreign Service Database.”

The abstract ran as follows, “Jean Bauer, an advanced doctoral candidate in the Corcoran Department of History at the University of Virginia and creator of The Early American Foreign Service Database, will discuss her use and creation of digital tools to trace historical social networks through time and space. Drawing on her research into the commercial, kinship, patronage, and correspondence networks that helped form the American diplomatic and consular corps, Bauer will examine how relational databases and computational information design can help scholars identify and analyze historical social networks. The talk will include demos of two open source projects Bauer has developed to help scholars analyze their own research, Project Quincy and DAVILA.”

Some of the slides are pretty text intensive, so if something catches your eye, go ahead and hit pause!

In Pursuit of Elegance

I wrote this for the HASTAC Scholars’ forum on Critical Code studies, which I co-hosted in January. To see the post in its original context, click here.

***********************

One of the older jokes about programming states that every great programmer suffers from the following three sins: laziness, impatience, and hubris. Laziness makes you write the fewest lines of code necessary to accomplish a given task. Impatience means that your program will run as quickly as possible. And hubris compels you to create code that is as beautiful as you can make it. These three criteria – length, speed, and elegance – are the benchmark for evaluating code.

But what makes code elegant? One of the first things you learn in a programming class is that (in most languages) the computer will completely disregard any white space beyond the single space required to differentiate one part of the statement from another. However, in the next breath, your instructor adjures you to follow indentation guidelines and fill the eye space of your code with enough blank spaces to make a Scandinavian graphics designer drool. So your code ends up looking rather like an ee cummings poem with lots of random space, oddly placed capitalization, and sporadic punctuation.

Of course that is the perspective of someone who is not used to looking at code. The indentations draw the eye to nested components (loops, subroutines, etc), the capitalization signifies variables or other important components of the program, and the punctuation stands in for the myriad of mathematical and logical operators absent from a QWERTY keyboard.

I believe the fear Matt Kirschenbaum discusses above comes in part from the visual strangeness of code. It just looks weird and impenetrable. The mantra embraced by too many programmers of “It was hard to write, it should be hard to read” doesn’t help the situation either. Academics don’t like feeling stupid (especially once they’ve left their graduate student days behind them) and the seeming impenetrability of programming syntax makes them feel that way.

Of course it’s not the academic who is stupid, it’s the computer. People who have little experience with how computers actually work often miss this critical distinction. The “thinking machine” does not think. Like Mark Sample’s now lost haiku generator, the computer has no vocabulary we do not give it. And as Mark Marino points out, as far as the computer is concerned, even those words are completely devoid of meaning. This gives the programmer an extraordinary amount of power, but within the constraints that everything must be broken down into components so simple even a computer can work with them.

My hope for Critical Code Studies, a field I have only just become acquainted while helping to create this forum, is that by analyzing the thick textuality of code and the highly social, highly contingent environments in which code is generated, we can find better ways of explaining code to those who are afraid of it.

As a historian of Early American Diplomacy who spends much of her day designing and building databases, websites, and data visualizations I find myself constantly trying to allay the fears of my less technically trained colleagues. However, there are crucial connections between the work of programmers and humanists. I think the link may lie with aesthetics.

This brings us back to laziness, impatience, and hubris. Speed and brevity were virtues of necessity in the early days of computer science. Early computers had very little memory or processing power. Even an efficient program could take hours, an inefficient one weeks. Also if the program was too long it could not be entered on a punch card. The vast amount of memory and processing power on even a budget home computer have made these restrictions all but obsolete except in the case of very small devices or very large data sets. Yet these criteria continue to have great psychological power, not unlike a great professor’s ability to reduce the complexity of a historical event to the essential points her students will remember, or the identification of previously unrecognized leitmotifs which draws an author’s body of work into a new stylistic whole.

The virtue of elegance comes straight from mathematics, which to me suggests that it is built into the very fabric of the universe. We all recognize beauty in some form. Sometimes the best way to understand a foreign culture is to determine what they value as beautiful and find in it the beauty that they perceive. The elegance of code is bound up in structure, process, and product. The better we can explain it, the more accessible code will become.

Do You See What I See?

This is the abstract for my talk, “Do You See What I See?: Technical Documentation in Digital Humanities,” which I gave at the 2010 Chicago Colloquium on Digital Humanities and Computer Science.

The actual presentation was more informal and consisted of a series of examples from my various jobs as a database designer.

The slides are embedded below.

*********************

Technical diagrams are wonderfully compact ways of conveying information about extremely complex systems. However, they only work for people who have been trained to read them. Humanists might never see the technical diagrams that underlie the systems they work on, reducing their ability to make realistic plans or demands for their software needs. Conversely, if you design a database for a historian, and then hand him or her a basic E-R (Entity-Relationship) or UML (Unified Modeling Language) diagram, you will end up explaining the diagram’s nomenclature before you can talk about the database (and oftentimes you run out of time before getting back to the research question underlying the database). Either scenario removes the major advantage of technical diagrams and leads to an unnecessary divide between the technical and non-technical members of a digital humanities development team.

True collaboration requires documentation that can be read and understood by all participants. This is possible even for technical diagrams, but not without additional design work. Using the principles of information design, these diagrams can be enhanced through color coding, positioning, and annotation to make their meaning clear to non-technical readers. The end result is a single diagram that provides the same information to all team members. Unfortunately, graphical and information design are specialized fields in their own right, and not necessarily taught to people with backgrounds in systems architecture.

A tool that I have recently designed may provide some first steps in that direction. The program is called DAVILA, an open source relational database schema visualization and annotation tool. It is written in Processing using the toxiclibs physics library and released under the GPLv3. DAVILA comes out of my work on several history database projects, including my own dissertation research on the Early American Foreign Service. As a historian with a background in database architecture and a strong interest in information design, I have tried several ways of annotating technical diagrams to make them more accessible to my non-technical colleagues and employers. However, as the databases increased in complexity making new diagrams by hand became a time-consuming and frustrating process. The plan was to create a tool that would create these annotated diagrams quickly to accommodate the workflow used in rapid application development.

With DAVILA you fill out a CSV file to label your diagram with basic information about the program (project name, URL, developer names) and license the diagram under the copyright or copyleft of your choice. You can then group your entities into modules, color code those modules, indicate which entity is central to each module, and provide annotation text for every entity in the database.
Once DAVILA is running, users can click and drag the entities into different positions, expand an individual module for more information, or hide the non-central entities in a module to focus on another part of your schema. All in a fun, force-directed environment courtesy of the toxiclibs physics library. Pressing the space bar saves a snapshot of the window as a timestamped, vector-scaled pdf.

I now use DAVILA to describe databases and have received positive feedback on their readability from programmers and historians. I have little training in visual theory or graphic design and would welcome comments from those with more expertise in those fields. DAVILA also only works with database schemas, but similar tools would be extremely useful for other types of technical diagrams. Collaboration would undoubtably be improved if, when looking at a technical diagram, we could all see the same thing.

For more on the project see: http://www.jeanbauer.com/davila.html.

And now, without further ado: My Slides

Partial Dates in Rails with Active Scaffold

As a historian I am constantly frustrated (but bemused) by how computers record time. They are so idealistically precise and hopelessly presentist in their default settings that creating intellectually honest digital history becomes impossible without some serious modifications.

In designing Project Quincy, my open-source software package for tracing historical networks through time and space, I quickly realized that how I handled dates would make or break my ability to design the kinds of interfaces and visualizations I needed to perform my analysis.

As a database designer, however, I balk at entering improperly formatted data into the database (I am firm in my belief that this will always come back to bite you in the end). So while MySQL lets me enter an unknown birth date as 1761-00-00, because it doesn’t require proper date formatting unless running in “NO_ZERO_DATE mode”, if I ever migrated the data to another database (say Postgres) I would be up to my eyebrows in errors. But I also don’t want to mislead my users into thinking that half the individuals in my database were born on January 1st.

So here are my solutions, drawn from the code of Project Quincy, which powers The Early American Foreign Service Database.

A relatively easy way to format partial dates in your frontend interface is to add 3 boolean flags to each date: year_known, month_known, and date_known. Then add the following method into your application helper (link to code here) to determine how you display each type of partial date.

For entering partial dates Project Quincy makes extensive use of ActiveScaffold, a Rails plugin that auto-generates an administrative backend. The nice thing about ActiveScaffold is that it is fully customizable. The problem with ActiveScaffold is that the defaults stink, so you basically end up customizing everything.

By default, ActiveScaffold treats date entry as a unified field, so you have to break up the javascript that knits day, month, and year together. You also have to change the default from today’s date to blank. If you enter only part of a date, it sets the other components to the lowest value possible.

Matt Mitchell, former Head of R&D for the University of Virginia Scholars’ Lab came up with the following elegant solution to my problem:

Create a partial view in /app/views/activescaffold/_common_date_select.html.erb and populate it with the following code.

And activate that partial with a helper method in your application_helper (link here).

And you should be good to go.

**************************************

If the pastie links go down, you can find the partial view and helper methods on Project Quincy at Github.

It’s [A]live!

It is with great pleasure, and no small amount of trepidation, that I announce the launch of the Early American Foreign Service Database (EAFSD to its friends). While the EAFSD has been designed as an independent, secondary source publication, it also exists symbiotically with my dissertation “Revolution-Mongers: Launching the U.S. Foreign Service, 1775-1825.”

I created the EAFSD to help me track the many diplomats, consuls, and special agents sent abroad by the various American governments during the first fifty-years of American state-building. Currently the database contains basic information about overseas assignments and a few dives into data visualization (an interactive Google map and Moritz Stefaner’s Relation Browser).

I have been a reluctant convert to the principles of Web 2.0, and I keenly feel the anxiety of releasing something before my perfectionist tendencies have been fully exhausted. The pages of the EAFSD are therefore sprinkled with requests for feedback and my (hopefully humorous) under construction page, featuring Benjamin West’s unfinished masterpiece the “American Commissioners of the Preliminary Peace Agreement with Great Britain.”

Over the next few months (and coming years) I will be adding more information to the database, allowing me to trace the social, professional, and correspondence networks from which American foreign service officers drew the information they needed to represent their new (and often disorganized) government. I will also be enhancing the data visualizations to include hypertrees, time lines, and network graphs.

This launch has been over two years in the making. As I look back over that time, I am amazed at the generous support I have received from my colleagues at the University of Virginia and the Digital Humanities community writ large. I wrote an extended acknowledgments page for the EAFSD, my humble attempt to recognize the help and encouragement that made this project possible.

Launching the EAFSD also gives me a chance to test, Project Quincy, the open-source software package I am developing for tracing historical networks through time and space. The EAFSD is the flagship (read guinea pig) application for Project Quincy. I hope my work will allow other scholars to explore the networks relevant to their own research.

To that end the EAFSD is, and always will be, open access and open source.

Introducing DAVILA

I have just released my first open source project. HUZZAH!

DAVILA is a database schema visualization/annotation tool that creates “humanist readable” technical diagrams. It is written in Processing with the toxiclibs physics library and released under GPLv3. DAVILA takes in the database’s schema and a pipe separated customization file and uses them to produce an interactive, color-coded, annotated diagram similar in format to UML. There are many applications that will create technical diagrams based on database schema, but as a digital humanist I require more than they can provide.

Technical diagrams are wonderfully compact ways of conveying information about extremely complex systems. But they only work for people who have been trained to read them. If you design a database for a historian, and then hand him or her a basic E-R or UML diagram, you will end up explaining the diagram’s nomenclature before you can talk about the database (and oftentimes you run out of time before getting back to the research question underlying the database). This removes the major advantage of technical diagrams and can also create an unnecessary divide between the technical and non-technical members of a digital humanities development team.

I have become fascinated by how documenting a project (either in development or after release) can build community. I’m not just talking about user generated documentation (ala wikis), but rather the feeling created by a diagram or README file that really takes the time to explain how the software works and why it works the way it does. There is a generosity and even warmth that comes from thoughtful, helpful documentation, just as inadequate documentation can make someone feel stupid, slighted, or unwanted as a user/developer. I will be writing on this topic more in the months to come (perhaps leading up to an article). In the meantime, check out DAVILA and let me know what you think.

Project homepage: http://www.jeanbauer.com/davila.html

The Cloisters, Part I

On the fifth day of Christmas, my husband and I took the A train the length of Manhattan up to one of my favorite spots in New York City — The Cloisters — home of the Metropolitan Museum of Art’s Medieval Art Collection. Even more than the art, I love the building, a medieval-style cloister built in the 1930s to house the collection, featuring beautiful courtyards and contemplative spaces, blending architectural styles, and in many cases, salvaged sections of buildings from several centuries once located all over Europe. Stain glass windows from Italy shine light on an altar from Spain in a room where the wall sconces display icons from Germany. Then you walk through an archway into an indoor courtyard supported by columns brought from the courtyards of ten other cloisters, now long gone.

Although I was on vacation, I couldn’t help but see the Cloisters as a metaphor for digital humanities. We are digital architects, creating new spaces to display the glorious works of the past and structuring the fragments to see new patterns in disparate sources. If we do our jobs right, the digital edifices should enhance not detract from the sources we seek to analyze and share. The framework of each project is tailored to the subject matter often with special nooks for contemplation and introspection.

Control your Vocab (or not)

I am a NINES Graduate Fellow for 2009-2010, and this post was written for the NINES Blog. To see it in its original context, click here.

Yesterday I had two conversations about controlled vocabulary in digital humanities projects (a.k.a. my definition of a really good day). Both conversations centered around the same question: what is the best way to associate documents with subject information? If you don’t attach some keywords or subject categories to your documents then you can forget about finding anything later. There are, in my estimate, two main camps for doing this in a digital project — tags and pre-selected keywords.

In my humble opinion, tags are best when you want your users to take ownership of the data. They decide the categories, so in some sense, they have a stake in the larger project and how it evolves. You might even be able to tell why people are using the data in the first place, by looking at what tags they associate with your (or their) content. On the downside, tags can be problematic for first time users who need to search (rather than explore) your data. On several occasions I have been confronted with tag clouds that have descended (or ascended) into the realm of performance art. They are fascinating in of themselves, but fail to provide a meaningful path into the data.

Pre-selected keywords often work best when a clearly defined set of people are in charge of marking up the content. They are great for searching, and if indexed in a hierarchical structure, can provide semantically powerful groupings (especially for geographical information). And if you have a Third Normal Form database, then you never have to worry about misspellings or incorrect associations between your keywords (Disclaimer: I love 3NF databases. I know they don’t work for every project, but when your data fits that structure life is good). As a historian, however, I am wary of keywords that are imposed on a text. If someone calls himself a “justice,” I balk at calling him a “judge” even if it means a more efficient search.

Of course, it all depends on your data and what you want to do with it, but my favorite solution is have, at minimum, two layers of keywords. The bottom layer reflects the language in the text (similar to tagging), but those terms are then grouped into pre-selected types. So “justice,” “justice of the peace,” “judge,” “lawyer,” “barrister,” counselor” all get associated with type “legal.” You can fake hierarchies with tags, but it requires a far more careful attention to tag choices than I typically associate with that methodology.

I implemented the two-tiered approach in Project Quincy, but I would love to hear other suggestions and opinions.

and the name of a good book

When you leave a message on my friend’s voicemail, she asks that you give your name, your phone number, and the name of a good book. Since I’m in grad school for history, I tend to end my messages with phrases like “if you suddenly need to know about balance of power politics at the turn of the nineteenth-century then…,” and when she calls back, we have a good laugh before she tells me about the new novel she’s reading.

I think recommending a really good book is one of the easiest ways to markedly improve someone’s life — personally or professionally.

But finding good books for digital humanities can be a real struggle. Especially since the market is flooded with computer books, most of them completely unsuited to the needs of your average beginning digital humanist.

So, I’ve decided to create an annotated bibliography for the digital side of digital humanities (the most frequently used languages/computer science concepts). This is hardly an exhaustive list of all the books that you could find useful, but instead a few books (2 or 3) on a given topic, aimed at beginners or those who want to move beyond basic knowledge. I assume that experts already know how to find the books they need. I’m starting with the books I’ve found most helpful and hope people will suggest new titles and categories over time.

To see the list, click on “Annotated Bibliography”, in the sidebar (under Pages).

Happy reading!

Aristocrats, Agendas, & Adams

Some days I think the biggest problem facing digital historians is our workflow. We are already expected to juggle archival research and secondary readings with teaching and writing. Add a digital project into the mix and the temptation to pull out one’s hair becomes almost irresistible. The madness increases when you are the primary programmer on your own project.

I feel your pain.

Last weekend I was despairing of the many moving parts that will (in the next eighteen months) constitute my dissertation. I realized that I was spending all my time on the database I am building to house my research and complete my analysis. The database is indispensable (and being spun-out into an open source project to help others), but it is only one part of my project. Then I had a brain wave, courtesy of my favorite Founding Father, John Adams.

According to Adams, the greatest threat to a successful republican government was a society’s aristocrats — people with the time, talents, charisma, and/or money necessary to accomplish their goals. Such people (and he was one) would always want to be in control, but in a republic The People, not the aristocrats, were supposed to govern. Adams’ solution was to shut the aristocrats into the upper house of the legislature (read: Senate) where they would have sufficient scope for their talents but not enough power to hijack the whole system.

Last weekend, I realized that my database is the “aristocrat” of my dissertation. It is powerful, attracts attention, and brings me funding, but it has become a drain on my time and energy out of proportion to its role in the larger project. So, I am going to try a constitutional experiment.

I am giving myself one day each week to work on the database. I’ve chosen Wednesdays to coincide with my office hours at NINES. With any luck, this concentrated time will push me to be more productive, while freeing up the rest of my time to do the research and writing necessary to complete my degree. Of course, everything comes back to the database in the end: my research is loaded into the data structure and analyzed for patterns, which I then explore in the prose of my dissertation.

But hopefully, a system of checks and balances will keep me stable.