Do You See What I See?

This is the abstract for my talk, “Do You See What I See?: Technical Documentation in Digital Humanities,” which I gave at the 2010 Chicago Colloquium on Digital Humanities and Computer Science.

The actual presentation was more informal and consisted of a series of examples from my various jobs as a database designer.

The slides are embedded below.

*********************

Technical diagrams are wonderfully compact ways of conveying information about extremely complex systems. However, they only work for people who have been trained to read them. Humanists might never see the technical diagrams that underlie the systems they work on, reducing their ability to make realistic plans or demands for their software needs. Conversely, if you design a database for a historian, and then hand him or her a basic E-R (Entity-Relationship) or UML (Unified Modeling Language) diagram, you will end up explaining the diagram’s nomenclature before you can talk about the database (and oftentimes you run out of time before getting back to the research question underlying the database). Either scenario removes the major advantage of technical diagrams and leads to an unnecessary divide between the technical and non-technical members of a digital humanities development team.

True collaboration requires documentation that can be read and understood by all participants. This is possible even for technical diagrams, but not without additional design work. Using the principles of information design, these diagrams can be enhanced through color coding, positioning, and annotation to make their meaning clear to non-technical readers. The end result is a single diagram that provides the same information to all team members. Unfortunately, graphical and information design are specialized fields in their own right, and not necessarily taught to people with backgrounds in systems architecture.

A tool that I have recently designed may provide some first steps in that direction. The program is called DAVILA, an open source relational database schema visualization and annotation tool. It is written in Processing using the toxiclibs physics library and released under the GPLv3. DAVILA comes out of my work on several history database projects, including my own dissertation research on the Early American Foreign Service. As a historian with a background in database architecture and a strong interest in information design, I have tried several ways of annotating technical diagrams to make them more accessible to my non-technical colleagues and employers. However, as the databases increased in complexity making new diagrams by hand became a time-consuming and frustrating process. The plan was to create a tool that would create these annotated diagrams quickly to accommodate the workflow used in rapid application development.

With DAVILA you fill out a CSV file to label your diagram with basic information about the program (project name, URL, developer names) and license the diagram under the copyright or copyleft of your choice. You can then group your entities into modules, color code those modules, indicate which entity is central to each module, and provide annotation text for every entity in the database.
Once DAVILA is running, users can click and drag the entities into different positions, expand an individual module for more information, or hide the non-central entities in a module to focus on another part of your schema. All in a fun, force-directed environment courtesy of the toxiclibs physics library. Pressing the space bar saves a snapshot of the window as a timestamped, vector-scaled pdf.

I now use DAVILA to describe databases and have received positive feedback on their readability from programmers and historians. I have little training in visual theory or graphic design and would welcome comments from those with more expertise in those fields. DAVILA also only works with database schemas, but similar tools would be extremely useful for other types of technical diagrams. Collaboration would undoubtably be improved if, when looking at a technical diagram, we could all see the same thing.

For more on the project see: http://www.jeanbauer.com/davila.html.

And now, without further ado: My Slides

Into the woods we go (again)

I owe an apology to woodcutters everywhere.

After my misadventures yesterday (see previous post), I confidently announced that Google Earth had allowed me to find the missing entrance to the original Blue Ridge Tunnel. Luckily for me, I blogged about my experience and was thus kept from making (yet another) critical error.

My friend (and experienced geocacher) Kristen Jensen read my blog and did some research of her own. Turns out someone had marked the location on Panoramio, but put it north of the new tunnel not south. I looked at the Panoramio page and realized that I had misunderstood the woodcutter’s instructions. I was supposed to cross over the rail road tracks and continue up a hill following a trail that ran parallel to the new rail bed.

Sure enough, when my husband and I headed out this afternoon, we found the trail again on the other side of tracks. It was rather overgrown and I can’t really say I’m surprised I didn’t see it yesterday. Once found, however, it was quite easy to follow and the overgrowth was relatively light.

As promised, it ended in a grotto, right at the entrance of a most imposing stone structure carved out of the hill. Definitely the place.

When we arrived, we could see little lights flickering in the tunnel. Overtime, they became three fellow hikers who had ventured into the tunnel with flashlights to see how far under the mountain they could go (estimated .25 miles before they hit a drainage pipe). Apparently, there are plans to turn the tunnel and old railway into a Green Trail through Nelson County.

I got some great pictures of the tunnel (and surrounding foliage; it was a perfect fall day). We also walked inside the tunnel, but not too far, and marveled at how well the masonry has held up. On the way back, a cargo train thundered below us on the new tracks I had walked the day before.

On balance, I am rather pleased this expedition turned out the way it did. The light was better today than yesterday, and my husband had never been hiking before. I can’t image a better introduction to one of my favorite pastimes.

Buried within this story you will probably find a morality tale about the importance of local knowledge over satellite imagery, the triumph of idiosyncrasy over algorithms, and the value of a close network of friends. But personally (and artistically), I’m just glad everything worked out.

If you want to see the Tunnel for yourself, here is how to find it.

  • Take Interstate 64 to Exit 99.
  • Turn Left at the bottom of the exit ramp onto 250 West.
  • Drive down the mountain (about 1.5 miles) until you see a railroad overpass.
  • Park on the side of the road.
  • Cross the road and head down the trail on the far side of the overpass.
  • Follow that trail to the modern train tracks. (about 5mins)
  • Cross the train tracks, and turn Right up the hill.
  • Follow the path to the tunnel entrance (about 10 minutes, possibly less if you aren’t a photographer).

For those of you with GPS devices, the correct coordinates are 38° 2′ 24.13″ N 78° 51′ 44.45″ W.

The path is clear, but overgrown so I would recommend long pants and sturdy shoes (hiking boots if you have them). Also bring some water and a jacket, the temperature varies by 15 degrees depending on where you are.

And if you meet a friendly woodcutter, say hello.

A Walk In the Woods

I am revising my opinion of friendly woodcutters.

This afternoon, I was out in the woods by Afton, VA looking for an abandoned railroad tunnel from the 1850s. My plan was to photograph the tunnel entrance, so Will Thomas could use the image in his new book. When it was completed in 1858, the Blue Ridge Tunnel was the longest train tunnel in the United States.

When I agreed to take the photograph I thought I knew where the tunnel was.

Wikipedia had a lat/long value for the tunnel which I happily plugged into my iPhone and got directions. Once I got off the interstate, I found what appeared to be the Frontage Road indicated on Google Maps (although AT&T lost me at several points so the Google API was spotty at best once I actually got out there). The road was in bad repair and covered in leaves, although it had been paved at one point.

I parked my car in a motel parking lot across the street and headed into the woods. After about 15 minutes it became apparent that I was not heading towards the location. I doubled back and came across an older man in work pants and a sweatshirt sitting on the back of his truck, which was piled with firewood.

He took one look at me (long sleeve tee-shirt, jeans, fleece vest and camera bag) and asked “You looking for the tunnel?”

I said yes, and he proceeded to explain that I was on the wrong side of the mountain. I had to get back in my car, drive to the railway overpass, find the trail that ran into the woods and then follow the railroad tracks (at least, that’s what I think he said, his directions were rather convoluted and referenced small differences in a geography I had yet to experience).

I did as he suggested, found the trail and headed off again. I soon found myself walking along suspiciously pristine railroad tracks, but my GPS said I was now headed in the right direction. Then I saw the tunnel: a sheer, concrete slab with a perfect arch cut into it, complete with trademark U.S. Government art-deco typeface etched into the surface, proclaiming “Blue Ridge Tunnel, 1942-1944.”

Huh?

I took a few shots (just in case), but by this point it was 4pm, and the sun was getting close to the mountain range, so I decided to go home.

As I prepared to merge onto the interstate, I saw the laughably small sign indicating the real Frontage Rd., Rt. 212, and made a quick right. The road was gravel, but well maintained and I drove until I found a gate.

I thought about heading down on foot, but realized the light would be terrible by the time I found the tunnel. Luckily, I didn’t try.

When I got home, I went online to see if I could figure out what had happened. Apparently, Wikipedia gave me the coordinates for the modern tunnel (built during WWII to handle the increased rail traffic) even though the article was about the nineteenth-century construction. The article claimed the new tunnel was built in parallel with the old one, but I certainly didn’t see any antebellum construction nearby.

Finally, I looked at Google Earth and I think I have found the old tunnel, about 200m south by southwest of the new tunnel. As far as I can tell, the easiest way to reach it will be off that Frontage Road and far, far away from the trail suggested to me in the woods. I’m heading back tomorrow to see if I’m right.

Either way, I’m pretty sure the woodcutter was wrong.

*****************************************************

The story continues in my next post, “Into the woods we go (again).”

Partial Dates in Rails with Active Scaffold

As a historian I am constantly frustrated (but bemused) by how computers record time. They are so idealistically precise and hopelessly presentist in their default settings that creating intellectually honest digital history becomes impossible without some serious modifications.

In designing Project Quincy, my open-source software package for tracing historical networks through time and space, I quickly realized that how I handled dates would make or break my ability to design the kinds of interfaces and visualizations I needed to perform my analysis.

As a database designer, however, I balk at entering improperly formatted data into the database (I am firm in my belief that this will always come back to bite you in the end). So while MySQL lets me enter an unknown birth date as 1761-00-00, because it doesn’t require proper date formatting unless running in “NO_ZERO_DATE mode”, if I ever migrated the data to another database (say Postgres) I would be up to my eyebrows in errors. But I also don’t want to mislead my users into thinking that half the individuals in my database were born on January 1st.

So here are my solutions, drawn from the code of Project Quincy, which powers The Early American Foreign Service Database.

A relatively easy way to format partial dates in your frontend interface is to add 3 boolean flags to each date: year_known, month_known, and date_known. Then add the following method into your application helper (link to code here) to determine how you display each type of partial date.

For entering partial dates Project Quincy makes extensive use of ActiveScaffold, a Rails plugin that auto-generates an administrative backend. The nice thing about ActiveScaffold is that it is fully customizable. The problem with ActiveScaffold is that the defaults stink, so you basically end up customizing everything.

By default, ActiveScaffold treats date entry as a unified field, so you have to break up the javascript that knits day, month, and year together. You also have to change the default from today’s date to blank. If you enter only part of a date, it sets the other components to the lowest value possible.

Matt Mitchell, former Head of R&D for the University of Virginia Scholars’ Lab came up with the following elegant solution to my problem:

Create a partial view in /app/views/activescaffold/_common_date_select.html.erb and populate it with the following code.

And activate that partial with a helper method in your application_helper (link here).

And you should be good to go.

**************************************

If the pastie links go down, you can find the partial view and helper methods on Project Quincy at Github.

It’s [A]live!

It is with great pleasure, and no small amount of trepidation, that I announce the launch of the Early American Foreign Service Database (EAFSD to its friends). While the EAFSD has been designed as an independent, secondary source publication, it also exists symbiotically with my dissertation “Revolution-Mongers: Launching the U.S. Foreign Service, 1775-1825.”

I created the EAFSD to help me track the many diplomats, consuls, and special agents sent abroad by the various American governments during the first fifty-years of American state-building. Currently the database contains basic information about overseas assignments and a few dives into data visualization (an interactive Google map and Moritz Stefaner’s Relation Browser).

I have been a reluctant convert to the principles of Web 2.0, and I keenly feel the anxiety of releasing something before my perfectionist tendencies have been fully exhausted. The pages of the EAFSD are therefore sprinkled with requests for feedback and my (hopefully humorous) under construction page, featuring Benjamin West’s unfinished masterpiece the “American Commissioners of the Preliminary Peace Agreement with Great Britain.”

Over the next few months (and coming years) I will be adding more information to the database, allowing me to trace the social, professional, and correspondence networks from which American foreign service officers drew the information they needed to represent their new (and often disorganized) government. I will also be enhancing the data visualizations to include hypertrees, time lines, and network graphs.

This launch has been over two years in the making. As I look back over that time, I am amazed at the generous support I have received from my colleagues at the University of Virginia and the Digital Humanities community writ large. I wrote an extended acknowledgments page for the EAFSD, my humble attempt to recognize the help and encouragement that made this project possible.

Launching the EAFSD also gives me a chance to test, Project Quincy, the open-source software package I am developing for tracing historical networks through time and space. The EAFSD is the flagship (read guinea pig) application for Project Quincy. I hope my work will allow other scholars to explore the networks relevant to their own research.

To that end the EAFSD is, and always will be, open access and open source.

As We May Code

Since the debut of the iPad, I can’t stop thinking about path dependency.

These virtual keyboards separate letters from numbers from symbols onto three distinct screens. While using my iPhone, I find myself spelling out words like “between” because I’d rather keep to the letters keyboard than switch back and forth to write “b/w.”

Everyone talks about how you would never write a book (or even an essay) on an iPad, but that would be a piece of cake compared to writing a computer program.

Laziness is one of hackers’ most beloved vices. This has the salutary effect of keeping programs short, the fewer keystrokes to accomplish a task the better. It also means that programing languages are intensely optimized for the QWERTY keyboard.

The classic example is probably something like Perl, which uses the breadth of the QWERTY symbology to specify a wide range of mathematical and logical concepts, with very little reference to the notation of either higher mathematics or formal logic. Perhaps if people had thought of using typing machines as “thinking machines” they would have included the Greek alphabet along with the Roman and basic accounting symbols.

While some languages like Ruby and Python are closer to English, they would still be a nightmare.

All this makes me wonder if we are stuck with the QWERTY keyboard for good. Or if not, what would new programming languages would grow up in its absence.

Thoughts?

Introducing DAVILA

I have just released my first open source project. HUZZAH!

DAVILA is a database schema visualization/annotation tool that creates “humanist readable” technical diagrams. It is written in Processing with the toxiclibs physics library and released under GPLv3. DAVILA takes in the database’s schema and a pipe separated customization file and uses them to produce an interactive, color-coded, annotated diagram similar in format to UML. There are many applications that will create technical diagrams based on database schema, but as a digital humanist I require more than they can provide.

Technical diagrams are wonderfully compact ways of conveying information about extremely complex systems. But they only work for people who have been trained to read them. If you design a database for a historian, and then hand him or her a basic E-R or UML diagram, you will end up explaining the diagram’s nomenclature before you can talk about the database (and oftentimes you run out of time before getting back to the research question underlying the database). This removes the major advantage of technical diagrams and can also create an unnecessary divide between the technical and non-technical members of a digital humanities development team.

I have become fascinated by how documenting a project (either in development or after release) can build community. I’m not just talking about user generated documentation (ala wikis), but rather the feeling created by a diagram or README file that really takes the time to explain how the software works and why it works the way it does. There is a generosity and even warmth that comes from thoughtful, helpful documentation, just as inadequate documentation can make someone feel stupid, slighted, or unwanted as a user/developer. I will be writing on this topic more in the months to come (perhaps leading up to an article). In the meantime, check out DAVILA and let me know what you think.

Project homepage: http://www.jeanbauer.com/davila.html

Trees

(with apologies to Joyce Kilmer)

I think that I shall never see
A graph as lovely as a tree.

A tree whose thick, strong root is prest
Against the lower bound at rest;

A tree that looks a little strange
While its data does self arrange;

A tree that may grow up or out
But never round and round about;

Upon whose path constraint is lain
To go forward or back again.

Graphs are made by fools like me,
But only math can make a tree.
~ Jean Bauer

The Design Bug

Edward Tufte should come with a warning label. Since I took his course a year ago last October, I have been bitten by the design bug. I realized the depth of this obsession last night while putting together a projected syllabus for a summer course in the History Department. Just a simple word processing document, right? Wrong.

Before I knew it, I was agonizing over font choices (what is wrong with Times New Roman?), getting the spacing just right between the columns (ensuring that the document will have to be exported as a pdf file to avoid disaster), and designing a banner graphic (two versions: a large one for the front page and a smaller one for subsequent pages). And not just a pretty picture, but a semantically rich graphic, which made me think hard about the essential theme of the course before I could render it visually.

This is an internal document! It is only supposed to get the course accredited, but I just can’t send it in without some attention to its visual impact.

I wasn’t always like this. Until about eighteen months ago, I had two intense, but distinct, sets of aesthetic appreciation: one based in logic and one based in visual or written art. I have always been drawn to “elegant solutions,” whether in the relational algebra behind a third normal form database, a well constructed thesis, or a beautiful piece of code. I am also a photographer and the daughter of a novelist, so I prize an arresting composition of shapes or colors or words to convey thoughts and feelings.

My new found interest in graphic and informational design is starting to blend these two senses together. Particularly, as I seek to find more effective ways of visually rendering my research on information flows in the Early American Foreign Service.

I don’t know where this newfound interest is taking me, or my scholarship. I only know that, for now, I’m along for the ride.