Category Archives: design

Tales from the Port: Part 2 — Migrating the Database

In retrospect, maybe I shouldn’t have promised to write a blog post every night this week. The port has been going well, but I’ve been working late each night, and it’s just too hard to write clear English prose starting at midnight. So here, at last, is the promised post on migrating Project Quincy’s database from Rails to Django.

My first love in Digital Humanities is data modeling and database architecture. The actual “code” in Project Quincy is pretty basic by professional programming standards. The underlying data structure is the real intellectual achievement. I spent six months of my nine month fellowship at the Scholars’ Lab designing a database that would effectively and efficiently model historical sources and allow scholars to catalog and analyze their research in meaningful ways. I even wrote a program called DAVILA to auto-generate interactive, color coded, annotated diagrams of my schema to show other historians how the system works. After all that work had been done, designing the interface for The Early American Foreign Service Database (EAFSD) took about two weeks.

As I mentioned last time, Rails and Django are similar frameworks for connecting databases to websites. They both have procedures for creating new database instances in several open source databases: MySQL, PostgreSQL, or SQLite3. But I already have a MySQL database with all the information I’ve been entering for the last three years. I really didn’t want to redo all that work, so I kept the same underlying database and connected it to the new Django project, with a few minor changes.

In the past three years, I’ve found a few shortcomings in the data model I created. So, I’ve used the port as an opportunity to add a couple more tables. Project Quincy records a “latitude” and “longitude” point for every location in the database, but I forgot to indicate which geographic coordinate system the latitude and longitude were from. Luckily for me all my coordinates were in the same system, so my maps work properly. But I can’t count on that forever, so I added a table called CoordinateSystem. I also extended the table that records which individuals were members of a specific organization. I had a field called “role” but there was no way of creating a list of all those roles and reusing them. I added two new tables “RoleTitle” and “RoleType” to allow for lists and grouping by type.

Then there were a few changes required by Django, mostly to my Footnotes module. Since Project Quincy is designed to store scholarly research, it gives users the ability to ‘footnote’ any record in the system by attaching the record to a cited source and saying whether or not that source supports the information in the record. This is accomplished by the Validations table, which can (but does not have to) be connected to any record in the database. This type of unspecified relationship is known as a “polymorphic association,” and Rails and Django implement polymorphic associations differently. Rails uses the name of the table to create the relationship. Django makes a meta-table that holds the names of all the other tables and assigns them a numeric key. So, I had to replace my table names in the Validations Table with their new keys. Figuring out how to do this took a post to the ever helpful Stackoverflow website and I was back in business. The old Footnotes module also had a little “Users” table that kept track of the people who could upload into the system. Django comes with a very powerful authentication system which also records users, so I got rid of my little table and hooked the footnotes module directly into the django_auth_user table.

I had greater plans to include an “Events” module. But, as I started to design one, I realized that this was not a decision I should make on my own and under a deadline. Project Quincy is an open source project, and I want other scholars to use it for their research. I need to do more reading on modeling events and talk to people before I commit to one particular structure.

So how did I actually migrate the database? MySQL has a nice command for backing up and redeploying a database; it’s called mysqldump. I took a dump (yes you read that correctly:-) of the database off my server and used it to create a transition database on my local machine. I then went in made the changes to the transition database directly, safe in the knowledge that could always restore the original database if I messed up. Once I had the transition database the way I wanted it, I made a second dump and used it to populate the database Django had already created for the new project.

Once all my data was in the new database, I ran an extremely helpful Django command ‘inspectdb.’ This lovely little program examined my database and created a file with its best guess on how to represent each database table in Django syntax. Then all I had to do was check for errors, and there weren’t many. It mistook my boolean (true/false) fields for integers and wanted me to specify some additional information for self joins (tables containing more than one relationship to the same, second table).

Once I had the tables properly represented it was time to sort them into their appropriate ‘applications.’ One of the biggest diferences between Rails and Django is their file structure. Rails creates a folder (with its own nested folders) for every table in the database. Django asks developers to chunk their database into folders called applications, designed to keep similar functions together in the system. Project Quincy was always designed with six modules: Biographical, Locations, Correspondence, Citations, Organizations, and Assignments. Each of these modules has 2 to 8 database tables inside it. One of the biggest decisions I had to make in planning this port was how to use applications. Did I put everything in one app folder, create an app for every module, or find an new way of grouping my system?

To make the decision, I wrote out index cards for each module listing the tables involved and what other modules it related to. I realized that Assignments and Organizations both brought people to a location for a reason, and that I would likely be visualizing those two kinds of relationships in vary simliar ways, but what should I call the new app? I ran the idea past my father, who has been designing databases since before I was born and recently took his entire development to python and django. He suggested the name “Activities” and that my future Events module could go in the same application.

After I sorted my tables into their appropriate (and newly created applications) I synced my Django project with the underlying database. So far, everything looks good.

Tales from the Port: Day 1 — Dry Dock

Welcome to my one week blog series, Tales from the Port, chronicling my rewriting of Project Quincy from Ruby on Rails to Django. This series may be a little rough around the edges — I’ll be writing it every night after I accomplish my goals for that day. But I wanted to give people a window into the life of (at least one) Digital Humanities developer. To see what it’s like to imperfectly translate your research and theories into lines of code and then watch your project come ‘alive.’

Of course, Digital Humanities is not about writing code or knowing how to program. DH is a community of people searching for a new way of working and researching, and we find inspiration in many disciplines. But, this is going to be one of the more intense work weeks of my life to date, so I’m hoping you’ll keep me company.

First, some background:

Project Quincy is an open source software package I wrote a few years back to trace historical networks through time and space. It is an integral part of my dissertation and currently runs The Early American Foreign Service Database, which went live almost two years ago on October 18, 2010. Project Quincy got its start at the University of Virginia’s Scholars’ Lab, where I was a graduate student fellow in 2008-2009. I was very pleased with the system when I first designed it, but technology doesn’t stop when your fellowship ends. Faced with an aging code base and an interface that could no longer accomodate the visual arguments which are becoming more and more central to my dissertation, it was time to upgrade. I could have taken Project Quincy from Rails 2.3.8 to 3.0 and tweaked the stylesheet along the way, but I am no longer at the University of Virginia. Last summer I was hired by the Brown University Library as their first Digital Humanities Librarian. My new colleagues program (mostly) in Django, and I’ve already met one or two professors here who could probably use the system for their own research. It was time to learn some new skills.

I have thoroughly enjoyed learning Python and Django, so much so that I will probably write more on them once this week is over. Since finishing up the tutorials, I have spent the last two weeks planning the port. As the week unfolds I’ll discuss how the system is changing and my reasons for making those changes. Although both Django and Rails exist to connect databases to websites with minimal headaches for the programmer, they have different affordances and make very different assumptions about what constitutes beautiful code.

So what have I done today?

Today I created the new Django project which I will be extending into ProjectQuincy. I had hoped to have the entire data model rewritten by now, but no plan survives contact with a new development environment . . . Apparently when I got my new MacBook Pro my MySQL installation did not survive the transfer. It took a few hours of research, then reinstalling MacPorts, before I could really get underway. I will have more to say tomorrow on my changes to the data structure.

Planning this port has been a bittersweet experience. I’ve had a great deal of fun learning a new language and framework. My colleagues at Brown, particularly Birkin Diana and Joseph Rhoads, have been extremely helpful: suggesting good training materials, answering questions, and teaching me the ever crucial “best practices.” Thanks to their help, I am looking forward to having a cleaner, more robust system. But, my fellowship year at Scholars’ Lab is a cherished memory, and so many people there helped and taught me as I figured out how to make The Early American Foreign Service Database a reality. As I worked on the project my friends pitched in, putting their own stamp on the code base. This new, fresh start won’t have code from Bess Sadler, Matt Mitchell, Joe Gilbert, or Wayne Graham. For a little while it will just be my code, and that feels a little lonely. But it won’t last. Soon I’ll be showing the system to my colleagues at Brown, and I can’t wait to see Project Quincy afresh through their eyes.

Republicans of Letters

Here are the slides for my January 26th talk at Brown University’s Center for Digital Scholarship, “Republicans of Letters: Historical Social Networks and The Early American Foreign Service Database.”

The abstract ran as follows, “Jean Bauer, an advanced doctoral candidate in the Corcoran Department of History at the University of Virginia and creator of The Early American Foreign Service Database, will discuss her use and creation of digital tools to trace historical social networks through time and space. Drawing on her research into the commercial, kinship, patronage, and correspondence networks that helped form the American diplomatic and consular corps, Bauer will examine how relational databases and computational information design can help scholars identify and analyze historical social networks. The talk will include demos of two open source projects Bauer has developed to help scholars analyze their own research, Project Quincy and DAVILA.”

Some of the slides are pretty text intensive, so if something catches your eye, go ahead and hit pause!

In Pursuit of Elegance

I wrote this for the HASTAC Scholars’ forum on Critical Code studies, which I co-hosted in January. To see the post in its original context, click here.

***********************

One of the older jokes about programming states that every great programmer suffers from the following three sins: laziness, impatience, and hubris. Laziness makes you write the fewest lines of code necessary to accomplish a given task. Impatience means that your program will run as quickly as possible. And hubris compels you to create code that is as beautiful as you can make it. These three criteria – length, speed, and elegance – are the benchmark for evaluating code.

But what makes code elegant? One of the first things you learn in a programming class is that (in most languages) the computer will completely disregard any white space beyond the single space required to differentiate one part of the statement from another. However, in the next breath, your instructor adjures you to follow indentation guidelines and fill the eye space of your code with enough blank spaces to make a Scandinavian graphics designer drool. So your code ends up looking rather like an ee cummings poem with lots of random space, oddly placed capitalization, and sporadic punctuation.

Of course that is the perspective of someone who is not used to looking at code. The indentations draw the eye to nested components (loops, subroutines, etc), the capitalization signifies variables or other important components of the program, and the punctuation stands in for the myriad of mathematical and logical operators absent from a QWERTY keyboard.

I believe the fear Matt Kirschenbaum discusses above comes in part from the visual strangeness of code. It just looks weird and impenetrable. The mantra embraced by too many programmers of “It was hard to write, it should be hard to read” doesn’t help the situation either. Academics don’t like feeling stupid (especially once they’ve left their graduate student days behind them) and the seeming impenetrability of programming syntax makes them feel that way.

Of course it’s not the academic who is stupid, it’s the computer. People who have little experience with how computers actually work often miss this critical distinction. The “thinking machine” does not think. Like Mark Sample’s now lost haiku generator, the computer has no vocabulary we do not give it. And as Mark Marino points out, as far as the computer is concerned, even those words are completely devoid of meaning. This gives the programmer an extraordinary amount of power, but within the constraints that everything must be broken down into components so simple even a computer can work with them.

My hope for Critical Code Studies, a field I have only just become acquainted while helping to create this forum, is that by analyzing the thick textuality of code and the highly social, highly contingent environments in which code is generated, we can find better ways of explaining code to those who are afraid of it.

As a historian of Early American Diplomacy who spends much of her day designing and building databases, websites, and data visualizations I find myself constantly trying to allay the fears of my less technically trained colleagues. However, there are crucial connections between the work of programmers and humanists. I think the link may lie with aesthetics.

This brings us back to laziness, impatience, and hubris. Speed and brevity were virtues of necessity in the early days of computer science. Early computers had very little memory or processing power. Even an efficient program could take hours, an inefficient one weeks. Also if the program was too long it could not be entered on a punch card. The vast amount of memory and processing power on even a budget home computer have made these restrictions all but obsolete except in the case of very small devices or very large data sets. Yet these criteria continue to have great psychological power, not unlike a great professor’s ability to reduce the complexity of a historical event to the essential points her students will remember, or the identification of previously unrecognized leitmotifs which draws an author’s body of work into a new stylistic whole.

The virtue of elegance comes straight from mathematics, which to me suggests that it is built into the very fabric of the universe. We all recognize beauty in some form. Sometimes the best way to understand a foreign culture is to determine what they value as beautiful and find in it the beauty that they perceive. The elegance of code is bound up in structure, process, and product. The better we can explain it, the more accessible code will become.

Do You See What I See?

This is the abstract for my talk, “Do You See What I See?: Technical Documentation in Digital Humanities,” which I gave at the 2010 Chicago Colloquium on Digital Humanities and Computer Science.

The actual presentation was more informal and consisted of a series of examples from my various jobs as a database designer.

The slides are embedded below.

*********************

Technical diagrams are wonderfully compact ways of conveying information about extremely complex systems. However, they only work for people who have been trained to read them. Humanists might never see the technical diagrams that underlie the systems they work on, reducing their ability to make realistic plans or demands for their software needs. Conversely, if you design a database for a historian, and then hand him or her a basic E-R (Entity-Relationship) or UML (Unified Modeling Language) diagram, you will end up explaining the diagram’s nomenclature before you can talk about the database (and oftentimes you run out of time before getting back to the research question underlying the database). Either scenario removes the major advantage of technical diagrams and leads to an unnecessary divide between the technical and non-technical members of a digital humanities development team.

True collaboration requires documentation that can be read and understood by all participants. This is possible even for technical diagrams, but not without additional design work. Using the principles of information design, these diagrams can be enhanced through color coding, positioning, and annotation to make their meaning clear to non-technical readers. The end result is a single diagram that provides the same information to all team members. Unfortunately, graphical and information design are specialized fields in their own right, and not necessarily taught to people with backgrounds in systems architecture.

A tool that I have recently designed may provide some first steps in that direction. The program is called DAVILA, an open source relational database schema visualization and annotation tool. It is written in Processing using the toxiclibs physics library and released under the GPLv3. DAVILA comes out of my work on several history database projects, including my own dissertation research on the Early American Foreign Service. As a historian with a background in database architecture and a strong interest in information design, I have tried several ways of annotating technical diagrams to make them more accessible to my non-technical colleagues and employers. However, as the databases increased in complexity making new diagrams by hand became a time-consuming and frustrating process. The plan was to create a tool that would create these annotated diagrams quickly to accommodate the workflow used in rapid application development.

With DAVILA you fill out a CSV file to label your diagram with basic information about the program (project name, URL, developer names) and license the diagram under the copyright or copyleft of your choice. You can then group your entities into modules, color code those modules, indicate which entity is central to each module, and provide annotation text for every entity in the database.
Once DAVILA is running, users can click and drag the entities into different positions, expand an individual module for more information, or hide the non-central entities in a module to focus on another part of your schema. All in a fun, force-directed environment courtesy of the toxiclibs physics library. Pressing the space bar saves a snapshot of the window as a timestamped, vector-scaled pdf.

I now use DAVILA to describe databases and have received positive feedback on their readability from programmers and historians. I have little training in visual theory or graphic design and would welcome comments from those with more expertise in those fields. DAVILA also only works with database schemas, but similar tools would be extremely useful for other types of technical diagrams. Collaboration would undoubtably be improved if, when looking at a technical diagram, we could all see the same thing.

For more on the project see: http://www.jeanbauer.com/davila.html.

And now, without further ado: My Slides

As We May Code

Since the debut of the iPad, I can’t stop thinking about path dependency.

These virtual keyboards separate letters from numbers from symbols onto three distinct screens. While using my iPhone, I find myself spelling out words like “between” because I’d rather keep to the letters keyboard than switch back and forth to write “b/w.”

Everyone talks about how you would never write a book (or even an essay) on an iPad, but that would be a piece of cake compared to writing a computer program.

Laziness is one of hackers’ most beloved vices. This has the salutary effect of keeping programs short, the fewer keystrokes to accomplish a task the better. It also means that programing languages are intensely optimized for the QWERTY keyboard.

The classic example is probably something like Perl, which uses the breadth of the QWERTY symbology to specify a wide range of mathematical and logical concepts, with very little reference to the notation of either higher mathematics or formal logic. Perhaps if people had thought of using typing machines as “thinking machines” they would have included the Greek alphabet along with the Roman and basic accounting symbols.

While some languages like Ruby and Python are closer to English, they would still be a nightmare.

All this makes me wonder if we are stuck with the QWERTY keyboard for good. Or if not, what would new programming languages would grow up in its absence.

Thoughts?

Introducing DAVILA

I have just released my first open source project. HUZZAH!

DAVILA is a database schema visualization/annotation tool that creates “humanist readable” technical diagrams. It is written in Processing with the toxiclibs physics library and released under GPLv3. DAVILA takes in the database’s schema and a pipe separated customization file and uses them to produce an interactive, color-coded, annotated diagram similar in format to UML. There are many applications that will create technical diagrams based on database schema, but as a digital humanist I require more than they can provide.

Technical diagrams are wonderfully compact ways of conveying information about extremely complex systems. But they only work for people who have been trained to read them. If you design a database for a historian, and then hand him or her a basic E-R or UML diagram, you will end up explaining the diagram’s nomenclature before you can talk about the database (and oftentimes you run out of time before getting back to the research question underlying the database). This removes the major advantage of technical diagrams and can also create an unnecessary divide between the technical and non-technical members of a digital humanities development team.

I have become fascinated by how documenting a project (either in development or after release) can build community. I’m not just talking about user generated documentation (ala wikis), but rather the feeling created by a diagram or README file that really takes the time to explain how the software works and why it works the way it does. There is a generosity and even warmth that comes from thoughtful, helpful documentation, just as inadequate documentation can make someone feel stupid, slighted, or unwanted as a user/developer. I will be writing on this topic more in the months to come (perhaps leading up to an article). In the meantime, check out DAVILA and let me know what you think.

Project homepage: http://www.jeanbauer.com/davila.html

The Design Bug

Edward Tufte should come with a warning label. Since I took his course a year ago last October, I have been bitten by the design bug. I realized the depth of this obsession last night while putting together a projected syllabus for a summer course in the History Department. Just a simple word processing document, right? Wrong.

Before I knew it, I was agonizing over font choices (what is wrong with Times New Roman?), getting the spacing just right between the columns (ensuring that the document will have to be exported as a pdf file to avoid disaster), and designing a banner graphic (two versions: a large one for the front page and a smaller one for subsequent pages). And not just a pretty picture, but a semantically rich graphic, which made me think hard about the essential theme of the course before I could render it visually.

This is an internal document! It is only supposed to get the course accredited, but I just can’t send it in without some attention to its visual impact.

I wasn’t always like this. Until about eighteen months ago, I had two intense, but distinct, sets of aesthetic appreciation: one based in logic and one based in visual or written art. I have always been drawn to “elegant solutions,” whether in the relational algebra behind a third normal form database, a well constructed thesis, or a beautiful piece of code. I am also a photographer and the daughter of a novelist, so I prize an arresting composition of shapes or colors or words to convey thoughts and feelings.

My new found interest in graphic and informational design is starting to blend these two senses together. Particularly, as I seek to find more effective ways of visually rendering my research on information flows in the Early American Foreign Service.

I don’t know where this newfound interest is taking me, or my scholarship. I only know that, for now, I’m along for the ride.