Category Archives: DAVILA

Tales from the Port: Part 2 — Migrating the Database

In retrospect, maybe I shouldn’t have promised to write a blog post every night this week. The port has been going well, but I’ve been working late each night, and it’s just too hard to write clear English prose starting at midnight. So here, at last, is the promised post on migrating Project Quincy’s database from Rails to Django.

My first love in Digital Humanities is data modeling and database architecture. The actual “code” in Project Quincy is pretty basic by professional programming standards. The underlying data structure is the real intellectual achievement. I spent six months of my nine month fellowship at the Scholars’ Lab designing a database that would effectively and efficiently model historical sources and allow scholars to catalog and analyze their research in meaningful ways. I even wrote a program called DAVILA to auto-generate interactive, color coded, annotated diagrams of my schema to show other historians how the system works. After all that work had been done, designing the interface for The Early American Foreign Service Database (EAFSD) took about two weeks.

As I mentioned last time, Rails and Django are similar frameworks for connecting databases to websites. They both have procedures for creating new database instances in several open source databases: MySQL, PostgreSQL, or SQLite3. But I already have a MySQL database with all the information I’ve been entering for the last three years. I really didn’t want to redo all that work, so I kept the same underlying database and connected it to the new Django project, with a few minor changes.

In the past three years, I’ve found a few shortcomings in the data model I created. So, I’ve used the port as an opportunity to add a couple more tables. Project Quincy records a “latitude” and “longitude” point for every location in the database, but I forgot to indicate which geographic coordinate system the latitude and longitude were from. Luckily for me all my coordinates were in the same system, so my maps work properly. But I can’t count on that forever, so I added a table called CoordinateSystem. I also extended the table that records which individuals were members of a specific organization. I had a field called “role” but there was no way of creating a list of all those roles and reusing them. I added two new tables “RoleTitle” and “RoleType” to allow for lists and grouping by type.

Then there were a few changes required by Django, mostly to my Footnotes module. Since Project Quincy is designed to store scholarly research, it gives users the ability to ‘footnote’ any record in the system by attaching the record to a cited source and saying whether or not that source supports the information in the record. This is accomplished by the Validations table, which can (but does not have to) be connected to any record in the database. This type of unspecified relationship is known as a “polymorphic association,” and Rails and Django implement polymorphic associations differently. Rails uses the name of the table to create the relationship. Django makes a meta-table that holds the names of all the other tables and assigns them a numeric key. So, I had to replace my table names in the Validations Table with their new keys. Figuring out how to do this took a post to the ever helpful Stackoverflow website and I was back in business. The old Footnotes module also had a little “Users” table that kept track of the people who could upload into the system. Django comes with a very powerful authentication system which also records users, so I got rid of my little table and hooked the footnotes module directly into the django_auth_user table.

I had greater plans to include an “Events” module. But, as I started to design one, I realized that this was not a decision I should make on my own and under a deadline. Project Quincy is an open source project, and I want other scholars to use it for their research. I need to do more reading on modeling events and talk to people before I commit to one particular structure.

So how did I actually migrate the database? MySQL has a nice command for backing up and redeploying a database; it’s called mysqldump. I took a dump (yes you read that correctly:-) of the database off my server and used it to create a transition database on my local machine. I then went in made the changes to the transition database directly, safe in the knowledge that could always restore the original database if I messed up. Once I had the transition database the way I wanted it, I made a second dump and used it to populate the database Django had already created for the new project.

Once all my data was in the new database, I ran an extremely helpful Django command ‘inspectdb.’ This lovely little program examined my database and created a file with its best guess on how to represent each database table in Django syntax. Then all I had to do was check for errors, and there weren’t many. It mistook my boolean (true/false) fields for integers and wanted me to specify some additional information for self joins (tables containing more than one relationship to the same, second table).

Once I had the tables properly represented it was time to sort them into their appropriate ‘applications.’ One of the biggest diferences between Rails and Django is their file structure. Rails creates a folder (with its own nested folders) for every table in the database. Django asks developers to chunk their database into folders called applications, designed to keep similar functions together in the system. Project Quincy was always designed with six modules: Biographical, Locations, Correspondence, Citations, Organizations, and Assignments. Each of these modules has 2 to 8 database tables inside it. One of the biggest decisions I had to make in planning this port was how to use applications. Did I put everything in one app folder, create an app for every module, or find an new way of grouping my system?

To make the decision, I wrote out index cards for each module listing the tables involved and what other modules it related to. I realized that Assignments and Organizations both brought people to a location for a reason, and that I would likely be visualizing those two kinds of relationships in vary simliar ways, but what should I call the new app? I ran the idea past my father, who has been designing databases since before I was born and recently took his entire development to python and django. He suggested the name “Activities” and that my future Events module could go in the same application.

After I sorted my tables into their appropriate (and newly created applications) I synced my Django project with the underlying database. So far, everything looks good.

Republicans of Letters

Here are the slides for my January 26th talk at Brown University’s Center for Digital Scholarship, “Republicans of Letters: Historical Social Networks and The Early American Foreign Service Database.”

The abstract ran as follows, “Jean Bauer, an advanced doctoral candidate in the Corcoran Department of History at the University of Virginia and creator of The Early American Foreign Service Database, will discuss her use and creation of digital tools to trace historical social networks through time and space. Drawing on her research into the commercial, kinship, patronage, and correspondence networks that helped form the American diplomatic and consular corps, Bauer will examine how relational databases and computational information design can help scholars identify and analyze historical social networks. The talk will include demos of two open source projects Bauer has developed to help scholars analyze their own research, Project Quincy and DAVILA.”

Some of the slides are pretty text intensive, so if something catches your eye, go ahead and hit pause!

Do You See What I See?

This is the abstract for my talk, “Do You See What I See?: Technical Documentation in Digital Humanities,” which I gave at the 2010 Chicago Colloquium on Digital Humanities and Computer Science.

The actual presentation was more informal and consisted of a series of examples from my various jobs as a database designer.

The slides are embedded below.

*********************

Technical diagrams are wonderfully compact ways of conveying information about extremely complex systems. However, they only work for people who have been trained to read them. Humanists might never see the technical diagrams that underlie the systems they work on, reducing their ability to make realistic plans or demands for their software needs. Conversely, if you design a database for a historian, and then hand him or her a basic E-R (Entity-Relationship) or UML (Unified Modeling Language) diagram, you will end up explaining the diagram’s nomenclature before you can talk about the database (and oftentimes you run out of time before getting back to the research question underlying the database). Either scenario removes the major advantage of technical diagrams and leads to an unnecessary divide between the technical and non-technical members of a digital humanities development team.

True collaboration requires documentation that can be read and understood by all participants. This is possible even for technical diagrams, but not without additional design work. Using the principles of information design, these diagrams can be enhanced through color coding, positioning, and annotation to make their meaning clear to non-technical readers. The end result is a single diagram that provides the same information to all team members. Unfortunately, graphical and information design are specialized fields in their own right, and not necessarily taught to people with backgrounds in systems architecture.

A tool that I have recently designed may provide some first steps in that direction. The program is called DAVILA, an open source relational database schema visualization and annotation tool. It is written in Processing using the toxiclibs physics library and released under the GPLv3. DAVILA comes out of my work on several history database projects, including my own dissertation research on the Early American Foreign Service. As a historian with a background in database architecture and a strong interest in information design, I have tried several ways of annotating technical diagrams to make them more accessible to my non-technical colleagues and employers. However, as the databases increased in complexity making new diagrams by hand became a time-consuming and frustrating process. The plan was to create a tool that would create these annotated diagrams quickly to accommodate the workflow used in rapid application development.

With DAVILA you fill out a CSV file to label your diagram with basic information about the program (project name, URL, developer names) and license the diagram under the copyright or copyleft of your choice. You can then group your entities into modules, color code those modules, indicate which entity is central to each module, and provide annotation text for every entity in the database.
Once DAVILA is running, users can click and drag the entities into different positions, expand an individual module for more information, or hide the non-central entities in a module to focus on another part of your schema. All in a fun, force-directed environment courtesy of the toxiclibs physics library. Pressing the space bar saves a snapshot of the window as a timestamped, vector-scaled pdf.

I now use DAVILA to describe databases and have received positive feedback on their readability from programmers and historians. I have little training in visual theory or graphic design and would welcome comments from those with more expertise in those fields. DAVILA also only works with database schemas, but similar tools would be extremely useful for other types of technical diagrams. Collaboration would undoubtably be improved if, when looking at a technical diagram, we could all see the same thing.

For more on the project see: http://www.jeanbauer.com/davila.html.

And now, without further ado: My Slides

Introducing DAVILA

I have just released my first open source project. HUZZAH!

DAVILA is a database schema visualization/annotation tool that creates “humanist readable” technical diagrams. It is written in Processing with the toxiclibs physics library and released under GPLv3. DAVILA takes in the database’s schema and a pipe separated customization file and uses them to produce an interactive, color-coded, annotated diagram similar in format to UML. There are many applications that will create technical diagrams based on database schema, but as a digital humanist I require more than they can provide.

Technical diagrams are wonderfully compact ways of conveying information about extremely complex systems. But they only work for people who have been trained to read them. If you design a database for a historian, and then hand him or her a basic E-R or UML diagram, you will end up explaining the diagram’s nomenclature before you can talk about the database (and oftentimes you run out of time before getting back to the research question underlying the database). This removes the major advantage of technical diagrams and can also create an unnecessary divide between the technical and non-technical members of a digital humanities development team.

I have become fascinated by how documenting a project (either in development or after release) can build community. I’m not just talking about user generated documentation (ala wikis), but rather the feeling created by a diagram or README file that really takes the time to explain how the software works and why it works the way it does. There is a generosity and even warmth that comes from thoughtful, helpful documentation, just as inadequate documentation can make someone feel stupid, slighted, or unwanted as a user/developer. I will be writing on this topic more in the months to come (perhaps leading up to an article). In the meantime, check out DAVILA and let me know what you think.

Project homepage: http://www.jeanbauer.com/davila.html