In retrospect, maybe I shouldn’t have promised to write a blog post every night this week. The port has been going well, but I’ve been working late each night, and it’s just too hard to write clear English prose starting at midnight. So here, at last, is the promised post on migrating Project Quincy’s database from Rails to Django.
My first love in Digital Humanities is data modeling and database architecture. The actual “code” in Project Quincy is pretty basic by professional programming standards. The underlying data structure is the real intellectual achievement. I spent six months of my nine month fellowship at the Scholars’ Lab designing a database that would effectively and efficiently model historical sources and allow scholars to catalog and analyze their research in meaningful ways. I even wrote a program called DAVILA to auto-generate interactive, color coded, annotated diagrams of my schema to show other historians how the system works. After all that work had been done, designing the interface for The Early American Foreign Service Database (EAFSD) took about two weeks.
As I mentioned last time, Rails and Django are similar frameworks for connecting databases to websites. They both have procedures for creating new database instances in several open source databases: MySQL, PostgreSQL, or SQLite3. But I already have a MySQL database with all the information I’ve been entering for the last three years. I really didn’t want to redo all that work, so I kept the same underlying database and connected it to the new Django project, with a few minor changes.
In the past three years, I’ve found a few shortcomings in the data model I created. So, I’ve used the port as an opportunity to add a couple more tables. Project Quincy records a “latitude” and “longitude” point for every location in the database, but I forgot to indicate which geographic coordinate system the latitude and longitude were from. Luckily for me all my coordinates were in the same system, so my maps work properly. But I can’t count on that forever, so I added a table called CoordinateSystem. I also extended the table that records which individuals were members of a specific organization. I had a field called “role” but there was no way of creating a list of all those roles and reusing them. I added two new tables “RoleTitle” and “RoleType” to allow for lists and grouping by type.
Then there were a few changes required by Django, mostly to my Footnotes module. Since Project Quincy is designed to store scholarly research, it gives users the ability to ‘footnote’ any record in the system by attaching the record to a cited source and saying whether or not that source supports the information in the record. This is accomplished by the Validations table, which can (but does not have to) be connected to any record in the database. This type of unspecified relationship is known as a “polymorphic association,” and Rails and Django implement polymorphic associations differently. Rails uses the name of the table to create the relationship. Django makes a meta-table that holds the names of all the other tables and assigns them a numeric key. So, I had to replace my table names in the Validations Table with their new keys. Figuring out how to do this took a post to the ever helpful Stackoverflow website and I was back in business. The old Footnotes module also had a little “Users” table that kept track of the people who could upload into the system. Django comes with a very powerful authentication system which also records users, so I got rid of my little table and hooked the footnotes module directly into the django_auth_user table.
I had greater plans to include an “Events” module. But, as I started to design one, I realized that this was not a decision I should make on my own and under a deadline. Project Quincy is an open source project, and I want other scholars to use it for their research. I need to do more reading on modeling events and talk to people before I commit to one particular structure.
So how did I actually migrate the database? MySQL has a nice command for backing up and redeploying a database; it’s called mysqldump. I took a dump (yes you read that correctly:-) of the database off my server and used it to create a transition database on my local machine. I then went in made the changes to the transition database directly, safe in the knowledge that could always restore the original database if I messed up. Once I had the transition database the way I wanted it, I made a second dump and used it to populate the database Django had already created for the new project.
Once all my data was in the new database, I ran an extremely helpful Django command ‘inspectdb.’ This lovely little program examined my database and created a file with its best guess on how to represent each database table in Django syntax. Then all I had to do was check for errors, and there weren’t many. It mistook my boolean (true/false) fields for integers and wanted me to specify some additional information for self joins (tables containing more than one relationship to the same, second table).
Once I had the tables properly represented it was time to sort them into their appropriate ‘applications.’ One of the biggest diferences between Rails and Django is their file structure. Rails creates a folder (with its own nested folders) for every table in the database. Django asks developers to chunk their database into folders called applications, designed to keep similar functions together in the system. Project Quincy was always designed with six modules: Biographical, Locations, Correspondence, Citations, Organizations, and Assignments. Each of these modules has 2 to 8 database tables inside it. One of the biggest decisions I had to make in planning this port was how to use applications. Did I put everything in one app folder, create an app for every module, or find an new way of grouping my system?
To make the decision, I wrote out index cards for each module listing the tables involved and what other modules it related to. I realized that Assignments and Organizations both brought people to a location for a reason, and that I would likely be visualizing those two kinds of relationships in vary simliar ways, but what should I call the new app? I ran the idea past my father, who has been designing databases since before I was born and recently took his entire development to python and django. He suggested the name “Activities” and that my future Events module could go in the same application.
After I sorted my tables into their appropriate (and newly created applications) I synced my Django project with the underlying database. So far, everything looks good.