Category Archives: databases

Aristocrats, Agendas, & Adams

Some days I think the biggest problem facing digital historians is our workflow. We are already expected to juggle archival research and secondary readings with teaching and writing. Add a digital project into the mix and the temptation to pull out one’s hair becomes almost irresistible. The madness increases when you are the primary programmer on your own project.

I feel your pain.

Last weekend I was despairing of the many moving parts that will (in the next eighteen months) constitute my dissertation. I realized that I was spending all my time on the database I am building to house my research and complete my analysis. The database is indispensable (and being spun-out into an open source project to help others), but it is only one part of my project. Then I had a brain wave, courtesy of my favorite Founding Father, John Adams.

According to Adams, the greatest threat to a successful republican government was a society’s aristocrats — people with the time, talents, charisma, and/or money necessary to accomplish their goals. Such people (and he was one) would always want to be in control, but in a republic The People, not the aristocrats, were supposed to govern. Adams’ solution was to shut the aristocrats into the upper house of the legislature (read: Senate) where they would have sufficient scope for their talents but not enough power to hijack the whole system.

Last weekend, I realized that my database is the “aristocrat” of my dissertation. It is powerful, attracts attention, and brings me funding, but it has become a drain on my time and energy out of proportion to its role in the larger project. So, I am going to try a constitutional experiment.

I am giving myself one day each week to work on the database. I’ve chosen Wednesdays to coincide with my office hours at NINES. With any luck, this concentrated time will push me to be more productive, while freeing up the rest of my time to do the research and writing necessary to complete my degree. Of course, everything comes back to the database in the end: my research is loaded into the data structure and analyzed for patterns, which I then explore in the prose of my dissertation.

But hopefully, a system of checks and balances will keep me stable.

Troubleshooting Rails Migrations

I’m not sure why this worked, but if it can help anyone else then here it goes . . .

My rails app decided to throw a hissy fit at me by refusing to let me run rake db:migrate because (as it claimed)

rake aborted!
An error has occurred, all later migrations canceled:

Mysql::Error: Table ‘states’ already exists: CREATE TABLE `states` (`id` int(11) DEFAULT NULL auto_increment PRIMARY KEY, `name` varchar(255), `continent_id` int(11) NOT NULL, `notes` text, `created_at` datetime, `updated_at` datetime, FOREIGN KEY (continent_id) REFERENCES continents (id)) ENGINE=InnoDB

This was deeply weird because that migration was the 3rd of about 30 migrations in my application. Rails had happily skipped over it in the many months since I had first run it to create the states table, moving directly to the new migrations and running them. Something was wrong, but I had no idea what. After many hours of frustration I stumbled across a viable (albeit hacky) solution to my problem.

Looking at the following thread on Stack Overflow (, I saw that if I commented out the create table in the failing migration and ran rake db:migrate again rails would skip the troubled migration. I meant to do this, but in my haste I only commented out the lines the describing the table, forgetting to comment out the “create_table” and its associated “end” lines of code.

I ran rake db:migrate before I realized this and got the following:

(in /Users/jabauer/pq_eafsd)
== CreateStates: migrating ===================================================
== CreateStates: migrated (0.0000s) ==========================================

I quickly realized my mistake and took out the comments, hoping to put the fields back in the table. When I did so, rake db:migrate ran perfectly. I am pretty new to rails, so I have no idea why this worked or if it is a replicable solution. However, I am offering it to you in the hopes that it will work, save you time, and keep you from having to leave a commented out migration permanently in your app. And, if anyone knows why this worked (or has a better fix), please let me know!

Normality: For or Against?

I wrote this post a year ago, when the Scholars’ Lab Blog was just getting off the ground. To see the post in its original context (including the interesting conversation it sparked in the comments), click here.


I’m a historian who is currently designing and/or building four databases. As I work through the complexities of each project, I’m struck by two thoughts.

First: I’m overworked.

Second: I like the way relational algebra makes me think.

Good database design involves breaking a data set into the smallest viable components and then linking those components back together to facilitate complex analysis. This process, known as normalization, helps keep the data set free of duplicates and protects the data from being unintentionally deleted or unevenly updated.

As I research merchants in the eighteenth century and how they connected people and empires with far-flung locations and transfered goods and ideas across oceans, I find it helpful to break those multivalent connections into discrete units. Who wrote to whom? Who worked for whom? Who became a diplomat or consul for the United States? Who recommended him for that position? And so on. Each question has become a relationship in my design for the Early American Foreign Service Database (EAFSD), and by linking all this (and more) information together, the EAFSD will track how the U.S. Foreign Service developed over fifty years. But there is a catch.

When the database is done, I plan on publishing it online so that other researchers can have access to its data. However, I cannot deny that the EAFSD was designed to answer questions specific to my dissertation. Other researchers looking at information gathered from the papers of diplomats, consuls, and merchants will (hopefully) want to ask other questions which my database may or not be able to answer. For example, I only focus on merchants who had a clear connection to the U.S. government (i.e., received positions in the Foreign Service), which means that a large segment of the merchant community will not appear in the database.

Along with the completed database I plan on releasing the source code (both for the database itself and the web application that permits the data migrations and the basic query structure) under an open source license, hopefully making it easier for other scholars to create their own relational databases to track social networks and institutional development. Once those databases are published similar issues will arise.

When a scholar decides to use a relational database in her research, she is making a decision about methodology — not theory. A relational database does not dictate what scholars will find in a given data set, but rather shapes their search in ways that need to remain in the forefront of all our minds, even if the methodological discussions get relegated to footnotes or appendices. If an astronomer has to state the specifications of the telescope along with the data received, a digital humanist should be clear about the choices she made (and why) in designing a database to facilitate her analysis and the analytical limits of the final design.

I became a historian because I see the world as a complex and contingent place that doesn’t respond well to being forced into a constraining model. While having the EAFSD is a necessary condition of my dissertation it is not a sufficient one.

There are real world ambiguities and unpredictable turns in my subject matter which should not be modeled in a relational data structure. High on this list are the many mistakes made by early American diplomats: John Adams picking a fight with the French Foreign Minister in the middle of the Revolutionary War (subject of my Master’s Thesis), James Monroe being recalled by a furious George Washington after denouncing (accurate) rumors regarding a new treaty with Great Britain, Thomas Jefferson breaking the Law of Nations to help Lafayette write the Rights of Man and Citizen, the list goes on and on. On the other hand, while the database also fails to capture the sheer brilliance of Benjamin Franklin it does hint at John Quincy Adams’ compulsive attention to detail. None of these stories or personalities map into the database, but they are all crucial to understanding how the newly United States interacted with the larger Atlantic World.

Designing the EAFSD has sharpened my historical analysis but narrative prose blurs the edges back into the delightfully abnormal lives of the people I seek to understand.