I call myself a data designer because I am as interested in why a project has this data set (as opposed to some other) as I am with the data structure I’m creating or the algorithms I plan to run. I care deeply about workflows, cleaning pipelines, provenance, ethics, and security. I start every project by asking “Does your data match your questions?” (Pro tip: at the beginning of a project the answer is always “no.”)
I took a circuitous route into data design. It began in grad school when I merged my fascination with database architecture to my aesthetic sense as a photographer and started visualizing my research into eighteenth-century diplomacy with Processing and Protoviz. I then took deep dives into network and geospatial analysis but came up for air when I realized that most AI and Machine Learning folks had forgotten that garbage in leads directly to garbage out. To this day I am much happier when people tell me they are running a series of statistical methods (plus a few necessary hacks) than that they are applying “Artificial Intelligence.” Likewise, most people won’t cop to ‘automated bias reification’ so I always ask for sample training sets and outputs from whoever is selling “Machine Learning” and keep it as far away from decisions that affect human bodies as I possibly can.