Most of what I know about databases and data management I taught myself while working with (or thinking about the problems of) political campaigns. The rest I learned in grad school while thinking about thinking about politics. Either way, most of my professional life has been spent organizing data and trying to figure out what, if anything, we could actually know from looking at data. If nothing else, I've figured out one thing for sure: organizing the data is inextricably linked to learning anything from it.
I have always been frustrated by the lack of careful thought about the relationship between how data is stored and modeled and its intended use. You can think of this relationship as the organizational posture of data—the state of data when you are not looking at it determines how easy it is to get down to understanding data when the time comes. Awareness of posture matters just as much for business as it does in academic research and political campaigns.
Most small organizations get their data posture wrong because getting data management right used to be expensive and difficult. Worse still, if you got it wrong, you could actually make a nightmare out of trivial access to mission-critical data. Those circumstances created some bad habits.
Now data management infrastructure has matured to the point where tools for managing and collecting transactional data are readily and cheaply available. Whether browser-based database applications, e-commerce tools, point-of-sale customer identification, email marketing services, QR-codes in printed communications, or mobile computing apps, we have available to us plentiful ways to measure the mechanisms of audience engagement.
Likewise, tools and techniques for understanding and analyzing data have reached the point where high-value insights are available for the simple cost of collecting good data and managing it well. Business Intelligence (BI) tools—built on the premise that ad hoc data analysis should be easy to implement without hand-coding and complicated programming—have broken out of the Enterprise-only realm and are accessible to organizations of even modest scale.
Inexpensive transactional data management coupled with accessible BI tools may sound like data paradise but there’s a missing piece of the puzzle: the tools for managing and collecting transactional data don’t necessarily store and present data optimally for analytical tools—and they probably shouldn’t.
In technical jargon this is the difference between OLTP (online transaction processing) and OLAP (online analytical processing). Most small-to-medium organizations (especially those that are undercapitalized) will try to get by conducting analytical functions from within their transactional databases. This tactic is often manageable at small volumes but it does not scale well. The tipping point comes when the analytic potential of non-mission-critical data becomes valuable enough that it makes sense for the organization to start gathering and collecting it. That the analytical data is valuable is assumed; but if the organization pools that data, in what is likely the only data management tool available, with mission-critical transactional data then problems develop.
First, as live transactional data becomes a smaller and smaller proportion of the overall dataset; transactional performance suffers—requiring processing power or worse, users, to sort through irrelevant data to get to current customer information. Second, analytic power suffers because the organizational schema is built to optimize insertion of individual transactions rather than extraction and summarization of large-scale analytics.
The art of striking the right organizational data posture is to recognize when you are at the point of realizing serious value from data analytics. There are two things you want to avoid: (1) you don't want to be playing catch-up later against competitors who got it right and (2) you don't want to have to disentangle an intertwined mess of OLTP/OLAP databases.