What Shape Is Your Data? – By Rand Fitzpatrick
Regardless of the product that you are building, data collection and analysis is likely an increasingly important component. Rand Fitzpatrick, currently the Chief Product Officer at OkCupid Labs (which is the R&D offshoot of the company), has deep experience as a product innovator with a strong background and understanding of technology. Below are his insights into how to challenge yourself to more effectively shaping and framing your data needs.
Fun Fact: Data collection volume increased by 400% in 2012.
This is a valuable question to ask repeatedly during the course of product development, from concept validation to feature iteration. It might seem like a somewhat simple and abstract question, but the process of answering it often yields a number of valuable insights. At the heart of all tech products exists some collection of data, with varying degrees of centrality to the business needs. Consider the following brief examples:
A CRM product might have hierarchical and graphically connected documents (contacts’ profiles and messages) as its core data models, where the size of each document can be relatively big, but the overall collection of documents won’t likely be overly large.
An analytical tracking system might center on time-series data, often in the form of key value pairs, and will have to deal with high volumes and velocities of data.
A market-like system could have records of inventory, with attributes of the inventory made explicitly available in the data to facilitate search, counting on it to also facilitate accurate representation of availability.
A dating product would need to model the attributes of people, and make sure that the data was structured in a way to enable quick and flexible parametric matching, clustering, and filtering.
It should be clear that these hypothetical examples mention only a core type of data that is dealt with in the product, as there will be myriad others involved. Additionally, these examples have only spoken about the data in terms of very high levels of abstraction, and not touched upon the lower level details. With a few examples at hand, and a notion that there are multiple levels at which we can think about the question, we can redefine the question “what shape is your data” in the following ways:
At a high level, what are the types of information that your product or business focuses on for the creation or delivery of value?
At a more discrete level, with the various types of data separated from one another, what are the models that best represent your particular data?
Finally, at a detailed level, what is the most natural implementation form for your data, given the models and uses you’ve conceived?
Answering these questions forces you to be more clear and focused about the core flows in your product, and then encourages you to decompose those flows into mechanisms you can understand and model your data around. Once you have that more clearly articulated picture, you can think about the detailed shape of the data, and implement the systems that will manipulate and process it. Is your data always going to be a stream of constantly-sized and typed data structures? Building around the concept of processing streams of tuples might be sensible and efficient. Is your data highly variable, with each record possessing its own structural properties? Document-oriented or property-graph databases might be good abstractions for your product. Are queries against document setsa core data interaction? Inverted indexes or trie structures might make for sensible representations of some of your data.
Walking through this exercise – going up and down the ladder of abstraction – allows product developers to check their understanding of the flow of data in their market to get clearly focused models in place. They can use those models to choose the best implementations and tools to support the value of the system as a whole.
So…what shape is your data?
For more information about Rand and OkCupid Labs work please visit www.okcupidlabs.com.