From Data to Understanding Data Visualisation

In Richard Saul Wurman’s book Information Anxiety (Doubleday), he describes how the New York Times on an average Sunday contains more information than a Renaissance-era person had access to in his entire lifetime.

This is an exciting time. A smart phone has thousands of times more computing power than the first computers used to tabulate complex data projects such as census or electoral results. The capability of modern machines is astounding. Performing sophisticated data analysis no longer requires a research laboratory, just a cheap machine and some code. Complex data sets can be accessed, explored, and analyzed by the public in a way that simply was not possible in the past. It is a massive opportunity that needs to be taken advantage of.

Data visualisation is a buzzword that everyone is fond of, yet few have invested in it to truly capture the opportunity mentioned above.

If you think about it, data visualization is simply not the visual element that we have so come to accept. It is in real a sophisticated process of translating large and extensive data sets and metrics into charts, graphs and other visuals.

Data visualization has a dependent and cause and effect relationship with how data is acquired and managed. Similar to how we don’t depend on our eyeballs alone to see, we need the entire neural network working robustly between the brain and the eye.

Each set of data has particular display needs, and the purpose for which you’re using the data set has just as much of an effect on those needs as the data itself. Add on to this the fact that we are accustomed to thinking about data as fixed values to be analyzed, but data is a moving target. How do we build representations of data that adjust to new values every second, hour, or week? This is an essential fact to consider because most data come from the real world, where there are no absolutes. The temperature changes, the train runs late, election winners change or a product launch causes the traffic pattern on a web site to change drastically. Both the display needs of data sets and real-time nature of data makes it a very alive and self-evolving creature.

These realities have to be engaged as we build a structure which starts with data and moves to understanding data which in return is influenced by incoming data – sort of a fluid cycle. Visual representation of data has to identify and share real-time trends, outliers, and new insights about the information represented in the data.

Visualization is a process, not the end in itself.

The process contains 8 steps.

Acquire

In simple words obtain data, whether from a file on a server, information from a pdf or a source over the internet.

The first step acquisition concerns how we obtain the data in the first place. If the visual element has to be distributed over the Internet, then we design the application keeping in mind the requirements of digital representation and its advantages, at the same time if we are considering a print medium, then the overall opportunities are limited and the limitations need to be understood. At this stage, it is necessary to structure the data to facilitate retrieval of common subsets. A tricky, but an indispensable task.

Parse

Provide some structure for the data’s meaning, and order it into categories.

Once we acquire the data, it needs to be parsed—changed into a format that tags each part of the data with its intended use. Then, each piece of data needs to be converted to a useful format.

Filter

Remove all, but the data of interest.

The next step involves filtering the data to remove portions not relevant to immediate use. Filters reduce the amount of data shown in visualizations, canvases, and projects. Helps us to bring in more focus into our project. Filtering defines visualisation and there are 3 kinds-

Range filters: Range filters are applied to data elements to limit data to a range of contiguous values, such as revenue of $100,000 to $500,000. Or you can create a range filter that excludes (as opposed to includes) a contiguous range of values.

List filters: Applied to data elements that are text data types and number data types that aren’t aggregatable.

Date filters: Use calendar controls to adjust time or date selections. You can either select a single contiguous range of dates, or you can use a date range filter to exclude dates within the specified range.

Mine

Here we apply methods from statistics or data mining as a way to discern patterns or place the data in a mathematical context.

Represent

Choose a basic visual model, such as a bar graph, list, or tree.

The Represent stage is a linchpin that informs the single most important decision in a visualization project and can make you rethink earlier stages. How you choose to represent the data can influence the very first step (what data you acquire) and the third step (what particular pieces you extract).

Refine

Improve the basic representation to make it clearer and more visually engaging.

In this step, graphic design methods are used to further clarify the representation by calling more attention to particular data (establishing hierarchy) or by changing attributes (such as color) that contribute to readability.

Interact

Add methods for manipulating the data or controlling what features are visible.

The next stage allows viewers control or explore the data. Here a programmer is brought in to enable Interactions that might cover things like selecting a subset of the data or changing the viewpoint. As another example of a stage affecting an earlier part of the process, this stage can also affect the refinement step, as a change in viewpoint might require the data to be designed differently.

Iteration

The last step before the cycle continues. The connections between the steps in the process illustrate the importance of the individual or team in addressing the project as a whole. This runs counter to how visualization is commonly understood- as a siloed task that is generally associated with a data gatherer and graphic designer. At the end of interaction- we learn that building a robust and timeless data visualization project will require us to revisit previous steps and continuously update.

Popular culture references of a sales meeting or a presentation give us the image where a professional presents a chart of scatterplots to provide a decisive insight. In reality, the process is deep and iterative. Ready-made visualizations can help produce a quick view of your data set, but they’re inflexible commodity items that can be implemented in packaged software. Any bar chart or scatter plot made with Excel will look like a bar chart or scatter plot made with Excel. Packaged solutions can provide only packaged answers, like a pull-string toy that is limited to a handful of canned phrases, such as “Sales show a slight increase in each of the last five years!”

As the amount of big data increases, data visualization processes that can use them to generate insights for the viewer’s computer and on mobile devices is a multimillion-dollar opportunity that requires a structure-driven engagement.