Table of Contents
1 Introduction
The volume of data produced every day is on an unprecedented scale, and still growing. Businesses and governments alike have recognised the need for analysing this data and using the extracted knowledge to inform the decisions they have to make every day. This is commonly referred to as “Big Data” and touches on a whole range of different technologies.
Data and information visualisation plays an integral part in this. The ability to communicate the salient information succinctly and intuitively is becoming increasingly important. Visualisation technologies have progressed in leaps and bounds, especially when it comes to interactive visualisations on the web. The performance improvements in Javascript implementations and emerging web standards have made infographics and interactive charts possible, and indeed routine, that would have been a pipe dream only a decade ago.
2 Leveraging Web Standards
A Javascript library that has quickly risen to prominence in the last few years is D3.js. D3 stands for data-driven documents. The name says it all — the idea behind D3 is to create documents that are entirely driven by the data behind them. Note the use of “documents” rather than “visualisations” here — although the vast majority of applications that use D3 are visualisations, the library itself is more general than that. It only requires a DOM (Document Object Model) to work with. D3 leverages a number of existing web standards, such as DOM, SVG, and CSS selectors, which allow you to succinctly address the parts of the DOM you are interested in.
DOMs are present in many technologies that are used for visualisations, from humble HTML to the more sophisticated SVG to X3DOM, which allows to create hardware-accelerated 3D visualisations directly in the browser. One notable technology that does not provide a DOM is HTML 5 canvas. This does not mean that you cannot use canvas with D3, but working with it does not feel as “natural” as working with a DOM.
On top of that, D3 defines a vast array of functionality that either makes working with documents easier, aids the creation of visual elements, or helps to compute the layout of common types of visualisations. In addition to static visualisations, D3 gives special attention to animation (called “transitions”) that let you “morph” document elements to represent new or changed data easily. It is under active development and there are frequent new releases that add functionality. Further to its core functionality, there are a number of plugins available.
3 Selective Documents
At the core of D3 are the so-called selections. These are the parts of the DOM that are “matched” with the data they should represent. If, for example, you wanted to create a bar chart to visualise a list of values, each value would be represented by a bar. The D3 way is to select these bars (which may not exist yet) in your document and “bind” the data to them. The data binding process computes the so-called enter, update, and exit selections. The enter selection contains those document elements which do not exist yet, but for which there is data. The update selection contains the elements that do exist in the document and for which there is also data available. The exit selection contains the elements that are present in the document, but for which no data was given.
Almost everything is based around the selections in D3. When starting with an empty document, the enter selection tells you which elements need to be created to represent the data. When the data is changed, the update selection tells you what needs to be changed in the document, and the exit selection allows you to easily remove the parts of the document that are no longer relevant.
Starting out with D3 can be a somewhat daunting task. The data-driven model is somewhat unique to D3 and can cause headaches for people with no background in web development and seasoned programmers alike. D3 provides a vast amount of useful functionality, but the learning curve is steep and it can be frustrating to have nothing appear on the page without any error messages or other indication of what may have gone wrong. However, there are lots of resources available to make learning D3 easier.
4 Where To Start?
The starting point for anybody wanting to use D3 should be its website. There are literally hundreds of examples ranging from the almost trivial introductory bar chart to complex and interactive visualisations that have been published in high-profile places. Common visualisation types such as pie charts can be found next to choropleth maps and more exotic things such as chord diagrams and circular heat maps. Examples like the D3 show reel highlight D3′s power to transition between different types of visualisations, while the D3 mower demonstrates its flexibility and that it is not limited to visualisations as such.
In addition to a large collection of examples, there is also a comprehensive library of tutorials, again ranging from introductory example to in-depth discussion of internal details. There are also a number of books on D3 and related technologies, pointers to all of which can be found on the tutorial page above. For those who prefer a more visual introduction, there are links to talks and demonstrations given on D3.
The D3 library excels not only in terms of its functionality, but also in terms of the quality and abundance of its documentation, which played no small part in its rise to fame. For those seeking help, there is a Google Group and tag on Stackoverflow, where your humble author spends much of his time.
4.1 Standing on the Shoulders of Giants
There are number of complementary libraries that leverage D3 to offer ready-made visualisations or chart components that are easier to use and deploy than “pure” D3. Probably the most prominent one is NVD3. More recent examples include C3, Dimple, Rickshaw, and D3.Chart. D3-generator is a service that generates D3 code for your chart. There are even projects that aim to take the coding part
out completely, instead allowing you to specify everything in a configuration language. An example for this is Vega.
In the remainder of this article, I will explain some of the pitfalls that new users of D3 often encounter. Most of this material is already covered in existing tutorials, but nevertheless comes up frequently and deserves special attention. I will assume fleeting familiarity with D3; the basic concepts are explained in Mike Bostock’s excellent “Let’s Make a Bar Chart” tutorial for example.
5 Does Your Selection Contain What You Think It Does?
Most of the problems new users of D3 encounter are because of a lack of understanding of the implications of D3′s data-driven approach and the selections. This can lead to errors that are extremely hard to debug and understand as there are no error messages or other indications that anything has gone wrong.
I will illustrate the problem with a simple example — a line chart. The relevant parts of the code are reproduced below; the full code for all the examples in this article are available on vida.io (disclaimer: I am an advisor for Vida Lab, the company behind vida.io).
svg.append("g")
.attr("class", "x axis")
.attr("transform", "translate(0," + height + ")")
.call(xAxis);
svg.append("g")
.attr("class", "y axis")
.call(yAxis);
svg.selectAll("path")
.data([data])
.enter().append("path")
.attr("class", "line")
.attr("d", line);
The code is reasonably straightforward, there are no error messages, yet the line does not show up. It becomes even more confusing if we move the code to add the axes after the code to add the line. Now everything works just fine! Surely the order of execution does not matter? What’s going on?
The short answer is that the order does matter. D3 selectors operate on the DOM of the document and, depending on the state of the DOM, will have different effects. As explained above, the selected elements are divided into three groups when data is bound to them — the enter, the update, and the exit selection. In our example, these selections are different for the two execution orders.
To shed more light on what is going on, we need to explore what D3′s axis component actually does. It creates an axis that represents the domain and range of the scale that has been attached to it. There are a number of elements in the visual representation of an axis — lines and text. In particular, some of the lines are represented by path
elements. Each of those elements has the data that represents the axis bound to it. Yes, D3 does not only talk the talk, it also walks the walk — the selection and data matching approach is used extensively internally.
The code that appends the actual line selects all path
elements, binds the data for our line to them, and appends new path
elements for the data that has no representation in the DOM (the enter selection). This works as expected when there is nothing in the DOM (the case where the code to draw the line is run first), but produces nothing when there are already path
elements in the DOM (the case where the axes are created first).
The second piece needed to understand what is happening here is how D3 matches data to DOM elements. By default, it uses the index of data and elements in their respective arrays/selections. That is, the first datum is matched to the first DOM element in the selection, the second to the second, and so on. This default behaviour can be changed by providing an optional function as second argument to .data()
.
So when svg.selectAll("path")
is called with the axes already present, the selection will contain the path
s that are part of those axes. Now .data([data])
tries to match the data in the argument (here an array with a single element containing the data that describes our single line) to the elements in the selection. This is successful as it uses the index-based method and does not compare the actual data. Hence the enter selection is empty (there is no datum that is not already represented by a DOM element) and nothing happens.
The animation below shows what’s happening in the DOM as the code is executed.
Contrast this with the case where no axes and therefore no path
elements are present in the document. The selection works as expected here, selecting nothing.
Simply put, D3 is saying “the data you have given me is already represented by this path that is part of the axis, there is no need to add anything to the visualisation because of it”. This behaviour is intended and therefore there is no warning or error message. Traditional debuggers are not helpful either. Vida.io offers a D3-specific element inspector (available in Chrome only) that allows to identify the code that added a specific element. This can help to figure out what’s going on in some cases.
5.1 Correct Selections
Apart from simply reordering the statements in the code, there are in principle two ways of fixing this problem. We can either modify the selection or the data matching. The idea behind the first is that we restrict the selection to those elements that we know are relevant, rather than everything of the same structure as what we are interested in. One way of doing so is to change the selector string to take the class of the elements into account as well as the type. You may have noticed that the line is already assigned the class “line” to be able to change its appearance in CSS. We can leverage this and change the selector to svg.selectAll("path.line")
. This works as expected and appends the line as the path
elements that are part of the axis do not have this class.
An alternative approach along the same lines is to restrict the selection of elements through a subselection. DOMs are hierarchical structures and elements are only selected below the current root. By changing the root to a part of the DOM that does not contain the elements for the axis, we can avoid the selector choosing the “wrong” elements. This is done by simply appending a group element g
to contain the line.
The second approach changes the way data is matched to selection elements. That is, while the path
elements that are part of the axis are still in the selection, D3 now knows that they do not represent the line we want to draw. This is done by passing a key function to .data()
that compares not the index, but the actual data.
Of these two general approaches, the first (restricting the selection to the relevant elements) is usually better. The reason is that when changing the data matching, the path
elements for the axes end up in the exit selection. When the visualisation is updated because new data becomes available, it is easy to accidentally remove these elements and part of the axes. Of course, the two methods can be combined as well.
5.2 Selective Problems
There is a whole class of problems that are related to this. For example, the selector string does not really matter at the beginning when there are no elements in the DOM — the selection will always be empty. This may tempt into specifying just anything there, but will case problems later on when updating the visualisation. As a general rule, the selector should select those elements that are then appended for its enter selection. If, as in the example above, path
elements with class line
are appended, the selector should accordingly be “path.line”.
Another common problem is that not all of the three selections are handled. This is again not usually a problem at the beginning, but when updating the visualisation. If only the enter selection is operated on, everything will work great when setting up, but updating will have no discernible effect. As a general rule, all three selections should be handled.
6 Conclusions
The post Getting started with D3.js appeared first on .