Rapid Data Viz

Andrew Montalenti, CTO

What do we do?

_images/parsely.png

Parse.ly customers

_images/logos.png

Is online media special?

Websites have a variety of interesting "first-party" metrics:

Third-party metrics emerging

_images/social_icons.png

What about online journalism?

_images/pulse.png

Time series data

_images/sparklines_multiple.png _images/sparklines_stacked.png

Summary breakdowns

_images/summary_viz.png

Benchmark statistics

_images/benchmarked_viz.png

Information radiators

_images/glimpse.png

Contextual overlays

_images/extension.png

How do we do it?

_images/oss_logos.png

Parse.ly careers

_images/team_jobs.png

Agenda

Data Visualization Theory

Three people:

Edward Tufte

_images/et_dash.jpg

Tufte: Do Whatever It Takes

_images/minard.png

data-ink ratio, cognitive style, chartjunk

Bostock: Embrace Standards

_images/data_join.png

not just charts, data-document joins

Fry: It's a Process

_images/process_01.png _images/process_02.png

multi-disciplanary process, feedback loops, iteration

Chart Types (1)

_images/elements_01.png _images/elements_05.png _images/elements_06.png

Chart Types (2)

Paradox of choice?

_images/elements_02.png _images/elements_03.png _images/elements_04.png

Encoding Guide (1)

_images/viz_elements.png

Encoding Guide (2)

_images/elements_table.png

Dense Displays

_images/more_data.png

How to iterate?

_images/process_03.png

Tools for everything, but no dataviz REPL.

Or is there? Enter IPython Notebook, Pandas, the web.

pyrepl

Let's take a look at "pulse traffic time series".

_images/pulse.png

pandas

Data my browser!

CONUNDRUM: Once I have some nice, clean, time series (or other) data rendering nicely in the IPython Notebook, how do I get it rendering nicely in the browser?

Options

d3-oriented Approach

d3

Data

_images/data_set.png

Documents

_images/data_values.png

Data-Driven Documents

_images/data_highlights.png

d3 scales

var data = [1, 2, 3, 4, 5];

var width = 200;
var height = 200;

var x = d3.scale
            .ordinal()
            .domain(data)
            .rangeBands([0, width]);
var y = d3.scale
            .linear()
            .domain([0, d3.max(data)])
            .range([0, height]);
var pct = d3.scale
            .linear()
            .domain([0, d3.max(data)])
            .range([0.4, 1]);

d3 scaling

y(1.7) // -> 68px
pct(1.7) // -> 60.4%
y(4.5) // -> 180px
pct(4.5) // -> 94%
x(5) // -> 160px
x.rangeBand() // -> 40px

d3 drawing

var chart = d3.select("#container")
  .append("svg")
    .attr("class", "chart")
    .attr("fill", "steelblue")
    .attr("width", width)
    .attr("height", height)
  .append("svg:g");

chart.selectAll("rect")
    .data(data)
    .enter()
        .append("svg:rect")
            .attr("x", x)
            .attr("height", y)
            .attr("opacity", pct)
            .attr("y", function(d, i) { return height - y(d); })
            .attr("width", x.rangeBand());

Prototyping with d3

I built a tool called "webrepl" for this.

What about my data?

Need to convert Pandas DataFrame to JSON format of some sort.

Typically: data and labels.

Typically also a pain in the butt!

nvd3 add-on

nvd3 concepts

nvd3 graphs

_images/nvd3_graphs.png

nvd3 approach

Assumes a certain data format, typically an array of dictionaries (series)

var data = [
    {"key": "data",
     "values": [
        1, 2, 3, 4, 5
     ]
    }
];

The values array will become your chart series data -- can use your own structure there.

Model is basically a pre-set of d3 scales, axes, labels, and data joins.

nvd3 model

nv.addGraph(function() {
    // build nvd3 chart model
    var chart = nv.models.discreteBarChart()
        .x(function(d, i) { return i })
        .y(function(d) { return d })
            .tooltips(true).showValues(true);

    // plain d3 code to do data-document binding
    d3.select('#chart svg').datum(data)
        .transition().duration(500)
            .call(chart);

    // nv utility for refreshing graph based on window size
    nv.utils.windowResize(chart.update);

    return chart;
});

nvd3 benefit

Still supports full power of d3, but gives you a starting point

_images/nvd3_bar.png

What is Vega?

_images/vega_website.png

Vega bar example (1)

var spec = {
    "width": 200,
    "height": 200,
    "data": [
        {
            "name": "table",
            "values": [
                {"x":"A", "y":1}, {"x":"B", "y":2}, {"x":"C", "y":3},
                {"x":"D", "y":4}, {"x":"E", "y":5}
            ]
        }
    ],
    // ...

Vega bar example (2)

"scales": [
    {"name": "x",
     "type": "ordinal",
     "range": "width",
     "domain": {"data":"table", "field":"data.x"} },
    {"name": "y",
     "range": "height",
     "nice": true,
     "domain": {"data": "table", "field": "data.y"} },
    {"name": "pct",
     "range": [0.4, 1],
     "nice": true,
     "domain": {"data": "table", "field": "data.y"} }
],
// ...

Vega bar example (3)

"marks": [
    {
        "type": "rect",
        "from": {"data": "table"},
        "properties": {
            "enter": {
                "x": {"scale": "x", "field": "data.x"},
                "width": {"scale":"x", "band": true, "offset": -1},
                "y": {"scale": "y", "field": "data.y"},
                "y2": {"scale": "y", "value": 0},
                "opacity": {"scale": "pct", "field": "data.y"}
            },
            "update": {
                "fill": {"value": "steelblue"}
            }
        }
    }
]

How does Vega work?

What is Vincent?

Vincent Graphs

_images/vincent_ipynb.png

vincent

vincent example

site_stack = vincent.StackedArea(df)
site_stack.axis_titles(x='Date', y='Pageviews')
site_stack.legend(title='Sites')
site_stack.display()
_images/vincent_stacked.png

My Tools

Step Tools
acquire pymongo, solr, apache pig
parse python stdlib, custom tools
filter ipython notebook, listcomps
mine pandas
represent matplotlib, vincent, nvd3
refine d3, chrome inspector
interact d3

Offline: I use Phantom to run full stack, including d3.

Why is IPyNB so exciting?

New IPyNB dataviz utilities

Future Nirvana

My Use Cases

Authority Report

_images/authority_report.png

Extra Time?

Talk about new IPyNB comm capabilities.

Type Into Browser

Links:

  • parse.ly/jobs
  • parse.ly/authority

Contacts:

  • @amontalenti / @parsely

Questions? Tweet me!

This deck

Other resources