Things I wrote about things I made in 2016.
Since seeing Allison McCann and Mike Beuoy's Every NBA Team’s Chance Of Winning In Every Minute Across Every Game last year, I've been trying to think of ways of seeing an entire season of basketball at once. Beyond each team's average chance of winning over time, I was curious about what their distribution of chances looked like.
To squeeze distribution in, I had to make a couple of trade offs. Instead of being able to encode point differentials with vertical position like I did with my Golden State's win streak chart, I used color for point difference and saved vertical position for distribution. Since there have been more score differences (GSW was beating by MEM 52 at one point) than can be usefully encoded as unique colors, I bucketed the score differences into 7 colors. It isn't possible to see exact differences anymore without mousing over, but the charts still show when teams are ahead or behind by a lot or a little. To help convey that a collection of games are being shown, I also bucketed time into one minute segments. This hides second by second changes, but allows each game to be represented by a series of dots.
Showing an entire season - not just the first half shown here - might require using something other than dots to avoid a long vertical scroll. Stacked area charts, which also wouldn't require bucketing time, might work better? I'd also be curious to see how these charts would change if they showed winning percentage instead of the score differentials I had on hand.
I'm not sure where I first saw the key trick of this chart - showing a changing distributions by stacking colored objects. 538 and WSJ have both used the technique recently, but I wouldn't be surprised if there are much earlier examples.
Data from stats.nba.com. Scraping and chart code.
MotorIntelligence intelligence collects monthly sales figures for cars sold in the U.S. We did this as a follow up to our bigger car sales piece last year.
The interactive charts are made with D3. The Bali temple chart started as a simple bars showing the number of different models sold each year. I was curious about the distribution of sales and how sale of a model changed year to year, so we made it a bit fancier. To make it a little more visually interesting, call out some interesting info about different cars, and emphasized that models of crossovers were the unit of analysis, we also interspaced the images of cars through the small multiples on the bottom.
The two top charts were originally done with D3 and transitioned between each other on scroll. That interaction felt a little superfluous so we decided to make them static charts which also gave David a chance to clean them up a bit in Illustrator.
Businessweek put together two full page spreads interweaving charts and text about the prospect of another recession. To transform for the piece for the web, we initially considered using the new shape-outside style to wrap text around the charts. This ended up being a little too tricky to get working quickly and for different sized browsers, so we ended up going with a simpler design that put the charts into Grantland style footnotes. I wanted a shrunken down version of each chart to be used as the icon for each footnote, but all the small, different looking charts made the page look cluttered and they ended up being too tiny to see much. If each of the charts were simpler sparklines it might have worked.
Data from stats.nba.com. Charts are made with d3 - code on github.
Getting a rough version of the chart working once I had the data didn't take more than a few dozen lines of code. Progress after that was a bit slower.
I initially plotted total games v. total losses, which was a little less intuitive than total wins v. total losses. Turning the win v. loss representation 45 degrees has the nice property that x position also clearly encodes total games played.
Dealing with the rotated graph created a couple problems I didn't anticipate. The white space in the corners of the chart seemed like they needed to be filled, so I added some annotations. Positioning them in the rotated coordinated was slow and tedious. The presentation ended up in an awkward middle ground a sparse display of data (what I usually focus on) and something more like a magazine (which I don't have a ton of skill with yet).
If the chart wasn't rotated, I would have also used canvas instead of SVG to draw the games. Drawing 1000s of
rect elements can hang some browsers and probably should have been fixed. I got stuck spending too much cleaning up the rotation and wanted to publish before the game tonight though.
Data is from the FEC. https://projects.propublica.org/itemizer/ is a great alternative their pretty clunky website.
Charts are made with a bit of d3. For the last FEC release in October, we also used small multiples. This time around we decided to focus on super-PAC donations since we haven't gotten any data about them for the last 6 months. Instead of trying to cram a bunch of info into a small chart like I did last time, we went with a cleaner presentation of simple bars. Not having to worry about the display code was a nice bonus too. Committees typically release their filings late at night, so we're stuck trying trying to finish these charts after we're tired from keeping the overview table updated and dealing with the messy FEC data.
##Who Marries Whom?
Dorothy Gambrell made a full-page spread for Businessweek showing the most common marriages between occupations. The print version was too large and dense for the web, so I started with a clean slate design-wise. I wanted to try a more analytical approach and made a bunch of scatter plots looking at income, sex ratio, same-sex relationships and the distribution of other slices of data. There were some interesting tidbits, but nothing too compelling. Since the Valentine's Day deadline was coming fast, I decided to play to my strengths and go with more of a data-art take.
While exploring different ways of plotting the data, I was frustrated that the 500+ job titles took up so much space that they would only fit inside of tooltips. Which meant that reading the charts in their entirety required mousing over hundreds of points. I wanted to see if it would be possible to fit every job title on the screen, so I laid them out in a grid and shrunk the font. Once the type was small enough to see all the job titles, connecting the most common pairings with lines (just like in the print version) sort of suggested itself. Inverting the typical data/tooltip relationship — putting the data behind a mouse interaction and seeing all of the metadata normally in tooltips all at once — allows for quick scanning to find interesting jobs and serendipitous exploration.
My first byline at the nyt!
Charts are made with d3 - used v4 for the first time, with only a few minor hiccups (
.attrs isn't included in the default build; some of d3-jetpack's magic broke).
Data is mostly from the European Election Database. Finding sources for recent elections and reconciling them took way longer than I thought it would. Election data is surprisingly bad.
Idea and design draws heavily from Mira's Angry Voters insurgent parties graphic. Each of our country charts were originally much larger, but we kept squeezing and shrinking them down until all fit on a single screen — even on a phone! I also tried using the x scale to encode time directly, making the width of each rectangle proportional to the time between elections. That ended up being too busy and it wasn't clear how to size the latest election.
This started as a wonkier bubble chart plotting finals streaks over time. I sized the circles by average points score to emphasize well know players and fixed over-plotting by using a force layout. This version was a little hard for non basketball fans to read -- if you didn't already know when streaks occurred, you had to mouse over each of the bubbles to read names. Rodman, Kobe and other players with multiple streaks being represented by two bubbles was also a little confusing.
Archie Tse suggested transforming the y scale to stack each player on top each other and use grid lines instead of position to show the number of streaks. This created enough space to label each player. It also allowed us to condense the Celtics streaks into a '7+' category, which cleaned up a bunch of white space and move LeBron closer to the top of the chart.
I tried looking at other slices of basketball data to contextualize his streak. Two promising routes that I ran out of time to polish enough to publish: Strength of teams defeated in conference play (LeBron has had an easier path to the finals) and number of minutes played (LeBron has played a ton over the last 10 years).
Charts made with d3v4 (the new force module was super helpful for the bubble and beeswarm plots). Data from Basketball-Reference.
##This small Indiana county sends more people to prison than San Francisco and Durham, N.C., combined. Why? Hi! I'm one of the authors (Adam), ama.
This was the first thing I started working on at the nyt, back in April. Josh had already done a ton of leg work getting access to the data and researching interesting areas to explore.
I spent a couple of weeks exploring the data and making charts in R. NCRP has a row of data of data for almost every prisoner in America. It is personally identifying, so I had to remote into a windows vm w/o internet access to do all the analysis. States also don't report numbers to the Bureau of Justice Statistics consistently, so we had to contact individual states so figure out which numbers were accurate.
Studies of our criminal justice system typically use publicly accessible state level data from the National Prison Survey. This is significantly easier to work with, but using states as the unit of analysis masks substantial intrastate differences:
The stark disparities in how counties punish crime show the limits of recent state and federal changes to reduce the number of inmates. Far from Washington and state capitals, county prosecutors and judges continue to wield great power over who goes to prison and for how long. And many of them have no interest in reducing the prison population.
##Which Debate Clips Got Replayed the Most on CNN, Fox News and MSNBC We were curious how different channels covered the debate. Pundits are at their most influential post debate, strongly shaping how the public views the debate.
Starting with a list of replayed clips and their start times & durations, I tried bucketing clips by their start minute. It wasn't quite granular enough to see the interesting movements.
I also played with a sort of waterfall graph to show both the start and duration of clips. While a little strange, I really liked this approach and think it could have worked, we were running short on time (the data was only ready yesterday around 1 PM and there's another debate coming up soon). We ended up going with a simpler form that doesn't try to show each individual clip but still incorporates the duration data.
Charts are were made with d3, data from the Internet Archive.
##The Ways Out of World Group Stage I spent most of last weekend watching the first half of group play in worlds and I was curious which teams had the best shot of advancing to quarterfinals.
I worked on a piece for the upshot on eurocup advancement and wanted to try something similar for League of Legends.
Since there are still six remaining matches in each, there are a ton more possible outcomes. I would have liked to show more interaction between the possible outcomes. Since I couldn't come up with a good way of showing a 6d hypercube, I grouped primary by team with linked hover to give an idea of those interactions. Some kind of tree might also have worked.
##In the Race for Registered Voters, Republicans Are Gaining
Quick look at voter registration trends across a couple of key swing states. I would have liked to do more, but the some states like Virgina and Michigan don't have partisan voter registration and others don't provide monthly registration numbers.
I was a little slow putting everything together; 538 got out a similar piece a day earlier.
Checkout Gregor's thought's on his "deception"
##How Trump Can Influence Climate Change There wasn't a ton of data to work with here - just a couple of projections for 2025 emissions - so I showed some random numbers on the first slide to convey that the projections were uncertain and make the page more visually interesting. Hopefully less upsetting than the jittery dials on the election forecast page
We went back and forth a little bit on the scaling of the y axis. We didn't want domain of the axis to equal the extent of the projections since at first glance it'd look like emissions were going to 0. But starting the y axis at 0 wouldn't leave enough room to label the wedges. We ended up saving about 1/3 of the y scale for white space under the chart for the but still managed to extend the y axis to 0 (couldn't figure out how to do it on mobile though).
Data from [Climate Interactive](https://www.climateinteractive.org/blog/the-new-york-times-employs-climate-interactive-data-to-chart-out-how-trump-can-influence-climate-change