Category: DataViz Lecture Notes

3/30/2017 Lecture Notes

Quick announcement:

It’s our last class! Use this as a lab day to:

  • Work on Assignment 6 using NVD3.
  • Begin working on your final project. Feel free to run ideas by me.
  • Do your course evaluation for dataviz. Remember, it is completely confidential.

For those in New Tech for New Media:

  • We meet here next Tuesday.
  • Next Thursday is at the SU Makerspace with John Mangicaro. I will be at a conference in Italy so won’t be there.

But a couple last things:

Thanks for being part of the dataviz class!

3/28 Lecture Notes: D3 and Final Project

I. Propose your Final Project

II. D3.

What is it? Let’s see some examples and ideas for how to learn more about it.

D3 — short for Data-Driven Documents — is emerging as a popular open-source library for data visualizations. This is partly because its creator, Mike Bostock, uses it to create stunning visualizations for the New York Times as an interactive graphics editor.  Bostock is also the creator of D3. You can view a collection of D3 interactives Bostock has worked on at his personal web site.

D3 is way too advanced to get into in much detail in this course, but if you are interested you can start to go through some online tutorials on your own. But first, fair warning: If you don’t have a good base-level understanding of HTML, CSS and some Javascript you are probably going to get stuck — A LOT. If that happens to you, don’t panic because you’re not alone. Just head over to a site like Codeacademy and learn the basic skills you need there, then go back to where you got suck on D3.

Here are several tutorials to try, starting with the easiest:

III. NVD3

D3 is way too advanced to get into in much details in this 5-week course, but if you are interested you can start to go through some online tutorials on your own.

Instead, we will use NVD3, which is a set of reusable charts that are rendered using D3. In terms of the raw output, you will notice that the examples are very similar to HighCharts or other Javascript charts such as Dimple.

The difference is that by using NVD3 charts, you can start to just barely understand how D3 brings data together with HTML, Javascript and CSS to create visual experiences. Think of it like a gateway drug to D3. After going through this exercise, you should absolutely expect to feel hemmed in and wonder how you can do more. When you reach that point, that’s when you may want to jump into the tutorials above.

Getting Started With NVD3

The first step is to download the NVD3 code. Click the Zip button at the upper right corner of the NVD3 site , unzip the file and drag the resulting folder to your desktop.

Open the Examples folder, and you will see a different HTML file for each chart. The examples match up to those you see in the examples gallery. Find the file linePlusBarChart.html and drag it into an open browser window.

Let’s Examine Our Two-Axis Chart

You should see a chart with bars and lines, and two scales — one on the left, and one on the right. This type of chart is useful for comparing two types of data that may relate to each other, but are on completely different scales.   For example, in this chart they’re comparing quantities in the millions of units with costs in the low hundreds of dollars. If you tried to create a chart like that in Excel the line representing dollars would be too small to display anything. This NVD3 chart auto-scales based on the data it’s given.

Getting Data Into the Chart

As with other Javascript libraries, this one accepts data directly in HTML. Let’s edit it!

First — and this is important — make a copy of that chart file and give it an original name. You will likely get tripped up the first time and need to start over. This way, you can play around with your new file and go back to square one if needed.

Next, open the file in a text editor such as TextWranger or Sublime Text. You will see some HTML elements that should be familiar to you by now. Stop at line 49 where it says var testdata. See all the numbers below? That’s the array that the chart displays. Each point consists of a pair of numbers that determines the height of the bar.

You will also notice that there are two sets of arrays: one labeled “Quantity” and the other “Price.” Obviously, the array under Quantity controls the bars that are aligned to the scale on the left, and the array under Price controls the line that is aligned with the scale on the right.

Go back up to the Price array and look closely at each pair of numbers and you will see something like this:

     [ 1136005200000 , 1271000.0]

This data format is known as JSON (Javascript Object Notation), and you can read more about it at Json.org.

Compare those numbers with what’s in your chart, you will eventually notice that the second number matches up with quantity. Great!

But what in the world is the number on the left? Do you have any guesses?

How Computers Calculate Dates

I’ll spare you the guesswork and just tell you. Those 13-digit numbers are dates that are formatted in a way that only computers understand. This is something that you would only be able to figure out by talking to a computer programmer, or doing some very specific web searches.

This will sound really bizarre, but those numbers represent the number of seconds that have passed since January 1,  1970 (I know — it’s weird, just accept it). This is true not just of this script, but for computers in general. Have you ever had your computer crash, or moved a bunch of files from one computer to another and noticed that the creation dates of the files are set back to 1969? Now you know why. It’s because they have no date, so the computer sets them all the way back to before 1970.

This means that if you want to input a set of date-value pairs for your chart, you need to convert every date into that format. This is where Excel can help you. Open an Excel spreadsheet, or download this one, and add two columns: One with dates in MM/DD/YY format, one with the values you want to display for each date. In the third column, enter this formula:

 =(A1-DATE(1969,12,31))*86400*1000

Make sure the A1 is referencing the cell directly to the left (so if you have column headers, you would probably want that to be A2). You will see a 13-digit number. Copy that formula all the way down the column. Now you have a list of dates the script can read.

Add your data to the script

The final step involves a lot of copying and pasting, or formatting in a text editor. You want to get all of your machine-readable date and data pairs into that script in exactly the same format as what you see, and remove all of the other data. If you are very careful and have a small amount of data you may be able to do it in a first try, but it’s very likely that you will have typos.

At the end, your array should look something like this:

[ [ 1388620800000,4234] , [ 1388620800000,234] ,[ 1388707200000,53] , [ 1388793600000,3634] , [ 1388880000000,434] , [ 1388966400000,6433] , [ 1389052800000,43454] , [ 1389139200000,4354] , [ 1389225600000,54] , [ 1389312000000,34534] ]

The array under Price will look similar, but with different sets of numbers depending on the data you’re comparing.

Finally, change the text next to “Key” above each array to whatever you are comparing. Drag your HTML file into an open browser window and you should see your data in there.

(Wondering how to get a lot of data into JSON format? There are converters online, like this one.)

Upload and Embed

As with every other code-based visualization, the final step is to upload the NVD3 folder to a web server, navigate to it on the server, and put the URL into iFrame codes like this:

<iframe src="YOURURLHERE" width="100%" height="480"></iframe>

Put that code into a blog post along with a story explaining the data.

IV. NVD3, Assignment 6

Assignment 6 is to work on your own using your own data. Feel free to use one of the other charts at NVD3.org.

Due: Tuesday, April 4.

V. Remember to do course evaluation.

Please spend last 15 minutes doing it in class.

3/23/17 Lecture Notes

Start working on Assignment 5. Here are some data sources you may want to peruse.

3/21/17 Lecture notes: Sortable tables

12:30: Housekeeping:

  • Who’s finished Assignment 4? Let’s see.
  • Sundance event on Thursday/Friday!!! It will be very cool, you should come.
  • Special invite for VR Journalism workshop: 5-7 p.m. Thursday in Innovation Lab for journalists and VR creators. Please RSVP here if you plan to come so I can get enough pizza:

12:45: Sortable data tables

  • Tabletop, what I used in the first class. Displays days from Google spreadsheets.
  • See how one past student used it in his final project, near the bottom.
  • Some coders, like Chris Keller, have modified it for journalism. Here’s an example of a sortable, searchable table.
  • Where do you get the code? Github of course!
  • Download the code. Then …

1-1:30: How to make it work

  • Make a Google spreadsheet.
  • Give it a name with no spaces. Change the tab label to the same name, also with no spaces.
  • Share it, and make it public to the web.
  • Publish it (File > publish to the web.) Yes you must do both of these to make the data available.
  • Take the URL and save it somewhere.
  • Download the Tabletop for Data Tables code and unzip it. Note the files you need is in a second zip inside the first one.
  • Open the Scripts folder, then open tabletop-feed.js in Sublime Text.
  • At the very top, find initializeTabletopObject and put your URL into the place after it, being careful not to remove any necessary syntax.
  • Remove everything except for the “key” which is the long list of characters near the end, before /pubhtml.
  • Scroll down to what you see in var tableColumns (around line 28).
  • Look at your spreadsheet column names. Copy and paste them into the field after “sTitle”. This is what will be displayed in the column on your site.
  • Then, in the field next to mDataProp, paste the same name, but remove all capitals and spaces. This is the variable that pulls from the Google sheet.
  • Finally, you will need to remove some logic at the end that makes the last column display a web site link. We will go over how to do this, and what the code means, in class.
  • Edit the top of the HTML file to remove the sample text, then preview the HTML. You should see a searchable sortable table.
  • The last step? Upload into the class FTP and embed.

1:30-1:50. Phew, that was a lot!

So now work on it and try and fail and try and fail and try again until you can get a table to appear. I will go around to help you 🙂

Feel free to use data from previous exercises or assignments, such as New York State bridges or any data from the NY State open data site.

 

 

3/9 Lecture Notes: High Charts Part 2

12:30-1:15 High Charts, part 2

  • Downloading and modifying locally.
  • Previewing and preparing for upload.
  • Embedding after upload.
  • See if you can recreate this chart on Business Insider.

1:15-1:50 Lab

3/7 Lecture Notes: High Charts

12:30-12:50: Assignment 3 review

Remember: you need to embed the code from timeline.knightlab.com, not the Google Spreadsheet.

A few good ones:

Who rocked it with code?

12:50-1:20: High Charts

  •  Highcharts demonstration.
  • Downloading code
  • Finding a chart you like
  • Customizing the data
  • Load locally
  • Upload to FTP
  • Embed in a blog post

1:20-1:50: Make a sample chart

Reminder of how to FTP and embed:
http://journovationsu.org/class-ftp/

3/2 Lecture Notes: Timelines and StoryMap

SAD UPDATE: CartoDB seems to have gone through a significant upgrade that removes some of the most useful features that work for journalistic stories, so we are (sadly) removing it from the schedule completely. It’s too bad because it used to be a really great mapping tool.

If you made a map using their tutorials, send me a link and I will give you extra credit. Meawhile, please see the revised class schedule.

Moving on ….

12:30-12:45: Let’s look at timelines!

12:45-1: StoryMap

1-1:20: Another type of Timeline, this one vertical using open source code.

1:20-1:50: Work on your timeline or story map for assignment 3.

  • Assignment 3 is to make a timeline or storymap about a story you are interested in. Embed it in the class blog.
  • For extra credit, created a vertical timeline, FTP it into the class server and embed that into your post instead.

 

2/28 Lecture Notes: No-Code Dataviz tools, CartoDB

12:30: Housekeeping

12:31-12:50: Guest: Jodi Upton, Knight Chair in Data and Explanatory Journalism.

12:50-1: Your Infogr.ams (assignment 2).

1-1:40: Other no-code dataviz tools

But first, HTML! You can go through these three self-guided HTML teaching tools on your own.

Simple No-Code Dataviz Tools

Here are the ones we will go over today:

1) Infogr.am (http://infogr.am)

Infogr.am is an easy way to create simple charts and graphs, as well as scrolling infographics that you may notice people posting in places like Facebook and Tumblr. For journalism I think the graphs and charts work best, because you can embed them directly into stories to visually explain something you are reporting in text.

Take note of the Graphs and Charts tab at the top. Click the Charts tab to see all the different types of charts you can use.  Choose your visualization type, double click diferent parts of the interface to edit them, and copy and paste your data in. If you have trouble copying data from a web site, try starting from a summary sheet you make in Excel.

Click “Share” and copy the iFrame code at the bottom to embed into the blog. If you find that your chart is too wide for the blog post, you have two choices.  You can manually change the width and height variables in the HTML code you copied, being very careful not to change anything other than those numbers, or you can try the “responsive” code which will make your chart shrink or stretch based on the width of the page where it’s embedded. The second option is good if you think your chart will be viewed by people on mobile devices, but be sure to test it out from a mobile device to be sure.

Here’s an example of a chart I made in Infogr.am using data from a previous exercise.

Cardiac Deaths in Central New York Hospitals in 2010 | Create infographics

 

 

 

2) Easel.ly (http://easel.ly)

Think of Easel.ly as a quick infographic creator. You find a template you like, then start to manually edit it and add graphics from a built-in library. Charts can also be added and edited as spreadsheets, similar to Infogr.am.

 

3) Google Fusion Tables (http://fusiontables.google.com)

Google Fusion Tables  turns columns and rows of numbers in spreadsheets into visualizations. Once signed in, import a spreadsheet in .xls or .csv formats (not not .xlsx, which is Microsoft’s proprietary format). Make sure your spreadsheet has column headers, or it won’t work.

Sometimes Google will add a tab that it calls a “card” that is the best choice for your data — for example, a “Map of latitude” will appear if your data includes geocoordinates. If you see a card that works for you click it and see how it appears. If not, click the + sign to the right of that tab and choose Add a Chart. Click Done when your chart is set up the way you want it.

To publish your chart, you have to do two things:

1. Click the Share button at the very upper right of the browser, then “Change” next to Private under “Who can Access.” Select “Public on the Web” and then Save and Done.

2. Go to the Tools menu and choose Publish. You will see iframe tags here that you can embed in your blog post.

Here’s an example of a chart from Google Fusion Tables:

 

4) StoryMap (https://storymap.knightlab.com/)

From the Knight Lab at Medill, StoryMap lets you tell a story that’s broken up by points on a map. You can also use it to tell a story that moves across something that isn’t a map at all, such as a very detailed painting. Think of it like a timeline that takes place on a gigantic picture.

 

5) Google Maps (http://maps.google.com)

Quick overview of how Google Maps work if you don’t know already. Sign into Google, then choose the three lines at upper left and choose Your Places. Click Maps, then Create a Map. To embed it, you must first click the Share button in edit mode and change the access to “Public on the Web.” Then you can choose Embed.

 

6) CartoDB – Prepare for Thursday (http://cartodb.com)

On Thursday we will crack open CartoDB, which is a powerful mapping service that lets you tweak some of the interface and mess with a code a little. You can think of it as the gateway drug to other dataviz tools we will use that do require you to mess around with code. You can get a head start on how to use CartoDB through these free video tutorials on their web site.

Today I will show how we can create a map of the bridges in New York State that are in need of repair.

 

2/23: Dataviz Class 2 Notes

Housekeeping

I. Excel
First we’ll go through some basic features of Excel and how to create formulas.

  • Adding information as data versus text.
  • When to save as XLS and CSV, and the difference.
  • Columns and rows.
  • Formula: using the equal sign for functions. Basic math.
  • Sum columns or rows.
  • Format cells to change cell type (text, number).
  • Making simple charts in Excel.
  • Adding formulas. Common formulas:  adding, subtracting, dividing, multiplying, summing.
  • Sorting and the importance of noting your list has headers.
  • Filtering.

II. Sorting and filtering NYS bridge data

  • Download and unzip this data set of 51,000 bridges in New York State. bridges_blanksremoved.csv.
  • Open it in Excel. Choose File > Save As > XLS.
  • Scroll right until you find the column “critfrac” (column DL) which stands for critical fracture. A y12 or y24 means outdated design, so a single solid hit can bring the entire bridge down.
  • Next, find the column “suffrtno” in column FC, which stands for Sufficiency Rating. Anything under 50 is considered dangerous.
  • Also note the “totlcost” (total cost to fix in thousands of dollars) in column DV, and “avdayno” (average daily traffic) in column AK.
  • Questions:
  1. How many bridges are in danger of collapsing due to critical fracture?
  2. How many bridges have an inadequate sufficiency rating?
  3. How many have both bad critical fracture and sufficiency rating numbers?
  4. How much traffic goes over the bridges with both bad critfrac and suffrtno ratings?
  5. How much will it cost to fix the bridges with both bad critfrac and suffrtno ratings?Go through this yourself, then let’s review the answers and how to get them.

III. Acquiring Data

  • New York State public data.
  • Getting data: formats to look for (CSV, JSON).
  • What if you get a big, fancy Excel document? How to dumb it down to a CSV.
  • Copying and pasting data as values versus as formulas.
  • Copying data from HTML tables.

IV. Sorting and Filtering Data

  • Show how to sort and filtering data in Excel.

V. Sorting and Filtering Exercise

Assignment 2: Find some interesting data from the New York State open data site. Use sorting and filtering to hone in on some interesting and easily digestible data points that could be used in a story. Create a graph of the data you find in Infogr.am. Create a blog post that includes a link to the raw data from the NYS site, explain how you filtered it, and embed the Infogr.am chart into the post. Due Monday morning.

Dataviz – Class 1 Notes

Tues 2/21 Class: Welcome!

I. Welcome! An Introduction to using data visualization to tell stories.
– Welcome presentation

– Let’s look at a few more dataviz examples.

II. Class Blog, Twitter hashtag
Most of the assignments will be filed by embedding widget code into the class blog at http://journovationsu.org. And you have an account! Walkthrough of how it works.

IIIi. Exercise: And now for a little magic. Introduce yourself through data!

1. Open this URL. How did that data get in there? Let’s find out together.

2. Google yourself and find an image that is publicly available on the web. Right-click the image and get the URL.

8. Go to this Google spreadsheet and fill out your info. Put your image URL into the correct column, and put this code around it:

     <img src=”YOURURLHERE” width=”200″ />

(Note: type this in, don’t copy and paste from this page).

9. Take a look at this URL or the home page to see class data populating in real time.

Congrats! You Participated in a Data Visualization
You not only introduced yourself to the class, but you participated in your very first interactive data-driven visualization. The data is all in a Google spreadsheet, and some free Javascript and JQuery code called Tabletop.js that we will use in a future class pulls all of that data into a web page. Try changing any of your information and you will see that the public web roster updates in real time.

IV. Excel

We will go through some basic features of Excel, and formulas.

  • Adding information as data
  • Add a formula
  • Columns and rows.
  • Formula: using the equal sign for functions. Basic math.
  • Sum columns or rows.
  • Select an area.
  • Format cells to change cell type (text, number).
  • Making charts in Excel.
  • Common formulas:  adding, subtracting, dividing, multiplying, summing.
  • CSV format versus native Excel format.
  • Sorting.
  • Filtering.

VI. Putting it into practice

  • Bridges across the country are badly in need of repair, and it can literally be a life or death issue. Here’s more about that.
  • Download and unzip this data set of 51,000 bridges in New York State. bridges_blanksremoved.csv
  • Open it in Excel. Scroll right until you find the column “critfrac” which stands for critical fracture. A y12 or y24 means outdated design, so a single solid hit can bring the entire bridge down.

Assignment 1: Register for the class blog, fill out this survey. Due before next class.

Assignment 2: To be given Thursday.