2/15/2017: Factba.se and Leaflet
February 15, 2018
12:30-1: Bill Frischling of Factba.se
- What is it? Check out the Leaflet.js site.
- Why use this instead of Google Maps?
- Take a look at the map I created this morning. It is embedded below, too.
- Let’s go through the tutorial I used to make this map. Please try to follow along!
- Start by downloading this simple HTML template and open it in sublime.
- Note: you may need to use this div code instead to make the map appear:
- You may need to get a Mapbox ID, then register for a token. Put that token into the “accessToken” part of the code when you get to that step. For now, try using this one: pk.eyJ1IjoicGFjaGVjb2QiLCJhIjoiLWk3WnpUTSJ9.MoCPhEjLDP6_wyjHjFDIvg
- Things not working? Feel free to work backwards from this workingmap.html.
1:30-1:50: Work on last few assignments.
- Assignment 6: NVD3: due Sunday night
- Assignment 7: Leaflet: due Sunday, Feb. 25
- Assignment 8: Propose your final project: due Thursday, Feb. 22
Also, don’t forget ….
- Independent learning blog post: 20 points of grade!!!
- Grad students must publish a data visualization professionally somewhere, OR create a web site on your own domain name and web host and post your final project there.
Here is the example map from above embedded into a post.
2/13/2018 Lecture Notes: D3 and NVD3
February 12, 2018
- Course evaluations: please make sure you do them. Unfortunately you have to do one for each of the 3-pack classes. This is a pain, but I really value your feedback and remember it is all anonymous.
- #NHData Finds.
- Review of assignment 5.
- Where did you fail? How did you get out of the sand traps? What did you learn?
- A few great examples:
- Amanda Dominguez: Drug overdoses
Zachary: Fake News Twitter accounts
Tiffany: Domain names of Donald Trump
Maria: Restaurant health code violations. Excellent work with both the map and the chart!
What is it? Let’s see some examples and ideas for how to learn more about it.
D3 — short for Data-Driven Documents — is emerging as a popular open-source library for data visualizations. This is partly because its creator, Mike Bostock, uses it to create stunning visualizations for the New York Times as an interactive graphics editor. Bostock is also the creator of D3. You can view a collection of D3 interactives Bostock has worked on at his personal web site. Here are a few good ones:
- Midterm election maps from 2014
- Interactive stories from the last Winter Olympics
- Tax rates across the country
- See examples of different types of visualizations here.
Here are several tutorials to try, starting with the easiest:
D3 is way too advanced to get into in much details in this 5-week course, but if you are interested you can start to go through some online tutorials on your own.
Getting Started With NVD3
The first step is to download the NVD3 code. Click the Zip button at the upper right corner of the NVD3 site , unzip the file and drag the resulting folder to your desktop.
Open the Examples folder, and you will see a different HTML file for each chart. The examples match up to those you see in the examples gallery. Find the file linePlusBarChart.html and drag it into an open browser window.
Let’s Examine Our Two-Axis Chart
You should see a chart with bars and lines, and two scales — one on the left, and one on the right. This type of chart is useful for comparing two types of data that may relate to each other, but are on completely different scales. For example, in this chart they’re comparing quantities in the millions of units with costs in the low hundreds of dollars. If you tried to create a chart like that in Excel the line representing dollars would be too small to display anything. This NVD3 chart auto-scales based on the data it’s given.
Getting Data Into the Chart
First — and this is important — make a copy of that chart file and give it an original name. You will likely get tripped up the first time and need to start over. This way, you can play around with your new file and go back to square one if needed.
Next, open the file in a text editor such as TextWranger or Sublime Text. You will see some HTML elements that should be familiar to you by now. Stop at line 49 where it says var testdata. See all the numbers below? That’s the array that the chart displays. Each point consists of a pair of numbers that determines the height of the bar.
You will also notice that there are two sets of arrays: one labeled “Quantity” and the other “Price.” Obviously, the array under Quantity controls the bars that are aligned to the scale on the left, and the array under Price controls the line that is aligned with the scale on the right.
Go back up to the Price array and look closely at each pair of numbers and you will see something like this:
[ 1136005200000 , 1271000.0]
Compare those numbers with what’s in your chart, you will eventually notice that the second number matches up with quantity. Great!
But what in the world is the number on the left? Do you have any guesses?
How Computers Calculate Dates
I’ll spare you the guesswork and just tell you. Those 13-digit numbers are dates that are formatted in a way that only computers understand. This is something that you would only be able to figure out by talking to a computer programmer, or doing some very specific web searches.
This will sound really bizarre, but those numbers represent the number of seconds that have passed since January 1, 1970 (I know — it’s weird, just accept it). This is true not just of this script, but for computers in general. Have you ever had your computer crash, or moved a bunch of files from one computer to another and noticed that the creation dates of the files are set back to 1969? Now you know why. It’s because they have no date, so the computer sets them all the way back to before 1970.
This means that if you want to input a set of date-value pairs for your chart, you need to convert every date into that format. This is where Excel can help you. Open an Excel spreadsheet, or download this one, and add two columns: One with dates in MM/DD/YY format, one with the values you want to display for each date. In the third column, enter this formula:
Make sure the A1 is referencing the cell directly to the left (so if you have column headers, you would probably want that to be A2). You will see a 13-digit number. Copy that formula all the way down the column. Now you have a list of dates the script can read.
Add your data to the script
The final step involves a lot of copying and pasting, or formatting in a text editor. You want to get all of your machine-readable date and data pairs into that script in exactly the same format as what you see, and remove all of the other data. If you are very careful and have a small amount of data you may be able to do it in a first try, but it’s very likely that you will have typos.
At the end, your array should look something like this:
[ [ 1388620800000,4234] , [ 1388620800000,234] ,[ 1388707200000,53] , [ 1388793600000,3634] , [ 1388880000000,434] , [ 1388966400000,6433] , [ 1389052800000,43454] , [ 1389139200000,4354] , [ 1389225600000,54] , [ 1389312000000,34534] ]
The array under Price will look similar, but with different sets of numbers depending on the data you’re comparing.
Finally, change the text next to “Key” above each array to whatever you are comparing. Drag your HTML file into an open browser window and you should see your data in there.
(Power Tip: Wondering how to get a lot of data into JSON format? There are converters online, like this one. If you use it, upload a CSV, choose commas as the separator, make sure “First row is column names” is selected, and choose CSV to JSON Array as the output.)
Upload and Embed
As with every other code-based visualization, the final step is to upload the NVD3 folder to a web server, navigate to it on the server, and put the URL into iFrame codes like this:
<iframe src="YOURURLHERE" width="100%" height="480"></iframe>
Put that code into a blog post along with a story explaining the data.
IV. A data set to work with
See if you can use sorting and filtering with this data set on causes of death in NYC to generate a comparison graph in NVD3 comparing two different factors (men vs. women, different causes, etc.)
V. NVD3, Assignment 6
Assignment 6 is to work on your own NVD3 chart using your own data. Feel free to use one of the other charts at NVD3.org.
2/8/2017 Lecture Notes
February 8, 2018
The rest of class is lab time for you to work on assignment 4. Instructions can be found in Tuesday’s notes. Meanwhile, if you are interested you can learn from me about how to create your own web site on your own web server.
2/6 Lecture Notes: Searchable data tables
February 6, 2018
- Assignment 4: What problems did you encounter? How did you fix them?
- Goood examples:
- Tiffany: does some chocolate taste better than others?
- Mike Light: Lord of the Rings characters by screen time
And with customization. How did he do it?
Should have some sources in the story though
- Amanda Dominguez: college tuition
Also changed code to make the colors orange.
- Connor: top 5 quarterbacks
- CB: NCAA story
- Zachary. Pakistan refugees
12:45: Sortable data tables
- Check out this one I made this morning:
- How did I do it? It uses Tabletop, which we used in the first class. Tabletop can be used to display data from Google spreadsheets.
- See how one past student used it in his final project, near the bottom.
- Some coders, like Chris Keller, have modified Tabletop for journalism. Here’s an example of a sortable, searchable table.
- Where do you get the code? Github of course!
- Download the code. Then …
1-1:30: How to make it work
- Make a Google spreadsheet. I’ll use this one about cardiac incidents in NY state hospitals.
- Give it a name with no spaces. Change the tab label to the same name, also with no spaces.
- Share it, and make it public to the web.
- Publish it (File > publish to the web.) Yes you must do both of these to make the data available.
- Take the URL from the top of the browser (not what is in the Publish to the Web window) and save it somewhere.
- Download the Tabletop for Data Tables code and unzip it. Note the files you need is in a second zip inside the first one.
- Open the Scripts folder, then open tabletop-feed.js in Sublime Text.
- At the very top, find initializeTabletopObject and put your URL into the place after it, being careful not to remove any necessary syntax.
- Remove everything except for the “key” which is the long list of characters near the end, before /pubhtml.
- Scroll down to what you see in var tableColumns (around line 28).
- Look at your spreadsheet column names. Copy and paste them into the field after “sTitle”. This is what will be displayed in the column on your site.
- Then, in the field next to mDataProp, paste the same name, but remove all capitals and spaces. This is the variable that pulls from the Google sheet.
- Finally, you will need to remove some logic at the end that makes the last column display a web site link. We will go over how to do this, and what the code means, in class.
- Edit the top of the HTML file to remove the sample text, then preview the HTML. You should see a searchable sortable table.
- The last step? Upload into the class FTP and embed.
1:30-1:50. Phew, that was a lot!
So now work on it and try and fail and try and fail and try again until you can get a table to appear. I will go around to help you 🙂
To practice, feel free to use data from previous exercises or assignments, such as New York State bridges, cardiac incidents or any data from the NY State open data site.
Looking ahead to Assignment 5
Assignment 5 will be to publish a chart using whatever tool you want, with a larger data set used for the story available in a searchable, sortable data table.
And the extra credit will be to figure out how to add more than three columns.
2/1 Lecture Notes: High Charts, continued
February 1, 2018
- Possible data set for final projects is Factba.se. See the Jimmy Kimmel piece on it.
- Any cool #NHData finds?
- 2 points extra credit forAssignment 4. is toModify the code in your Highcharts example to change the colors of of bars, use different images or icons, or make other customizations that are not configurable variables in the code library. Hint: it’s not in the HTML! If you attempt this, be sure to post something in the Extra Credit 2 assignment in Blackboard and explain what you did.
- Alberto Cairo, a dataviz expert and author of The Functional Art, will be here for a workshop Friday, March 30. Please plan to attend! More details soon.
12:45-1:10: High Charts review
- I’ll go through the entire Highcharts process one last time.
- Downloading code
- Finding a chart you like
- Customizing the data
- Load locally
- Upload to FTP. Reminder of how class FTP works.
- Embed in a blog post
1:10-1:50: Work on your assignment
1/30 Lecture Notes: High Charts
January 30, 2018
12:30-12:55: Review your work
- Any cool #NHData finds?
- Assignment 3: good examples
- Common problems
- Not telling a story, jumping right into the chart
- Embedding data only and nothing visual
- Not including a link to the source data
- Who rocked it with code for extra credit?
12:55-1:30: High Charts
- Highcharts demonstration.
- Downloading code
- Finding a chart you like
- Customizing the data
- Load locally
- Upload to FTP
- Embed in a blog post
1:30-1:45: Create a sample High Chart
Every year, five cities in the northeast compete for the Golden Snowball Competition, and Syracuse is one of them. Which one is most likely to win based on historic snowfall? Let’s use data from Weatherba.se for context. See if you can create the chart below based on data from Weatherba.se.
- If you have extra time, see if you can recreate this High Chart on Business Insider.
1:45-1:50: Assignment 4
- Start working on Assignment 4.
Reminder of how to FTP and embed:
1/25/2018 Lecture Notes
January 25, 2018
12:40 and on…
Continue working on Assignment 3 using one of the no-code visualization tools we reviewed Tuesday (see the lecture notes from Tuesday for more info).
For 5 points extra credit, you can follow these instructions for how to create and upload a timeline using code.
- Walk through Lisa Williams’ Absurdly Illustrated Guide to creating a timeline.
- NOTE: When you get the “key” after going through Publish to the Web, don’t use the URL Google gives you in the confirmation window. Instead, use the URL at the top of the browser and get the “key” from that.
- For example, I have a spreadsheet at this URL: https://docs.google.com/spreadsheets/d/1-eEocAeK92RgAdhA_h2D07cagKJqJcRkNUheshf5LdY/edit#gid=0
The “key” for that is 1-eEocAeK92RgAdhA_h2D07cagKJqJcRkNUheshf5LdY/edit#gid=0
- Also, be sure to change the title of “Balanced Media’s Timeline” to your own. You do this by editing the index.html file, finding that title and changing it to your own.
- When you have it working on your computer, use these FTP login credentials to FTP it to the server and embed. It will ultimately look like this.
1/23/2018 Lecture Notes
January 23, 2018
- Feedback on Jodi Upton?
- Look through Class Finds and Twitter hashtag #nhdata.
- Class survey! If you got a zero, you can do it today and still get partial credit.
- Submit URLs in Blackboard! (Let’s look at it again).
- And how do you embed?
- Assignment 2: Good examples. And a couple critiques.
12:45-1:30: More no-code dataviz tools.
Today we will be using free data visualization tools that require hardly any coding at all. Feel free to use any of these in your final project. We will also begin creating maps in CartoDB.
1) Infogr.am (http://infogr.am)
You’ve already used Infogr.am, but here’s an example of a tabbed chart that breaks things out in a nice step by step way.
2) Google Maps (http://maps.google.com)
Google Maps can be easily embedded into web sites using the My Maps feature.
First, sign into your Google account, then go to http://maps.google.com. Click the “hamburger” icon at the top left labeled Menu when you hover over it, then choose Your Places, then the Maps tab, then the Create Map button.
Search for a place on the map and click Add to Map in the info box for that location. You can edit the information on the map by clicking the pencil icon. You can add images or videos to the placemarks by clicking the camera icon.
Now, change the base map, which controls the look and feel of the map. Click Base Map on the left and choose a different layer.
When you’re ready to publish the map, click Share and then change the settings from private to public.
Finally, embed the map in a blog post. Click the three dots at the upper right and choose Embed this Map. You will get embed codes.
Go to the class site and create a blog post. Click on the Text tab of the posting screen and paste your embed codes in there. The end result should look like this:
3) Google Fusion Tables (fusiontables.google.com)
Google Fusion Tables turns columns and rows of numbers in spreadsheets into visualizations. Once signed in, go to Google Docs (drive.google.com) and create a spreadsheet. In Google Fusion Tables, create a table and choose the Google Doc you created as the source. You will need to fiddle with the settings to make sure it’s grabbing what you want.
On the next screen, click the Plus sign to the right of Cards and choose a visualization. Fiddle around with the appearance settings to get everything as you want it, then click Done.
To publish your chart, you have to do two things:
1. Click the Share button at the very upper right of the browser, then “Change” next to Private under “Who can Access.” Select “Public on the Web” and then Save and Done.
2. Go to the Tools menu and choose Publish. You will see iframe tags here that you can embed in your blog post.
From this point onward, that table will automatically update as the data in your Google spreadsheet changes.
Here’s an example of a chart from Google Fusion Tables:
4) Maps from Fusiontables.
- Go through this tutorial.
- Check out the latitude and longitude fields in this spreadsheet.
- (One change: choose New > More > Google Fusion Tables.)
6) Timelines and Storymaps.
- Check out some cool Timelines.
- Demonstration: creating a timeline.
- Demonstration: creating a StoryMap.
See a list of other no-code tools in the Tools tag of the class Diigo group, and Professor Barbara Fought’s JTools web site.
8) Assignment 3: Assignment 3 is to create a data visualization using one of the no-code tools above to tell a story using data that you find and analyze yourself.
Note: on Thursday we will take a look at a coding-based timeline example. If you make a timeline with that instead of the Knight timeline, I’ll give you 5 points extra credit.
1/18/2018 Lecture Notes
January 18, 2018
Welcome back! Professor Pacheco is out of town today. You should begin the class working on Assignment 2, which is to find some data from the New York State data site and tell a story with it using Instagram.
At 1 p.m., Professor Jodi Upton, our Knight Chair in Data and Explanatory Journalism, will arrive to give a guest lecture about data journalism.
1/16/2018 Lecture Notes: Welcome to Dataviz!
January 11, 2018
Tues 1/16/2018 Class: Welcome!
12:30-12:50: Welcome! An Introduction to using data visualization to tell stories.
- Quick roll call.
- Before next class, please register for the class blog (follow instructions from an email the system sent you) and fill out this survey.
- Review how the class site is organized.
- Walkthrough of the syllabus and schedule.
12:50-1: What is Dataviz?
- Some things can be better understood by seeing or exploring. Here are some good examples:
- News you can Use: Is it better to buy or rent?
- Understanding complexity. Which Supreme Court justices agree or disagree?
- Understanding behaviors. Where do voters in NYC live?
- Understanding sentiment. Emotional arcs during presidential addresses.
- Understanding geographic distribution. Perentage of people without health insurance by U.S. county.
- Understanding our world. http://Hint.fm/wind
- Having fun! Here are some visualizations about where in the world people have the best and worst sex and recurring themes in Arrested Development.
- More examples: Here are some more I have collected, and on the #Dataviz Twitter hashtag and Data is Beautiful Reddit.
- And how about some really bad examples!
1-1:15: Exercise: And now for a little magic. Introduce yourself through data!
- Open this URL. How did that data get in there? Let’s find out together.
- Google yourself and find an image that is publicly available on the web. Right-click the image and get the URL.
- Go to this Google spreadsheet and fill out your info. Put your image URL into the correct column, and put this code around it:
<img src=”YOURURLHERE” width=”200″ />
(Note: type this in, don’t copy and paste from this page).
- Take a look at this URL or the home page to see class data populating in real time.
Congrats! You Participated in a Data Visualization
We will go through some basic features of Excel, and formulas.
- Adding information as data
- Add a formula
- Columns and rows.
- Formula: using the equal sign for functions. Basic math.
- Sum columns or rows.
- Select an area.
- Format cells to change cell type (text, number).
- Making charts in Excel.
- Common formulas: adding, subtracting, dividing, multiplying, summing.
- CSV format versus native Excel format.
1:25-1:40: Putting it into practice: Sorting and filtering NYS bridge data
- Bridges across the country are badly in need of repair, and it can literally be a life or death issue. Here’s more about that.
- Download and unzip this data set of 51,000 bridges in New York State. bridges_blanksremoved.csv
- Open it in Excel. Scroll right until you find the column “critfrac” (column DL) which stands for critical fracture. A y12 or y24 means outdated design, so a single solid hit can bring the entire bridge down.
- Next, find the column “suffrtno” (column FC), which stands for Sufficiency Rating. Anything under 50 is considered dangerous.
- Also note the “totlcost” (total cost to fix in thousands of dollars) in column DV, and “avdayno” (average daily traffic) in column AK.
- How many bridges are in danger of collapsing due to critical fracture?
- How many bridges have an inadequate sufficiency rating?
- How many have both bad critical fracture and sufficiency rating numbers?
- How much traffic goes over the bridges with both bad critfrac and suffrtno ratings? (Use the data from “avdayno,” column AK).
- How much will it cost to fix the bridges with both bad critfrac and suffrtno ratings? (Use “totlcost”, column DV).Go through these yourself, then let’s review the answers and how to get them.
1:40-1:50: Looking ahead: