I'll take "Famous Titles" for $400 Alex. #IVMOOC Week 4

Two posts in week, and two weeks of #IVMOOC down! Topical analysis this week, and keeping in theme, here’s what I’m doing.  Alongside setting up data architecture, moving research along, planning future moves, and comparing and contrasting frameworks (including engineering thinking, skills, accreditation, assessment and professional behaviour), I’m eating lunch, listening to Hidetake Takayama, writing this post and trying to figure out what to make for dinner. Pretty much the same as everyone else!

Anyways.  I liked this week, but I got sidetracked with digging into visualizing data from twitter.  In the midst of watching one of the videos, I wanted to see how the #antivaxprof shenanigans were heating up or calming down.  I fired up R, figured out how to use the twitter API through ROAuth and TwitteR, and got to business. This is data from Feb 4th to 11th that contained #antivaxprof. I wanted to see graphically how tweets were being used, as in when tweets were originating and how they were progressing.  Below is the quick and dirty plot of that. Red dots are original tweets, greens are retweets of the same tweet.  Each plotted to the retweet count of the original.

Twitter Viz

Then I wanted to see how this was capturing the attention of the twittersphere, simply by looking at the frequency of tweets per day, and overlaying that with the total retweet count of all those tweets in a day.  (I guess the #IVMOOC workflow is sinking in.)  Enter plyr, and here is the quick and dirty of that.  I see this graph as showing that the public interest of the story declined in both activity and reach over time.  Thing about controversies and social media.  Short bursts of attention that die down, or at least settle only to those most affected.  I’m certain that this data is a little biased though, since twitter is just one small slice of the social media globe.

Twitter Viz 2

Then I went back to actually doing what I was supposed to be doing for #IVMOOC.  I searched the NSF grants database for “Engineering Education”, and work on extracting and visualizing the word co-occurrence network in the titles.   Had to use a bit of scripting to get things the way I wanted it.

resizeLinear(references,2,40)
resizeLinear(weight,.5,5)
colorize(references,gray,black)
g.edges.color = "34,94,168"
for n in g.nodes:
    n.x = n.xpos*40
    n.y = n.ypos*40
    n.strokecolor = n.color
    if (n.references > 20):
        n.labelvisible = true

A bit of time and some cleaning up in Affinity Designer, and I submitted this:

IVMOOC Assignment 4

I can see a lot of potential uses for this in my own work and research.  Fantastic week.  Next up: the midterm.  

(Cue ominous music)

Where in the world is Carmen Sandiego?: #IVMOOC Week 3

Getting through this week was a bit tough, not due to content or complexity, but due to life, work and winter.  Being the only well parent is a lot of work.  Couple that with deadlines, grant writing and research sessions made for very little free time.  I made it though!

After watching the videos and hands-on clips for the week, I felt tackling this assignment was pretty easily.  I chose to visualize the amount of NSF funding for Engineering Education research. My search off of the scholarly database yielded 2000 results, with multiple awards per state.  Aggregating in the very manual way outlined in the hands on video was far too inefficient and tedious.  So I turned to R and some of the wonderful packages that have been created (gdata, ggmaps, ggplot2, plyr).  I did use SCI2 to geocode the data, since I used my query limits with google maps when playing around with ggmaps.

After that it was pretty easy.  I aggregated at the state level, as the SCI2 geocoder gave me some errors when encoding the zip codes, or the city level.  Pulled up a map of the continental USA, and plotted the amount of funding indicating amounts with both size and color.  The code I used is below:

library(plyr)
library(gdata)
library(ggmap)
library(ggplot2)

setwd("~/MOOCs/IVMOOC";)
data <- read.csv("NSF Master LandL.csv";)

data.state <-ddply(data, .(state, Latitude, Longitude), summarize, expected_total_amount = sum(expected_total_amount))
usa <- get_map(location = 'united states', zoom = 4, color="color", maptype='road')
g <- guide_legend("Amount of Funding (USD)")

ggmap(usa) +
  geom_point(aes(x=Longitude, y=Latitude, show_guide = TRUE, size=expected_total_amount, color=expected_total_amount), data = data.state) +
  scale_colour_gradient(low="#e7e1ef", high="#dd1c77") +
  guides(colour = g, size = g) +
  theme(legend.position = "bottom") +
  ggtitle("Geocoding of NSF Funding of Engineering Education Research") +
  xlab("Longitude") +
  ylab("Latitude")

I wanted to explore at both the state level and at the city level, but ran into those limits, and the amount of time I could spend on this.  Maybe in the future, I’ll delve into this a bit more and expand the code to geocode by the zip in R, and then plot those on the US map.    My visualization:

Geospatial Assignment

All in all, a fun week.  The bits I found most useful:  a nice overview of colors and their use in geospatial maps and the hands-on sessions. Next up, week 4 and the midterm!

#IVMOOC Week 2: Wibbley-wobbly timey-wimey stuff

I’m liking this course.  The videos are easy to listen to and are in nice manageable chunks that I can watch them while making dinner, or on a quick break from work.  I especially liked the discussion of the burst-detection model this week, although I would have enjoyed a more in-depth discussion on how the model works.  I’m filing that for future research and reading once I get the opportunity to explore the textbook and associated papers.  The hands on videos this week were fantastic, and offered a great tutorial to accomplish this analysis in Sci2.  Below is my temporal bar graph/burst analysis of Mesothelioma, stopping at the year 2007.  I might redo it, given some time, as I’m concerned about the generated scale being a bit wonky (two entries for zero?).

Mesothelioma temporal bar

I do wonder about the self-assessments.  They seem to be stand-alone things for people to check if they’ve absorbed key facts/nuggets of information from the videos.  That to me seems far less important than being able to discuss what each approach can be used for, what its benefits are, and how this would be applied in actual practise.  The assignments reflect this to a degree, and the client project whole-heartedly embraces this. But I wonder about the midterm/final  being a test to see if we’ve absorbed the discrete bits of content, which worries me.  Content is important, but developing the skills to conduct visual analysis and create useful and meaningful visualizations of data seems to be a goal of the course.

I think the midterm/final would be a perfect case for authentic assessment of the skills/techniques presented in the course thus far.  It would require both a knowledge of the content, plus the practise of skills and techniques presented thus far.  I’m crossing my fingers for this, and hoping I’m pleasantly surprised.  I have my doubts though, as the grading scheme and expectations for the course have been not communicated to the participants, or at the least, very difficult to find.

Either way, it’s something I can overlook as I believe the opportunity to explore this kind of content in a guided fashion far outweighs my concerns about assessment!