Sketching D.C. Crime Data With R

By Matt Stiles | | Topics: Crime

A car burglar last week nabbed a radio from our car, prompting me to think (once again) about crime in Washington, D.C., where I live.

I wanted to know if certain crimes were more common in particular neighborhoods, so I downloaded a list of every serious crime in 2012 from the city’s data portal. The data contained about 35,000 reported incidents of homicides, thefts, assaults, etc., with fields listing the date, time and neighborhood associated with each case.

I used the statistical programming language R, which is great for quickly creating small multiples to examine data, to make some rough visual sketches.

First, since we’re talking about cars, the first grid shows thefts from vehicles, by hour and “advisory neighborhood commission“. These commissions are the small groups of officials who represent their respective D.C. neighborhoods on issues like real estate development and alcohol sales, among other things. (I live in Brookland, which is governed by ANC 5B). You can find your ANC here.

It’s clear that thefts from vehicles are most common in ANC 1B, a diverse, densely populated and rapidly changing section of the city. For those familiar with D.C., this is Shaw, U Street and parts of Columbia Heights. The x-axis shows the hour of the crime, and the y-axis shows the total number of crimes. My neighborhood is relatively safe, actually:

theftfromcar2012

Next we look at robberies, which appear common in ANC1A, which also contains Columbia Heights and Park View. Notice the spikes in the early-morning hours in the ANCs 1A and 1B, compared to the late-night spikes in ANCs 8B and 8C, both of which are in the far southeast neighborhoods like Anacostia and Buena Vista. These are among the poorest areas in the city. I’m not sure what that means, but it’s interesting:

robbery2012

Burglaries…

burglary2012

Car thefts…

cartheft2012

Assaults with dangerous weapons…

assault2012

Here are the homicides — all of which get coded as occurring at midnight, so we don’t get to distribution by hour. Still, the result is a simple bar chart that shows the variance by region  (7D and 8E had more homicides last year than other locations).

homicide2012

Here’s the grid with all these crimes above (also including a small number of arson cases):

allcrimes

And here’s a grid with histograms for each offense type. Simple thefts (there were more than 12,000 last year) appear to be most commonly reported in the afternoon, while thefts from vehicles are most often reported first thing in the morning — probably because victims notice the crime when they wake up.

Screen Shot 2013-07-08 at 12.32.53 PM

Again, these are just quick sketches, but they show you the power of R in exploring your data before investing time in a more complicated visualization. A look at the basic code also shows how quickly these types of sketches can happen.

Previously:

How Common Is Your Birthday?

By Matt Stiles | | Topics: Demographics

UPDATE: I’ve written a clarification about this post here. Please read it

A friend posted an interesting data table on my Facebook wall yesterday, which was my birthday. The data listed each day of the year with a ranking for how many babies were born in the United States on each date from 1973 to 1999. Some interesting trends are evident in the data. Apparently, people like to make babies around the winter holiday season because a large proportion of babies are born in September (ours is due Sept. 24, btw).

Sept. 16 was most common. Feb. 29* was least common. This heatmap is an effort to visualize the trends, with darker shades representing more births:

Data source: NYTimes.com, Amitabh Chandra, Harvard University

Follow Matt @Stiles on Twitter.

* A previous version of this post incorrectly listed Jan. 1 as the least common birth day. 

Use Calendar Heat Maps to Visualize Your Tweets Over Time

By Matt Stiles | | Topics: Social Media, Tutorials

Following Nathan Yau’s excellent tutorial for creating heat maps with time series data (he used vehicle accidents by day for a year), I visualized 3,559 of my tweets back to March 2009.

These maps, created with a modified R script from the tutorial, show how often I sent tweets (both personal and RT), with darker shades representing more activity. It’s fun to go back to the dark days and recall what sparked flurries of tweets:

Charting Marriage, Education

By Matt Stiles | | Topics: Uncategorized

Lately I’ve been experimenting with bubble charts in R based on Nathan Yau’s great tutorial. In this case, I wanted to see the relationship between higher education and marriage among women by state. 

Some states — such as Idaho, Utah and Wyoming — have both high marriage rates and low higher education rates. But that really says more abou those states than whether marriage and higher education correlate. Washington, D.C., for example, has the highest higher education rate and the lowest marriage rate. 

Still, it’s fun to see how states compare. View a larger version here

Data source: U.S. Census Bureau, American Community Survey

Another View of ONA

By Matt Stiles | | Topics: Uncategorized

Yesterday I posted a map that used proportional symbols to visualize the home cities of Online News Association conference attendees. Today’s version uses great circles to map the routes attendees took to Boston (assuming they had direct flights, of course). Red lines represent more attendees from a location: 

Inspired by Nathan Yau’s great tutorial. (Thanks also, Nathan, for the generous help today).