Datasets & Spreadsheet Design, Stephanie Falkowski

This project involves exploring datasets and thinking about spreadsheet design. The dataset used in this project comes from data.gov and concerns Graffiti removal in Chicago. Upon finding this dataset, I downloaded the CSV file.

The download CSV file button on the data.gove website

Opening this CSV file in OpenRefine, I took a look at the data and made some cursory observations about it. The column headings include things like:

Date the request was made to have graffiti removed,
Status of the job,
Date the job was completed,
Request number,
Type of surface the graffiti is on
Where the graffiti is
Street address
X-Y coordinates
Latitude and Longitude
Ward
Police District
Community Area

Naturally, the where the graffiti is and on what type of surface is useful information for those tasked with removing/covering it up, as is the street address. The Police District, Ward, and Community Area information would also make it easier to find. Having X-Y Coordinates and Latitude and Longitude seems overkill, but perhaps that is useful information for a city as big as Chicago. Having three columns, one for latitude, one for longitude, and one for both, seemed unnecessary,

chart showing Latitude, Longitude, and both together — latitude and longitude information split,
but next to a column listing the coordinates together

One column heading that definitely seemed redundant to this dataset is “type of service requested,” since the same info was entered for each entry, namely “graffiti removal.” This was confirmed by running a text facet on this column, which returned only the one choice.

Facet showing graffiti removal as the only option for the type of service request column — “Graffiti Removal” – the only choice for
“Type of Service Request” in this Spreadsheet

Another curious choice was revealed by running another text facet on the column “where is the Graffiti located?” While this returned information about the most and least common places to be graffitied, it also turned up the detail that one of the requests was for an expressway with the note “DSS will NOT remove.”

Facet on "Where" returns unexpected result — Facet on “Where is the Graffiti located?” returns an unexpected result – Is this the best place for the information that DSS will not remove the Graffiti?

Request marked "Will not remove" listed as "Complete" — Request with the “Will NOT remove” note marked as “Completed”
The “Status” column may have been the more obvious place to put the “will not remove” info.

This information was not included in the “status” field where I might have expected it, in fact this specific request is marked completed. Such is the information that can gathered when thinking about spreadsheet design, what information should be collected and how and where should it be entered.