Digital Humanities, Stephanie Falkowski, Page 2

Posted 3 years ago by sfalkowski

This post goes over another lesson from the Programming Historian, this one covering the uses of GoogleMaps and GoogleEarth.

The first part uses the My Maps feature of Google Maps. Creating a new map was very intuitive. After titling the map, I imported the sample data concerning various fat exports to Britain in the mid-1890s. I then had to tell Google which columns of the uploaded spreadsheet correspond to places and which to information to be linked to those places.

On the map produced, all places had the same default marker, and this was changed by going from uniform style to “Style by data column: Commodity.” This option gave each type of commodity its own color, with the ability to customize the colors and icons further.

Next, I added a new layer by clicking new layer and changed the base map to satellite by selecting that option at the bottom of the editing panel.

Then to practice putting place markers, in, I used the place marker tool to place some markers. Similarly, I used the tool next to this to draw some polygons around a couple of lakes and to highlight a road that runs between them. All of these markers, lines, and polygons were added to the new “Layer 2” as seen in the side panel.

The second part of the lesson uses Google Earth. After downloading and exploring how it works, the first steps of the lesson were to look at the Rumsey Historical maps by selecting them to appear in the layers pane. This brought up icons where there were historical maps ready to be overlaid. For an example, this 1815 map of Quebec City and the surroundings.

The next part involved importing a saved map, and for this I used the map that I had just been working on as the example St. Lawrence Seaway map wasn’t currently compatible. This was as simple as opening the saved KMZ file.

Next was drawing a polygon of Lake St. Clair, which worked much as in the Google Maps.

The most interesting part was to overlay a historical map onto the satellite images. For this I found an 1886 map of Duluth, MN. To input it, I selected the overlay button and browsed to find the right JPEG image. Then came adjusting of the map to align with the satellite imagery.

QGIS Map-building

Posted 3 years ago by sfalkowski

In this post, I am working with QGIS, following along the tutorial from the Programming Historian to create and edit a map of Prince Edward Island.

Upon opening a new project, the first task was to set the Coordinate Reference System (CRS) to the specifics of Prince Edward Island. This was done in the CRS tab in Project Properties.

Project Properties window for the selecting of a Coordinate Reference System

Next was to input the coastlines from a shapefile having that information by adding it as a vector layer as shown below. By default, QGIS gives this a background fill, which was removed double clicking the coastline entry in the layers pane, thereby opening the layer properties window where the symbology involving such aspects of appearance as fills and outlines can be customized. The background was removed by changing the stroke color of the simple fill to “No Brush.”

coastlines appear, but so does an unwanted background color

selecting no brush to remove background color

Prince Edward Island’s hydro-network was entered as the next vector layer, and the lines changed to be blue befitting waterways.

Then I added the shapefile showing the landuse of Prince Edward Isalnd, e.g. where forests, wetlands, developed areas, etc. are, according to the 1935 inventory of the region on which the shapefile is based. The editing of symbology was more complicated here, so that the different categories would be distinguished in shades of green. From the symbology tab of the layer properties, I first had to select categorized instead of the default “single symbol,” then set the value to be categorized to “landuse,” and select the color ramp to the gradient of greens.

using categorized option to configure the land-use vector layer

In order to remove the outlines between land sections, I selected “configure symbol for the symbol menu, and changed the stroke style of the simple fill to “No Pen.”

configuring symbol to remove outlines in land-use vector layer

selecting no pen to remove outlines of land-use vector layer

map showing coastlines, hydro-network, and land-use of PEI.

Highways were added in the same way, again using the “categorized” option to make primary and secondary roads appear differently.

Placenames followed, and were edited under the “Labels” tab of the Layer Properties window. A buffer was added so the names of cities would be legible against other things in the map.

Finally a raster layer was added, in much the same was vector layers were. This map image was then moved behind the other layers and the coastlines were bolded to finish the lesson.

Using Zotero

Posted 3 years ago by sfalkowski

Zotero is a useful tool for collecting, organizing, and citing sources. Here I explore some of its features that operate using a screen-scraping browser extension. I start in the Zoetero application, adding a new collection, for this example I named it “desert monasticism.”

To populate this collection, I went to the university’s library website and entered the same phrase into the search bar and began adding sources that were most relevant and interesting using the Zotero add-on for Firefox, as shown below. There are such add-ons for most browsers, with the current exception of Safari.

screenshot showing the Zotero browser add-on in use.

The screen-scraping system isn’t perfectly able to find all relevant information and categorize it correctly. Indeed, looking at the new collection within the application, some of the entries need cleaning. Some examples are shown here:

The author field is blank for this entry

Fixing these issues is a simple enough matter, just editing the fields so they have the correct information, or, as in the case of the contributor that was actually the editor, changing the name of the field from the drop down menu as shown:

changing a contributor field to an editor field.

Another edit that I often forget is whether the authors’ names are properly separated first and last. The “creator” column generally shows only a last name, so when a first name is given, this is a clue that the metadata needs further cleaning. In this example, three articles by John Wortley stand out, and one by Tobias Stanislas Haller.

the new collection with creator fields needing further cleaning highlighted.

These are easily fixed by clicking the “switch to two fields” button as shown.

Zotero is helpful in keeping track of sources used in research, and as long as one then cleans the data, it can also integrate with Microsoft Word or LibreOffice to create citations and bibliographies.

Data Visualization

Posted 3 years ago by sfalkowski

Google’s Public Data directory has visualizations of data from many sources, and this post examines a couple of them, thinking about their design and possible ways the visualizations may be misleading.

The first visualization I’m going to consider has fertility rate, or rather the average number of children per woman, on the Y-axis and life expectancy on the X-axis. The data comes from World Bank.

Birthrate and life expectancy bubble chart

One shortcoming of this graph is in the nature of the bubble chart, namely that the size of the bubble is supposedly proportionate to the population of the country it represents, which is difficult to do accurately with circles. This problem is exaggerated with this data, as the degree to which bubbles are magnified can be altered by the viewer.

Birthrate bubble chart with bubble magnification at zero

Birthrate bubble chart at maximum bubble magnification — Birthrate bubble chart with bubble magnification at zero

A second shortcoming is in the dimensions of the graph, which change with the size of the browser window. The scales seem reasonable for the data being presented, but depending on how narrow or elongated the X-axis is, it can appear that the difference between countries is more or less drastic.

bubble plot of fertility and life expectancy in a narrow window, making the trend line appear more vertical.

bubble plot of fertility and life expectancy in a wide window, making the trend line appear more horizontal.

This second visualization considered here is a line graph showing the broadband penetration rate, i.e. “Number of high-speed internet connections (capacity equal or higher than 144 kbit/s) per 100 inhabitants,” on the Y-axis and the year, ranging from 2003 to 2010, on the X-axis.

A line graph in this case is appropriate to show continual change of each country.

This graph is based on data from Eurostat, which importantly shows data only from countries in the European Union, and so does not include all European countries, much less allows comparisons with those of other continents. What countries are included in the graph can be specified from the checklist to the left of the chart. Selecting too many makes for a cluttered graph, so I have a more reasonable number checked.

This graph also has the issue of being able to tell a story of rapid implementation of broadband or one of slower, gradual adoption depending on the size of the browser window.

broadband penetration in some EU countries, narrow view

broadband penetration in some EU countries, wide view

These visualizations are good at communicating information, but it must be recognized that there is malleability and distortions that shape how data is interpreted.

Network Diagrams

Posted 3 years ago by sfalkowski

This post examines the Linked Jazz network diagram found at https://linkedjazz.org/network/. The initial view of this diagram is below, and shows the nexus of relationships between people I assume are all jazz musicians.

Initial view, the "fixed mode" of the Linked Jazz network diagram — Initial view, the “fixed mode” of the Linked Jazz network diagram

Some of these nodes are larger than others, both in terms of the picture of the musician and in the font size of their name, this is especially true of those located around the edge of the general oval shape. The vast majority of the nodes are dots too small to discern the picture and have no name listed until you mouse over them, which opens a dialogue box giving some general information from Wikipedia about the selected musician.

As can be seen above, doing so also grays out all extraneous data points and shows only those artists who had some relationship to the one selected. Why most edges in this mode are blue, while some are green or orange is unclear, but presuming the edges denote musical influences, the way the nodes are laid out suggests the direction of influence in this case was that which Milt Hinton had on other musicians. This appears to be the sort of information this network diagram reveals, i.e. basic information on a large number of jazz musicians and more importantly for this diagram, the lines of influence between them.

Other modes change the arrangement of nodes to highlight different things. The description of each is in the image below.

different view modes available for Linked Jazz network diagram

In these other modes, it is unclear to me if the meaning of the edges has changed or if it is only a change in node arrangement. One clear difference in the edges appears in the gender mode shown below, in which edges are either blue for men or red for women.

In "gender" mode, selecting Ella Fitzgerald

Other modes have different colors to the edges, but with no legend showing and no knowledge of jazz, what they denote is left a mystery.

Spatial History Visualizations

Posted 3 years ago by sfalkowski

Stanford University has a variety of data visualizations on their Spatial History page, and this post looks at a couple of them to comment on the design choices.

First is the Historic Quad Index, found at

https://web.stanford.edu/group/spatialhistory/cgi-bin/site/viz.php?id=7&project_id=

This visualization is in the form of a map, or as described in its description, a “georeferenced shapefile,” showing the United States Geological Survey’s quadrangles. This visualization assumes the viewer knows something about USGS quads, as this is not explained in the about section. It specifies that “the largest rectangles are 60 minute quads and the smallest are 7.5 minutes quads,” but again assumes the viewer knows what that means. Without more information, it is difficult to know what argument this visualization is making.

The Historic USGS Quad Index visualization

As seen in the above image, the quads are displayed in different colors, but there are a couple issues with the design choices in this regard. 1) no legend is included to specify which color corresponds to what. The description only says that “the different colors represent the different scales of USGS quads,” but not what those different scales are. 2) the yellow color is difficult to distinguish from the slightly lighter and more greenish yellow, especially where the two are not right next to each other.

The second visualization, this one displaying the Cattle Production in the American West, is more interactive, and is found at

https://web.stanford.edu/group/spatialhistory/cgi-bin/site/viz.php?id=7&project_id=

Like the last one, the data is laid out geographically on a map, and changes according to the year selected on the interactive date scale at the bottom. The assumption of viewer knowledge is lower, and the data tells a story of the vacillations in cattle producing regions and the value of cattle in these areas. Specifically, Texas becomes a focus due to the size of its circle (somewhat misleading), and maintains that focus even as other areas expand their cattle production, as shown in below images.

Cattle Production in the American West, 1885 — Cattle Production in the American West, screenshots of two dates, one towards the beginning and one towards the end of the available dates

Cattle Production in the American West, 1929 — Cattle Production in the American West, screenshots of two dates, one towards the beginning and one towards the end of the available dates

legend for circle size for Cattle Production in the American West visualization

Data is shown in circles corresponding to states that produced cattle in a given year, and also in a bar chart broken down by region and state. Both the circles and bars are colored to show the value of the cattle, and the size of the circles represent how many cattle there were. Here is introduced on flaw, i.e. that, as seen in the scale, the circle for 4 million cattle looks to be four times bigger than the circle for 2 million, and four times smaller than that for 7 million.

While some of the Standford spatial visualizations link to the data used, neither of these to did so.

Omeka Sites Compared

Posted 3 years ago by sfalkowski

This Project looks at examples of how Omeka can be used by comparing two sites, both made with Omeka.

visit the Public Art Collection

The first is The Public Art Collection from Eastern Michigan University, and displays photos of artworks from various areas of Michigan, mostly found on university campuses. The metadata for each photo are not overly standardized. Each photo has a unique identifier number and reports who contributed each photo and when, but other fields may or may not be used and not always in uniform manner.

Metadata for "Peacework Through Art," extent field circled. — “Peacework Through Art” includes an extent field to give the size of the mural.

Metadata for "Gathered Sounds," focus on relation field. — “Peacework Through Art” includes an extent field to give the size of the mural.

One example is the type of title: some titles provide only the name of the artwork, some describe the piece, what it is, where it is, and even the angle it was taken from. The site is easy to navigate, having a search function and different ways to sort the items.

The search and sorting options for the Area Public Art Collection Omeka site.

visit the Reitman V. Mulkey site

The second is the Reitman V. Mulkey site from Reed University, which walks users through the context and ramifications of the Supreme Court case concerning housing discrimination. There are separate pages that can be navigated through using arrows at the bottom of the screen or by clicking the page title in the header.

The header bar with pages listed to navigate through the site's storyline/argument. — The header bar with pages listed to navigate through the site’s storyline/argument.

The first page, “Browse” is where items are listed with their metadata. Items include maps, documents, photos, and books. Even so, this information (type, format, medium) is not part of the metadata, which instead focusses on a detailed paragraph in the description field, and also has date, creator, publisher, and source fields filled in.

Metadata for a photo of the Little Rock Nine — Metadata for two of the items on the “Browse” page.
Note the level of detail for both in the “description” field.

Metadata for a cartoon entitled "Practical Amalgamation" — Metadata for two of the items on the “Browse” page.
Note the level of detail for both in the “description” field.

Like the other site, there is a search function and different ways to sort the items – the same options as above.

These two sites have very different purposes, one constructs an argument about the Reitman V. Mulkey case, and the other simply catalogues artwork found on Michigan campuses. Both have items with metadata, but that metadata differs according to that differing purpose. Both are easy to navigate and share much the same features, though the second site contains explanatory pages not found on the first.

Datasets & Spreadsheet Design

Posted 3 years ago by sfalkowski

This project involves exploring datasets and thinking about spreadsheet design. The dataset used in this project comes from data.gov and concerns Graffiti removal in Chicago. Upon finding this dataset, I downloaded the CSV file.

The download CSV file button on the data.gove website

Opening this CSV file in OpenRefine, I took a look at the data and made some cursory observations about it. The column headings include things like:

Date the request was made to have graffiti removed,
Status of the job,
Date the job was completed,
Request number,
Type of surface the graffiti is on
Where the graffiti is
Street address
X-Y coordinates
Latitude and Longitude
Ward
Police District
Community Area

Naturally, the where the graffiti is and on what type of surface is useful information for those tasked with removing/covering it up, as is the street address. The Police District, Ward, and Community Area information would also make it easier to find. Having X-Y Coordinates and Latitude and Longitude seems overkill, but perhaps that is useful information for a city as big as Chicago. Having three columns, one for latitude, one for longitude, and one for both, seemed unnecessary,

chart showing Latitude, Longitude, and both together — latitude and longitude information split,
but next to a column listing the coordinates together

One column heading that definitely seemed redundant to this dataset is “type of service requested,” since the same info was entered for each entry, namely “graffiti removal.” This was confirmed by running a text facet on this column, which returned only the one choice.

Facet showing graffiti removal as the only option for the type of service request column — “Graffiti Removal” – the only choice for
“Type of Service Request” in this Spreadsheet

Another curious choice was revealed by running another text facet on the column “where is the Graffiti located?” While this returned information about the most and least common places to be graffitied, it also turned up the detail that one of the requests was for an expressway with the note “DSS will NOT remove.”

Facet on "Where" returns unexpected result — Facet on “Where is the Graffiti located?” returns an unexpected result – Is this the best place for the information that DSS will not remove the Graffiti?

Request marked "Will not remove" listed as "Complete" — Request with the “Will NOT remove” note marked as “Completed”
The “Status” column may have been the more obvious place to put the “will not remove” info.

This information was not included in the “status” field where I might have expected it, in fact this specific request is marked completed. Such is the information that can gathered when thinking about spreadsheet design, what information should be collected and how and where should it be entered.

Dublin Core Records

Posted 3 years ago by sfalkowski

Continuing to learn about and use Dublin Core, in this project I got more practice collecting this information and filing it into the correct Dublin Core fields. I chose six items all relating to the crusades and relations between East and West. Five are books, and one a journal article.

There are certainly limitations to using only these fields. Not all relevant data can be fit into these fields. For instance, even to properly format a citation from one of these items, one needs more information, such as place of publication. For journal articles it is even worse, with no clear place to put the name of the journal. In this example, I first put this information under source, but then thought this would be problematic if I were to decide to use source to track other information, such as who digitized the older books that weren’t born-digital. For this reason I thought again and put the name of the journal under publisher, though that can obscure who published the journal.

Another difficulty is that with only a creator field, it is difficult to distinguish between authors, translators, and editors. Using only standard Dublin Core fields also introduces other ambiguities, as it disguises the categorical differences between articles and monographs for example, both having the identifier of “text.” With nearly everything being “text,” this becomes a less than helpful designation. Even in the format description this distinction cannot be discerned, as I opted to not enter the dimensions, weight, etc. of the physical book, but the electronic version I am actually accessing, whether it is a pdf or an epub – all these items happen to be in PDF format. Coverage was a difficult field, as this is not always clearly included in descriptions and subject keywords. Sometimes it can only be determined by actually looking through the book and its contents.

Working with Omeka as a Contributor

Posted 3 years ago by sfalkowski

This project involved inputting two items to our class’s shared Omeka site. The process was straightforward and easy to figure out. Upon registering for the trial level account, the Omeka site for Hist5891 that I had been invited to was there at the bottom of the page.

Clicking Manage Site (see above) brought me to a page where there was an option to add an item (see below).

This in turn brought me to a page wherein I could enter metadata into Dublin Core fields. For my first item, I was working with the book Byzantium between the Ottomans and the Latins by Nevra Necipoğlu. Many fields were self-explanatory as to what information was to be put there. For instance, the title of the book clearly belonged in the first field “title,” the author’s name in the field “creator,” and the publisher in the field “publisher.” Referring to an explanation of the fields cleared up what was to go in the other fields. Subject entries and description were copied from the publisher’s website. Since I was the one contributing the book to this site, I put my name as the contributor. Some fields were not applicable for this source, such as “relation,” since there are no important related resources appropriate to such a field for this book. Format details were copied from the Amazon listing for this book. For “type” I had initially entered “hardcover,” forgetting that we were to use DCMI-Type Vocabulary, which only requires the more vague term “text.”

Finally I added a jpeg of the cover of this book, by clicking on the “Files” tab at the top of the page, and choosing the file to upload. From there it was a matter of clicking “Save Changes.” and this first item was added to the list.

Cover of Byzantium between the Ottomans and the Latins

For my second item, I used the restored version of the Black Madonna of Częstochowa. For this icon, the metadata was largely supplied from its listing on ARTstor. While being a very different object, and the metadata entered respectively different, the process of figuring out what information goes where was straightforward enough.