Digital Humanities, Stephanie Falkowski, Page 3

Posted 3 years ago by sfalkowski

To get more practice at using HTML, I chose to put Carl Sandburg’s poem “Nocturne in a Deserted Brickyard” into this language.

I set the entire poem up as a block quote, by using block quote tags before and after the text, inclusive of title and citation. For the title, I decided on a heading level 2, judging that since I set this up as a block quote, there would be at least one higher level of heading on this hypothetical webpage.

Image showing the HTML for a block quote and header level 2. — Figure 1 Showing the HTML for a block quote and header level 2.

The first line of the poem begins with an indent, and so to include this indent, I used the <pre> element to tell the computer that this indent is not for the purposes of keeping my place in the lines of code, but that it is part of the text and belongs in the displayed version. This is shown in step 3 of Figure 2.

Indent of first line using <pre srcset= — Figure 2 Indent of first line using <pre> tag.

After entering in each line, I added the break <br> element so that the next line would appear on the next line. Without these line breaks, the lines run together as though they were a prose paragraph and not verse.

Figure 3 Using line break tags to separate lines of poetry.

After the final line break, I used the <cite></cite> citation tags to properly attribute this poem to Carl Sandburg. This is shown in step 10 of Figure 4.

The final HTML form of Carl Sandburg's "Nocturne in a Deserted Brickyard," including the citation. — Figure 4 The final HTML form of Carl Sandburg’s “Nocturne in a Deserted Brickyard,” including the citation.

HTML Recipe Experiment

Posted 3 years ago by sfalkowski

original form of barbecue spareribs recipie — Recipe for Barbecue Spareribs as printed in the 1961 Betty Crocker’s New Picture Cook Book. Note the lack of formatting.

For my first attempt at coding, I chose a favorite barbecue spareribs recipe from Betty Crocker’s 1961 New Picture Cook Book.

First, because this recipe was not originally written in list format, either for ingredients or instructions, I had to type out the recipe introducing these style conventions. Then using codepen.io, I put it into HTML. I began with the heading, putting the title of the recipe between header one tags, as shown in the first steps below in Figure 2.

From there, I added the first ordered list, using the <ol> </ol> tags before and after, and sandwiching each step of the instructions between list item <li></li> tags. This produced a numbered list on the right hand pane.

Image showing the HTML entries behind the main header and first ordered list. — Figure 1 Showing the HTML entries behind the main header and first ordered list.

The next part of the recipe is another recipe unto itself, so I added a lower level of heading, heading 2, to the words “Texas Barbecue Sauce,” just as I had done for the main heading, only now using <h2></h2> in place of <h1></h1>. A screenshot of the step is below in Figure 2.

Figure 2 Step 11 shows the new subheading after the ordered list.

After this is the unordered ingredients list, which got wrapped in <ul></ul> unordered list tags. For each ingredient, however, the same tags were used as in the ordered list to denote a list item. Because I used unordered list tags on this list, each ingredient was rendered with a bullet point in the right hand pane, as seen in Figure 3.

Image showing the input for unordered lists in HTML — Figure 3 Showing the input for unordered lists in HTML

For the instruction section of this sub-recipe, I returned to using the ordered list <ol> </ol> tags. And thusly, my recipe for barbecue spareribs is now in HTML format.

Figure 4 The complete recipe for Barbecue Spareribs with Texas Barbecue Sauce in HTML.

More on OpenRefine

Posted 3 years ago by sfalkowski

For my last post on using OpenRefine, I worked with a dataset of the British Library’s comic book holdings found on thomaspadilla.org.

go to thomaspadilla.org

This site provides guidance for some of the ways in which OpenRefine can be used to clean data. It has walk-throughs of using features including the text filter, facet tool, clustering, and transform on the data there provided. For my post, I used some of these same techniques to find answers to other questions. I questioned how many of the records listed had a less-than-certain place of publication recorded. I did not take into account for the purposes of this exercise those records where the place of publication was unknown to the point of leaving the field blank – of which, there are 2050 – but only those that offered a place followed with a qualifying question mark. There were only 233 that fit the description, and with OpenRefine I was also able to tell that most of them, 183, were given the designation of “London?” followed distantly by the broader placeholder of “England?” that was used in only fourteen instances, and others used less than four times each.

chart showing facet tool sorting most common questioned places of publication

OpenRefine offers many ways of manipulating messy, inconsistent data to render it usable to answer a variety of questions. It was not necessary to clean the data completely, but only those aspects determined by the question being asked, which in this case meant manipulation of the Place of Publication column, though plenty of further cleaning could be done in other equally inconsistent columns to open it to even further questions.

OpenRefine with British Library Data

Posted 3 years ago by sfalkowski

Using OpenRefine

A certain data set on the British Library’s comic book holdings has a number of issues that makes using this data to answer specific questions rather difficult. In this post, I will use OpenRefine to clean and sort the data such that it can answer a specific question, in this case, “How many of the British Library’s comic books have a questioned place of publication listed?”

As above noted, this data has numerous issues, including:

Authors: some authors names are given last, first; some have a period after the name; some a comma after the name; sometimes the same name is entered in different formats
Place of Publications: some give only city names, others city and state, others just a country; state names are abbreviated in various ways; some publication places are bracketed; some have multiple entries.
Publisher: the same publisher is not always entered the same way, e.g. Titan, Titan Books, Titan [distributor], et.al. all refer to the same publisher
Date of Publication: some dates end with a period, some do not, some are bracketed, some are circa dates
Some entries do not have information for all the fields

The basic problem is in the inconsistent ways information is entered in the spreadsheet.
Not all of these problems need to be resolved to answer the question regarding comics whose place of publication is uncertain, but some of the inconsistencies are best eliminated.

Many of the place of publication entries are bracketed, and if we want to look deeper into the question, we probably want “London?” and “[London?]” counted together.
To do this, I used the the transform tool and told it to remove the brackets in this column. The previews show “[London?]” being transformed to just “London?”

Our question can be answered by using the filter text tool in the column for Place of Publication and entering a question mark in the search field, since glancing through the data, this seems to be the way uncertainty is denoted. The text filter limits the view to only those comics that have a question mark in their place of publication. However, looking further into the matter using the facet tool, we see that there are 28 ways in which a question mark appears. Many of them denote the uncertainty in question, others seem to question the spelling of the place’s name.

At this point, it is simple enough to specify which of the twenty-eight are relevant and select them to be included. This gives us 233 records for comics whose place of publication is guessed at. And we can at a glance see other information such as London being by far the most common assumed place of publication, with 183 comics out of the 233 or nearly 80%.

Text Encoding Initiative

Posted 3 years ago by sfalkowski

A Quick Introduction to the Text Encoding Initiative Consortium

The Text Encoding Initiative Consortium develops and maintains guidelines for encoding text-types most commonly used by humanities researchers into a machine-readable form. The organization began in 1987 with the first version of the Guidelines dating to 1990. The current Guidelines, known as P5, were first released in 2007 and are updated regularly every six months. Contributors to this project come from a wide range of countries and group affiliations. Within the TEI are numerous special interest groups working on developing the coding most relevant to their interests. Current groups for example include those working on East Asian/Japanese texts, graphs, manuscripts, and newspapers among others, developing standards for encoding these materials with all their particular specifications.