Thursday, March 10, 2016

Text Mining/Data Visualization: 1950's IF Project

I faced several challenges during this project. My first was using my computer for anything other than research, Word, PowerPoint, or binge watching Hulu. In addition, the operating system I am most familiar with is Windows and I am working with a Mac. After familiarizing myself with the operating system, archive.org, and voyant-tools, my next hurdle was downloading the correct file format of IF. I never figured out how to download and save the text file directly onto the Mac, but I was able to copy the text file from archive.org and paste it into TextEdit. I retrieved all of my files and began the editing process. The files were missing large portions of the text, and some of it was illegible. I edited one issue in full and it took almost six hours. Since I did not have the time to embark on an editing journey, I had to scrap the idea of perfection and move forward with partially edited issues.
All of text files from 1952-1974 were to be utilized in the project. However, after I uploaded all the files into the corpus reader, it did not load. I compiled a mega file that did not work either. That is when I came to the conclusion I would have to work with the files in smaller portions.
I chose to read the February 1959 issue of IF. Considering when this issue was published, I decided to focus on the events leading up to the end of the decade. I created a list of potential themes that could be associated with events such as the Korean War and the beginning of the Cold War, DNA, the launching of Sputnik 1, NASA, suburban life, and the Baby boom (the list could go on and on). The themes I extrapolated from these events are space, spaceships, satellites, cloning, planets, overpopulation, resources, radiation, bombs, shelters, and spies.
In a previous post I discussed my issues with “word stop”, I was only able to successfully “Stop Words” I did not want on the Cloud one time during my Frank Norris project. So unfortunately, the Cloud was rendered useless.  That lead me to my decision to use Bubblelines. It is capable of clearing all the terms, and by using the “Find Terms” feature it searches the files and locates similar words. If the words are not in the text, no results appear. I separated the issues by year, and searched my themes. The URL links did not work for this, I had to export it by selecting “Export Bubblelines” and “a PNG image of the visualization” then saved the image. 
Here are my results:


1952 IF
 Main Theme: Space
Other Predominant Themes: planets, radiation, and bombs

1953 IF
Main Theme: Space
Other Predominant Themes: Satellites, planets, and bombs

1954 IF
Main Theme: Space
Other Predominant Themes: Spaceship and radiation

1955 IF
Main Theme: Space
Other Predominant Themes: Planets, spaceships, bombs, and radiation

1956 IF
Main Themes: Space and planets
Other Predominant Themes: Bombs and radiation

1957 IF
Main Theme: Space
Other Predominant Themes: Planets, bombs and radiation

1958 IF
Main Theme: Space
Other Predominant Themes: Planets, spaceships, bombs

1959 IF
Main Theme: Space
Other Predominant Themes: Planets, bombs, spaceships
   The main themes throughout the issues were space, planets, bombs, and radiation. Other themes such as overpopulation, resources, shelters, and spies appeared infrequently. Where as, cloning did not appear at all. The main themes correlate with the Korean War, the Cold War, the launching of Sputnik 1, and NASA. This displays how these events had an influence on the themes that appeared in literature during the 1950's.
The missing information definitely impacted the quality of the results voyant-tools produced. Text that may have further supported my themes could have been missing. If one or two issues were assigned per student, attention to detail would be possible. Editing could be done in full and issues could be analyzed more. Since we are such as small class, I am pleased results were possible. 
There were moments this process seemed arduous, each step was foreign to me, but in retrospect I have taken away a lot from this project. My advice to future Digital Humanities students, things may not go as anticipated and challenges can arise, be patient, tenacious, and collaborate. When coming up with a list of themes, try to think of multiple words that could be used for that theme to search for in the texts (Ex: DNA--cloning, genetics, gene, etc.) For Mac users, allow google to be your friend when you reach a hurdle. Ask "how to" questions, a solution may have already been posted.
 These tools cannot replace what you gain from manually reading literature. However, for dissecting literature, these tools are wonderful. You can see patterns in literature and make correlations. In addition, data mining and visualization tools are beneficial for all types of learners. Finally, my favorite, that we can see how literature is influenced by culture, past and present.

No comments:

Post a Comment