Saturday, August 3, 2019

A Digital Humanities Study of Reddit Student Discourse about the Humanities

A Digital Humanities Study of Reddit Student Discourse about the Humanities

by Raymond Steding
Published August 1, 2019
A STEM program is not superior to a Liberal Arts program and vice versa. There is a chance for success no matter the route any student takes — Reddit commenter VillageMed
This blog post documents how to locate Reddit social media comments that exemplify students’ and graduates’ discourse about the humanities using the tools and methods of the WhatEvery1Says project. It is based on research begun during the WhatEvery1Says 2018 Summer Research camp and work that continued into Fall 2018.
Reddit comments are the back and forth user posts and replies in titled subreddits–Reddit community forums. I show how Digital Humanities tools produce topics of interest or themes of student discourse such as “Jobs,” “PhD Advice,” “Stem vs. Non-Stem discourse,” “Teaching,” “Admissions,” and “Writing.” The reasons for the students’ positions are often stated clearly within the contexts of these thematic labels. Locating such “topics” of student discourse about the humanities helps to categorically understand student issues. Further, a clearer understanding of the ideas and motives expressed in the Reddit comments facilitates advocacy for the humanities. Additionally, Digital Humanities newbies will learn from this study how to process the Reddit archive to answer their own research questions.
Let’s first look at the “what” of this article: at four exemplary comments expected as the outcome of the research:
Comment #1 PhD Advice Topic 137 subreddit: askacademia
For most fields in the humanities and social_sciences, you have to accept that you may end up in a job that has nothing to do with your degree. There are not enough jobs in academia for the number of students graduating with PhDs [ . . . ] for those who are in top-tier programs and willing to make that sacrifice, grad school can obviously be very personally rewarding
Comment #2 Stem vs. Non-Stem Topic 105 subreddit: badhistory
Physics student, so I can chime in here. STEM students feel that their major is harder and more rigorous than those who major in the liberal_arts, particularly since STEM fields are very math heavy . . .
Comment #3 Political Rhetoric Topic 121 subreddit: changemyview
“SJW” [ . . . ] Notable demographics include liberal_arts majors in college, tumblr, BLM. As the acronym describes, they are “Social Justice Warriors” and fight for “Social Justice”
Comment #4 Stem vs. Non-Stem Topic 105: subreddit: college
It’s because of the stigma in society, because the way such subjects are advertised in society — they make it seem like math and science are difficult subjects [ . . . ] Not everyone can write a 70+ page essay with ease, some people find math and equations to be the easy thing, but many people assume that the opposite is true for everyone.
These comments touch on various issues of interest to the researcher who seeks to understand how students and graduates conceptualize the humanities. As will be seen, the process of generating the topic model itself plays a fundamental role in drawing comments like those above into the same contexts. The model demonstrates the comment’s semantic relationships from one comment to another within the same topic and the semantic relationships of topic to topic. Although this blog post doesn’t conduct a detailed review of how a close reading of exemplary comments such as those above may be used for advocacy, it does answer the question of why a researcher should use Reddit as a resource.

Reddit as a Data Source for Student Discourse about the Humanities

Reddit describes itself as “a website comprised of thousands of user-originated and operated communities, called ‘subreddits,’ or ‘subs,’ dedicated to a variety of interests.” Reddit’s data-rich set of global knowledge and discourse with “more than 330M monthly unique visitors and 18+billion views per month” provides the researcher through the use of Digital Humanities tools an in-depth look into comments about almost every public topic of interest (“Holiday on Reddit”).
All of this data is curated by “Moderators,” or “Mods,” who perform “a variety of functions within th[e] community, including removing spam and enforcing the rules of their subreddit” (Reddit). Since a Reddit user may create any number of pseudonyms to post comments, many times comments are expected to be deleted. The commenter might post angry comments that have nothing to do with the theme of the subreddit, or they may make irrational comments in another voice that aligns with their chosen pseudonym. Although the deletion service of the moderators doesn’t scrub the comments of all irrelevant data, it does spare the researcher some of the work. Off-topic comments and spam less often end up as tokens in the corpus submitted to computational analysis procedures. The result is that fewer spam and off-topic comments get mixed into the topic model.

The Corpus

Each minute as many as 5000 new comments or more than ½ million new words are added. The following graphics snapped from the front page of pushshift.io depict the statistical usage of Reddit (pushift.io).
Statistical usage of Reddit. Source: pushift.io.
Statistical usage of Reddit. Source: pushift.io
The first job in assembling a corpus from Reddit data is to establish constraints on how much of the material to collect. For this study, a total of 3.3 terabytes of open-access Reddit comments (approximately five billion) from January 2006 through October 2018 were downloaded in JSON format from pushshift.io. The scope of this corpus is such that, if the comments were printed three per sheet of paper and each sheet stacked one on top of another, the length of the paper stack would exceed 100 miles. With such a large amount of text in the archive, the question becomes how does the researcher find what they are looking for?
To collect a corpus comprised of documents with exemplary comments such as those above, I initially filtered the downloaded Reddit data for comments containing at least one of the keywords “humanities,” “liberal arts,” or “the arts,” which resulted in a corpus of 154 files, totaling 980.5 MB of text. I next performed three further refinements. First, I filtered comments that contained the keywords “student,” “major,” or “college” (with or without affixes) into a new corpus. The python code to search the text of the 154 source files follows:
#usr/bin/python
import os
import glob
path2 = '/home/path-to-your-source-files/Student-Major/'
for json_filename in glob.glob(os.path.join(path2, '*.json')):
    filename_out = (os.path.basename(json_filename))
    filename_in = filename_out
    grep_command = 'grep -i \'student\\|major\\|college\' /home/path-to-your-source-files/Reddit/Student-Major/' + filename_in + ' > /home/path-to-your-destination-files/' + filename_out  + '-student-majors-college' + '.json'
    os.system(grep_command)
This second search results in 153 files totaling 335.5 MB that were run through a Python preprocessing script for proper formatting before the data were uploaded to the WE1S server. The Python script removes comments containing less than 225 words and comments with a karma score of less than or equal to 2. It also calculates the sentiment and subjectivity values of each comment through the use of the Python Textblob API; it writes out each comment as a single JSON file containing both the comment text and the metadata. The resulting corpus (“Corpus-A”) contains a total of 22,160 comments.

Metadata

The JSON file format of the content downloaded from pushshift.io compliments the researcher’s exploration by making parsing and processing easy with Python. Each line within the files contain the following metadata:
{
    "author": "xPadawanRyan",
    "Author_flair_css_class": "",
    "author_flair_text": "SSW / BA & MA History / PhD* Human Studies",
    "body": "It's because of the stigma in society, because the way such subjects are advertised in society -- they make it seem like math and science are difficult subjects . . . Not everyone can write a 70+ page essay with ease, some people find math and equations to be the easy thing, but many people assume that the opposite is true for everyone.",
    "Can_gild": true,
    "Controversiality": 0,
    "Created_utc": 1509939054,
    "Distinguished": null,
    "Edited": false,
    "Gilded": 0,
    "Id": "dper8w9",
    "Is_submitter": false,
    "Link_id": "t3_7b2gls",
    "Parent_id": "t3_7b2gls",
    "permalink": "/r/college/comments/7b2gls/why_do_people_assume_that_we_major_in_worthless/dper8w9/",
    "Retrieved_on": 1512171532,
    "Score": 3,
    "Stickied": false,
    "Subreddit": "college",
    "Subreddit_id": "t5_2qh3z",
    "subreddit_type": "public"
}
Parsing and extracting information that relates to the research is necessary whatever format the corpus files are in, but if the data are in JSON format, a Python script can extract any of the Reddit metadata fields and use them elsewhere. For example, the permalink value that points to the comment thread on the Reddit website can be reformatted as a link in the dfr-browser tool for visualizing topic models. A comparison of the original comment above with the view JSON link on the document title page (in WE1S’s customization of dfr-browser) shows that the JSON list file downloaded from pushshift.io above has been reformatted by the researcher’s Python script to include the permalink value as a hyperlink to the original Reddit thread. The reformatted comment page includes other essential statistics such as the karma score.
{
    "title": "2017-11-humanities-student-major_569_college.txt",
    "pub_date": "2017-11-05T00:00:00Z",
    "Sentiment": "0.01",
    "Subjectivity": "0.56",
    "KarmaScore": "3",
    "Upvotes": "0",
    "Downvotes": "0",
    "Wordcount": "371",
    "Permalink":         "http://reddit.com/r/college/comments/7b2gls/why_do_people_assume_that_we_major_in_worthless/dper8w9/",
    "Threadlink": "http://reddit.com/r/college/comments/7b2gls/why_do_people_assume_that_we_major_in_worthless",
    "Commenter": "xPadawanRyan",
    "content_scrubbed": "It's because of the stigma in society, because the way such subjects are advertised in society -- they make it seem like math and science are difficult subjects . . . Not everyone can write a 70+ page essay with ease, some people find math and equations to be the easy thing, but many people assume that the opposite is true for everyone."
}
The karma metadata field is an important publicly assigned quality-of-commenter numerical value for the researcher to use as a proxy of authority when filtering for “higher” or “lower” quality comments. According to Reddit, “Posts and comments accrue votes, or points, called ‘karma’ . . . [it] is generally a measure of the perception of [the user’s] contribution to Reddit. Positive karma indicate[s] your fellow users regard your comments or posts as enjoyable and contributory to the subreddit.” The karma value is a seed used to winnow the search results into a corpus that includes the public’s approval of the comments being researched. Assuming users prefer a higher rating based on their overall karma points, then the bias of this metadata value is that it may be used to the exclusion of other commenters. The excluded commenters with low karma values could be authors of equally meaningful comments, but they are either new or their comments aren’t as highly rated by others.
Nonetheless, ghosting of the karma value onto the comments made by a commenter occurs since most commenters desire to increase their karma rating rather than lower it; they tend to produce meaningful comments to win more karma points. The implicit notion of a comment being equivalent to the karma rating of the commenter explicitly carries along with the comment within its metadata. Despite bias, as a research decision, the karma rating and the humanities search terms become adjustable variable values for creating quality corpora to answer the research question.

Overview of the Methodology

Corpus-A is the result of what Jo Guldi refers to as “an iterative research process that require[s] successively re-seeding, re-winnowing, and re-reading resulting samples of text from a corpus” (“Critical Search: A Procedure for Guided Reading in Large-Scale Textural Corpora”, 13). The entire Reddit archive of comments filtered by the six search terms “humanities,” “liberal arts,” “the arts,” “student,” “major,” and “college” has “constrain[ed] a large corpus around a particular question” (Guldi 11). Finding exemplary comments for further analysis of how and why students and graduates talk about humanities fields in the way that they do is what the research seeks. Since every comment includes at least two of the search terms, Corpus-A contains many comments worthy of closer inspection. But, with over 22,000 comments many are not meaningful for understanding student discourse concerning the humanities, and many comments will not help determine what influences commenters’ viewpoints. In general, sociological, economic, cultural, parental authority, and individual preference compel their opinions.
But anticipating these factors may lead to the exclusion of some specific and surprising possibilities. For instance, some comments may reveal stereotyping to the point of stigma as a primary element of student opinions within particular subreddits. Others may reveal an unexpected presence of references to “students” and “humanities” in some gaming subreddits Therefore, the search made with the hope of finding the unknown about student discourse must be wide enough to include the broadest context of possible influences behind student opinions, and be narrow enough to isolate the comments constrained by what is meant by the humanities.
As part of the WE1S project, I have analyzed the Reddit corpus using the WE1S workflow based around topic modeling using MALLET and visualized the resulting model with dfr-Browser (Goldstone) and pyLDAvis (Mabey). “A ‘topic’ consists of a cluster of words that frequently occur together. Using contextual clues, topic models can connect words with similar meanings and distinguish between uses of words with multiple meanings” (MALLET). To this point Guldi says that “[t]opic models identify semantic similarities in collections of words that are used together” (19). The semantically similar collections, or topics, may be thought of as themes, such as “jobs,” “admissions,” or “campus infrastructure,” where the documents (in this case, Reddit comments) contain varying proportions of terms most highly associated with those topics. And, each topic visually displayed in dfr-browser includes a list of comments as individual documents that contribute to it. Therefore, the grouping of the documents into coherent themes of discourse by the tools makes it possible to closely analyse within individual Reddit comments the thematic bases of student rhetoric.
PyLDAvis, a Python port of the LDAvis package for R, is an important tool in the WE1S workflow for ascertaining the semantic coherence of topics generated in the model. According to Shirley and Sievert, authors of the original LDAVis package, it “attempts to answer a few basic questions about a fitted topic model: (1) What is the meaning of each topic?, (2) How prevalent is each topic?, and (3) How do the topics relate to each other?” (63). Knowing the meaning of a topic and how prevalent the topic is helped me to label the thematic topics of Corpus-A. The main component of pyLDAvis that helped me to determine the semantic relationships of topics to the comments most heavily represented in them is the relevance indicator. The authors explain “relevance,” as an indicator that gives the user the ability to see the term’s lift — “the ratio of a term’s probability within a topic to its marginal probability across the corpus”— compared with “the familiar ranking of terms in decreasing order of their topic-specific probability” (Sievert 65-6). Knowing the relevance of the terms of the topics, along with a close reading of a few of the comments that made up the topics gave me confidence of the coherent semantic relationship between the topic’s label and the documents that make up the topic.

Interpretation and Methodology

Although not experimented extensively, I generated the Corpus-A topic model of 200 topics from a corpus of 21,018 de-duplicated documents which appears to provide close to an optimal granularity to locate student discourses of interest. I used the following algorithm to prepare the corpus for modeling:
  1. Remove 1376 stop words from the stoplist file
  2. Normalize all versions of “United States of America” to “United States”
  3. Remove punctuation
  4. Merge some phrases from a standard list with underscore
  5. Replace “‘s” with “[.]”
  6. Remove duplicates from the corpus
Once prepared, the corpus completed modeling through the dfr-browser and pyLDAvis modules on the UCSB server Jupyter notebooks.
A structured series of observation and judgment steps made in accordance with the WE1S interpretation protocol provided guidance for locating which topics are the most important in the model. Following WE1S guidelines, I went to dfr-browser’s List View and listed the topics with the most heavily weighted topics on top as in the screenshot below.
Mega topics shown in dfr-browser’s List View
Mega topics shown in dfr-browser’s List View
List View shows the top 13 topics relative to their topic weights within the corpus along with their topic words and a graph of the topic distribution over time. In this view, the “mega topics” are those with values of greater than two percent of the corpus. The mega topics have the highest proportional weights and because they consist of general topic words, they are difficult to label meaningfully. For illustrative purposes, I’ve labeled one such mega topic as “Non-noun Stop Words” since it contains mostly adjectives and adverbs: words that researchers sometimes remove by way of adding them to the stop word list file to improve model coherency.
The protocol asks the researchers to note the topics of interest where the topic words appear to have a semantic relationship. In my experience, these topics have typically been topics with a less than two percent representation of the corpus and a higher than .5 percent representation. For instance, in the graphic above, Topic 150, with a 1.6 percent representation of the corpus, has the keywords “degree,“ “job,” “degrees,” “major,” “liberal_arts,” “college,” “people,” “jobs,” “field,” “majors,” “school,” “career,” “work,” “business,” and “market.” What is noteworthy about Topic 150 are the numbers of search terms that appear as keywords within the topic such as “liberal arts,” “major,” and “college.” The presence of key search terms in a topic’s keywords suggests that this should be considered a topic worthy of further investigation. The implied theme or label of the topic might be “degrees that lead to jobs.”
Since the keywords of Topic 150 appeared coherent to me in List View, I turned to Topic View to examine the topic more closely.
Most prominent topics for Humanities and STEM shown in dfr-browser’s Topic View
Most prominent topics for Humanities and STEM shown in dfr-browser’s Topic View
The protocol asks the researcher to read the comments that contribute to the topic. After reading five of the comments that contribute to Topic 150, I concluded this is a topic of interest but the theme of the comments seemed to talk about the benefits of either STEM or humanities majors rather than degrees that lead to jobs. I therefore labeled the topic “Stem or Non-Stem Discourse.”
As Guldi states, “by thrashing the data with different tools, the digital scholar obtains insight into the bias of the tools themselves, and the variety of answers they can produce” (25). Indeed, my interpretation takes place in a back and forth manner between the dfr-browser and pyLDAvis visualizations as needed. I’m interested in the verification of pyLDAvis by the dfr-Browser and vice-versa. These tools have slightly different ways of representing topics, and comparing these representations aids the interpreter in developing semantically meaningful labels for significant topics.
In the screenshot below, I have added custom labels indicating my interpretations of the topic’s content or theme. For example Topic 121 “Political Rhetoric/Arguments” reflects general political discourse entering into the conversation of students over time. Further research into the individual documents containing this topic may or may not reveal that divisiveness in student political opinions creates a reactionary environment that accentuates stereotyping of humanities students.
To find out how students argue for and against becoming humanities majors the topics containing comments for investigation appear to be Topic 105 (“Stem vs. Non-Stem”), Topic 150 (“Stem or Non-Stem Discourse”), Topic 62 (“Follow Your Bliss”), and Topic 172 (“Humanities and Jobs”).
Topics liklye to represent student discourse
Topics likely to represent student discourse
These topics related to humanities majors together have 4.2% representation of the 200 topics which gives them a better than average overall proportion of the corpus. The documents that make up each topic of interest require sample reading of the underlying comments to verify if they help answer our goal question or not, but their labels indicate what we should expect to find.
Using pyLDAvis further helped me to locate coherent topics that consist of comments related to our inquiry and thus to eliminate much of the need for sample reading beyond the first few comments of each topic of interest (Wieringa). pyLDAvis simplifies the labeling of topics, and therefore it simplifies the process of determining where to search for the answer to our question within the corpus. Its visual interface for locating the topics of choice lets us look deep within the topics to know that the topics, and by association, the documents that represent the topic, are consistent with the theme of the label. In the model below Topic 105, located in the lower right quadrant, stands out in red. The relevance slider In the upper right is the primary tool of pyLDAvis.
Topic 105 in pyLDAvis with Relevance set to 0.6
Topic 105 in pyLDAvis with Relevance set to 0.6
By sliding the relevance value to 0.6 often the first five or so words inform the researcher with enough information for them to appropriately label a topic. In this case, the words are “stem,” majors,” “fields,” “humanities,” and “non-stem” which suggests that the comments that make up the topic contain opposing student rhetoric about humanities and stem majors.
The researcher may double-check this assumption by returning to dfr-browser’s word index page and clicking on the “humanities” link. Amongst the “Prominent Topics,” Topic 105 (“Stem vs. Non-Stem”) has the highest probability of containing the word “humanities”:
Topic 105 in pyLDAvis with Relevance set to 0.6
Prominent Topics in the Humanities
A sample reading of the Reddit comments associated with this topic, supports the interpretation based on Topic 105’s keywords (“humanities,” “stem,” “liberal arts,”, “engineering,” and “non-stem”); the discourse of the topic involves opposing points of view. Going back to pyLDAvis and sliding the relevance indicator to the far left, the two words of Topic 105 with the highest lift (term frequency) are “stem” and “non-stem.” The following diagram shows the image with the value of the relevance metric set to zero.
Topic 105 in pyLDAvis with Relevance set to 0
Topic 105 in pyLDAvis with Relevance set to 0
In this manner of back and forth “thrashing” of the models, the researcher gains assurance that Dfr-browser and pyLDAvis agree: high polarization exists within the comments of Topic 105. The documents constituting “Stem vs. Non-Stem” most likely contain sought after student and graduate rhetoric.
Worthy of note is that the list of subreddits within the top 500 documents constituting Topic 105 contains 171 different subreddits, many of which are non-academic in nature. A partial list of the names of the first 15 of 171 subreddit names that contribute heavily to Topic 105 is “6thForm,” “ABCDesis,” “academiceconomics,” “actuallesbians,” “AdviceAnimals,” “Anarchism,” “Capitalism,” “antisrs,” “ApplyingToCollege,” “asianamerican,” “AsianParentStories,” “AskAcademia,” “AskAnAmerican,” “AskEngineers,” and “AskFeminists.” These results imply that discourse about humanities and STEM majors arises out of a broad demographic base and within context across a spectrum of interests.
Other topics of interest within the list labeled thus far include Topic 62 (“Follow Your Bliss”), wherein the comments, for the most part, subscribe to the idea that passions guide students towards a field of study; Topic 150 (“Stem or Non-Stem Discourse”), wherein the comments do not argue for or against the humanities but rather tell why the students have chosen a particular path; and Topic 172 (“Humanities and Jobs”) which speaks to the issue of job prospects for humanities majors. The comments that comprise each of the topics of interest require further examination to learn how best to address student concerns about the humanities.

Conclusion

The premise matching the goal of this research blog assumes that the researcher will, after studying the rhetoric and diction for and against the humanities in the documents of a topic such as Topic 105, develop optimal insight into how best to frame an answer presentable to the public in support of the humanities. Guldi states that at the end of the “critical search is [the] actual reading of particular texts,” which in this case are individual comments classified as exemplary (29). She refers to this stage as “Guided Reading,” where the “iterative encounters with the algorithm and reading allow[s] the researcher to find documents that fit best with [the] question” (29). And, although this research continues beyond the documentation here to re-model the exemplary comments of this and many other models combined, the results have proven the usefulness of the Digital Humanities tools used to find the comments and themes of student discourse about the humanities.
This blog post contains the technical information necessary for researchers who desire to explore Reddit for answers to particular questions about human discourse. It demonstrates that the Reddit archive is a vast aggregation of the English language worthy of investigating questions that would otherwise be impossible without Digital Humanities tools. Through software such as MALLET, dfr-browser, and pyLDAvis, the study shows that algorithmically analyzing a corpus into topics, or themed genres, consisting of file sets helps to answer the research question of how students talk about the humanities. For a detailed look at the results of this study, download the top-ranked 500 comments of Topic 105 (“Stem vs. Non-Stem”) here.

Works Cited

Goldstone, Andrew. Dfr-browser. “Take a MALLET to Disciplinary History”_. 2013. 2018. _GitHubhttps://github.com/agoldst/dfr-browser.
Guldi, Jo. Critical Search: A Procedure for Guided Reading in Large-Scale Textual Corpora. Preprint, SocArXiv, 20 Dec. 2018. DOI.org (Crossref), doi:10.31235/osf.io/g286e.
“Holiday on Reddit.” Upvoted,http://redditblog.com/2018/11/13/holiday-on-reddit/. Accessed 4 Feb. 2019.
Mabey, Ben. Python Library for Interactive Topic Model Visualization. Port of the R LDAvis Package.: Bmabey/PyLDAvis. 2015. 2019. GitHubhttps://github.com/bmabey/pyLDAvis.
Pannucci, Christopher J., and Edwin G. Wilkins. “Identifying and Avoiding Bias in Research. Plastic and Reconstructive Surgery, vol. 126, no. 2, Aug. 2010, pp. 619–25. PubMed Central, doi:10.1097/PRS.0b013e3181de24bc. Accessed 4 Mar. 2019.
Pushshift.io, files.pushshift.io/reddit/comments/. Accessed 27 Feb. 2019.
“Reddit: The Front Page of the Internet.” Reddithttps://www.reddit.com/r/askReddit/wiki/index. Accessed 4 Mar. 2019.https://www.reddit.com/r/askReddit/wiki/index
Sievert, Carson, and Kenneth Shirley. “LDAvis: A Method for Visualizing and Interpreting Topics.” Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, Association for Computational Linguistics, 2014, pp. 63–70. ACLWebhttp://www.aclweb.org/anthology/W14-3110.
“TextBlob 0.15.2 Documentation.” https://textblob.readthedocs.io/en/dev/api_reference.html#textblob.blob.TextBlob.sentiment. Accessed 4 Mar. 2019.
Wieringa, Jeri E. “Using PyLDAvis with Mallet· from Data to Scholarship”. http://jeriwieringa.com/2018/07/17/pyLDAviz-and-Mallet/. Accessed 4 Mar. 2019.

Friday, August 2, 2019

A Digital Humanities Research Study


This study examines topic models for the purpose of determining the nature of a “crisis” in the humanities.” By examining the original “Default Model,” the new “Default Model” without duplicates and a Reddit “Corpus-A” model and topics related to the words “humanities,” “crisis,” “problem,” and “issues” in all of the models, and using the WE1S Qualtrics surveys the study takes note of the similarities between the models as well as attempts to understand the word crisis.

First Qualtrics Survey
Through a set of reductionist procedures, the default pyLDAvis model set of topics beneath the red conditional topic distribution bubble of topic 110, while the word crisis is hovered over, may be reduced such that all documents in each topic distill down to a single topic sentence and then to a single word.

The resulting words might be thought of as the most concise approximation of the representation of each corresponding document. Utilizing pyLDAvis, for words similar to ‘crisis’ within correlated topics we have “stress,” “sustainability,” “trump,” “poverty,” “criticized,” “economic,” “terrorism,” and we have the following correlations:
looking at T223, ‘stress,’ we have T104 related
looking at T82 ‘sustainability,’ we have T37 related
looking at T85, ‘trump,’ we have very many topics related.
looking at T65, ‘poverty,’ we have T118 and 90 and 64 most related
looking at T134, ‘criticized,’ we have T240 most related
looking at T110 ‘economic,’ we have T245 most related
looking at T163, ‘terrorism,’ we have T178 most related
Where all reductions may be further reduced to the single word “problems.” And, the words “laughter,” “Freud,” and “humanities that oppose crisis might be considered as solutions:”
looking at T90, “laughter,” we have T97 (life, love) most related
looking at T118, “humanities,” we have T34 and T90 most related
looking at T69, “Freud,” we have T146 most related
For clarity, examples of the above opposition follow: Freudian analysis resolves personality crisis; humanities discourse addresses and resolves human problems; laughter rejoices on behalf of being alive. And, many of the reviewed topic documents related to the word “humanities” are arguments on behalf of the humanities.

Conclusively, humanities discourse represents a high level of rhetoric that attempts to position the humanities as a worthy pursuit in a world of high tech and science. Rather than humanities discourse found in (university) humanities curriculum, the reading of topic documents reveals argumentation on behalf of the humanities. Positive justifications for the humanities and a sense of crisis often going hand in hand, and the rhetoric confirms some need that humanities academics have for resolving varying perspectives into a crisis in the humanities.

Using Qualtrics module 4.c to compare the words humanities with problems

For the first word, “humanities,” the topics and top documents convey friction, a state of high energy of human concern, that attempts to dissipate its state through a synthesis with the sciences and less problematic academic curriculum. Thus, a focused study of the sources of tension likely leads to better solutions. In other words, a close reading of documents within the listed associated topics-with-words (semantic relations) reveals “how crisis-related words relate to the humanities in public discourse. Proceeding with this assumption, the documents of topic 191 appear to provide a recommendation to a “crisis” in the humanities in general. The top documents of topic 191 for which I labeled “Humanities and Science,” contain the following :
Doc 1: “As Dr. Chun will suggest, now is a perfect time for scientists and humanities scholars to come together to answer the hard questions about how to solve the most pressing problems of our world.”
Doc 2: ‘ ''The Hubris of the Humanities'' (column, Dec. 6): Nicholas D. Kristof correctly argues that Americans need a better diet of science to meet the complexities of civilization. His argument rests on a division between a liberal arts education and one based on science.’
Doc 3: ‘Intellectually we seldom venture outside our comfort zones unless forced to. Humanities students fulfill their physical sciences requirement only under duress while future computer scientists disdain having to sit through one ethnic studies class.’
Doc 4: ‘People generally see philosophy as impractical, unnecessary or entirely subjective. They say philosophers ponder the meaning of life and other abstract questions but contribute nothing to ?society’
Doc 5: ‘ABSTRACT [...] cutting requirements would decrease the number of social studies teachers, and would therefore decrease the number of electives available to all students. [...] social studies is not only fascinating, it is some of the most practical knowledge you can acquire.’
Doc 6: ‘The Ohio Department of Education's over-complicated and over-detailed new standards include a 40-page "Learning Standards" for Social Studies, as well as "model curricula" for each grade K-12. Four "strands" (history, government, geography, and economics) with included "skill topics" illustrate the broad scope of Social Studies and seem like a good formula for living (if actually taught and experienced in the classroom).’
Doc 7: Of all the non-useful things people believe, have no proof of but perpetuate, I'd like to put one to rest: ‘"You can't do anything with" (fill in the blank) an English degree, a degree in philosophy or anything in the general vicinity of the humanities or social sciences. But that belief is just false, false, false - especially today.’
It follows that whenever a topic where all the documents must contain the term “humanities,” and “science” and or STEM like terms the topic will mostly contain rhetorical discourse that argues in favor of more or different humanities curriculum and/or a synthesis of both humanities and science curriculum.

Second New Default 250 Topic Model Qualtrics Survey and Reddit

Topic 9, “Problems,” became the most substantially weighted topic in the new default model. This is not a coincidence since it is the critical state of all problems that the new model through its key search term “humanities” is in conversation with. Etymologically speaking, crisis, as documented in the HTOED as “< Latin crisis, < Greek κρsσις discrimination, decision, crisis, < κρsνειν to decide), and “critical” as documented in the OED as “Critical 5. Of the nature of, or constituting, a crisis: a. Of decisive importance in relation to the issue. spec. critical path: the most important sequence of stages in an operation, determining the time needed for the whole operation; frequently attributive” manifest as either “critical” conditions or “crisis” conditions of particular problems found in the corpus. In either case, critical (related words critic, criticism and critique) and analytical thinking found throughout humanities discourse are necessary and paramount to addressing all problematic issues we as humans face.
Model analysis to answer the question “How prevalent is ‘humanities crisis discourse (discourse about the crisis in the humanities) as compared with other ways of talking about the humanities (again, within specific contexts)?’ reveals a small number of topics related to the words ‘crisis,’ and ‘critical.’ Humanities crisis discourse is little when compared to the fact that every document of the corpus is in conversation with the "humanities" search term.

A quantitatively similar observation in the Reddit corpora shows that comments confirm that crisis discourse compared with overall discourse about the humanities is small. The following procedures taken prove the above conclusion:
Using grep -o -i "\bcrisis\b" Reddit-All-Humanities-2006-2018.json | wc -l, grep -o -i "\bcrisis in the humanities\b" Reddit-All-Humanities-2006-2018.json | wc -l, and grep -o -i "\bhumanities crisis\b" Reddit-All-Humanities-2006-2018.json | wc -l
Within a total of 270,784 comments that include the search term “humanities,” “crisis” appears 1562 times, “crisis in the humanities” appears 24 times, and “humanities crisis” appears 12 times.
Using grep -o -i "\bcrisis\b" Reddit-Liberal-Arts-All-2006-2018.json | wc -l, grep -o -i "\bcrisis in the humanities\b" Reddit-Liberal-Arts-All-2006-2018.json | wc -l, and grep -o -i "\bhumanities crisis\b" Reddit-Liberal-Arts-All-2006-2018.json | wc -l
Within a total of 243,476 comments that include the search term “liberal arts,” “crisis” appears 822 times, “crisis in the humanities” appears two times, and “humanities crisis” appears one time.
Using grep -o -i "\bcrisis\b" Reddit-The-Arts-All-2006-2018.json | wc -l, grep -o -i "\bcrisis in the humanities\b" Reddit-The-Arts-All-2006-2018.json | wc -l, and grep -o -i "\bhumanities crisis\b" Reddit-The-Arts-All-2006-2018.json | wc -l
Within a total of 287,335 comments that include the search term “the arts,” “crisis” appears 1029 times, “crisis in the humanities” appears 0 times, and “humanities crisis” appears 0 times.
The word “crisis” appears a total of 3,413 times within 801,595 comments for a corresponding representation within the corpus of 0.00425776: 1 out of every 235 comments contains the word “crisis.“ Similar to topics 93 and 143 discussed above, “crisis” is again more often related to financial matters and not related to the humanities.

“Crisis” is an integral “gloss” in humanities discourse most often addressed, at lower value levels of “a crisis in the humanities,” under the guise of the words “problem” and “problems.” Since a “problem” is a condition, and “problems,” are conditions along paths where critical points are stationed to prevent a crisis or to resolve a crisis, the listing of the number of times “problem,” and “problems” appear alongside the three search terms in Reddit comments follows:
Using grep -o -i "\bproblem\b" Reddit-All-Humanitiesl-2006-2018.json | wc -l
reveals that within a total of 270,784 comments that include the search term “humanities,” “problem” appears 20,060 times, and “problems” appear 10,501 times.
Using grep -o -i "\bproblems\b" Reddit-Libera-Arts-All-2006-2018.json | wc -l

reveals that within a total of 243,476 comments that include the search term “liberal arts,” “problem” appears 14,508 times, and “problems” appear 5,980 times.
Using grep -o -i "\bproblem\b" Reddit-The-Arts-All-2006-2018.json | wc -l and grep -o -i "\bproblems\b" Reddit-Thel-Arts-All-2006-2018.json | wc -l
reveals that within a total of 287,335 comments that include the search term “the arts,” “problem” appears 14,036 times, and “problems” appear 6,578 times.
Out of 801,595 comments that include the three search terms “problem” and “problems” appear 71,663 times or 0.089400508. Therefore approximately one out of every eleven comments contains the words problem or problems. Whereas crisis discourse is one comment out of 235 comments, “problem” terms are 21.35 times as prevalent. This fact complicates the question of how prevailing crisis discourse is and “crisis discourse” depends on how close the problems discussed in the documents approach definition as crisis problems.

Despite the assumption that "humanities crisis discourse" is not that prevalent, examination of the relationship between humanities discourse and crisis discourse more broadly shows that conceptually these discourses are very intimately related in a broader intellectual sense. Humanities curriculum addresses the issues surrounding the condition of being human, of which the ultimate crisis is death. "Humanities" discourse is always critical, and, in some way in conversation with "crisis," especially when art makes a statement through performance. Like science, humanities discourse clears the forest with ever higher forms of awareness, and makes all that exists more than historically possible. Thus, humanities discourse is the act that overcomes the greatest crisis in humanities: non-existence; through critical analysis, the humanities disciplines work toward the goal of superseding the human condition. Therefore, humanities discourse is always at varying value levels of “a crisis in the humanities.” The most critical point is located at the apex of a “crisis”, and critical points exist in each step along the way to resolving a crisis. Tension first builds to a crucial point and then dissipates, breaks down, or resolves in some way. In this sense then, critical thinking enters almost every document written argumentatively as well as to rhetorically convey meaning via problem resolution. The what, how, and why of college essay writing as well as published problem resolution articles inherently contain some level of crisis.

In summary of the Qualtrics survey 4c, observation of the model with the above perspective in mind suggests that "crisis” language, especially through relationships to the word(s) “problem(s),” exists inherently as a “gloss” or ghosting of the meaning of “crisis” that enters into the four reviewed topics, T93, T134, T2, and T42.

Comparison of Two Models: Model-A Default Model 2238 with Model-B Reddit Corpus-A

Hypothesis:
By analyzing a public corpus with a student-focused Reddit corpus it may be shown that the difference and similarity between the two definitively reveals discourses that may be addressed in such a way as to increase interest in becoming a humanities major.
The default 250 topic model referred to here as Model-A 2238 “consists of documents from all U.S. sources in the WE1S corpus (as of the beginning of July 2019) found by searching on "humanites," minus materials from Reddit.” (Lindsay)

The Reddit topic comments in the Corpus-A model, referred to as Model-B, contain the search words 'humanities,' liberal arts,' 'the arts.' "student," "major," and "college" and therefore are limited to comments about Student Life in particular.

This analysis expects to find where the greatest tension exists in student discourse about the humanities and where the greatest tension exists in public discourse about the humanities. And, further to determine how the discourse is the same or different.

The following is assumed to be true.

If aspects of student perception about the humanities become known, then that perception might be modified via appropriate advocacy. Identification by young adults with groups is necessary for most during the period of psychological development when individual identity forms. Students are at an age where they need to identify with their major because it constitutes their developing persona. If both models show tension exists between the humanities and science fields that represents an identification issue for students, it should be looked at across all models to see how the identification issues in the context of the documents relate to each other. The results of the document analysis will establish definitive elements for advocacy.

Exploring both models together helps to answer the research question of whether or not topics reveal high relative tension (possibly reaching a crisis) between the humanities and other topics.

In Model-A 2238 T9, top MegaTopic top words are: need, problem, system, issue, policy, time, important, believe, problems, question, fact, process, change, and issues

In Model-B Reddit T191, top MegaTopic top words are: people, wrong, argument, agree, true, problem, opinion, humanities, reason, simply, understand, read, argue, issue, view, and evidence

Comparison of the two MegaTopics shows that Model-A terms relate to larger situations than the smaller more individual focus of Model-B Reddit comments. The reasoning behind this conclusion follows:

System, issues, policy, and process typically relate to large issues. The words wrong, argument, true, opinion, reason, evidence, read, understand, simply, issue, view, and problem are words that might be used in smaller individual contexts.

In Reddit comments, only “humanities” and “people” seem to be words involving larger contexts. When compared with “system,” “issues,” “policy,” and “process,” the words nvolving larger contexts in Model-B are “Human-related,” and the Model-A terms are “System-related.”

Although this is the case, the words “need,” “problem,” “time,” “believe,” “fact” and “change” may be typical of either system-related issues or more human-related issues.

The words found in both models’ MegaTopics are “problem” and “issue.” Depending on the degree, these two circumstances may be crisis problems or crisis issues.

This finding relates to my prior finding that depending on degree, problems may be crisis problems or minor problems.

Looking at the topics and their relationship with the prominent topics for the words “problem,” “problems,” “issue,” and “issues” we have:

Five prominent topics in Model-A 2238 show “problem” as a top word.
Twelve prominent topics in Model-B Reddit show “problem” as a top word.

A cursory glance at the prominent topics in Model-A 2238 for the word problem suggests labeling of the topics as follows:
T9 the need problem
T176 the drug problem
T144 the people problem
T164 the fact argument problem
T242 the sciences problem

A cursory glance at the prominent topics in Model-B Reddit Corpus-A for the word “problem” suggests labeling of the topics as follows:
T81 solving types of a problem
T123 black lives matter problem
T191 problem arguments
T5 the general problem
T176 the people problem
T28 the education problem
T106 the understanding and learning problem
T187 the job problem
T152 the math problem
T45 the student university problem
T148 the student debt problem
T172 the humanities and jobs problem

This implies that students speak about the individual “problem” more often than published articles do. And, topics where the word problem is prominent shows a higher focus on the types of individual student problems than the default model which shows more societal types of problems.

Following the same rough analysis of the prominent topics for the word problems a cursory glance at the prominent topics in Model-A 2238 for the word “problem” might suggest labeling the topics as follows:
T9 the need problems
T219 health problems
T2 student problems
T106 official problems
T242 science problems
T66 public health problems

And a cursory glance at the prominent topics in Model-B Reddit Corpus-A for the word “problems” might suggest labeling the topics as follows:
T81 solving problems
T78 mental health problems
T93 social problems
T152 math problems
T77 philosophical problems
T47 female problems
T5 general problems

Thus, despite being a smaller corpus with a smaller number of topics (200 vs. 250 in Model-A) Model-B Reddit Corpus-A tends to have a greater focus on the words problem and problems.

While the other shared MegaTopic word “issue” appears six times in prominent topics’ top topic words in both models, “Issues” appears sixteen times in Model-A and seven times in Model-B.
Published articles in Model-A 2238 speak to the following issues:
T234 symposium issues
T61 social issues
T16 student administration issues
T41 election/political issues
T9 needs issues
T5 humanities presentation issues
T73 community support issues
T94 international issues
T207 female issues
T55 presidential issues
T81 legal justice issues
T183 gender issues
T32 inaudible issues
T168 humanities publication issues
T66 health issues
T162 political issues


And, students in Model-B Reddit Corpus-A speak about the following topic issues:
T78 mental health issues
T47 women issues
T81 people responsibility issues
T5 general issues
T121 political argumentative issues
T45 academic issues
T98 varying majors’ studying issues

The number of topics that published articles contain the word “issues” confirms that published articles speak to “issues” (larger themes/contexts) more often than students speak about them.
Published articles have a broad audience and the commenters on Reddit tend to have a focused personalized “scope” to their themes. Their individuality seeks recognition in Model-B, Reddit subreddits, whereas a business publication and the author of an article seek recognition in Model-A; Therefore it makes sense that Model-A should contain more topics with the top word “issues.”

Consideration of the above indicates the divergence between the two models based on the greater number of topics related to the word “issues” in Model-A and the greater number of word “problem” in Model-B. But, why?

From the OED we have:
1. problems in problem, n. View full entry a1382
...A difficult or demanding question; (now, more usually) a matter or situation regarded as unwelcome, harmful, or wrong and needing to be overcome; a difficulty

2. problems in † problem, v. View full entry 1645
...intransitive. To dispute or discuss an academic or scholastic question.

And for the word issue we have the following:
I. The action of going, flowing, or coming out; the means by which, or place where, this occurs.
1. The action of going or flowing out; the opportunity to flow or go out; exit; release; outflow; an instance of this. Also: (a quantity of) something which flows or comes out in this way.
a. With reference to physical movement, as by water, air, people, etc.
II . . . III (many various definitions for issue)
IV. A point of contention or significance.
a. Law. The point in question or dispute in a court action at the conclusion of the statements of case by the contending parties, when one side affirms and the other denies.
issue of fact n. an issue depending on or relating to the facts of a case.
issue of law n. an issue depending on or relating to the application or interpretation of the law.
general issue: see general adj. and n. Special uses 2; special issue: see special adj., adv., and n.

The diverse definitions of “issues” alone may be the reason why “issues” appear in the default model 2238 Model-A articles more than the singular word “issue.” In other words, it may not be a fact that “issues” represents a larger quantity, but instead, it represents a broader usage among publishers of its various definitions when compared with the way that Reddit commenters use the word “issues.”

Using Topic Bubbles for Reddit Model Corpus-A
The two comparison words humanities+problem reveal bubbles T45 and T191. Topic 45 is specific to student life in the university as determined by the words “students,” “university,” “study,” “programs,” “pressures,” “issue,” “diversity,” “change,” and “support,” while T191 contains words such as “people,” “opinion,” “argument,” “wrong,” “facts,” “reason,” “agree,” “disagree,” “true,” “correct,” and “discussion.” Both topics appear to be highly coherent. T45 might be labeled “Student Life,” while T191, although a MegaTopic might be labeled “Debate Words.”
Looking at the top documents of Topic 45 in Dfr-Browser we have:

Doc 1 "The university created a Career Services Diversity Fund Committee that allocates funding to campus events related to diversity and inclusion . . . CLA hired a multicultural advisor.\n\n* School hires a bunch of administrators and deans to implement absurd policies.\n* Student fees and tuition go up\n* School talks about how expensive college is and how it \"disadvantages poor students\",\n* School begs for money from alum because \"tuition and state funding does not fully cover the costs of the 'important' programs we offer\""
Doc 2 "peaking from experience within the Faculty of Arts.\n\nThe funding problem McGill is facing now, following student strikes over 2011-2012, are complex. When the provincial government froze funding to the university, and then slashed it following a change of parties in office, the university was forced to freeze certain funds"
Doc 3 ""Waikato student who left a few semesters back. This change might seem insignificant but it[.] quite a blow to the Maori and Indigenous studies department, it[.] also a situation where the straw is breaking the camel[.] back. \n\nCurrently, the Maori and Indigenous studies department has their own management structure, culture, and support network for students that they've built from the ground up. This shift will restructure all the staff, management team, and destroy what is effectively a space specifically for Maori education on campus."

In the three above top comments of Topic 45, depending on the perspective taken, each could be noted as crisis situations.

The top documents of MegaTopic T191 follow:
Doc 1 "calling names is not an \"ad hominem argument.\" If you're acting stupid and I call you stupid because your argument is stupid, I'm not basing my argument on the fact that you're stupid, I'm noting that you're stupid as a result of your stupid argument. However, since you have many times brought up things that you think are true about me personally"
Doc 2 "Condemning someone for an arbitrary reason is therefore morally wrong in every sense."
Doc 3 "Feminists dont agree on what feminism is. And unfortunately, thats why i think feminism has gotten such a negative rep among many, because the same unreasonable third wave 'feminists' are grouped in with the more reasonable moderates."
Doc 4 "See, research in the fields of Humanities is not the same as that in Science. \n\nI know. I studied in a college that is renowned for the Humanities."

The few documents examined in topics that contain both humanities and problems suggest a division between two categories. First, T45 includes kinds of crises related to the university and students, and second, T191 contains a wide range of people problems, including student problems, but the top words might be defined as a debate vocabulary that is associated with the words humanities and problems.


The words in topic bubbles T45 are: “students,” “universities,” “issues,” “funding,” “resources,” “problem,” “hiring,” “diversity,” “issue,” “culture,” “pressure,” “enrollment,” “sciences,” “humanities” . . .

And in T191 the words are: “people,” “wrong,” “argument,” “true,” “facts,” “question,” “makes clear,” “opinion,” “issue,” “evidence,” “reality,” “disagree,” “position,” “debade,” “absolutely” . . .

From the above observation economic concerns such as “funding,” “resources,” and “hiring” might be considered sources of economic crises. Students have a lot of crisis-related issues, and some words in T45 such as “humanities” and “issue” that are present in the debate words of T191 may indicate a crisis.

A diachronic perspective into crisis related words.
Of note the Google ngram viewer shows the two words “problem,” beginning to decline in 1980, and “issue,” beginning to decline in 1998 through 2008.


But, Topic 9 in the Default Model 2238, and Topic 191 in the Reddit Corpus-A show a relative increase. T9 begins to increase in 2004, and T191 begins to increase in 2014.


From these conditional probability charts and the Google Ngram Viewer charts the question arises as to why the conditional probability T9, including its top words are increasing in published articles while they are declining in books? Part of the reason for the trend higher of both topics T9 and T191 is that the contexts that give the words their meanings have changed. Although it is beyond the scope of this report to determine the reason for the decline in the words “problem” and “issue” in books, the above graphs seem worthy of note since most today would conclude that both “problems” and “issues” are on the rise. Thus, the conclusion that “problems” and “issues” are on the rise as shown in the Default Model 2238 and the Reddit Model Corpus-A correlates with a general perception that that is indeed the case.
We could say then that as the perception of crisis related topics T9 and T191 increases, talk of a crisis in the humanities is not “outside” the context of a general increase in the perception of crisis. In other words, it is less difficult to talk of a crisis in the humanities today than it would have been a number of years back since the prevalence of the debate words, of topics T9 and T191, which are on the rise, may easily be called into conversation with the word “crisis.”
Speaking of “crisis” in general and “a crisis in the humanities” in particular may be more acceptable and therefore more welcome to the ears of listeners today as anxiety rises. Physiologically, an anxious condition has an emotional connection to debate words whenever an individual’s psychology reaches a state of crisis: from being torn between “pressures” (a T191 word) and “stresses.” Unlike words such as “problem,” “crisis,” “pressure,” and “stress” the word “anxiety” which may lead to individual crises is on the rise in the Google Ngram Viewer.


Further analysis of the “anxious condition” may be correlated to an emotional response that carries meaning in connection with the debate words of T9 and the word “pressure” of T191. The rise of stress that leads to crises may be confirmed by finding evidential psychiatric research studies of individuals over time as well as proved biologically by locating evidential research studies of the levels of cortisol in the general population over time.
If one speaks about crisis today then the top topic words, the “debate words,” of T9 have more referents supplying meaning to and whenever the word “crisis” used today as in “a crisis in the humanities” calls into focus the meanings of the debate words of T9 and the words of T191. The perception of “crisis” is at some level understood when the phrase “a crisis in the humanities” is read, even though the reader hasn’t a notion of what it means to “a crisis in the humanities” researcher. And, further, “a crisis in the humanities” researcher never fully understands what the phrase means because the phrase always invokes a flux of understandings justified according to the needs of various perspectives taken. This implies that perspectives into “a crisis in the humanities” may effectively be rhetorical according to knowledge gained through ongoing analyses. And, further, the more that the phrase “a crisis in the humanities” appears in publication, the greater the audience it speaks to.

Comparing the two models top topics that contain the word “humanities” we find that both models’ top topics may be labeled Humanities and Sciences.
This is a very noteworthy finding in that if there is a crisis that arises out of the tension between the humanities and the sciences then a level of tension reveals itself as a common theme whether the corpus is comprised of individual student comments, or comprised of public documents. The evidence for the above observation follows:

Model-A 2238 Topic 25 (a topic that could be labeled Humanities and Sciences)
Doc 1 “there is no guarantee that an engineering or tech diploma means you are set for life."(“Did You Choose the Right Major?,” Mic, 2014-03-24)
Doc 2 “College graduates who received engineering degrees this year were offered salaries averaging about $25,000, again getting the highest offers of any group of graduates” (“Degrees in Engineering Bring Top Offers for ’81 Graduates,” The New York Times, August 04, 1981)
Doc 3 “college graduates with bachelor's degrees in the arts, humanities, and architecture experienced significantly higher rates of joblessness” (“It’s a matter of degrees - some are less useful.” The Virginian-Pilot, January 07, 2012)
Doc 4 “A new study supports the value of a liberal arts education over time, countering some of the claims that science, math, engineering, and technology are the most lucrative career paths” (“Study Highlights Value Of Liberal Arts Major,” Education Week, January 29, 2014)
Doc 5 “That is why so many employers who want to hire thinkers and problem-solvers look no further than humanities students.” (“College teaches one to think,” The Washington Post, September 03, 2014)

Model-B Reddit Topic 105 Stem vs. Non-Stem
Doc 1 “never EVER think that a liberal_arts major is \"inferior\" because they don't do math until 3 am.”
Doc 2 “Physics student, so can chime in here. STEM students feel that their major is harder and more rigorous than those who major in the liberal_arts, particularly since STEM fields are very math heavy.”
Doc 3 “It[.] because of the stigma in society, because the way such subjects are advertised in society -- they make it seem like math and science are difficult subjects that take real skill to do, and anything else can be done by everyone.”
Doc 4 "Well here[.] the thing. It[.] harder to FAIL at humanities, social_sciences, etc. So it[.] easier to coast, and the students who don't want to do much work are going to gravitate to those fields.”
Doc 5 “Its the ridiculous perception that these people are somehow *better* than non-STEM majors, and that their ability to use \"logic\" makes them superior to \"emotional\" women.”

The comparison of these topics reveals two different catalysts of tension between the humanities and science majors. In Model-A “humanities” T25 the concern is the economic viability of humanities majors vs. sciences majors. In Model-B “humanities” T105 the concern is the difficulty of humanities majors curriculum when compared to STEM majors curriculum and how the issue of stereotype plays a part in the way that students feel about their identity; the comments show that the students are concerned about how they are perceived and they are concerned about the economic viability of their major.

So, the comparison implies that published articles biggest concerns about the humanities and the sciences are economic viability of the majors and the Reddit commenters biggest concern is how students perceive themselves in relationship to their major. Therefore “if” a crisis exists in the humanities then it likely exists whenever the tension rises between the two “fields” of disciplines either in economic terms or when the tension rises due to negative stereotyping of student identity. Additionally, external pressures and stresses contribute to a general sense of a state of crisis, and may contribute in one way or another to a student’s choice of major (e.g., “I’m so stressed out that I’m just going to go along with my parents.”).

The correspondences between topics and keywords were surprising in the above finding that two topics of two different models confirm the highest tension in the models exists between the humanities and sciences; the distribution of the theme, (the humanities/sciences debate) is homogenous across models. The disparity between the two models are the different audiences that the documents address. The Reddit commenters address others within a topic thread of comments in a subreddit, and the published articles address specific public audiences related to their papers’ distribution.

Conclusions from the comparison between the two models follow:
The highest tension that might be termed a crisis is related to student identification with a particular major. And, the greatest external pressure on how well or poorly a student feels about their major is the economic viability of their major. Further, since the crisis is located in the discourse between the humanities and the sciences, in both published articles and within Reddit comments, I believe the same will be found in a comparison between all media forms. In other words, it will be possible to label a topic “Humanities and Sciences” in all of the existing topic models. This is due to the homogenous distribution of the “Humanities and Sciences” theme across all spectrums of corpora that contain the search term “humanities.” And, further, this assumption implies that the humanities define the sciences and Vice a Versa. Neither can exist without the other and therefore a crisis in the humanities, at some level, always was and always will be a crisis between the humanities and the sciences.

A Digital Humanities Study of Reddit Student Discourse about the Humanities

A Digital Humanities Study of Reddit Student Discourse about the Humanities by Raymond Steding Published August 1, 2019 POSTED ...