Friday, December 14, 2018

Digital Humanities Final Paper 2018

13th Dec 2018
Reddit
By Ray Steding

The 525DH WE1S team began by aligning itself with the project goals of the UCSB Students and the Humanities Team as set out by Abigail Droge. Their “Team Overview” states: “The goal of this team is to understand the interactions between US college students and the humanities. We will approach the question from multiple perspectives, including those of educators, students, and institutions.” The focus of the project is to contextualize the discourse that students have about the humanities. Abigail’s project includes more than this as to methodology and hypothesis. With the understanding of her research goals and with the communications between the UCSB "Students and the Humanities Team" and the "525DH WE1S" team through the course of the semester, our team set out to gather meaningful data to supplement and to model.

Joyce Brummet became our WE1S project leader during one of our in-class meetings in which we decided to separate into two groups of data inquiry: The Twitter team and the Reddit team. Since I have experience with gathering data I assisted when called on. I focused on how to obtain, process, and interpret the Reddit data.

Initially, I searched out and found a large Reddit file from doing Internet searches, but it turned out not to contain the Reddit comments of interest. I continued down a Google thread to find that Google offered a product known as Google BigQuery and they give anyone a $300 free credit line to begin doing searches on what they have available. Through BigQuery I obtained all of the available, and for my purposes, usable Reddit comments from all of the community colleges, the California State colleges, the UC system colleges, and the ivy league colleges. Additionally, I downloaded subReddit comments from the askReddit, education, explainlikeimfive, politics, soccer, applyingtocollege, and LeagueOfLegends.

With that data available to me I began to write the software to convert it into JSON files that would be accepted by the UCSB “harbor” server notebooks Finally, I produced the Reddit topic models. In recap, this is the process that I used:
  • Query Google’s database and download the results as source files in JSON format
  • Grep the source files for the search terms humanities, liberal arts, the arts, and STEM.
  • Take the resultant files and process them through python code such that the files conformed to the WE1S Workflow Management System.
  • Upload the files to a harbor server data directory as zipped files.
  • Run the jupyter notebooks on the server to process the files into topic models aligned with the goals of the project.
The Google BigQuery SQL command that I used follows:
  1. SELECT author, body
  2. FROM fh-bigquery.reddit_comments.2018*
  3. WHERE subreddit = 'UCLA'
  4. AND REGEXP_CONTAINS(body, r'(?i)\bhumanities\b|\bliberal arts\b|\bthe arts\b')
  5. LIMIT 16000
The resultant JSON files from queries such as the above were lists of all the comments in the acquired subReddits which contained the search terms humanities, liberal arts, the arts, and STEM.

To get all of the BigQuery files into a format that the UCSB jupyter notebooks accepted I had to write python code. I wrote the programs as separate cells in jupyter notebooks that I ran locally. The Jupyter notebooks created are located at the end of this post.

After processing the files into their acceptable WE1S format I zipped them and uploaded them into the /data/data-scraped/CSUN-Reddit-Data subfolder on the UCSB server here. I often altered the code depending on what needed to be accomplished. For instance, I added the textblob sentiment values to the bibliography of the DFR Browser by including the value in the schema title field.

All of the topic models produced are in a Google spreadsheet here.

The topic models within the spreadsheet have preliminary notes with a brief description of what I hoped to accomplish by producing the model. All of the models presented different themes (topics) that ran through the corpora included in the file list of the import topic model data jupyter notebook 1. I always looked at the first five topics in the models and read through the first five comments in each of the five topics. I also looked at the topics that had the highest representation within the model.

While making the first of the topic models with BigQuery subReddit data, a torrent file that I had requested about a month earlier for all of the 2017 Reddit comments automatically downloaded onto my computer. The new 2017 data files, some 122 GB of comments in three JSON list formatted files, presented me with the opportunity of modeling an entire year's worth of Reddit comments. I could analyze what every 2017 Reddit commenter says about the humanities. So, I did, but I first had to again get the data from the source files into an acceptable format for the UCSB jupyter notebooks. The two programs that I wrote were slight adaptations from the first code.

The coding of the last two programs shows improvement from the first two processing programs and may be compared with the first two programs at the end of this post. The longer that I used Google searches and StackOverFlow the easier it became to get the code to do what I wanted and needed it to do. Working with files as lists of elements reminded me of working with arrays in other languages, but it is not an intuitive process and many of the examples on StackOverFlow and other sites did not include a solution to what I looked for.

After clearing up the coding issues I then had to deal with a pesky problem on the UCSB notebooks. I sometimes received the following error:

Error in load_from_mallet_state(mallet_state_file = paste0(modelDir, "/", \: hyperparameter notation missing. Is this really a file produced by mallet train-topics --output-state?

I still don’t know whether the problem is within one of the many thousands of source files or within notebook 4 make topic model cell 7 topics. Sometimes the files would go through and other times the same files would not. Sometimes if I created a whole new project it worked and other times not. I always ran the three jupyter notebooks in sequence, but it didn’t help.

Despite the difficulties, the research models gave me insight on many topic model questions such as does a balanced model comprised of equal file sizes from different subReddits result in a more homogeneous mix of topic files present within a topic? And, does representativeness become more apparent when both file size and distribution across subReddits comprises a corpus? The answers to both questions are yes. Another question that arose out of looking at the topic models was why do comments from specific subReddits align as the majority of files under specific topics? The files representing a topic often were all from one subReddit. This tendency became greatly exaggerated in the models wherein the corpora are from the University Wire and subReddits. Files tended to constellate according to whether they were University Wire or subReddits even when their file sizes equaled each other. In any event, it also became apparent that although different search term articles may be mixed within a topic model, different genres of corpora may not. Thus, Twitter, for the most part, should be its own genre and not mixed with University Wire or subReddits; genres should be analyzed independently of each other.

Persevering past the point of doubt

Although I’m still in the early stages of working with the models, the models have given me first-hand experience with the terrain of discourse used by different sectors of the commenters. My interest is in the discourse of two opposing groups of students: humanities students and STEM students. First, I want to know their reasons for taking their majors and for postgraduate students I want to know how they feel about the results as they seek employment. Secondly, I want to know the sentiment and subjectivity values to help analyze the feeling tone of student discourse. A noticeable type of discourse seems almost unique to each subReddit and this may be part of the reason specific subReddits align under specific topics. Additionally, moderators keep the commenters in line. Since I’m most interested in why students believe few job opportunities exist for humanities majors I became excited when I found a topic that included mostly comments about a crisis in STEM from a surplus of STEM field workers.

This led me to the last two models created this semester. Topic model M and topic model N have corpora comprised of data with the humanities search terms and the word crisis and the search term STEM and the word crisis. What becomes immediately apparent when looking at these models is that the comments are unusually long comments. It seems like the word crisis comes with an earnest explanation. Actually, two dissimilar words are less likely to appear in smaller documents than they are within a larger context. I’ve read many of the crisis comments but am not through with them, and so I don’t have any conclusions at this point.

When I began work on the What Every 1 Says project during Summer Camp I thought about what I would like to topic model. What first came to mind was the crises that lead one to make difficult decisions in life such that they become more than what they are at the time. Something like those moments of indecision where there are no good answers except to proceed in one doubtful way or another. On both an individual level as well as a collective level crisis leads to something greater. I also liked the idea since I’m a runner of somehow modeling motivation that pushes one past the point of absolute doubt on to championship in whatever field or discipline. So to me, the question of a crisis in the humanities defines the conditions representative of the barrier of doubt that must be overcome. What is present in the mind of a researcher, engineer, scholar, professional of any field that gives them the perseverance to push past the doubt of their hypothesis? What is the discourse of the students as they push past their parent's ideals? I presume in the context of students that the barrier I’ve been reading about is the belief that they will have no future should they become humanities majors. As one commenter bluntly states: "here's my story I graduated college almost years ago with a useless degree in humanities it took me more than a year to find a full-time job that paid less than dollar k prior to finding a full-time job." But, maybe that condition is a subset within the larger meaning of crisis--the crisis of being human; the human condition; the human condition that perseverance changes.

Researching two different forms of a crisis in the humanities

This is my personal narrative from a few days ago that relates to choosing what to research.

I woke up from having a dream on how to add more material to my topic model analysis. I remembered from the dream a looping of information from the Liberal Arts data files back into the topic model source material. I watched as each iteration changed the model. It must have been from what I was reading on my screen not long before I went to bed. To be clear, when I was awake before going to bed that night I talked with my roommate about a personal interest that I had about topic modeling research that might include analyzing the way people faced the difficult choices in life; the choices that caused them to become more than who they are: the fulfillment of their potentialities. Well, on the screen when I woke up, by coincidence, was the following Reddit comment which is a short story:

i ignored the doorbell as it rang for the seventeenth time in the past four minutes i had work to do the study of neuroparasitology was not to be halted for girl scout
cookies i used amazon like a normal person besides i was pretty sure i was close to discovering a creature at least partially responsible for aberrant behavior in dogs
then i heard my door open the telephone rang again but i left it i hadn`t slept in hours and whoever kept calling could wait until my work was complete a moment later
there was a cough from behind me i waved my hand absently it was probably gary with some insignificant question about how a section of the brain functioned during
mating and how that would affect a mite hidden nearby honestly i had no idea how he got his phd i finished making my notes and turned around i had finished my pre
lecture sigh during the chair spin and was preparing the usual diatribe when i noticed three large men in suits were there instead of my idiot colleague startled but not
unduly alarmed i wondered if they were investors sir youre going to have to come with us the higher ups want to talk to you i was now quite concerned that the private
company funding my important research was going to pull funds or had made a deal with some mafia types ridiculous but not impossible i frowned at him but nodded
assent as i went to put on some pants the benefits of working from home meant i usually didnt have to worry about such stupid social delicacies as pants but id make
an exception to keep my job i followed the bulky men noticing guns and an alertness that spoke to their quality i noticed that my door was still in tact as we left the
building and that there was a couple of guards waiting outside who locked up after me i was led from my small house to one of four large black suvs on the street i
spent the next thirty minutes thinking about where my quarry might be hiding it clearly wasnt a stem parasite but it affected behaviors influenced by both lobes i had
been confident where id been looking but it hadnt been there i made some notes on my lab coat notebook eventually i arrived at a large building it was a government
building of some kind it wasnt clear what it was for but it had all the look of petty bureaucracy it looked like my afternoon was going to be wasted on paperwork and
financial inquiries i was lead to an office on the top floor not a good sign and everyone in the building seemed to be straining to get a look at me i was glad id
remembered to put on pants sometimes i forgot and everyone got in a tizzy i was motioned into a room and saw a large table and an empty chair close to me on the
other side was a line of people seated comfortably with an extensive spread of documents in front of them i sat down and looked at the documents waiting for them to
finish their arbitrary introductions this didnt look to be about neuroparasitology at all i caught the last couple of names senators of some such we brought you here
because after an extensive and exhaustive search we have uncovered that you alphonse anderson are the most intelligent person on the planet we have arranged for
you solve all the big questions posed to our species about time you idiots have been bumbling about these questions for years all the infrastructure is there and no
one gets how to use it its all the bloody paperwork a man from the corner stepped out and placed stacks of folders down in front of me each listed with a major crisis
each of these folders tells you the resources at your disposal and the problem that needs to be solved she droned on some more but i wasnt listening i scanned the
first folder homelessness and tossed it towards them put the homeless in a warehouse and have them work on assembling care packages to send abroad i continued
food is easy its all there it just has to rounded up and distributed instead of being left to rot reduce the best before dates and ship them to impoverished nations for
single day items like hamburgers or whatever give them to local food banks and shelters after closing i went through the files easily putting matters to rest after i was
halfway through the stack i came to one that gave me some trouble the meaning of the universe i asked the stunned board in front of me well yes one of them
stammered back at me no scientific solution ask some philosopher about your drivel but its the most important question there one of the women shouted she was
minister of something or another probably something in the arts with an attitude like that all the people ive saved will have plenty of time to ponder that once they stop
dying to make a determination on that id need more information than humanity has at its disposal if you want my personal opinion i would say there is no purpose to
the universe no divine entity and no driving unseen force its a story of star dust biology and luck no sooner had i finished my sentence than everyone in the room
began to scream i sat back perplexed as they wailed with a kind of agony that i hadnt even imagined i could see fires start in the building across the street cars
suddenly swerve into each other in the roads below id seen the microphone from the start of course but i hadnt been aware that the world was listening live over the
next few weeks death tolls mounted my policies to reduce the suffering of mankind more for efficiencys sake than sympathy had ended suddenly when everyone
despaired in the face of a cruel reality it was only a matter of time until everything broke down still i was pretty happy with the results id been allowed to go back to my
research after all.
The story has elements of looping within it. The microphone left on broadcasts out to the masses. The protagonist likely became infected by the creepy parasites that control their hosts. The parasites which he's become are the subject of the research. It is still open in another tab in my browser and is a story written within the “The Arts” search term result from grep searching the 2017 Reddit files. It is an exhibit so to speak, a real-life example, of the kind of research that I was talking with my roommate about. I did not read the whole story closely until I awoke. It’s an example of the “looping” of material back into the topic model results. In my case, it combined with my discussion with my roommate, formed into a dream, and resulted in a realization, but a realization of what I hoped to find with topic modeling at the beginning of Summer Camp. It has a fuzzy logic about it--it’s kind of spooky. The reason is, is that the last line seems to explain what I’d been telling my roommate; what I’m interested in researching: “suddenly when everyone despaired in the face of a cruel reality it was only a matter of time until everything broke down still I was pretty happy with the results id been allowed to go back to my research” (anonymous, 2017, Reddit comment): the reality that breaks down and leads one to persevere and overcome is what I'm happy to research.

In conclusion, what I thought a crisis in the humanities meant to me as opposed to what a crisis in the humanities means to the WE1S project, materialized as a legitimate path of research distinct from the project goals of WE1S, but yet not so different in topic modeling procedure. My path forward like the protagonist in the story above says, “I’d been allowed to go back to my research” is what in one way symbolizes a resolution of a crisis in the humanities. The fictional story represents the efforts of a struggling author as he or she confronts and overcomes their personal crisis in the humanities. On the way to becoming more than a fantasy about potentialities, they are writing it out and affecting the world with their words.



The Following two python cells process the Google BigQuery JSON files.

2018-11-21-JSON-Metadata-searchterm-sentiment is located on the 525 DH WE1S Google Drive here.


'''2018-11-21-JSON-Metadata-searchterm-sentiment.pynb Author raysteding@gmail.com
This file takes the .json Reddit list files from Google BitQuery and cleans them. Many things such as the
exclusion of the comments with STEM in them could have been handled better, but it was a fun learning experience.
'''
import json
import re
import pprint
import os
import glob
import codecs
# set the path to the .json list files that you wish to process into .txt files
# why do this when they need to be .json? It is an option that I chose. You can change it
# The code assumes that only the .json source files exist in the source directory although subdirectories are ok.
path = r'C:\Users\rayst\Documents\525-DH\software\dev\Lexos-cleaned\selected_documents'
#set the counters to zero
count = 0
cnt = 0
cnt2 = 0
# Process every .json source file from Google's BigQuery that is in the above source directory
for filename in glob.glob(os.path.join(path, '*.json')):
filename2 = filename
filepath2 = filename2.replace(".json", "-")
fp = codecs.open(filename, 'r', 'UTF-8')
filepath2 = filepath2 + str(cnt) + '.txt'
outfile = open(filepath2, 'w', encoding='UTF-8')
cnt += 1
cnt2 = 0
# next omit the comments about stem cells because they most likely refer to biology
# rather than related to STEM vs. humanities
for line in fp:
cnt2 += 1
word = line
word2 = line
word3 = line
word4 = line
result = word.find('stem cells')
result2 = word2.find('Stem cells')
result3 = word3.find('stem cell')
result4 = word4.find('Stem cell')
if (word.find('stem cells') != -1) or (word2.find('Stem cells') != -1) \
or (word3.find('stem cell') != -1) or (word4.find('Stem cell') != -1):
line = fp.readline()
# If the comment isn't about biology then process the comment. Note that each line is an entire comment.
else:
# Get rid of the username of the comment because it is not needed
first,second = line.split('\"body\":')
line = second
# Clean the line. Here you can add anything that you wish to get rid of or alter.
line = line.replace('|',' ')
line = line.replace('\"',' ') ; line = line.replace('(', ' ')
line = line.replace(')',' ') ; line = line.replace('}',' ')
line = line.replace('{',' ') ; line = line.replace('/n/n',' ')
line = line.replace('/n',' ') ; line = line.replace('{', ' ')
line = line.replace('\\',' ') ; line = line.replace('\n\n',' ')
line = line.replace(' n n',' ') ; line = line.replace(' n ',' ')
line = line.replace('[',' ') ; line = line.replace(']',' ')
line = line.replace('*',' ') ; line = line.replace('&',' ')
line = line.replace('/',' ') ; line = line.replace('\\',' ')
line = line.replace('\'','') ; line = line.replace(',',' ')
line = line.replace('.',' ') ; line = line.replace(':',' ')
line = line.replace('0',' ') ; line = line.replace('1',' ')
line = line.replace('2',' ') ; line = line.replace('3',' ')
line = line.replace('4',' ') ; line = line.replace('5',' ')
line = line.replace('6',' ') ; line = line.replace('7',' ')
line = line.replace('8',' ') ; line = line.replace('9',' ')
line = line.replace('-',' ') ; line = line.replace('$','dollar ')
line = line.replace('%','percent ') ; line = line.replace(';',' ')
line = line.replace('_',' ') ; line = line.replace('_____',' ')
line = line.replace('?',' ') ; line = line.replace('!',' ')
line = line.replace('&',' ') ; line = line.replace('',' ') ; line = line.replace('tp',' ')
line = line.replace('deleted','') ; line = line.replace('removed','')
line = line.replace('ufffd','') ; line = re.sub('\s+', ' ', line)
# Change all the text to lower case
line = line.lower()
# This newline is needed to enable the search terms to be searched for and written out to a seperate file
line = line + '\n'
# print(line)
outfile.write(line)
# print (count)
count += 1
if count == 50:
outfile.close()
filepath2 = filename2.replace(".json", "-")
filename3 = filepath2 + str(cnt2) + '.txt'
outfile = open(filename3, 'w', encoding='UTF-8')
count = 0
cnt2 += 1

line = fp.readline()
outfile.close()
fp.close()
#filepath2.close()
#It might be a good idea to convert "&" tp "&", ">" to ">", "

'''This file takes the output txt files from the above program, and seperates comments out into four files
each with one of the key search terms as a condition for being written.
The file pattern is search term, sentiment value, filename.
'''
from textblob import TextBlob
import json
import re
import pprint
import os
import glob
import codecs
from textblob import TextBlob

text = ''''''

blob = TextBlob(text)
blob.tags # [('The', 'DT'), ('titular', 'JJ'),
# ('threat', 'NN'), ('of', 'IN'), ...]

blob.noun_phrases # WordList(['titular threat', 'blob',
# 'ultimate movie monster',
# 'amoeba-like mass', ...])

#for sentence in blob.sentences:
# print(sentence.sentiment.polarity)
# 0.060
# -0.341

#blob.translate(to="es") # 'La amenaza titular de The Blob...'

# set the path to the .json list files that you wish to process into .txt files
# why do this when they need to be .json? It is an option that I chose. You can change it
# The code assumes that only the .json source files exist in the source directory although subdirectories are ok.
path = r'C:\Users\rayst\Documents\525-DH\software\dev\Lexos-cleaned\selected_documents'
#set the counters to zero
count = 0
cnt = 0
cnt2 = 0
stem_cnt = 0
humanities_cnt = 0
liberal_arts_cnt = 0
the_arts_cnt = 0
#wiki = TextBlob("Python is a high-level, general-purpose programming language.")
# Process every .json source file from Google's BigQuery that is in the above source directory

for filename in glob.glob(os.path.join(path, '*.txt')):
fp = codecs.open(filename, 'r', 'UTF-8')
# filepath2 = filepath2 + str(cnt) + '.txt'
# outfile = open(filepath2, 'w', encoding='UTF-8')
cnt += 1
cnt2 = 0
for line in fp:
cnt2 += 1
word = line
word2 = line
word3 = line
word4 = line
result = word.find('humanities')
result2 = word2.find('liberal arts')
result3 = word3.find('the arts')
result4 = word4.find('stem')

# process the the arts comments
if (word.find('humanities') != -1):
stem_cnt += 1
blob = TextBlob(word)
print(line,'\n')
for sentence in blob.sentences:
sent = ("%.3f" % round(sentence.sentiment.polarity ,3))
sent = str(sent)
sent = sent.replace('-', 'neg-')
print(sent, '\n')
# Capture the path into a list: list1[9] is the file name
list1 = (filename.split('\\'))
# list2 becomes the date
list2 = (list1[9].split('-'))
newfname = path + '\\' 'humanities_' + sent + '_' + str(humanities_cnt) + '_' + list1[9]
newfname = newfname.replace('.txt', '.json')
document = 'humanities_' + sent + '_' + str(humanities_cnt) + '_' + list1[9]
document = document.replace('.txt', '.json')
print(newfname)
outfile = open(newfname, 'w', encoding='UTF-8')
list3 = ('{"doc_id": ' + '"' + document + '",' + '"attachment_id\": \"none\",\"pub\": \"Reddit\",\"pub_date\": \"' \
+ list2[1] + '\",\"length\": \"1500\",\"title\": \"' + document + '",' + '"name\": \"' \
+ document + '",' + '\"url": \"Google BigQuery\",\"namespace\": \"we1sv2.0\",\"metapath\": \"Corpus,' \
+ list1[9] + ',' + 'RawData\",\"content\": \"' + line)
list3 = list3.splitlines()[0]
list3 = (list3 + '\"}')
outfile.write(list3)
outfile.close()

# process the liberal arts comments
if (word2.find('liberal arts') != -1):
stem_cnt += 1
blob = TextBlob(word)
print(line,'\n')
for sentence in blob.sentences:
sent = ("%.3f" % round(sentence.sentiment.polarity ,3))
sent = str(sent)
sent = sent.replace('-', 'neg-')
print(sent, '\n')
# Capture the path into a list: list1[9] is the file name
list1 = (filename.split('\\'))
# list2 becomes the date
list2 = (list1[9].split('-'))
newfname = path + '\\' 'liberal_arts_' + sent + '_' + str(liberal_arts_cnt) + '_' + list1[9]
newfname = newfname.replace('.txt', '.json')
document = 'liberal_arts_' + sent + '_' + str(liberal_arts_cnt) + '_' + list1[9]
document = document.replace('.txt', '.json')
print(newfname)
outfile = open(newfname, 'w', encoding='UTF-8')
list3 = ('{"doc_id": ' + '"' + document + '",' + '"attachment_id\": \"none\",\"pub\": \"Reddit\",\"pub_date\": \"' \
+ list2[1] + '\",\"length\": \"1500\",\"title\": \"' + document + '",' + '"name\": \"' \
+ document + '",' + '\"url": \"Google BigQuery\",\"namespace\": \"we1sv2.0\",\"metapath\": \"Corpus,' \
+ list1[9] + ',' + 'RawData\",\"content\": \"' + line)
list3 = list3.splitlines()[0]
list3 = (list3 + '\"}')
outfile.write(list3)
outfile.close()

# process the the arts comments
if (word3.find('the arts') != -1):
stem_cnt += 1
blob = TextBlob(word)
print(line,'\n')
for sentence in blob.sentences:
sent = ("%.3f" % round(sentence.sentiment.polarity ,3))
sent = str(sent)
sent = sent.replace('-', 'neg-')
print(sent, '\n')
# Capture the path into a list: list1[9] is the file name
list1 = (filename.split('\\'))
# list2 becomes the date
list2 = (list1[9].split('-'))
newfname = path + '\\' 'the_arts_' + sent + '_' + str(the_arts_cnt) + '_' + list1[9]
newfname = newfname.replace('.txt', '.json')
document = 'the_arts_' + sent + '_' + str(the_arts_cnt) + '_' + list1[9]
document = document.replace('.txt', '.json')
print(newfname)
outfile = open(newfname, 'w', encoding='UTF-8')
list3 = ('{"doc_id": ' + '"' + document + '",' + '"attachment_id\": \"none\",\"pub\": \"Reddit\",\"pub_date\": \"' \
+ list2[1] + '\",\"length\": \"1500\",\"title\": \"' + document + '",' + '"name\": \"' \
+ document + '",' + '\"url": \"Google BigQuery\",\"namespace\": \"we1sv2.0\",\"metapath\": \"Corpus,' \
+ list1[9] + ',' + 'RawData\",\"content\": \"' + line)
list3 = list3.splitlines()[0]
list3 = (list3 + '\"}')
outfile.write(list3)
outfile.close()

# process the stem comments
if (word4.find('stem') != -1):
stem_cnt += 1
blob = TextBlob(word)
print(line,'\n')
for sentence in blob.sentences:
sent = ("%.3f" % round(sentence.sentiment.polarity ,3))
sent = str(sent)
sent = sent.replace('-', 'neg-')
print(sent, '\n')
# Capture the path into a list: list1[9] is the file name
list1 = (filename.split('\\'))
# list2 becomes the date
list2 = (list1[9].split('-'))
newfname = path + '\\' 'stem_' + sent + '_' + str(stem_cnt) + '_' + list1[9]
newfname = newfname.replace('.txt', '.json')
document = 'stem_' + sent + '_' + str(stem_cnt) + '_' + list1[9]
document = document.replace('.txt', '.json')
print(newfname)
outfile = open(newfname, 'w', encoding='UTF-8')
list3 = ('{"doc_id": ' + '"' + document + '",' + '"attachment_id\": \"none\",\"pub\": \"Reddit\",\"pub_date\": \"' \
+ list2[1] + '\",\"length\": \"1500\",\"title\": \"' + document + '",' + '"name\": \"' \
+ document + '",' + '\"url": \"Google BigQuery\",\"namespace\": \"we1sv2.0\",\"metapath\": \"Corpus,' \
+ list1[9] + ',' + 'RawData\",\"content\": \"' + line)
list3 = list3.splitlines()[0]
list3 = (list3 + '\"}')
outfile.write(list3)
outfile.close()
line = fp.readline()
fp.close()
The Following two python cells process the three 2017 Reddit torrent files.

2017-Reddit-STEM-File-Processing-for-UCSB-Harbor is located on the 525DH WE1S Google Drive here


'''2017-Reddit-STEM-File-Processing-for-UCSB-Harbor.ipynb Author raysteding@gmail.com
This file adds a comma to the end of each line in the files grep'ped from the 40GB source files for the search term.
For simplicity I add the commas to all lines and then hand-edit the last comma out and add the opening and closing
brackets at the beginning of the file and at the bottom of the file to make a json loadable file for further processing
in the cell below which takes the json file and exports out the Reddit comments.
'''
import json
import re
import os
import glob
path = r'C:\Users\rayst\Documents\525-DH\software\dev\Lexos-cleaned\Humanities-Crisis\the_arts3-crisis.json'
outfilepath = r'C:\Users\rayst\Documents\525-DH\software\dev\Lexos-cleaned\Humanities-Crisis\the_arts3-crisis-edited.json'
outfile = (path)
outfile = open(outfilepath, 'w', encoding='UTF-8')
with open(path, 'r', encoding='UTF-8') as f:
for line in f:
line = line.replace('}','},')
# Strip links from comments
line = re.sub(r'^https?:\/\/.*[\r\n]*', '', line, flags=re.MULTILINE)
print(line)
outfile.write(line)
outfile.close()

'''This file processes the individual search terms from the complete 2017 reddit into .json files.
It requires that the source file from above is in proper json format. It then opens that file and captures the "body"
and the subReddit that each comment comes from as well as the file name. It writes that data out into individual
comments along with the sentiment and subjectivity values produced by textblob into properly formatted json files
which includes the proper metadata according to the WE1S schema'''
import json
import os
import re
from textblob import TextBlob
'''textblob is a program that returns sentiment and subjectivity values. It has to be added to your system. The only
thing that needs to be edited in this file is the line below which calls and processes the file produced in the cell above.
'''
path = r'C:\Users\rayst\Documents\525-DH\software\dev\Lexos-cleaned\Humanities-Crisis\the_arts3-crisis-edited.json'
file_cnt = 0
with open(path, 'r') as f:
num_lines = sum(1 for line in open(path))
doc = json.loads(f.read())
count = 0
sent_total = 0.000
while count ',' ')
line = line.replace('deleted','') ; line = line.replace('removed','')
line = line.replace('ufffd','') ; line = re.sub('\s+', ' ', line)
# Change all the text to lower case
line = line.lower()
line = line.rstrip('\r\n')
content = line

'''Calculate the sentiment polarity'''
blob = TextBlob(content)
sent_total == 0.000
sent_cnt = 1
for sentence in blob.sentences:
sent = ("%.3f" % round(sentence.sentiment.polarity ,3))
sent = float(sent)
sent_total = sent_total + sent
sent_cnt += 1
sent = sent_total / sent_cnt
sent = round(sent, 3)
SP = sent
sent = 0.000
sent_total = 0.000

'''Calculate the subjectivity value'''
sent_total == 0.000
sent_cnt = 1
for sentence in blob.sentences:
sent = ("%.3f" % round(sentence.sentiment.subjectivity ,3))
sent = float(sent)
sent_total = sent_total + sent
sent_cnt += 1
sent = sent_total / sent_cnt
sent = round(sent, 3)
SS = sent
sent = 0.000
sent_total = 0.000

'''This is the beginning of assigning the WE1S schema variables'''
# Capture the path into a list: list1[9] is the file name
# create an output filename
filename2 = f.name
filename2 = filename2.replace('-edited','')
filename2 = filename2.replace('.json','')
filename2 = filename2 + '_' + str(file_cnt) + '.json'
file_title = (os.path.basename(filename2))
date = '2017'
# Capture the word length as char
word_lengths = len(content.split())
word_lengths = str(word_lengths)
# Create a single variable with the sentiment, subjectivity, word lengths, and file title
document = str(subreddit) + '_' + 'sentiment' + '_'+ str(SP) + '_' + 'subjectivity' + '_' + str(SS) \
+ '_' + 'words=' + word_lengths + '_' + file_title
# Put all the data required by the WE1S schema into a variable for the output file
list3 = (
'{' + '\n'
+ ' \"doc_id\": \"' + file_title + '\",' + '\n'
+ ' \"attachment_id\": ' + '\"none' + '\",' + '\n'
+ ' \"pub\": ' + '\"subReddits' + '\",' + '\n'
+ ' \"pub_date\": \"' + date + '\",' + '\n'
+ ' \"length\": \"' + word_lengths + '\",' + '\n'
+ ' \"title\": \"' + document + '\",' + '\n'
+ ' \"content\": \"' + content + '\",' + '\n'
+ ' \"name\": \"' + document + '\",' + '\n'
+ ' \"namespace\": \"we1sv2.0' + '\",' + '\n'
+ ' \"metapath\": \"Corpus' + ',' + file_title + ',' + 'RawData\"' + '\n'
+ '}'
)
list3 = (list3)
count += 1
print(filename2)
outfile = open(filename2, 'w', encoding='UTF-8')
outfile.write(list3)
file_cnt += 1
outfile.close()

Saturday, December 8, 2018

What is Value?

On The Issue of Machine Value within a Marxist Theory of Capitalism
          In the edited collection of the Cutting Edge “Why Machines Cannot Create Value; or, Marxist Theory of Machines,” (1997) C. George Caffentzis makes the claim that “futurological assumptions and political dystopias turned out to be radically wrong in their common assumptions”  i.e. why a future utopia or a riotous lower class never develops (30). He bases his claim on the fact that “machines cannot create value.” Defining the word value to a specific meaning according to Marxist theory precedes the question of what happens when the surplus value created by computing becomes a dominant force within a capitalist society. I contend that capitalism adapts to the unrealized surplus values created by machines (software as well as hardware) in a coexistent manner within a robot economy until capitalism ceases to be a dominant force.
Caffentzis frames his “article [a]s a reanalysis and defense of Marx’s original claim that machines cannot produce value;” (31) “where value is human labor that is performed under capitalist production relations, that transforms use-values, and that must realize itself through exchange as money” (Carchedi 74). Caffentzis states that “[c]omputing . . . is just another aspect of human labor power [a value] that can be exploited to create surplus value,” (53) and goes on to say that “the Marxist reason why machines cannot create value [is] because they are values already” (54). In a similar manner, Carchedi claims implicit in Marxist theory is the notion that  “only labor power can produce value” (73). It is the use-values of unrealized labor power in machines such as open source software, DIY projects, unprofitable YouTube videos, Steemit.com an open source platform similar to FaceBook and countless other freely exchangeable nonprofit informational systems, and even commodities like Bitcoin (open source software) which are designated as money in some jurisdictions, that imply something other than a capitalist future.
Take for example the software that UCSB and CSUN students and professors write for the What Every 1 Says program. Funded by the Andrew Carnegie Foundation, WE1S requests that the software be made available for download on GitHub. The software must be licensed by the Creative Commons Attribution-ShareAlike 4.0 International Public License. The copyright states:
1. Subject to the terms and conditions of this Public License,
the Licensor hereby grants You a worldwide, royalty-free,
. . . irrevocable license to: . . .
       a. reproduce and Share the Licensed Material, in whole or
          in part; and
       b. produce, reproduce, and Share Adapted Material . . .
The WE1S software posted to GitHub (WE1S) by members of the WE1S project as well as software licensed by others with similar licenses such as the MIT License (The MIT), and the GNU Public License (gnu.org) all operate outside the Marxist idea of value. Take for another example all the machine software that our hyper-historical world (Floridi 3) runs on, such as the open source software the Internet runs and computer, and device software operating systems, AI systems and so on. To a great extent our world runs on open source software.
The informational society we live in could not have been anticipated by Marx and neither is it included in his theory. According to Floridi “by fostering the development of ICTs, the state ends by undermining its own future as the only, or even the main, information agent” (172) And further, “the more physical goods and even money become information-dependent, the more the informational power exercised by multi-agent systems acquire a significant financial aspect” (176).  The royalty-free portion of our informational infrastructure has only use-value and if it were to cease to exist so too would the ecosystem of our human society which is now dependent on it.
        Michael Tiemann, former RedHat CTO and co-found Cygnus Solutions, the first company to provide commercial support for open source, writes about how Cygnus, Red Hat, and other open source software support companies function within the economic system (Tiemann). The article quotes Nobel Laureate Ronald Coase saying that ‘institutional arrangements determine to a large extent what is produced, [so,] what we have is a very incomplete [mainstream economic] theory’” (Tiemann). The economic theory that Coase refers to is one that has an economy [that can] be coordinated by a system of prices [alone]. Tiemann validates that this is not so through his creation of an open source software service company. Incidentally, Red Hat purchased Cygnus for $674  in a November 1999 transaction (Red Hat), and Red Hat was purchased by IBM for $34 billion dollars this month, which implies even more demand for open source software solutions and maintenance as open source integrates into the highest levels of a supposedly capitalist economy (CNBC). Open source software is even specified for use by the U.S. Department of Defense (Tiemann).
        Due to the fact that hyper-historical economies depend less over time on human labor value where value must be realized through the transaction of money, the economy based on value described and argued by Caffentzis as being a system of values misses the point of machines producing value. And, although Caffentzis is correct that within Marxist theory machines do not create value because “only humans can create value” his article outside of a philosophical exercise in Marxist theory is only relatively important as to why machines can’t create value. Machines as open source informational software systems represent a large portion of the economic foundation that societies run on. The more machines produce products and services outside the scope of Marxist theory, the more society moves to a robot economy (What is a robot economy?).
Works Cited
Carchedi G., “High-Tech Hype: Promises and Realities of Technology in the
Twenty-First Century.”Cutting Edge: Technology, Information Capitalism and Social Revolution. Davis, Jim, et al., editors.First Edition edition, Verso, 1998.
Chaffentzis, George C., “Why Machines Cannot Create Value; or, Marxist Theory of Machines.”
Cutting Edge: Technology, Information Capitalism and Social Revolution. Davis, Jim, et al., editors. First Edition edition, Verso, 1998.
Creative Commons — Attribution-ShareAlike 4.0 International — CC BY-SA 4.0.
Gnu.Org. https://www.gnu.org/licenses/gpl-3.0.en.html. Accessed 23 Nov. 2018.
IBM to Acquire Red Hat in Deal Valued at $34 Billion.
The MIT License | Open Source Initiative. https://opensource.org/licenses/MIT. Accessed 23
Nov. 2018.
Tiemann, Michael. “The (Awesome) Economics of Open Source.” Opensource.Com,
Accessed 23 Nov. 2018.
WE1S -- WhatEvery1Says.” GitHub, https://github.com/whatevery1says. Accessed 23 Nov.
2018.
What Is Robot Economy? - Definition from WhatIs.Com.” SearchEnterpriseAI,
2018.



Implications of Caffentzis’s Claim
In the edited collection of Cutting Edge, (1997) C. George Cafentzis makes the claim that “futurological assumptions and political dystopias turned out to be radically wrong in their common assumptions” because “machines cannot create value” (30). His contribution to the Cutting EdgeWhy Machines Cannot Create Value; or, Marxist Theory of Machines” misleads the reader by allowing the reader to signify complex, subjective and porous connotations of value with a single word. “Value” is a symbolically challenged word, its signification imbued with meaning by referents from economics to advertising and mathematics, etc. People intuitively sense that they know what it means but like God, nobody can pinpoint exactly what it is outside of the context within which it is framed. Caffentzis frames his “article [a]s a reanalysis and defense of Marx’s original claim that machines cannot produce value” (31); “where value is human labor that is performed under capitalist production relations, that transforms use-values, and that must realize itself through exchange as money” (Carchedi 74). Without ever mentioning other forms of economic systems, he critiques a model of a capitalist system as if it is the one in which his audience lives.
Caffentzis speaks about the capitalist production of value as defined within Marxist theory; a kind of closed system of equilibrium of capitalist production wherein “only labor power can produce value.” (Carchedi 73). In Marxist theory, value may be measured quantitatively (Carchedi 83). And, while this may be true when making a Marxist critique of capitalism, when haven’t the subversive forces of ever-changing financial systems continually rendered any specific value into a state of flux?
Value fluctuates by magnitudes and in different ways depending on the whether or not the calculation is made within a specific financial system and when calculated. The question of whether or not machines can produce value plays a subordinate role to what we mean when we speak of value. Take as a rough approximation of what I mean by looking at the following graph of GDP--the total value of goods produced and services provided in a country during one year.
GDP, as shown above, is dependent on credit growth within debt-based economies. In the graph above, the issuance of debt (the upper line) is shown to determine GDP (Salmon). According to Marx, The “capital-value of such paper is nevertheless wholly illusory.” It is “fictitious capital.” But within the debt-based economy of the U.S. since the Greenspan era, the credit cycle has replaced the business cycle of late capitalism. Debt now determines value and not human labor power. Within the centralized wealth structure of debt-based economies, the allocation of value may be assigned to those first in line to receive debt instruments--monetary sums--at lower interest rates than others. And, value in the form of debt provides students with funding to go to college, and so on. Value as debt functions slightly differently in other G20 nations but nonetheless, debt has replaced the value of money.
Rather than human labor power, debt allocation enables machines to create use value that in turn keeps the system running. One reason for this is that the use-value of the products produced must be exchanged on the market in a Marxist capitalist system “through the intermediation of money” (Carchedi 74). But in a debt-based economy, use-values are exchanged for the liabilities of the Federal Reserve Banks. Secondly, the machines provide the necessary software for the exchange of financialized instruments of debt that transfer wealth from one location of the world to another. In China, for example, the surpluses of human labor production have been redistributed to the West while the West relies on the use value of machine software to exchange those products for instruments of debt. And although a person that adheres to Marxist theory might claim that this is what Chaffentzis explains when he quotes that capitalists recognize “‘their profits are not derived solely from the labor employed in their own individual sphere’ and they are involved in the collective exploitation of the total working class,” (41) he fails to mention that the “capitalists” are well aware that they are buying back their companies stock with borrowed Federal Reserve notes as a means to increase their profits per share.
The elephant in the room of Caffetzis’s article is the western world’s central banking system that controls the motion of debt around the world and the fact that he fails to mention, outside of a Marxist capitalist system, monetary systems at all. Federal Reserve banking policy provides a type of financial system that allows the use of derivatives and all manner of financial instruments of the types Randy Martin alluded to in 2006 in his article “Where Did The Future Go?”; he indicates this when he says that capital was “reincarnated as a plethora of financial instruments” (Martin 2). According to Mike Maloney a guest of Chris Martenson during a Peak Prosperity interview, “every 30 to 40 years, the world had a brand-new monetary system, completely different from the last one. The classical gold standard before World War I, the interwar gold exchange standard and the Bretton Woods from 1944 to 1971, and now the global dollar standard” (ChrisMartensondotcom).
Caffentzis misleads his reader by failing to state that Marxist capitalism which he bases his article on is a historical financial model now superseded by centrally controlled models of fictitious capital which are that exact thing (third-order simulacra) Baudrillard tried to warn us about. The answer Caffentiz gives as to “Why Machines Can’t Create Value?” can only be answered by constraining meaning to Marxist theory. Thus, his article doesn’t answer the question of whether to not machines can create value independent of human labor power. Within debt-based economies and within special economic zones and inside and outside the scope of larger economies machines can and do create value.


Works Cited
Carchedi G., “High-Tech Hype: Promises and Realities of Technology in the
Twenty-First Century.”Cutting Edge: Technology, Information Capitalism and Social Revolution. Davis, Jim, et al., editors.First Edition edition, Verso, 1998.
Chaffentzis, George C., “Why Machines Cannot Create Value; or, Marxist Theory of Machines.”
Cutting Edge: Technology, Information Capitalism and Social Revolution. Davis, Jim, et al., editors. First Edition edition, Verso, 1998.
ChrisMartensondotcom. Mike Maloney: One Hell Of A Crisis. YouTube,
https://www.youtube.com/watch?v=sIbB5o2A9Qk&feature=youtu.be. Accessed 19 Nov. 2018.
RANDY MARTIN -- WHERE DID THE FUTURE GO? -- LOGOS 5.1 WINTER 2006.
http://www.logosjournal.com/issue_5.1/martin.htm. Accessed 19 Nov. 2018.
Salmon, Felix. “Chart of the Day: Growth and Debt.” Reuters Blogs, 16 Mar. 2012,
http://blogs.reuters.com/felix-salmon/2012/03/16/chart-of-the-day-growth-and-debt/.



A Digital Humanities Study of Reddit Student Discourse about the Humanities

A Digital Humanities Study of Reddit Student Discourse about the Humanities by Raymond Steding Published August 1, 2019 POSTED ...