Connor Jerzak (Instructor) Naijia Liu (Instructor) | Ruofan Ma (TF) Yuning Liu (TF)

Course Slack Gradescope

Exam and Final Project

Final Project:

April 3rd: One-page proposal.
April 17th: First draft of poster.
April 22nd: Final Poster draft due.
April 24th: Poster session (more details to come)

Midterm Exam:

Take home, open book, please cite your peers if you receive help.
Midterm
gun_videos.csv
gun_transcripts.zip
gun_channels_labeled.csv
gun_rec.csv
More details in working paper if you are interested. (optional reading)
Here are some code for pre-processing:

channel = read_csv("gun_channels_labeled.csv") %>% as.data.frame video = read_csv("gun_videos.csv") %>% as.data.frame docs = "gun_transcripts/*" transcript = readtext(docs)

And you can clean your data:

#urn to corpus corp_tran = corpus(transcript) video.ids = gsub("\\.txt", "", docnames(corp_tran)) video.data = data.frame(video.id = video.ids) video.data = left_join(video.data, video, by = c("video.id" = "rec.video.id")) #summary(corp_tran, 10) #take a look at the corpus #preferred preprocessing method: PNLSWI toke_tran = tokens(corp_tran, verbose = TRUE) #77215 toke_tran_P = tokens(corp_tran, remove_punct = TRUE, verbose = TRUE) # 77166 toke_tran_PN = tokens(corp_tran, remove_punct = TRUE, remove_numbers = TRUE, verbose = TRUE) #74664 dfm_tran_raw = dfm(toke_tran, tolower = FALSE, verbose = TRUE) #77211? dfm_tran_P = dfm(toke_tran_P, tolower = FALSE, verbose = TRUE) #77163 dfm_tran_PN = dfm(toke_tran_PN, tolower = FALSE, verbose = TRUE) #74661 dfm_tran_PNL = dfm(dfm_tran_PN, tolower = TRUE, verbose = TRUE) #56582 dfm_tran_PNLS = dfm(dfm_tran_PNL, tolower = FALSE, stem = TRUE, verbose = TRUE) #37175

Solution:

Solution here. This is a very open ended task, so this is just one possible way to solve it.

Office Hours:

Connor Jerzak: Walk-in OH Wed 1:35-2:35+ pm (CGIS S223) Naijia Liu Ruofan Ma Yuning Liu