Exam and Final Project

Final Project:

  • April 3rd: One-page proposal.
  • April 17th: First draft of poster.
  • April 22nd: Final Poster draft due.
  • April 24th: Poster session (more details to come)

Midterm Exam:

  • Take home, open book, please cite your peers if you receive help.
  • Midterm
  • gun_videos.csv
  • gun_transcripts.zip
  • gun_channels_labeled.csv
  • gun_rec.csv
  • More details in working paper if you are interested. (optional reading)
  • Here are some code for pre-processing:

    channel = read_csv("gun_channels_labeled.csv") %>% as.data.frame
    video = read_csv("gun_videos.csv") %>% as.data.frame
    docs = "gun_transcripts/*"
    transcript = readtext(docs)


    And you can clean your data:

    #urn to corpus
    corp_tran = corpus(transcript)
    video.ids = gsub("\\.txt", "", docnames(corp_tran))
    video.data = data.frame(video.id = video.ids)
    video.data = left_join(video.data, video, by = c("video.id" = "rec.video.id"))
    #summary(corp_tran, 10) #take a look at the corpus

    #preferred preprocessing method: PNLSWI
    toke_tran = tokens(corp_tran, verbose = TRUE) #77215
    toke_tran_P = tokens(corp_tran, remove_punct = TRUE, verbose = TRUE) # 77166
    toke_tran_PN = tokens(corp_tran, remove_punct = TRUE, remove_numbers = TRUE, verbose = TRUE) #74664

    dfm_tran_raw = dfm(toke_tran, tolower = FALSE, verbose = TRUE) #77211?
    dfm_tran_P = dfm(toke_tran_P, tolower = FALSE, verbose = TRUE) #77163
    dfm_tran_PN = dfm(toke_tran_PN, tolower = FALSE, verbose = TRUE) #74661
    dfm_tran_PNL = dfm(dfm_tran_PN, tolower = TRUE, verbose = TRUE) #56582
    dfm_tran_PNLS = dfm(dfm_tran_PNL, tolower = FALSE, stem = TRUE, verbose = TRUE) #37175

Solution:

  • Solution here. This is a very open ended task, so this is just one possible way to solve it.

Office Hours:

Connor Jerzak: Walk-in OH Wed 1:35-2:35+ pm (CGIS S223) Naijia Liu Ruofan Ma Yuning Liu