发布时间 : 星期二 文章【原创】r语言twitter 文本挖掘 语义分析分析附代码数据更新完毕开始阅读
mutate(text =str_replace_all(text, \, \)) %>% unnest_tokens(word, text, token =\, pattern = reg) %>% filter(!word %in%stop_words$word, str_detect(word, \))
tweet_words
## # A tibble: 8,753 × 4
## id source created word
##
## 1 676494179216805888 iPhone 2015-12-14 20:09:15 record ## 2 676494179216805888 iPhone 2015-12-14 20:09:15 health ## 3 676494179216805888 iPhone 2015-12-14 20:09:15 #makeamericagreatagain ## 4 676494179216805888 iPhone 2015-12-14 20:09:15 #trump2016
## 5 676509769562251264 iPhone 2015-12-14 21:11:12 accolade ## 6 676509769562251264 iPhone 2015-12-14 21:11:12 @trumpgolf
## 7 676509769562251264 iPhone 2015-12-14 21:11:12 highly ## 8 676509769562251264 iPhone 2015-12-14 21:11:12 respected ## 9 676509769562251264 iPhone 2015-12-14 21:11:12 golf ## 10 676509769562251264 iPhone 2015-12-14 21:11:12 odyssey ## # ... with 8,743 more rows tweet_words %>%
count(word, sort =TRUE) %>% head(20) %>%
mutate(word =reorder(word, n)) %>% ggplot(aes(word, n)) +
geom_bar(stat =\) + ylab(\) + coord_flip()
From the figure we can see Hillary's keyword ranking is the first, followed by Trump 2016 this keyword. At the same time in the back of the keywords, we also see Trump, and Clinton and so on.
The emotional analysis of the data, and calculate the relative impact of Andrews and Apple mobile phone ratio
The emotional ratio of the different platforms is calculated by the emotional tendencies of the characteristic words, and the visualization is carried out
android_iphone_ratios <-tweet_words %>% count(word, source) %>% filter(sum(n) >=5) %>%
spread(source, n, fill =0) %>% ungroup() %>%
mutate_each(funs((. +1) /sum(. +1)), -word) %>% mutate(logratio =log2(Android /iPhone)) %>% arrange(desc(logratio))
nrc <-sentiments %>%