library(Matrix) library(tidyverse) library(igraph) We work with the same dataset used for Tidyverse containing data regarding some of the top-voted kaggle kernels.
# Use `` for column names with spaces kaggle <- "kagglekernels.csv" %>% read_csv(col_types = cols( Votes=col_double(), Owner=col_factor(), Kernel=col_factor(), Dataset=col_factor(), Output=col_character(), `Code Type`=col_factor(), Language=col_factor(), Comments=col_double(), Views=col_double(), Forks=col_double()) ) kaggle # Tibbles automatically print head(tibble) ## # A tibble: 971 x 12 ## Votes Owner Kernel Dataset `Version Histor… Tags Output `Code Type` Language ## <dbl> <fct> <fct> <fct> <chr> <chr> <chr> <fct> <fct> ## 1 2130 Mega… Explo… Titani… Version 8,2017-… tuto… This … Script markdown ## 2 1395 Guid… Full … Data S… Version 19,2017… tuto… This … Notebook Python ## 3 1363 Pedr… Compr… House … Version 47,2018… begi… This … Notebook Python ## 4 1316 Anis… Intro… Titani… Version 93,2018… tuto… This … Notebook Python ## 5 1078 Kaan… Data … Pokemo… Version 389,201… begi… This … Notebook Python ## 6 1003 Phil… Explo… Zillow… Version 44,2017… begi… This … Script markdown ## 7 946 Mana… Titan… Titani… Version 16,2017… tuto… This … Notebook Python ## 8 826 Omar… A Jou… Titani… Version 6,2016-… begi… This … Notebook Python ## 9 814 anok… Data … Quora … <NA> inte… This … Notebook Python ## 10 726 SRK Simpl… Zillow… Version 19,2017… eda,… This … Notebook Python ## # … with 961 more rows, and 3 more variables: Comments <dbl>, Views <dbl>, ## # Forks <dbl> Again, we can use the Tags to create a number of different new variables, each representing one Tag.