Metropolis-Hastings library(ggplot2) library(tidyverse) library(gridExtra) library(MASS) Random Walk Metropolis Hastings Let’s implement a Random Walk Metropolis-Hastings algorithm with a symmetric proposal density. rwmh <- function(start, niter, proposal, target){ # Store accepted, rejected and all samples accepted <- matrix(NA, nrow=niter, dimnames=list(NULL, "accepted")) rejected <- matrix(NA, nrow=niter, dimnames=list(NULL, "rejected")) samples <- matrix(NA, nrow=niter) # Set first sample to start and sample uniform random numbers for efficiency z <- start accepted[1] <- z; rejected[1] <- z; samples[1] <- z u <- runif(niter) # Draw candidate, calculate acceptance prob for niter-1 times for (i in 2:niter){ cand <- proposal$sample(z) prob <- min(1, target$density(cand)/target$density(z)) if (u[i] <= prob) {accepted[i] <- cand; z <- cand} else {rejected[i] <- cand} samples[i] <- z } result <- list("samples"=samples, "accepted"=accepted, "rejected"=rejected, "proposal"=proposal, "target"=target) return(result) } Plotting Function The following function takes the output of the Random-Walk Metropolis-Hastings algorithm and it produces two plots:

Continue reading

library(MASS) library(microbenchmark) library(tidyverse) Utility functions The kernel_matrix function calculates the kernel matrix with squared exponential in a vectorized way. This function is borrowed from the profiling portfolio. kernel_matrix <- function(X, sigmasq, Y=NULL){ if (is.null(Y)){ Y <- X } n <- nrow(X) m <- nrow(Y) # Find three matrices above Xnorm <- matrix(apply(X^2, 1, sum), n, m) Ynorm <- matrix(apply(Y^2, 1, sum), n, m, byrow=TRUE) XY <- tcrossprod(X, Y) return(exp(-(Xnorm - 2*XY + Ynorm) / (2*sigmasq))) } The function make_flags takes a y vector of length n and containing K class labels 0, 1, , K-1 and returns a matrix with n rows and K columns, where each row is a one-hot encoding for the class corresponding to that observation.

Continue reading

Matrices Portfolio

library(Matrix) library(tidyverse) library(igraph) We work with the same dataset used for Tidyverse containing data regarding some of the top-voted kaggle kernels. # Use `` for column names with spaces kaggle <- "kagglekernels.csv" %>% read_csv(col_types = cols( Votes=col_double(), Owner=col_factor(), Kernel=col_factor(), Dataset=col_factor(), Output=col_character(), `Code Type`=col_factor(), Language=col_factor(), Comments=col_double(), Views=col_double(), Forks=col_double()) ) kaggle # Tibbles automatically print head(tibble) ## # A tibble: 971 x 12 ## Votes Owner Kernel Dataset `Version Histor… Tags Output `Code Type` Language ## <dbl> <fct> <fct> <fct> <chr> <chr> <chr> <fct> <fct> ## 1 2130 Mega… Explo… Titani… Version 8,2017-… tuto… This … Script markdown ## 2 1395 Guid… Full … Data S… Version 19,2017… tuto… This … Notebook Python ## 3 1363 Pedr… Compr… House … Version 47,2018… begi… This … Notebook Python ## 4 1316 Anis… Intro… Titani… Version 93,2018… tuto… This … Notebook Python ## 5 1078 Kaan… Data … Pokemo… Version 389,201… begi… This … Notebook Python ## 6 1003 Phil… Explo… Zillow… Version 44,2017… begi… This … Script markdown ## 7 946 Mana… Titan… Titani… Version 16,2017… tuto… This … Notebook Python ## 8 826 Omar… A Jou… Titani… Version 6,2016-… begi… This … Notebook Python ## 9 814 anok… Data … Quora … <NA> inte… This … Notebook Python ## 10 726 SRK Simpl… Zillow… Version 19,2017… eda,… This … Notebook Python ## # … with 961 more rows, and 3 more variables: Comments <dbl>, Views <dbl>, ## # Forks <dbl> Again, we can use the Tags to create a number of different new variables, each representing one Tag.

Continue reading

Profiling Portfolio

Implementations Data Generation and Problem Settings Import the necessary packages. The library provfis is a stochastic profiler and can be used to profile our code. library(profvis) library(tidyverse) library(MASS) library(microbenchmark) First, let’s decide some parameters and settings such as the amount of training and testing data. # Number of data points for plotting & Bandwidth squared of the RBF kernel ntest <- 500 # Characteristic Length-scale for kernel sigmasq <- 1.

Continue reading

Tidyverse Portfolio

library(tidyverse) library(plotly) library(forcats) library(reshape2) library(magrittr) Spotify: Genre Popularity by Artist Import the spotify data an drop index column cause it contains redundant information. spotify <- read_csv("top50.csv", col_names=c( "index", "Song", "Artist", "Genre", "BPM", "Energy", "Danceability", "Loudness", "Liveness", "Valence", "Length", "Acousticness", "Speechiness", "Popularity"), col_types=cols( index=col_double(), Song=col_factor(), Artist=col_factor(), Genre=col_factor()), skip=1) # Index column contains row_numbers spotify <- spotify %>% select(-c("index")) For brevity, will change the name of the genres and change one of the artists cause it has a non utf-8 name.

Continue reading

Mathematical Set Up Suppose we have a training matrix \(X\) with \(n\) observations and \(d\) features \[ X = \begin{pmatrix} x_{11} & x_{12} & \ldots & x_{1d}\\ x_{21} & x_{22} & \ldots & x_{2d} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \ldots & x_{nd} \end{pmatrix} =\begin{pmatrix} {\bf{x}}_1^\top \\ {\bf{x}}_2^\top \\ \vdots \\ {\bf{x}}_n^\top \end{pmatrix} \in\mathbb{R}^{n\times d} \] and that we want to do a feature transform of this data using the Radial Basis Function.

Continue reading

Common R Portfolio

Vectorization As a general rule of thumb, we want to avoid writing for-loops as much as possible, and instead vectorize operations to speed up our calculations. For example suppose that we want to calculate \[\sum_{i=1}^n\cos(i) \times \sin(i) \times \tan(i) - \cos(i)^2-\sin(i)^3\] At first one could write a function similar to this: myfunc <- function(n){ total = 0 for (i in 1:n){ total = total + cos(i)*sin(i)*tan(i) - cos(i)^2 - sin(i)^3 } return(total) } We can see see how long a function takes to run in R by using the system.

Continue reading

Author's picture

Mauro Camara Escudero

PhD student in Computational Statistics and Data Science at the University of Bristol

COMPASS PhD Student

Bristol, United Kingdom