terça-feira, 11 de outubro de 2016

Using MongoDB with R directly

Hello everyone, this post is about integrating mongodb and R directly using the library "mongolite". For that it will create a dataframe that contains 20.002 rows and then insert it 'n' times in the mongodatabase in order to perform a small database-writting benchmark.

The code was tested using RStudio (R version 3.3.1) and should be working also using the command line R. It also suppose that you have already installed mongodb in your computer and it is already running.

The fist step is to install the package using the command: install.packages("mongolite")

Then we can use it: library(mongolite)

Notice that it might also install some additional packages required to make it work properly.

--Inserting a custom dataframe into the database and the benchmark function:

library(mongolite)
m <- mongo(collection = "test")

# I believe it supposes that you are using the default port for mongodb

# Initiates a variable that will hold our temporary dataframe
df <- NULL;
title <- c("name","path","timestamp","type")

t1 <- Sys.time()
c1 <- "creation"
d1 <- c("user","/Users/user/home/examples/test",t1,c1)

t2 <- Sys.time()
c2 <- "creation"
d2 <- c("user2", "/Users/user2/home/examples/test",t2,c2)

rbind(df,d1)->df
rbind(df,d2)->df
colnames(df) <- title

# Create a dataframe with the replicated values.
for (i in 1:10000) {
  t1 <- Sys.time()
  c1 <- "update"
  d1 <- c("user1", "/Users/user/home/examples/test",t1,c1)

  t2 <- Sys.time()
  c2 <- "update"
  d2 <- c("user2", "/Users/user2/home/examples/test",t2,c2)
  rbind(df,d1) -> df
  rbind(df,d2) -> df
}

print("DATA CREATED")
df2 <- data.frame(df)
print("Converted to DATAFRAME")

insert <- function() {
  initialTime <- Sys.time()
  m$insert(df2)
  finalTime <- Sys.time()
  total <- difftime(finalTime,initialTime, units = c("secs"))
  return(total)
}

x <- 0
y <- 0

# This function will insert the 20.002 data 'n; times in the database and then plot time the computer needed to do it.

testIt <- function(iterations) {
  y <<- rep(0,iterations)
  x <<- rep(0,iterations)
  for(i in 1:iterations) {
    x[i] <<- as.numeric(insert())
    y[i] <<- m$count()
  }
  plot(x,y,xlab="Time To Write", ylab="Number of items in database")
}

That's it.
For more information, refer to the following webpage:
https://cran.r-project.org/web/packages/mongolite/vignettes/intro.html

Nenhum comentário:

Postar um comentário