Cheminformatics

Mostrando entradas con la etiqueta R. Mostrar todas las entradas

lunes, 11 de abril de 2016

Simulating queueing systems with simmer

We are very pleased to announce that a new release of simmer, the Discrete-Event Simulator for R, is on CRAN. There are quite a few changes and fixes, with the support of preemption as a star new feature. Check out the complete set of release notes here.

Let’s simmer for a bit and see how this package can be used to simulate queueing systems in a very straightforward way.

The M/M/1 system

In Kendall’s notation, an M/M/1 system has exponential arrivals (M/M/1), a single server (M/M/1) with exponential service time (M/M/1) and an inifinite queue (implicit M/M/1/(infty)). For instance, people arriving at an ATM at rate (lambda), waiting their turn in the street and withdrawing money at rate (mu).

Let us remember the basic parameters of this system:

whenever (rho < 1). If that is not true, it means that the system is unstable: there are more arrivals than the server is capable of handling, and the queue will grow indefinitely.

The simulation of an M/M/1 system is quite simple using simmer. The trajectory-based design, combined with magrittr’s pipe, is very verbal and self-explanatory.

library(simmer)
set.seed(1234)

lambda <- 2
mu <- 4
rho <- lambda/mu # = 2/4

mm1.trajectory <- create_trajectory() %>%
  seize("resource", amount=1) %>%
  timeout(function() rexp(1, mu)) %>%
  release("resource", amount=1)

mm1.env <- simmer() %>%
  add_resource("resource", capacity=1, queue_size=Inf) %>%
  add_generator("arrival", mm1.trajectory, function() rexp(1, lambda)) %>%
  run(until=2000)

Our package provides convenience plotting functions to quickly visualise the usage of a resource over time, for instance. Down below, we can see how the simulation converges to the theoretical average number of customers in the system.

library(ggplot2)

# Evolution of the average number of customers in the system
graph <- plot_resource_usage(mm1.env, "resource", items="system")

# Theoretical value
mm1.N <- rho/(1-rho)
graph + geom_hline(yintercept=mm1.N)

It is possible also to visualise, for instance, the instantaneous usage of individual elements by playing with the parameters items and steps.

plot_resource_usage(mm1.env, "resource", items=c("queue", "server"), steps=TRUE) +
  xlim(0, 20) + ylim(0, 4)

We may obtain the time spent by each customer in the system and we compare the average with the theoretical expression.

mm1.arrivals <- get_mon_arrivals(mm1.env)
mm1.t_system <- mm1.arrivals$end_time - mm1.arrivals$start_time

mm1.T <- mm1.N / lambda
mm1.T ; mean(mm1.t_system)

## [1] 0.5

## [1] 0.5012594

It seems that it matches the theoretical value pretty well. But of course we are picky, so let’s take a closer look, just to be sure (and to learn more about simmer, why not). Replication can be done with standard R tools:

library(parallel)

envs <- mclapply(1:1000, function(i) {
  simmer() %>%
    add_resource("resource", capacity=1, queue_size=Inf) %>%
    add_generator("arrival", mm1.trajectory, function() rexp(1, lambda)) %>%
    run(1000/lambda) %>%
    wrap()
})

Et voilà! Parallelizing has the shortcoming that we lose the underlying C++ objects when each thread finishes, but the wrap function does all the magic for us retrieving the monitored data. Let’s perform a simple test:

library(dplyr)

t_system <- get_mon_arrivals(envs) %>%
  mutate(t_system = end_time - start_time) %>%
  group_by(replication) %>%
  summarise(mean = mean(t_system))

t.test(t_system$mean)

##
##      One Sample t-test
##
## data:  t_system$mean
## t = 344.14, df = 999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.4953154 0.5009966
## sample estimates:
## mean of x
##  0.498156

Good news: the simulator works. Finally, an M/M/1 satisfies that the distribution of the time spent in the system is, in turn, an exponential random variable with average (T).

qqplot(mm1.t_system, rexp(length(mm1.t_system), 1/mm1.T))
abline(0, 1, lty=2, col="red")

M/M/c/k systems

An M/M/c/k system keeps exponential arrivals and service times, but has more than one server in general and a finite queue, which often is more realistic. For instance, a router may have several processor to handle packets, and the in/out queues are necessarily finite.

This is the simulation of an M/M/2/3 system (2 server, 1 position in queue). Note that the trajectory is identical to the M/M/1 case.

lambda <- 2
mu <- 4

mm23.trajectory <- create_trajectory() %>%
  seize("server", amount=1) %>%
  timeout(function() rexp(1, mu)) %>%
  release("server", amount=1)

mm23.env <- simmer() %>%
  add_resource("server", capacity=2, queue_size=1) %>%
  add_generator("arrival", mm23.trajectory, function() rexp(1, lambda)) %>%
  run(until=2000)

In this case, there are rejections when the queue is full.

mm23.arrivals <- get_mon_arrivals(mm23.env)
mm23.arrivals %>%
  summarise(rejection_rate = sum(!finished)/length(finished))

##   rejection_rate
## 1     0.02065614

Despite this, the time spent in the system still follows an exponential random variable, as in the M/M/1 case, but the average has dropped.

mm23.t_system <- mm23.arrivals$end_time - mm23.arrivals$start_time
# Comparison with M/M/1 times
qqplot(mm1.t_system, mm23.t_system)
abline(0, 1, lty=2, col="red")

from R-bloggers http://ift.tt/1N3PFyO
via IFTTT

martes, 2 de febrero de 2016

Your First Machine Learning Project in R Step-By-Step (tutorial and template for future projects)

Do you want to do machine learning using R, but you’re having trouble getting started? In this post you will complete your first machine learning project using R. In this step-by-step tutorial you will: Download and install R and get the most useful package for machine learning in R. Load a dataset and understand it’s structure using statistical summaries […]

The post Your First Machine Learning Project in R Step-By-Step (tutorial and template for future projects) appeared first on Machine Learning Mastery.

from Machine Learning Mastery http://ift.tt/1Px7IMO
via IFTTT

jueves, 28 de enero de 2016

Pipelining R and Python in Notebooks

by Micheleen Harris Microsoft Data Scientist As a Data Scientist, I refuse to choose between R and Python, the top contenders currently fighting for the title of top Data Science programming language. I am not going to argue about which is better or pit Python and R against each other. Rather, I'm simply going to suggest to play to the strengths of each language and consider using them together in the same pipeline if you don't want to give up advantages of one over the other. This is not a novel concept. Both languages have packages/modules which allow for the...

from R-bloggers http://ift.tt/1QqS3xq
via IFTTT

In-depth analysis of Twitter activity and sentiment, with R

Astronomer and budding data scientist Julia Silge has been using R for less than a year, but based on the posts using R on her blog has already become very proficient at using R to analyze some interesting data sets. She has posted detailed analyses of water consumption data and health care indicators from the Utah Open Data Catalog, religious affiliation data from the Association of Statisticians of American Religious Bodies, and demographic data from the American Community Survey (that's the same dataset we mentioned on Monday). In a two-part series, Julia analyzed another interesting dataset: her own archive of...

from R-bloggers http://ift.tt/1PTKQmE
via IFTTT

lunes, 25 de enero de 2016

Super Fast Crash Course in R (for developers)

As a developer you can pick-up R super fast. If you are already a developer, you don’t need to know much about a new language to be able to reading and understanding code snippets and writing your own small scripts and programs. In this post you will discover the basic syntax, data structures and control […]

The post Super Fast Crash Course in R (for developers) appeared first on Machine Learning Mastery.

from Machine Learning Mastery http://ift.tt/1KyeGeT
via IFTTT

viernes, 18 de septiembre de 2015

From functional programming to MapReduce in R

The MapReduce paradigm has long been a staple of big data computational strategies. However, properly leveraging MapReduce can be a …

Continue reading →

from R-bloggers http://ift.tt/1FQwYFF
via IFTTT

miércoles, 2 de septiembre de 2015

Logistic Regression in R – Part One

Please note that an earlier version of this post had to be retracted because it contained some content which was generated at work. I have since chosen to rewrite the document in a series of posts. Please recognize that this may take some time. Apologies for any inconvenience. Logistic regression is used to analyze the […]

from R-bloggers http://ift.tt/1Kr0tW1
via IFTTT

miércoles, 26 de agosto de 2015

How to use lists in R

In the [last post](http://ift.tt/18yhf3k), I went over the basics of lists, including constructing, manipulating, and converting lists to other classes. Knowing the basics, in this post, we'll use the **apply()** functions to see just how powerful working with lists can be. I've done two posts on apply for dataframes and matrics, [here](http://ift.tt/1i25x6o) and [here](http://ift.tt/1A8JbS4), so give those a read if you need a refresher. Intro to apply-based functions for lists There are a variety of apply functions that can be used depending on what you want to do. The table below shows the function, what it inputs, and what it outputs: For example, if we have a list and you want to produce a vector (of the same length), we use **sapply()**. If we have a vector and want to produce a list of the same length, we use **lapply()**. Let's try an example. The syntax of lapply is: lapply(INPUT, function(x) (Some function here)) where INPUT, as we see from the table above, must be a vector or a list, and function(x) is any kind of function that takes **each element of the INPUT** and applies the function to it. The function can be something that already exists in R, or it can be a new function that you've written up. For example, let's construct a list of 3 vectors like so: mylist=5} #apply that function to the list sapply(mylist, span.fun) Creating a list using lapply You don't need to have a list already created to use lapply() - in fact, lapply can be used to _make_ a list. This is because the key about **lapply()** is that it *returns* a list of the same length as whatever you input. For example, let's initialize a list to have 2 empty matrices that are size 2x3. We'll use lapply(): our input is just a vector containing 1 and 2, and the function we specify uses the matrix() function to construct a 2x3 matrix of empty cells for each element of this vector, so it returns a list of two such matrices. If instead of empty matrices you wanted to fill these matrices with random numbers, you could do so too. Check out both possibilities below. #initialize list to to 2 empty matrices of 2 by 3 list2

from R-bloggers http://ift.tt/1K09Wn4
via IFTTT

lunes, 24 de agosto de 2015

Predicting Titanic deaths on Kaggle IV: random forest revisited

On July 19th I used randomForest to predict the deaths on Titanic in the Kaggle competition. Subsequently I found that both bagging and boosting gave better predictions than randomForest. This I found somewhat unsatisfactory, hence I am now revisi...

from R-bloggers http://ift.tt/1MOw2ct
via IFTTT

miércoles, 8 de julio de 2015

Sports Data and R – Scope for a Thematic (Rather than Task) View? (Living Post)

Via my feeds, I noticed a package announcement today for cricketR!, a new package for analysing cricket performance data. This got me wondering (again!) about what other sports related packages there might be out there, either in terms of functional thematic packages (to do with sport in general, or one sport in particular), or particular […]

from R-bloggers http://ift.tt/1CnfYKR
via IFTTT

Time series outlier detection (a simple R function)

(By Andrea Venturini) Imagine you have a lot of time series – they may be short ones – related to a lot of different measures and very little time to find outliers. You need something not too sophisticated to solve quickly the mess. This is – very shortly speaking – the typical situation in which you can adopt washer.AV() function in R language. In this linked document (washer) you have the function and an example of actual application in R language: a data.frame (dati) with temperature and rain (phen) measures (value) in 4 periods of time (time) and in 20 geographical zones (zone). (20*4*2=160 arbitrary observations). > dati phen time zone value 1 Temperature 1 a01 2.0 2 Temperature 1 a02 20.0 … … 160 Rain 4 a20 8.5 The example of 20 meteorological stations measuring rainfall and temperature is useful to understand in which situation you can implement the washer() methodology. This methodology considers only 3 observations in a group of time series, for instance all 20 terns between time 2 and 4: if the their shape is similar between each other than no outlier will be detected, otherwise – as it happens to the orange time […]

from R-bloggers http://ift.tt/1HN6OYJ
via IFTTT

jueves, 25 de junio de 2015

KDD Cup 2015: The story of how I built hundreds of predictive models….And got so close, yet so far away from 1st place!

The challenge from the KDD Cup this year was to use their data relating to student enrollment in online MOOCs to predict who would drop out vs who would stay. The short story is that using H2O and a lot … Continue reading →

from R-bloggers http://ift.tt/1fFLDgk
via IFTTT

Cheminformatics - QSAR

lunes, 11 de abril de 2016

Simulating queueing systems with simmer

The M/M/1 system

M/M/c/k systems

martes, 2 de febrero de 2016

Your First Machine Learning Project in R Step-By-Step (tutorial and template for future projects)

jueves, 28 de enero de 2016

Pipelining R and Python in Notebooks

In-depth analysis of Twitter activity and sentiment, with R

lunes, 25 de enero de 2016

Super Fast Crash Course in R (for developers)

viernes, 18 de septiembre de 2015

From functional programming to MapReduce in R

miércoles, 2 de septiembre de 2015

Logistic Regression in R – Part One

miércoles, 26 de agosto de 2015

How to use lists in R

lunes, 24 de agosto de 2015

Predicting Titanic deaths on Kaggle IV: random forest revisited

miércoles, 8 de julio de 2015

Sports Data and R – Scope for a Thematic (Rather than Task) View? (Living Post)

Time series outlier detection (a simple R function)

jueves, 25 de junio de 2015

KDD Cup 2015: The story of how I built hundreds of predictive models….And got so close, yet so far away from 1st place!

Datos personales

Archivo del blog

lunes, 11 de abril de 2016

The M/M/1 system

M/M/c/k systems

martes, 2 de febrero de 2016

jueves, 28 de enero de 2016

lunes, 25 de enero de 2016

viernes, 18 de septiembre de 2015

miércoles, 2 de septiembre de 2015

miércoles, 26 de agosto de 2015

lunes, 24 de agosto de 2015

miércoles, 8 de julio de 2015

jueves, 25 de junio de 2015

Datos personales

Subscribe

Archivo del blog