from TwistedSifter http://ift.tt/1MYdQvu
via IFTTT
Machine learning is a fascinating and powerful field of study filled with algorithms and data. The thing is, there are so many different types of people interested in machine learning, and each has different needs. It is important to understand what it is you want from machine learning and to tailor your self-study to those needs. […]
The post Find Your Machine Learning Tribe: Get Started And Avoid Getting The Wrong Advice appeared first on Machine Learning Mastery.
Implementing machine learning algorithms from scratch seems like a great way for a programmer to understand machine learning. And maybe it is. But there some downsides to this approach too. In this post you will discover some great resources that you can use to implement machine learning algorithms from scratch. You will also discover some of […]
The post Understand Machine Learning Algorithms By Implementing Them From Scratch (and tactics to get around bad code) appeared first on Machine Learning Mastery.
coordinates
and projection
to create spatial data. There is no need of loading it, since the same functions are available under different names in sp. However, I prefer these two because they are easier to remember. The last packages are rgdal and rgeos, for performing various operations on geodata.The script therefore starts like:library(gstat)
library(sp)
library(spacetime)
library(raster)
library(rgdal)
library(rgeos)
POSIXlt
or POSIXct
, which are standard ways of representing time in R. This very first thing we have to do is of course setting the working directory and loading the csv file:setwd("...")
data <- read.table("ozon_tram1_14102011_14012012.csv", sep=",", header=T)
> paste(data$generation_time[1])
[1] "1318583686494"
POSIXlt
, but we first need to extract only the first 10 digits, and then convert it. This can be done in one line of code but with multiple functions:data$TIME <- as.POSIXlt(as.numeric(substr(paste(data$generation_time), 1, 10)), origin="1970-01-01")
paste(data$generation_time)
. This creates the character string shown above, which we can then subset using the function substr
. This function is used to subtract characters from a string and takes three arguments: a string, a starting character and a stopping character. In this case we want to basically delete the last 3 numbers from our string, so we set the start on the first number (start=1
), and the stop at 10 (stop=10
). Then we need to change the numerical string back to a numerical format, using the function as.numeric
. Now we just need one last function to tell R that this particular number is a Date/Time object. We can do this using the function as.POSIXlt
, which takes the actual number we just created plus an origin. Since we are using UNIX time, we need to set the starting point at "1970-01-01". We can test this function of the first element of the vector data$generation_time to test its output:> as.POSIXlt(as.numeric(substr(paste(data$generation_time[1]), start=1, stop=10)), origin="1970-01-01")
[1] "2011-10-14 11:14:46 CEST"
data.frame
data has a new column named TIME where the Date/Time information are stored. Another issue with this dataset is in the formats of latitude and longitude. In the csv files these are represented in the format below:> data$longitude[1]
[1] 832.88198
76918 Levels: 829.4379 829.43822 829.44016 829.44019 829.4404 ... NULL
> data$latitude[1]
[1] 4724.22833
74463 Levels: 4721.02182 4721.02242 4721.02249 4721.02276 ... NULL
data$LAT <- as.numeric(substr(paste(data$latitude),1,2))+(as.numeric(substr(paste(data$latitude),3,10))/60)
data$LON <- as.numeric(substr(paste(data$longitude),1,1))+(as.numeric(substr(paste(data$longitude),2,10))/60)
paste
and substr
to extract only the numbers we need. For converting this format into degrees, we need to sum the degrees with the minutes divided by 60. So in the first part of the equation we just need to extract the first two digits of the numerical string and transform them back to numerical format. In the second part we need to extract the remaining of the strings, transform them into numbers and then divided them by 60.This operation creates some NA
s in the dataset, for which you will get a warning message. We do not have to worry about it as we can just exclude them with the following line:data <- na.omit(data)
> min(data$TIME)
[1] "2011-10-14 11:14:46 CEST"
> max(data$TIME)
[1] "2012-01-14 13:40:43 CET"
data.frame
objects using Date/Time:> sub <- data[data$TIME>=as.POSIXct('2011-12-12 00:00 CET')&data$TIME<=as.POSIXct('2011-12-14 23:00 CET'),]
> nrow(sub)
[1] 6734
#Create a SpatialPointsDataFrame
coordinates(sub)=~LON+LAT
projection(sub)=CRS("+init=epsg:4326")
#Transform into Mercator Projection
ozone.UTM <- spTransform(sub,CRS("+init=epsg:3395"))
SpatialPointsDataFrame
with coordinates in metres.
STFDF
and STIDF
formats. The first represents objects with a complete space time grid. In other words in this category are included objects such as the grid of weather stations presented in Fig.1. The spatio-temporal object is created using the n locations of the weather stations and the m time intervals of their observations. The spatio-temporal grid is of size 𝑛∙𝑚.
STIDF
objects are the one we are going to use for this example. These are unstructured spatio-temporal objects, where both space and time change dynamically. For example, in this case we have data collected on top of trams moving around the city of Zurich. This means that the location of the sensors is not consistent throughout the sampling window. Creating STIDF
objects is fairly simple, we just need to disassemble the data.frame
we have into a spatial, temporal and data components, and then merge them together to create the STIDF
object.The first thing to do is create the SpatialPoints
object, with the locations of the sensors at any given time:ozoneSP <- SpatialPoints(ozone.UTM@coords,CRS("+init=epsg:3395"))
SpatialPoints
in the package sp. This function takes two arguments, the first is a matrix
or a data.frame
with the coordinates of each point. In this case I used the coordinates of the SpatialPointsDataFrame
we created before, which are provided in a matrix
format. Then I set the projection in UTM.zerodist
:dupl <- zerodist(ozoneSP)
STIDF
object:ozoneDF <- data.frame(PPB=ozone.UTM$ozone_ppb[-dupl[,2]])
data.frame
with only one column, named PPB, with the ozone observations in part per billion. As you can see I removed the duplicated points by excluding the rows from the object ozone.UTM with the indexes included in one of the columns of the object dupl. We can use the same trick while creating the temporal part:ozoneTM <- as.POSIXct(ozone.UTM$TIME[-dupl[,2]],tz="CET")
STIDF
:timeDF <- STIDF(ozoneSP,ozoneTM,data=ozoneDF)
STIDF
object by using the spatio-temporal version of the function spplot
, which is stplot
:stplot(timeDF)
variogramST
. Its use is similar to the standard function for spatial kriging, even though there are some settings for the temporal component that need to be included.var <- variogramST(PPB~1,data=timeDF,tunit="hours",assumeRegular=F,na.omit=T)
variogramST
is identical to a normal call to the function variogram
; we first have the formula and then the data source. However, then we have to specify the time units (tunits
) or the time lags (tlags
). I found the documentation around this point a bit confusing to be honest. I tested various combinations of parameters and the line of code I presented is the only one that gives me what appear to be good results. I presume that what I am telling to the function is to aggregate the data to the hours, but I am not completely sure. I hope some of the readers can shed some light on this!!plot(var,map=F)
plot(var,map=T)
plot(var,wireframe=T)
vgmST
and fit.StVariogram
, which are the spatio-temporal matches for vgm
and fit.variogram
.demo(stkrige)
# lower and upper bounds
pars.l <- c(sill.s = 0, range.s = 10, nugget.s = 0,sill.t = 0, range.t = 1, nugget.t = 0,sill.st = 0, range.st = 10, nugget.st = 0, anis = 0)
pars.u <- c(sill.s = 200, range.s = 1000, nugget.s = 100,sill.t = 200, range.t = 60, nugget.t = 100,sill.st = 200, range.st = 1000, nugget.st = 100,anis = 700)
separable <- vgmST("separable", space = vgm(-60,"Sph", 500, 1),time = vgm(35,"Sph", 500, 1), sill=0.56)
plot(var,separable,map=F)
vgmST
do not make much of a difference, so probably you do not have to worry too much about the parameters you select in vgmST
.fit.StVariogram
with the option fit.method=0
, which keeps this model but calculates its Mean Absolute Error (MSE), compared to the actual data:> separable_Vgm <- fit.StVariogram(var, separable, fit.method=0)
> attr(separable_Vgm,"MSE")
[1] 54.96278
> separable_Vgm <- fit.StVariogram(var, separable, fit.method=11,method="L-BFGS-B", stAni=5, lower=pars.l,upper=pars.u)
> attr(separable_Vgm, "MSE")
[1] 451.0745
plot(var,separable_Vgm,map=F)
extractPar
:> extractPar(separable_Vgm)
range.s nugget.s range.t nugget.t sill
199.999323 10.000000 99.999714 1.119817 17.236256
vgmST
we need to provide both the spatial and temporal component, plus the value of the parameter k
(which needs to be positive):prodSumModel <- vgmST("productSum",space = vgm(1, "Exp", 150, 0.5),time = vgm(1, "Exp", 5, 0.5),k = 50)
k = 5
, but R returned an error message saying that it needed to be positive, which I did not understand. However, with 50 it worked and as I mentioned the automatic fit does not care much about these initial values, probably the most important things are the upper and lower bounds we set before.> prodSumModel_Vgm <- fit.StVariogram(var, prodSumModel,method = "L-BFGS-B",lower=pars.l)
> attr(prodSumModel_Vgm, "MSE")
[1] 215.6392
stAni
and creating a metric model in vgmST
can be done as follows:metric <- vgmST("metric", joint = vgm(50,"Mat", 500, 0), stAni=200)
> metric_Vgm <- fit.StVariogram(var, metric, method="L-BFGS-B",lower=pars.l)
> attr(metric_Vgm, "MSE")
[1] 79.30172
sumMetric <- vgmST("sumMetric", space = vgm(psill=5,"Sph", range=500, nugget=0),time = vgm(psill=500,"Sph", range=500, nugget=0), joint = vgm(psill=1,"Sph", range=500, nugget=10), stAni=500)
> sumMetric_Vgm <- fit.StVariogram(var, sumMetric, method="L-BFGS-B",lower=pars.l,upper=pars.u,tunit="hours")
> attr(sumMetric_Vgm, "MSE")
[1] 58.98891
SimplesumMetric <- vgmST("simpleSumMetric",space = vgm(5,"Sph", 500, 0),time = vgm(500,"Sph", 500, 0), joint = vgm(1,"Sph", 500, 0), nugget=1, stAni=500)
> SimplesumMetric_Vgm <- fit.StVariogram(var, SimplesumMetric,method = "L-BFGS-B",lower=pars.l)
> attr(SimplesumMetric_Vgm, "MSE")
[1] 59.36172
plot(var,list(separable_Vgm, prodSumModel_Vgm, metric_Vgm, sumMetric_Vgm, SimplesumMetric_Vgm),all=T,wireframe=T)
roads <- shapefile("VEC25_str_l_Clip/VEC25_str_l.shp")
Klass1 <- roads[roads$objectval=="1_Klass",]
Klass1.UTM <- spTransform(Klass1,CRS("+init=epsg:3395"))
crop
in rgeos, with the object ozone.UTM that I created before:Klass1.cropped <- crop(Klass1.UTM,ozone.UTM)
plot(Klass1.cropped)
plot(ozone.UTM,add=T,col="red")
spsample
to create a random grid of points along the road lines:sp.grid.UTM <- spsample(Klass1.cropped,n=1500,type="random")
RData
format (gridST.RData): seq
:tm.grid <- seq(as.POSIXct('2011-12-12 06:00 CET'),as.POSIXct('2011-12-14 09:00 CET'),length.out=5)
length.out=5
), with POSIXct
values between the two Date/Times provided. In this case we are interested in creating a spatio-temporal data frame, since we do not yet have any data for it. Therefore we can use the function STF
to merge spatial and temporal data into a spatio-temporal grid:grid.ST <- STF(sp.grid.UTM,tm.grid)
krigeST
to perform the interpolation:pred <- krigeST(PPB~1, data=timeDF, modelList=sumMetric_Vgm, newdata=grid.ST)
stplot
:stplot(pred)
vignette("st", package = "gstat")
demo(stkrige)
Microsoft recently launched support for machine learning in their Azure cloud computing platform. Buried in some of their technical documentation for the platform are some resources that you may find useful for thinking about what machine learning algorithm to use in different situations. In this post we take a look at the Microsoft recommendations for […]
The post Choosing Machine Learning Algorithms: Lessons from Microsoft Azure appeared first on Machine Learning Mastery.
Where does theory fit into a top-down approach to studying machine learning? In the traditional approach to teaching machine learning, theory comes first requiring an extensive background in mathematics to be able to understand it. In my approach to teaching machine learning, I start with teaching you how to work problems end-to-end and deliver results. […]
The post 5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics appeared first on Machine Learning Mastery.
Where can you get good datasets to practice machine learning? Datasets that are real-world so that they are interesting and relevant, although small enough for you to review in Excel and work through on your desktop. In this post you will discover a database of high-quality, real-world, and well understood machine learning datasets that you […]
The post Practice Machine Learning with Small In-Memory Datasets from the UCI Machine Learning Repository appeared first on Machine Learning Mastery.
Has this happened to you? You are working on your dataset. You create a classification model and get 90% accuracy immediately. “Fantastic” you think. You dive a little deeper and discover that 90% of the data belongs to one class. Damn! This is an example of an imbalanced dataset and the frustrating results it can […]
The post 8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset appeared first on Machine Learning Mastery.
(a.k.a. my answer to the question: “how do I get started in machine learning?“) I’m a developer. I have read a book or some posts on machine learning. I have watched some of the Coursera machine learning course. I still don’t know how to get started… Does this sound familiar? The most common question I’m asked by developers on […]
The post Machine Learning for Programmers: Leap from developer to machine learning practitioner appeared first on Machine Learning Mastery.
"Feature engineering" is a fancy term for making sure that your predictors are encoded in the model in a manner that makes it as easy as possible for the model to achieve good performance. For example, if your have a date field as a predictor and there are larger differences in response for the weekends versus the weekdays, then encoding the date in this way makes it easier to achieve good results.
However, this depends on a lot of things.
First, it is model-dependent. For example, trees might have trouble with a classification data set if the class boundary is a diagonal line since their class boundaries are made using orthogonal slices of the data (oblique trees excepted).
Second, the process of predictor encoding benefits the most from subject-specific knowledge of the problem. In my example above, you need to know the patterns of your data to improve the format of the predictor. Feature engineering is very different in image processing, information retrieval, RNA expressions profiling, etc. You need to know something about the problem and your particular data set to do it well.
Here is some training set data where two predictors are used to model a two-class system (I'll unblind the data at the end):
There is also a corresponding test set that we will use below.
There are some observations that we can make:
Depending on what model that we might choose to use, the between-predictor correlation might bother us. Also, we should look to see of the individual predictors are important. To measure this, we'll use the area under the ROC curve on the predictor data directly.
Here are univariate box-plots of each predictor (on the log scale):
There is some mild differentiation between the classes but a significant amount of overlap in the boxes. The area under the ROC curves for predictor A and B are 0.61 and 0.59, respectively. Not so fantastic.
What can we do? Principal component analysis (PCA) is a pre-processing method that does a rotation of the predictor data in a manner that creates new synthetic predictors (i.e. the principal components or PC's). This is conducted in a way where the first component accounts for the majority of the (linear) variation or information in the predictor data. The second component does the same for any information in the data that remains after extracting the first component and so on. For these data, there are two possible components (since there are only two predictors). Using PCA in this manner is typically called feature extraction.
Let's compute the components:
> library(caret) > head(example_train)
PredictorA PredictorB Class
2 3278.726 154.89876 One
3 1727.410 84.56460 Two
4 1194.932 101.09107 One
12 1027.222 68.71062 Two
15 1035.608 73.40559 One
16 1433.918 79.47569 One
> pca_pp <- preProcess(example_train[, 1:2], + method = c("center", "scale", "pca")) + pca_pp
Call:
preProcess.default(x = example_train[, 1:2], method = c("center",
"scale", "pca"))
Created from 1009 samples and 2 variables
Pre-processing: centered, scaled, principal component signal extraction
PCA needed 2 components to capture 95 percent of the variance
> train_pc <- predict(pca_pp, example_train[, 1:2]) > test_pc <- predict(pca_pp, example_test[, 1:2]) > head(test_pc, 4)
PC1 PC2
1 0.8420447 0.07284802
5 0.2189168 0.04568417
6 1.2074404 -0.21040558
7 1.1794578 -0.20980371
Note that we computed all the necessary information from the training set and apply these calculations to the test set. What do the test set data look like?
These are the test set predictors simply rotated.
PCA is unsupervised, meaning that the outcome classes are not considered when the calculations are done. Here, the area under the ROC curves for the first component is 0.5 and 0.81 for the second component. These results jive with the plot above; the first component has an random mixture of the classes while the second seems to separate the classes well. Box plots of the two components reflect the same thing:
There is much more separation in the second component.
This is interesting. First, despite PCA being unsupervised, it managed to find a new predictor that differentiates the classes. Secondly, it is the last component that is most important to the classes but the least important to the predictors. It is often said that PCA doesn't guarantee that any of the components will be predictive and this is true. Here, we get lucky and it does produce something good.
However, imagine that there are hundreds of predictors. We may only need to use the first X components to capture the majority of the information in the predictors and, in doing so, discard the later components. In this example, the first component accounts for 92.4% of the variation in the predictors; a similar strategy would probably discard the most effective predictor.
How does the idea of feature engineering come into play here? Given these two predictors and seeing the first scatterplot shown above, one of the first things that occurs to me is "there are two correlated, positive, skewed predictors that appear to act in tandem to differentiate the classes". The second thing that occurs to be is "take the ratio". What does that data look like?
The corresponding area under the ROC curve is 0.8, which is nearly as good as the second component. A simple transformation based on visually exploring the data can do just as good of a job as an unbiased empirical algorithm.
These data are from the cell segmentation experiment of Hill et al, and predictor A is the "surface of a sphere created from by rotating the equivalent circle about its diameter" (labeled as EqSphereAreaCh1
in the data) and predictor B is the perimeter of the cell nucleus (PerimCh1
). A specialist in high content screening might naturally take the ratio of these two features of cells because it makes good scientific sense (I am not that person). In the context of the problem, their intuition should drive the feature engineering process.
However, in defense of an algorithm such as PCA, the machine has some benefit. In total, there are almost sixty predictors in these data whose features are just as arcane as EqSphereAreaCh1
. My personal favorite is the "Haralick texture measurement of the spatial arrangement of pixels based on the co-occurrence matrix". Look that one up some time. The point is that there are often too many features to engineer and they might be completely unintuitive from the start.
Another plus for feature extraction is related to correlation. The predictors in this particular data set tend to have high between-predictor correlations and for good reasons. For example, there are many different ways to quantify the eccentricity of a cell (i.e. how elongated it is). Also, the size of a cell's nucleus is probably correlated with the size of the overall cell and so on. PCA can mitigate the effect of these correlations in one fell swoop. An approach of manually taking ratios of many predictors seems less likely to be effective and would take more time.
Last year, in one of the R&D groups that I support, there was a bit of a war being waged between the scientists who focused on biased analysis (i.e. we model what we know) versus the unbiased crowd (i.e. just let the machine figure it out). I fit somewhere in-between and believe that there is a feedback loop between the two. The machine can flag potentially new and interesting features that, once explored, become part of the standard book of "known stuff".
icecream temp=c(11.9, 14.2, 15.2, 16.4, 17.2, 18.1,
18.5, 19.4, 22.1, 22.6, 23.4, 25.1),
units=c(185L, 215L, 332L, 325L, 408L, 421L,
406L, 412L, 522L, 445L, 544L, 614L)
)
I will be comparing the results from this model with others. Hence, I write a helper function to provide consistent graphical outputs:basicPlot plot(units ~ temp, data=icecream, bty="n", lwd=2,
main="Number of ice creams sold", col="#00526D",
xlab="Temperature (Celsius)",
ylab="Units sold", ...)
axis(side = 1, col="grey")
axis(side = 2, col="grey")
}
basicPlot()
As expected more ice cream is sold at higher temperature.lsq.mod basicPlot()
abline(lsq.mod, col="orange", lwd=2)
legend(x="topleft", bty="n", lwd=c(2,2), lty=c(NA,1),
legend=c("observation", "linear least square"),
col=c("#00526D","orange"), pch=c(1,NA))
That's easy and looks not unreasonable.glm
function, in which I specify the "response distribution" (namely the number of ice creams) as Gaussian and the link function from the expected value of the distribution to its parameter (i.e. temperature) as identity. Indeed, this really is the trick with a GLM, it describes how the distribution of the observations and the expected value, often after a smooth transformation, relate to the actual measurements (predictors) in a linear way. The 'link' is the inverse function of the original transformation of the data.lin.mod family=gaussian(link="identity"))
library(arm) # for 'display' function only
display(lin.mod)
## glm(formula = units ~ temp, family = gaussian(link = "identity"),
## data = icecream)
## coef.est coef.se
## (Intercept) -159.47 54.64
## temp 30.09 2.87
## ---
## n = 12, k = 2
## residual deviance = 14536.3, null deviance = 174754.9 (difference = 160218.6)
## overdispersion parameter = 1453.6
## residual sd is sqrt(overdispersion) = 38.13
Thus, to mimic my data I could generate random numbers from the following Normal distribution:rlnorm
.log.lin.mod family=gaussian(link="identity"))
display(log.lin.mod)
## glm(formula = log(units) ~ temp, family = gaussian(link = "identity"),
## data = icecream)
## coef.est coef.se
## (Intercept) 4.40 0.20
## temp 0.08 0.01
## ---
## n = 12, k = 2
## residual deviance = 0.2, null deviance = 1.4 (difference = 1.2)
## overdispersion parameter = 0.0
## residual sd is sqrt(overdispersion) = 0.14
log.lin.sig log.lin.pred basicPlot()
lines(icecream$temp, log.lin.pred, col="red", lwd=2)
legend(x="topleft", bty="n", lwd=c(2,2), lty=c(NA,1),
legend=c("observation", "log-transformed LM"),
col=c("#00526D","red"), pch=c(1,NA))
exp(coef(log.lin.mod)[1])
## (Intercept)
## 81.62131
Although this model makes a little more sense, it appears that is predicts too many sales at the low and high-end of the observed temperature range. Furthermore, there is another problem with this model and the previous linear model as well. The assumed model distributions generate real numbers, but ice cream sales only occur in whole numbers. As a result, any draw from the model distribution should be a whole number.pois.mod family=poisson(link="log"))
display(pois.mod)
## glm(formula = units ~ temp, family = poisson(link = "log"), data = icecream)
## coef.est coef.se
## (Intercept) 4.54 0.08
## temp 0.08 0.00
## ---
## n = 12, k = 2
## residual deviance = 60.0, null deviance = 460.1 (difference = 400.1)
pois.pred basicPlot()
lines(icecream$temp, pois.pred, col="blue", lwd=2)
legend(x="topleft", bty="n", lwd=c(2,2), lty=c(NA,1),
legend=c("observation", "Poisson (log) GLM"),
col=c("#00526D","blue"), pch=c(1,NA))
exp
function to make a prediction of sales for a given temperature. However, R will do this for me automatically, if I set in the predict
statement above type="response"
.predict(pois.mod, newdata=data.frame(temp=32), type="response")
## 1
## 1056.651
Perhaps the exponential growth in my model looks a little too good to be true. Indeed, I am pretty sure that my market will be saturated at around 800. Warning: this is a modelling assumption from my side!market.size icecream$opportunity bin.glm family=binomial(link = "logit"))
display(bin.glm)
## glm(formula = cbind(units, opportunity) ~ temp, family = binomial(link = "logit"),
## data = icecream)
## coef.est coef.se
## (Intercept) -2.97 0.11
## temp 0.16 0.01
## ---
## n = 12, k = 2
## residual deviance = 84.4, null deviance = 909.4 (difference = 825.0)
bin.pred basicPlot()
lines(icecream$temp, bin.pred, col="purple", lwd=2)
legend(x="topleft", bty="n", lwd=c(2,2), lty=c(NA,1),
legend=c("observation", "Binomial (logit) GLM"),
col=c("#00526D","purple"), pch=c(1,NA))
plogis
:# Sales at 0 Celsius
plogis(coef(bin.glm)[1])*market.size
## (Intercept)
## 39.09618
# Sales at 35 Celsius
plogis(coef(bin.glm)[1] + coef(bin.glm)[2]*35)*market.size
## (Intercept)
## 745.7449
So, that is 39 ice creams at 0ºC and 746 ice creams at 35ºC. Yet, these results will of course change if I change my assumptions on the market size. A market size of 1,000 would suggest that I can sell 55 units at 0ºC and 846 at 35ºC.temp p.lm p.log.lm 0.5 * summary(log.lin.mod)$dispersion)
p.pois p.bin basicPlot(xlim=range(temp), ylim=c(-20,market.size))
lines(temp, p.lm, type="l", col="orange", lwd=2)
lines(temp, p.log.lm, type="l", col="red", lwd=2)
lines(temp, p.pois, type="l", col="blue", lwd=2)
lines(temp, p.bin, type="l", col="purple", lwd=2)
legend(x="topleft",
legend=c("observation",
"linear model",
"log-transformed LM",
"Poisson (log) GLM",
"Binomial (logit) GLM"),
col=c("#00526D","orange", "red",
"blue", "purple"),
bty="n", lwd=rep(2,5),
lty=c(NA,rep(1,4)),
pch=c(1,rep(NA,4)))
n A set.seed(1234)
(rand.normal mean=A %*% coef(lin.mod),
sd=sqrt(summary(lin.mod)$dispersion)))
## [1] 152.5502 278.3509 339.2073 244.5335 374.3981 404.4103 375.2385
## [8] 403.3892 483.9470 486.5775 526.3881 557.6662
(rand.logtrans meanlog=A %*% coef(log.lin.mod),
sdlog=sqrt(summary(log.lin.mod)$dispersion)))
## [1] 196.0646 266.1002 326.8102 311.5621 315.0249 321.1920 335.3733
## [8] 564.6961 515.9348 493.4822 530.8356 691.2354
(rand.pois lambda=exp(A %*% coef(pois.mod))))
## [1] 220 266 279 337 364 434 349 432 517 520 516 663
(rand.bin size=market.size,
prob=plogis(A %*% coef(bin.glm))))
## [1] 197 249 302 313 337 386 387 414 516 538 541 594
basicPlot(ylim=c(100,700))
cols alpha.f = 0.75)
points(icecream$temp, rand.normal, pch=19, col=cols[1])
points(icecream$temp, rand.logtrans, pch=19, col=cols[2])
points(icecream$temp, rand.pois, pch=19, col=cols[3])
points(icecream$temp, rand.bin, pch=19, col=cols[4])
legend(x="topleft",
legend=c("observation",
"linear model",
"log-transformed LM",
"Poisson (log) GLM",
"Binomial (logit) GLM"),
col=c("#00526D",cols), lty=NA,
bty="n", lwd=rep(2,5),
pch=c(1,rep(19,4)))
R version 3.2.1 (2015-06-18)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.4 (Yosemite)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] arm_1.8-6 lme4_1.1-8 Matrix_1.2-1 MASS_7.3-42
loaded via a namespace (and not attached):
[1] minqa_1.2.4 tools_3.2.1 coda_0.17-1 abind_1.4-3
[5] Rcpp_0.11.6 splines_3.2.1 nlme_3.1-121 grid_3.2.1
[9] nloptr_1.0.4 lattice_0.20-31
This post was originally published on mages' blog.