Organizing your work in R

I think the best way to manage the .Rdata and .Rhistory files is to set up a different folder for each project or assignment. Make a copy of the shortcut to the R application and set the properties of the shortcut (right-click on the shortcut icon) so it starts in the project folder you made. .RData and .Rhistory will be stored in that folder; copy both to a floppy if you want to take your work to a new computer at the end of a session. If you are working in the BSB lab remember that you can't write to Drive K, so the best place to create the folder is in D:\Temp.

Use the data frame mydata you set up when you were learning R.

- Add a column to the data frame, giving the names of the subjects, "Joe","Bill","Sam","Beth","Sue".
- Fit a straight line through the plot of y against x1, that is, compute the simple linear regression of y on x1.
- Plot y against x1, but hide the points.
- Use
**text()**to place the subjects' names in place of the points on the graph. - Add a title to the graph.
- Add the fitted line to the graph.

> mydata$name <- c("Joe","Bill","Sam","Beth","Sue") > mydata y x1 x2 name 1 1.2 1.5 1 Joe 2 3.6 2.5 1 Bill 3 5.1 6.0 1 Sam 4 4.2 3.1 2 Beth 5 2.1 2.2 2 Sue > lmfit <- lm(y~x1, data=mydata) > lmfit Call: lm(formula = y ~ x1, data = mydata) Coefficients: (Intercept) x1 0.8519 0.7804 > plot(mydata$x1, mydata$y, xlab="x1", ylab="y", type="n") > text(mydata$x1, mydata$y, mydata$name) > title("An X-Y Text Plot") > abline(lmfit)

Plot the Binomial distribution by setting up a spreadsheet with
consecutive values of *x* in the first column,
*f*(*x*) in the second column, and values for *n* and
*p* in nearby cells. The graph should automatically redraw if
*n* or *p* changes. Repeat for the Poisson distribution.

If you're not sure what I'm asking for here, click here to see the Excel workbook distributions.xls.

Generate 20 observations from a standard normal distribution and draw a graph showing a histogram (as relative frequencies), a smoothed density estimate, a dot plot, and the true standard normal probability density function.

Repeat this a few times with n = 20, then a few times with n = 40, a few times with n = 100 , a few times with n = 1000, and a few times with n = 10000. How many observations do you need before you can say with any certainty whether or not a given sample came from a Normal distribution?

If you have time, do this again for a skewed distribution, such as the chi-square distribution on 1 or 3 degrees of freedom.

Since creating the graph involves several steps, you might want to
write a function **normdat(n)** so you don't have to type in all
the steps every time. The easiest way to write this function is to
type **fix(normdat)**; this will open a text editor, where you can
write the function, then save it as you exit the editor. The same
command **fix(normdat)** is also a convenient way to edit or modify an existing function.

> normdat function(n = 50) { xdat <- rnorm(n) hist(xdat, prob = T) lines(density(xdat)) points(xdat, rep(0, n)) xgr <- seq(-4, 4, length = 100) lines(xgr, dnorm(xgr), lty = 2) }

Once **normdat()** is written, you just have to type
**normdat(20)** a few times, **normdat(40)** a few times, etc.,
to complete the exercise.

Statistics 2MA3 2002-2003

Statistics 2MA3 2001-2002

Statistics 2MA3 2000-2001

Statistics 2MA3