A simple way to reshape data

In everyday data analysis, it’s not at all uncommon to find that you have your data organised one way, but the software you want to use to analyse the data expects the data to be organised another way. Knowing how to reshape your data so that you can analyse it is a basic data analysis skill, but one that doesn’t usually get taught in a lot of detail in research methods classes in psychology. I think this is a pity: reorganising data is important, but not easy, and a lot of students struggle with it.

R contains a lot of very powerful functions that you can use to reshape data. The cast() and melt() functions within the reshape package are very powerful tools, for example, and the reshape() is pretty powerful too. Even better, there’s quite a lot of helpful tutorials that you can find online for using these functions (e.g., here, here and here). The longer I use R, the more I find myself using these functions on a regular basis. They’re really handy things.

However, they do have a drawback: they’re not always easy for novices to use. The reason for this is that they are “general purpose” reshaping functions, especially cast() and melt(). They’re designed to handle lots of different situations. In fact, the help documentation to the reshape package even goes so far as to claim that cast() and melt() are the only two reshaping functions you should ever need. I think that’s a bit of an exaggeration, but not by a lot. It’s really quite impressive how much data manipulation you can do with just these functions. Unfortunately, the price of flexibility is complexity: cast() and melt() in particular take a long time to learn how to use, and I still find myself needing to re-read the documentation every time I want to do something with them.

This poses a problem from an educational perspective. I do want my students to learn a little bit about how to reshape data, and I don’t want them to be restricted only to “toy” scenarios. However, in practice I don’t think it’s feasible to teach psychology students how to use reshape() in an introductory class, much less cast() and melt(). The solution that I came up with was to write two simpler functions, wideToLong() and longToWide() (both contained in the lsr package), which are much more restrictive than reshape(), cast() and melt(). The idea was to have functions that are powerful enough to handle reshaping problems of “intermediate” complexity, but are extremely simple to use.

Continue reading

Posting frequency

I don’t imagine that anyone is reading this blog at this point. I’ve made no effort to publicise it and I’m not posting very frequently, especially over the (Australian) summer break. Besides, I only have a limited amount of time I can spend on statistics stuff, and I’m spending most of that time getting my lecture notes to a state that I wouldn’t be completely embarrassed to post them on the web. However, on the off chance that someone does find this blog before I start taking it seriously … yes, I do eventually intend to post more frequently than once a month!

A simple way to calculate group means in R

The very first thing that made me realise I would need to be careful when teaching R to undergraduate psych students was a remark in an introductory text on calculating group means using tapply(). Suppose you have a data frame called experiment containing three variables, outcome, group and gender. What you’d like to do is calculate means and confidence intervals for the outcome variable, broken down by group and gender. For one reason or another the most common advice I’ve seen for doing so is to use tapply() function. However, I think it makes a lot more sense to introduce students to the aggregate() function first. The aggregate() function has a much simpler syntax, and produces much cleaner looking output. Continue reading

Another R blog, when there are so many

I think I started my first blog in 2002 or thereabouts. Back then I was working as a postdoc in quantitative psychology at Ohio State and filled with naive enthusiasm for the medium. I ran my blog using blosxom and hosted it on my homepage. Trackbacks were mostly manual, social media wasn’t a Thing, and we had ponies. That blog is now long gone, along with three or four others. Fortunately, the Wayback Machine has managed to archive one or two snapshots, proving that I was once young.

A big lesson I learned from blogging in the dark ages is that a blog needs to serve a purpose. Without a clear goal in mind, the author tends to get bored and start talking about kittens. The internet does not need more kittens, therefore a blog needs a purpose.

The purpose of this blog is to talk about statistics, and in particular to talk about learning statistics with R. This is by no means the only blog on the internet with that goal, of course. However, I’ve recently posted an R package by the same name (lsr). Moreover, I’ve been teaching an intro stats class for psychology students since 2010, and my lecture materials are usually entitled “Learning Statistics with R” or some variant of this. So it seemed like as good a title as any for a blog. As I see it, the goal of this blog is to talk about topics such as the following:

  • How to run basic analyses in R, and how those analyses work
  • Teaching R in an intro stats class for psych students
  • The lsr package itself
  • My lecture notes, once I get around to posting them online.

I guess we’ll see how that goes.

In the meantime, the point of this post, in addition to introducing the blog itself, is to post a some R code so that I can see how code will look. Let’s go with this:

print( "this is an R command" )

Yes, that’ll do nicely.