RStudio -- the Good, the Bad, the Ugly, and the Insufferable

Many R users, and the vast majority of students in my class, interact with R through RStudio, available here. Obviously for many users RStudio offers a superior way to work, and as far as I know works with all of the material presented in these tutorials and with all the packages included. However, the miscellaneous scripts at the bottom of the LabDSV R page will cause RStudio to crash. Personally, I'm an Old School guy, and I'm not completely sold on RStudio. As a linux user, it solves problems I don't have and it encourages (if not enforces) a way of working I don't appreciate. Nonetheless, since so many people find it useful I'll contribute some thoughts on best practices using RStudio with LabDSV.

If you just want advice on best practices in RStudio, skip down to here and avoid my rant.

The Good

Projects

One of the excellent things RStudio does is encourage users to define "projects" and work within the projects. This keeps all the related data and work together in a single workspace and separates it from other projects you might be working on. Windows users, in particular, that start R by clicking on the icon on their desktop often end up with much (or all) of their work in a single workspace, commingling work from many potentially different projects. R of course has facilities to work with separate projects natively, but users enamored of the desktop often do not use them to full extent. So, RStudio projects are a good idea. You will certainly want a "project" for the labs presented in LabDSV, and you might want to separate the labs into separate projects like "species modeling", "ordination", and "clustering."

Script Editor

RStudio encourages users to work from a script file rather than directly at the console. In addition, it provides a text editor for those script files that is much superior to generic Wordpad. (Word obviously should never be used to write R scripts.) As a bonus, the editor can be made to mimic vi, but that's perhaps moot to most Windows and even Mac users.

Tool Tips

When you use the console or script editor window RStudio pops up the command-line version of tool-tips, providing the list of arguments for any function you start to enter. This can be really helpful for commands you're less familiar with and makes it unnecessary to start up the help file just to get the command arguments. Bravo for this innovation.

The Bad

Panel Format

By default, RStudio starts up with a four-panel configuration packed into a single frame. This packaging is very Windowsesque and in my mind impractical. The distribution of window types within the frame is configurable, and each of the windows may have tabs allowing you to select exactly what will appear in the window. However, even if you have a large, high resolution monitor the four panel arrangement tries to pack too much into too small a space and results in undesirable compression of information.

Command Redirection

Even if you work in the console, RStudio traps and redirects many simple R commands. In particular, I dislike the way it crams R help files into a tiny little window. Normally,?command will produce a formatted output to the console that is easily negotiated with the arrow keys. If you want to stretch out, help.start() pops up the help file system in your browser with full hot-link capability. It's the same browser with the same capability you're already familiar with, and you can shrink it or pop it behind your R session to get it out of the way when you don't need it. It's a vastly superior solution to the RStudio box, but RStudio captures the help.start() command and redirects it to the little box. Bummer.

Wasted Space

Some of the little windows are of limited utility and waste space. The Environment, History, Connections window is of little use. The information it provides is normally easily printed to the console (if and when you want it) except that now RStudio traps the function calls. For example, history(100) will call up the last 100 commands entered at the console which you can browse and manage with your arrow keys. You can easily cut-and-paste commands from your history. Instead, in RStudio it redirects the output to the little Environment, History, Connections box. Bummer.

The Environment, History, Connections window provides the Environment tab which gives a summary of objects in your workspace. It's more detail than the ls() command provides, but not nearly as much information as str() provides. Clicking on the blue arrow button will provide the str() output, however. Maybe it's helpful, but ls() and str() in the console provide the same information without taking up real estate on your screen unless you specifically want to see something.

Data File Import

Importing data into R seems to be a significant problem for many users. Admittedly the plethora of optional arguments for read.table and the profusion of read.whatever functions makes things a little funky. You might think that this is one area where a GUI could improve things. Unfortunately, this appears not to be the case.

In RStudio there are two ways to import data: (1) using the File menu and selecting "Importing Dataset", or (2) clicking on a file in the File tab in the "Files, Plots, Etc" panel. Unfortunately (maybe deliberately?) they do things differently. The File menu approach first pops up a list of import format options. Notably, it offers a choice between "From text (base) ..." and "From Text (readr) ..." This is important because base and readr differ significantly in the data formatting they support. If you choose "From text (base)" you get a file chooser. Selecting a file opens the import GUI with options to the left, a file previewer to the top right, and a data.frame previewer to the bottom right. This part is nice. Unfortunately, it doesn't offer a code preview of the R function that will ultimately be called to do the import. After clicking on "Import" you can see in the console that it uses read.table. Importantly, it offers options for choosing or setting row.names() and a box to click for "stringsAsFactors". It follows the read.table protocol that if the first row has one fewer entries than the rest of the lines then the first line is assumed to be a header with column names. It offers a fairly limited set of separators (e.g not including |), but is otherwise fairy flexible and functional.

The second option (clicking on a file name in the File panel) is quite different. First, it only offers the option to import files with a limited number of file extensions (maybe just .csv?). Files with .dat or .txt extensions cannot be imported. Second, even if package readr is not loaded, it's first choice is to load readr and then use read_csv(). For example, for a file called test.csv, it provides a "Code Preview" that looks like

library(readr) test <- read_csv("test.csv") View(test)

There doesn't seem to be an option to choose (base) read.table(), but you can edit the Code Preview and change whatever you like. To the lower left is a panel with options to select. This time, the Delimiter dropdown menu offers "other" and you can specify "|" for example, which automatically changes the Code Preview to use read_delim() instead of read_csv. Unfortunately, read_delim() seems incapable of reading files where the first line has fewer entries than the rest of the lines. The previewer gives no warning (although you can see the headings are misaligned if you look). When you click "Import", however, the console fills with error messages ending with

In rbind(names(probs), probs_f) : number of columns of result is not a multiple of vector length (arg 1)

That's a fairly cryptic error message for a parsing failure, but the material that comes before that is somewhat more helpful. Curiously, despite the copious error messages it does not abort the import and happily reads the data into the wrong columns, pruning off the last column of data instead of manufacturing a column heading for the last column. As far as I know, this is standard behavior for readr functions, but it's disturbing.

Since read_delim() is a readr function, it returns a tibble instead of a data.frame whether that's what you want or not. I STRONGLY advise against using tibbles in LabDSV; row.names are critical and tibbles often mangle them. There is obviously more to say, but I'll leave it for the moment.

If you're really stubborn, you can edit the code in the Code Preview window to do what you want, but every time you touch something in the options panel RStudio will rewrite the Code Preview window to its liking. Frankly, it's much easier to use the console.

The Ugly

Ridiculous Graphics

RStudio totally hoses the outstanding graphics capabilities of R. By default, in RStudio graphics get plotted to a little window in a box that is much too small to allow for a decent font or resolution. It's cramped and ugly. Worse yet, it decides on the X and Y limits depending on the size and shape of that little box. There is an option to "zoom" the plot, which produces another re-sizable graphics window. The window is re-sizable, but it's not possible to specify an actual dimension or font point size so that it's difficult to simulate a graphic you would include in a lab report or publication.

Much worse, from the perspective of LabDSV, while you can resize it (with "automatic" font and glyph size re-scaling) you cannot interact with the zoomed window; you still have to precisely identify points in the tiny window. You can (and generally have to) resize the box (at the expense of the console and script editor) to actually see what you're doing. Equally bad, it doesn't render the points as you draw (just pops up a stupid blue bubble), and you can't see what you have drawn until you click "finish" or "ESC". It's ridiculously easy to make mistakes, and if you do, all you can do is start over. RStudio users should avoid the RStudioGD graphics device like the plague.

Worse yet, in my mind, it only offers you a single window. It stores all of your plots and you can recall them later if you want, but you can't put two plots up side-by-side to compare. You can do par(mfrow=(1,2)) to get two plots side-by-side, but now their aspect ratios are squished and ridiculous until you resize the box again at the expense of the console and script editor windows. It's lame.

I routinely have two or three graphics windows open with specified sizes, font sizes and families. I can pop them to the front or back with the window manager and put them side-by-side or atop each other easily. It's beautiful.

The Script Editor

As I noted above, providing a built-in script editor is arguably a good idea, especially if you're stuck using Windows. If I had to use someone else's Windows computer that lacked a decent text editor I would undoubtedly be delighted to have the one inside RStudio.

While in exploratory mode I generally use R directly from the console. Encouraging users to write reproducible scripts is a good idea as their work matures. However, I insist "It's only a script if you source it." Instead, what I observe is that students enter all their interactive coding into the script window, as opposed to the console, and then highlight specific lines and click on "Run" to get it transferred down to the console. It's not just that it's inefficient (and it is, just type the command in the freakin console), it leads to horrible habits. The script window code gives the impression of a specific order of execution, but when you just highlight random lines and click on "Run" you have no real order of operation. You could easily highlight the same line twice and get different results because you have executed other commands in the meantime. Now arguably this is user error and not RStudio's fault, but it should be impossible to execute specific commands without re-executing all lines in the script that precede that line. So, instead of a script, at best the script editor becomes a closet of R code, and at worst a junk drawer of R code. It's dangerous.

The Insufferable

Window Manager

RStudio insists on managing your Windows according to its own rules. On my computer, to cut-and-paste I use the first and second mouse buttons with no need to use the keyboard. In RStudio to transfer code from the console to the script window I have to use the heinous CTRL-C/CTRL-V mantra. Worse, on my computer I use "focus follows Mouse." I.e., whatever window my mouse cursor is in gets focus and I don't have to click in the window to get the old one to let go and get the new one to pay attention. Having to click in a window before you can type in it is like having to punch somebody in the nose before you can talk to them; it's violent and unnecessary. And it's infuriating when you are constantly typing in the wrong window after deliberately moving the cursor to the window you want. Students have heard many an expletive from me when I have to work on Windows, and RStudio does its damnedest to convert my linux machine to Windows.

Command Redirection

In R I can generally execute the system() command to get to the shell interpreter and execute miscellaneous commands. In RStudio I can sometimes do that, i.e. system('ls -l') works and gives a list of files from my current directory in the console. On the other hand, something as simple as system('more filename.txt') crashes RStudio and you lose all your work. It's unforgivable. There is a separate "terminal" tab in the console window that allows interaction with the system, but it's another unnecessary nuisance in using RStudio.

Script Editor and fix()

In R you can enter fix(function_name)and pop up an editing session with the editor of your choice. When you are done, you exit the editor and it saves a copy of your function into your current workspace. If you are comfortable in your editor it's a Godsend. In RStudio, if you enter fix(function_name) it pops up a ridiculous little postage stamp of an editor and completely ignores your choice of editor. Even though I specified vi emulation in the script editor the fix() editor ignores that and gives me some primitive editing functions. To further the aggravation, the window is not re-sizable, but rather offers a horizontal scrolling bar if your text is wider than 35 characters. Give me a break!

Instead, you have to use the script editor (even though you're writing a function, not a script). When you're done, however, you can't just enter the exit command and save the function to your workspace. You can write it to a file with the "Source on Save" box ticked, or you can highlight all the lines and click on the Run button. However, that saves the whole thing to your console one line at a time, scrolling off anything else you were interested in and clogging up your .Rhistory file unnecessarily. It's lame.

Best Practices in RStudio

If you find that RStudio provides a better way to interact with R then by all means make use of it. However, I strongly suggest the following: