RStudio -- the Good, the Bad, the Ugly, and the Insufferable
Many R users, and the vast majority of students in my class, interact
with R through RStudio, available here. Obviously for many users RStudio offers
a superior way to work, and as far as I know works with all of the material presented
in these tutorials and with all the packages included. However, the
miscellaneous scripts at the bottom of the LabDSV R page will cause RStudio to
crash. Personally, I'm an Old
School guy, and I'm not completely sold on RStudio. As a linux user, it solves
problems I don't have and it encourages (if not enforces) a way of working I
don't appreciate. Nonetheless, since so many people find it useful I'll
contribute some thoughts on best practices using RStudio with LabDSV.
If you just want advice on best practices in RStudio, skip down to here and avoid my rant.
One of the excellent things RStudio does is encourage users to define "projects"
and work within the projects. This keeps all the related data and work together in a
single workspace and separates it from other projects you might be working on.
Windows users, in particular, that start R by clicking on the icon on
their desktop often end up with much (or all) of their work in a single
workspace, commingling work from many potentially different projects. R
of course has facilities to work with separate projects natively, but users
enamored of the desktop often do not use them to full extent. So, RStudio
projects are a good idea. You will certainly want a "project" for the labs
presented in LabDSV, and you might want to separate the labs into separate
projects like "species modeling", "ordination", and "clustering."
RStudio encourages users to work from a script file rather than directly at the
console. In addition, it provides a text editor for those script files that is
much superior to generic Wordpad. (Word obviously should never be used to write
R scripts.) As a bonus, the editor can be made to mimic vi, but that's perhaps
moot to most Windows and even Mac users.
When you use the console or script editor window RStudio pops up the command-line version of
tool-tips, providing the list of arguments for any function you start to enter.
This can be really helpful for commands you're less familiar with and makes it
unnecessary to start up the help file just to get the command arguments. Bravo
for this innovation.
By default, RStudio starts up with a four-panel configuration packed into a
single frame. This packaging is very Windowsesque and in my mind impractical.
The distribution of window types within the frame is configurable, and each of
the windows may have tabs allowing you to select exactly what will appear in the
even if you have a large, high resolution monitor the four panel arrangement
tries to pack too much into too small a space and results in undesirable
compression of information.
Even if you work in the console, RStudio traps and redirects many simple R
commands. In particular, I dislike the way it crams R help files into a tiny
little window. Normally,?command will produce a formatted output to the
console that is easily negotiated with the arrow keys. If you want to stretch
out, help.start() pops up the help file system in your browser with full
hot-link capability. It's the same browser with the same capability you're
already familiar with, and you can shrink it or pop it behind your R session to get it out of
the way when you don't need it. It's a vastly superior solution to the RStudio
box, but RStudio captures the help.start() command and redirects it to
the little box. Bummer.
Some of the little windows are of limited utility and waste space. The
Environment, History, Connections window is of little use. The
information it provides is normally easily printed to the console (if and when
you want it) except that now RStudio traps the function calls. For example,
history(100) will call up the last 100 commands entered at the console
which you can browse and manage with your arrow keys. You can easily
cut-and-paste commands from your history. Instead, in RStudio it redirects the
the little Environment, History, Connections box. Bummer.
The Environment, History, Connections window provides the
Environment tab which gives a summary of objects in your
workspace. It's more detail than the ls() command provides, but not
nearly as much information as str() provides. Clicking on the blue
arrow button will provide the str() output, however. Maybe it's
helpful, but ls() and str() in the console provide the same
information without taking up real estate on your screen unless you specifically
want to see something.
Data File Import
Importing data into R seems to be a significant problem for many users.
Admittedly the plethora of optional arguments for read.table and the
profusion of read.whatever functions makes things a little
funky. You might think that this is one area where a GUI could improve things.
Unfortunately, this appears not to be the case.
In RStudio there are two ways to import data: (1) using the File menu and
selecting "Importing Dataset", or (2) clicking on a file in the File tab in the
"Files, Plots, Etc" panel. Unfortunately (maybe deliberately?) they do things
differently. The File menu approach first pops up a list of import format
options. Notably, it offers a choice between "From text (base) ..." and "From
Text (readr) ..." This is important because base and readr
differ significantly in the data formatting they support. If you choose "From
text (base)" you get a file chooser. Selecting a file opens the import GUI with
options to the left, a file previewer to the top right, and a data.frame
previewer to the bottom right. This part is nice. Unfortunately, it doesn't
offer a code preview of the R function that will ultimately be called to do the
import. After clicking on "Import" you can see in the console that it uses
read.table. Importantly, it offers options for choosing or setting
row.names() and a box to click for "stringsAsFactors". It follows the
read.table protocol that if the first row has one fewer entries than
the rest of the lines then the first line is assumed to be a header with column
names. It offers a fairly limited set of separators (e.g not including |), but
is otherwise fairy flexible and functional.
The second option (clicking on a file name in the File panel) is quite
different. First, it only offers the option to import files with a limited
number of file extensions (maybe just .csv?). Files with .dat or .txt extensions
cannot be imported. Second, even if package readr is not loaded, it's
first choice is to load readr and then use read_csv(). For
example, for a file called test.csv, it provides a "Code Preview" that looks
test <- read_csv("test.csv")
There doesn't seem to be an option to choose (base) read.table(), but
you can edit the Code Preview and change whatever you like. To the lower left is
a panel with options to select. This time, the Delimiter dropdown menu offers
"other" and you can specify "|" for example, which automatically changes the
Code Preview to use read_delim() instead of read_csv.
Unfortunately, read_delim() seems incapable of reading files where the
first line has fewer entries than the rest of the lines. The previewer gives no
warning (although you can see the headings are misaligned if you look). When
you click "Import", however, the console fills with error messages
In rbind(names(probs), probs_f) :
number of columns of result is not a multiple of vector length (arg 1)
That's a fairly cryptic error message for a parsing failure, but the material
that comes before that is somewhat more helpful. Curiously, despite the copious
error messages it does not abort the import and happily reads the data into the
wrong columns, pruning off the last column of data instead of manufacturing a
column heading for the last column. As far as I know, this is standard behavior
for readr functions, but it's disturbing.
Since read_delim() is a readr function, it returns a tibble
instead of a data.frame whether that's what you want or not. I STRONGLY advise
against using tibbles in LabDSV; row.names are critical and tibbles often mangle them.
There is obviously more to say, but I'll leave it for the moment.
If you're really stubborn, you can edit the code in the Code Preview window to do
what you want, but every time you touch something in the options panel RStudio
will rewrite the Code Preview window to its liking. Frankly, it's much easier
to use the console.
RStudio totally hoses the outstanding graphics capabilities of R. By default, in RStudio
graphics get plotted to a little window in a box that is much too small to allow
for a decent font or resolution. It's cramped and ugly. Worse yet, it decides on the
X and Y limits depending on the size and shape of that little box. There is an option to
"zoom" the plot, which produces another re-sizable graphics window. The window
is re-sizable, but it's not possible to specify an actual dimension or font
point size so that it's difficult to simulate a graphic you would include in a
lab report or publication.
Much worse, from the perspective of LabDSV, while you can resize it (with
"automatic" font and glyph size re-scaling) you cannot interact with the zoomed
window; you still have to precisely identify points in the tiny window. You can
(and generally have to) resize the box (at the expense of the console and script
editor) to actually see what you're doing. Equally bad, it doesn't render the
points as you draw (just pops up a stupid blue bubble), and you can't see what
you have drawn until you click "finish" or "ESC". It's ridiculously easy to
make mistakes, and if you do, all you can do is start over. RStudio users
should avoid the RStudioGD graphics device like the plague.
Worse yet, in my mind, it only offers you a single window. It stores all of
your plots and you can recall them later if you want, but you can't put two
plots up side-by-side to compare. You can do par(mfrow=(1,2)) to get
two plots side-by-side, but now their aspect ratios are squished and ridiculous
until you resize the box again at the expense of the console and script editor
windows. It's lame.
I routinely have two or three graphics windows open with specified sizes, font
sizes and families. I can pop them to the front or back with the window manager
and put them side-by-side or atop each other easily. It's beautiful.
The Script Editor
As I noted above, providing a built-in script editor is arguably a good idea,
especially if you're stuck using Windows. If I had to use someone else's Windows
computer that lacked a decent text editor I would undoubtedly be delighted to
have the one inside RStudio.
While in exploratory mode I generally use R directly from the console. Encouraging
users to write reproducible scripts is a good idea as their work matures.
However, I insist "It's only a script if you source it." Instead, what
I observe is that students enter all their interactive coding into the script
window, as opposed to the console, and then highlight specific lines and click
on "Run" to get it transferred down to the console. It's not just that it's
inefficient (and it is, just type the command in the freakin console), it leads
to horrible habits. The script window code gives the impression of a specific
order of execution, but when you just highlight random lines and click on "Run"
you have no real order of operation. You could easily highlight the same line
twice and get different results because you have executed other commands in the
meantime. Now arguably this is user error and not RStudio's fault, but it should be impossible
to execute specific commands without re-executing all lines in the script that
precede that line. So, instead of a script, at best the script editor becomes a
closet of R code, and at worst a junk drawer of R code. It's dangerous.
RStudio insists on managing your Windows according to its own rules. On my
computer, to cut-and-paste I use the first and second mouse buttons with no need
to use the keyboard. In RStudio to transfer code from the console to the
script window I have to use the heinous CTRL-C/CTRL-V mantra. Worse, on my computer I
use "focus follows Mouse." I.e., whatever window my mouse cursor is in gets focus and
I don't have to click in the window to get the old one to let go and get the new
one to pay attention. Having to click in a window before you can type in it is
like having to punch somebody in the nose before you can talk to them; it's
violent and unnecessary. And it's infuriating when you are constantly typing in
the wrong window after deliberately moving the cursor to the window you want.
Students have heard many an expletive from me when I have to work on Windows,
and RStudio does its damnedest to convert my linux machine to Windows.
In R I can generally execute the system() command to get to the shell
interpreter and execute miscellaneous commands. In RStudio I can sometimes do
that, i.e. system('ls -l') works and gives a list of files from my
current directory in the console. On the other hand, something as simple as
system('more filename.txt') crashes RStudio and you lose all your work.
It's unforgivable. There is a separate "terminal" tab in the console window
that allows interaction with the system, but it's another unnecessary nuisance
in using RStudio.
Script Editor and fix()
In R you can enter fix(function_name)and pop up an editing session with
the editor of your choice. When you are done, you exit the editor and it saves
a copy of your function into your current workspace. If you are comfortable in
your editor it's a Godsend. In RStudio, if you enter
fix(function_name) it pops up a ridiculous little postage stamp of an
editor and completely ignores your choice of editor. Even though I specified vi
emulation in the script editor the fix() editor ignores that and gives
me some primitive editing functions. To further the
aggravation, the window is not re-sizable, but rather offers a horizontal
scrolling bar if your text is wider than 35 characters. Give me a break!
Instead, you have to use the script editor (even though you're writing a
function, not a script). When you're done, however, you can't just enter the
exit command and save the function to your workspace. You can write it to a file
with the "Source on Save" box ticked, or you can highlight all the lines and
click on the Run button. However, that saves the whole thing to your console
one line at a time, scrolling off anything else you were interested in and clogging
up your .Rhistory file unnecessarily. It's lame.
Best Practices in RStudio
If you find that RStudio provides a better way to interact with R then by all
means make use of it. However, I strongly suggest the following:
- Do NOT import data using the RStudio data import tools. They will hose your data
and make your life miserable.
- Take your mouse, grab the vertical bar separating the script window and
console from the other two windows and push the bar all the way to the right to
eliminate the "Environment, History, Connections" and "Files, Plots, etc"
windows. You can always get them back if you want, but get them out of the
- Click in the console window, and enter x11() to pop up a floating
graphics window to plot to. If you're on a Mac, enter quartz().
As I noted above, you can specify a specific size
and font point size if you prefer. Enter the command more than once if you
want to do side-by-side comparisons or have multiple plots visible at the same
time. The windows will get numbered (starting with 2) and you can specify which
window is the current device with dev.set(2) for example.
- If you're not actually writing a script intended to be run from the first line
to the last, enter your commands directly in the console, not the script
editor. Don't worry, you have permission to enter text there; it doesn't belong
to RStudio exclusively. The order of operation is saved to the .Rhistory
file so you know exactly what you have done. Later, you can easily edit the .Rhistory
file into a script if desired.