Background, Motivation, & Assumptions

So far, throughout the various R Assignments, I have been subliminally training you to create a reproducible workflow and file structure. The purpose of this assignment is to make the reasons explicit for some of the things I’ve had you do, so far, with little to no explanation.

Creating a reproducible file structure is important for efficiently working with and sharing your research and data. When you have a reproducible file structure, it will be easier for you to come back to a project after weeks or months of not working on it and pick up where you left off. Also, it will be easier for others to work directly with you, reproduce your work, and/or expand upon your work.

Of course, in order to create a reproducible file structure, you need to understand how your computer stores and accesses the files it uses. As computers have become more user-friendly, knowledge of this has increasingly been lost among younger generations of computer users. In fact, it’s something you can regularly find older faculty complaining about on twitter.

Monica Chin has an interesting article about the issue. In that article, she interviews professors and students and concludes by suggesting that your generation will likely create your own tools that will be less dependent on traditional file structures. But guess what? We’re not there yet. So you should learn something about the basics of creating a reproducible file structure so you can do reproducible work. Along the way we will also learn some things about R and RStudio that can be helpful in producing reproducible research. Specifically, we will:

  1. Create basic file structure template
  2. Use the “here” package to access files and create robust workflow
  3. Download data from ICPSR and load it into R using “haven” package.
  4. Create a README file so future you (and others) know the logic of your file structure.
  5. Create zip file of file structure to share for reproducibility.

Part 1 (Assignment 4.1): Create a Basic File Structure

Let’s start by creating a template of a simple file structure for this assignment. I have had you do this in the past by having you create a separate folder for each assignment where you save your RMD and knitted html files. As you have seen in class, this allows me to take your RMD file and, given I have all of the supporting files in my own OneDrive folder (e.g. datasets, immages, etc.) run your RMD file on virtually any computer. Here, we’ll create a file structure for this assignment that is self-contained. Meaning, you can share the file structure and I can copy that file structure to any computer and, givne that computer has R, RStudio, and the associated packages, reproduce your work with one-click.

  • Note: If you haven’t already, you should download and install the OneDrive app to your computer and link it to your UNCW account. This will create a OneDrive folder on your computer where you can easily access and store files that automatcally sync with the cloud, thus creating cloud-based backups of any file(s) you place in your OneDrive folder. Click on the follwoing links for directions for installing OneDrive on Windows or MAC.

1.1: Find OneDrive File Location

In order to create a basic file structure, you first need to locate the OneDrive folder on your computer, and specifically your LastName_CRM495_work folder. Here is what it looks like on my home (Windows) computer:

I have blacked out my other files, but you can see where my “Day_CRM495_Work” folder is located along with the shared “Day-SP22_CRM495” folder where you turn in your work. First, notice in the left pane that “OneDrive - UNC-Wilmington” is highlighted. That simply means that OneDrive is installed on my computer, it’s linked to my UNCW account, and I’m currently looking inside my OneDrive folder.

You’ll also notice a couple files from my graduate course from last semester (“Day_SC500_work” and “Day_F21_SC500”). This actually points to a potential problem with how I’ve named my files. Notice how the “Day_CRM495_Work” and “Day_SC500_work” folders are next to eachother and not next to their corresponding share folders? This could (and has) cause errors when I’m trying to move between my work folder and the shared folder where you commit your work. The reason it’s organized like this is because it’s organized alphabetically. Given I want the corresponding course material next to each other, an easy fix would be to rename them so they each include the abbreviation for semester (e.g. “F21” and “SP22”). Here is what that looks like: