Excel vs R: A Brief Introduction to R

With examples using dplyr and ggplot

Quantitative research often begins with the humble process of counting. Historical documents are never as plentiful as a historian would wish, but counting words, material objects, court cases, etc. can lead to a better understanding of the sources and the subject under study. When beginning the process of counting, the first instinct is to open a spreadsheet. The end result might be the production of tables and charts created in the very same spreadsheet document. In this post, I want to show why this spreadsheet-centric workflow is problematic and recommend the use of a programming language such as R as an alternative for both analyzing and visualizing data. There is no doubt that the learning curve for R is much steeper than producing one or two charts in a spreadsheet. However, there are real long-term advantages to learning a dedicated data analysis tool like R. Such advice to learn a programming language can seem both daunting and vague, especially if you do not really understand what it means to code. For this reason, after discussing why it is preferable to analyze data with R instead of a spreadsheet program, this post provides a brief introduction to R, as well as an example of analysis and visualization of historical data with R.1

The draw of the spreadsheet is strong. As I first thought about ways to keep track of and analyze the thousands of letters in the Daniel van der Meulen Archive, I automatically opened up Numbers — the spreadsheet software I use most often — and started to think about what columns I would need to create to document information about the letters. Whether one uses Excel, Numbers, Google Sheets or any other spreadsheet program, the basic structure and capabilities are well known. They all provide more-or-less aesthetically pleasing ways to easily enter data, view subsets of the data, and rearrange the rows based on the values of the various columns. But, of course, spreadsheet programs are more powerful than this, because you can add in your own programatic logic into cells to combine them in seemingly endless ways and produce graphs and charts from the results. The spreadsheet, after all, was the first killer app.

With great power, there must also come great responsibility. Or, in the case of the spreadsheet, with great power there must also come great danger. The danger of the spreadsheet derives from its very structure. The mixture of data entry, analysis, and visualization makes it easy to confuse cells that contain raw data from those that are the product of analysis. The nature of defining programatic logic — such as which cells are to be added together — by mouse clicks means that a mistaken click or drag action can lead to errors or the overwriting of data. You only need to think about the dread of the moment when you go to close a spreadsheet and the program asks whether you would like to save changes. It makes you wonder. Do I want to save? What changes did I make? Because the logic in a spreadsheet is all done through mouse clicks, there is no way to effectively track what changes have been made either in one session or in the production of a chart. Excel mistakes can have wide-ranging consequences, as the controversy around the paper of Carmen Reinhart and Kenneth Rogoff on national debt made clear.2

[Read More]

My Approach to Digital Humanities

Digital humanities holds the promise of increasing the means by which scholars are able to analyze and present data. Though some sentiments about the significance of digital humanities might be overblown, there is no doubt that the more ways we have to analyze sources the better. Learning a variety of the tools that make up the rather nebulous universe of digital humanities is like learning a new language. It opens up new possibilities that were previously closed or necessitated the expertise of others. This frames digital humanities as a collection of skills rather than a means to a predetermined end. I have adopted this perspective in learning about the possibilities opened by digital humanities and working on a digital humanities project. I am hardly the first person to take these steps, but I hope that by explaining my thought process I can set a basis for future posts on digital humanities.

If there has been one guiding force in my approach to digital humanities, it is to learn skills and tools in the process of production. In a way, this is simply the application of critical thinking to how I create my scholarly work. Instead of doing things in what seems the de facto manner, I have sought to question if there is either a more efficient way or a way in which I could gain or improve my competency. The goal of efficiency was particularly significant in thinking about how to organize my research and writing (DH 1.0), while learning new skills has been more important in the production of digital humanities projects (DH 2.0). This may not be the easiest way to complete a project in the short run, but by doing things the hard way, I am looking to open up new opportunities for future projects. With this theoretical approach in mind, let me now discuss a few concrete principles of my digital humanities practices concerning text, applications, and producing digital humanities projects.

[Read More]

New kinds of Projects: DH 2.0 and Coding

In the process of learning about how I could use digital technologies to better organize my research, I quickly started to think about how I might extend these skills to produce new kinds of outputs.1 I was familiar with the concept of digital humanities, but the step from an internal process of organizing research and writing to production seemed both too nebulous and difficult. Digital humanities also seemed to concentrate on the visual. This was intriguing, but did not present itself as the most pressing need for a graduate student who was writing a dissertation on sibling relations among 16th-century merchants. It took years for me to move from what I am calling DH 1.0 to DH 2.0, to move from research methodology to making what could properly be termed digital humanities projects. This post presents an introduction to how I eventually took this step and why I decided to learn to code instead of other available solutions.2

[Read More]

Thinking about Workflow: DH 1.0

In the spring of 2011, I was in the middle of doing research for my dissertation. I had recently returned from my second extended trip to the archives in the Netherlands and Belgium and had accumulated a ton of notes. I knew that technology had drastically altered the possibilities for research, but the fundamentals of my own workflow were hardly different than they had been when I began undergrad in the early 2000s. Sure, I used the internet to watch Netflix, but the basic tools—centered on the Microsoft Office Suite—were essentially the same. One day, I gave in to the nagging feeling that I was not getting enough out of my computer, that there were better ways of doing things. I started to poke around the internet to see if I could find ways to improve my workflow. What started as a distraction soon turned into a months-long project on finding better tools for conducting research. I quickly realized that the actual process of getting work done, doing research and writing papers, is little discussed in graduate school. Early career graduate students are so busy reading, writing, and generally trying to keep up, that the process of how to do the work is easily pushed to the background. Since then, I have tried to think critically and systematically about my workflow and have often discussed this with others.

[Read More]