newline

A practical example of why Ruby is a great language: comparing works cited in two sections of a paper

Programming

August 07, 2022

I’m writing a research paper, and I wanted to determine if all the papers I’ve cited in my main text were also mentioned in the discussion. Because Ruby is so easy to write and ‘tape together’, it only took a few lines to get the list of papers I still need to mention. In general terms, the goal was to find uses of a group of LaTeX commands with a varying argument in two sections of text (different types of \cite{key} with a varying key), and determine which arguments (key) occur in one section but not in the other. Continue reading to see the power of Ruby in action when solving this.

First, I saved the two sections to two files /tmp/ool (for Overview of Literature, the main section), and /tmp/discussion (easy to do in Vim with the :write command).

The goal is to see whether everything that was cited in /tmp/ool was then also cited in /tmp/discussion. So, I started up irb. The first thing I did was load the files into two variables:

disc = File.read '/tmp/discussion'
ool = File.read '/tmp/ool'

Now I have the text in two variables, but I only want to find the citations. In LaTeX, I used two different forms of citations: \cite{key}, and \citeauthor{key}. So, I needed to find the uses of this command in both files:

re = /\\cite(?:author)?{([^}]*)}/
disc_cites = disc.scan(re).flatten
ool_cites = ool.scan(re).flatten

OK, now disc_cites and ool_cites have a list of cite keys. Some cite commands may have multiple cite keys separated by a comma, so I need to split those up:

disc_cites.map! { |s| s.split(',').map(&:strip) }.flatten!
ool_cites.map! { |s| s.split(',').map(&:strip) }.flatten!

Finally, I want to remove any duplicates and I want to find the difference between these two lists; that’s easiest with sets:

ool_cites.to_set - disc_cites.to_set

And that gives me the list of things I cited in the Overview of Literature, but not in the Discussion.

As an added benefit, I can do the reverse to make sure I didn’t cite a paper in the Discussion that I hadn’t introduced before:

disc_cites.to_set - ool_cites.to_set

While Ruby is not always the best choice, it’s great for these sorts of quick ‘prototype’ uses, where the goal isn’t really efficiency, but just getting from a problem/question to a solution easily. Furthermore, Ruby’s method chaining syntax lets you reach a result in steps, adding new methods to the chain one-by-one.