toXY: Convert graphs in PDF files into numbers with Inkscape

It is rather common that a scientific paper presents most of its results in graphs, reporting in text just a few exact numeric values for some relevant cases.
The missing exact numerical information could prevent other researchers to fully compare their results with those of that paper.
When the paper is in PDF format (99.999% of the cases), and it is relatively recent, the exact numerical information is very likely included in the paper, because graphs are almost always included in vectorial form.

I wrote an Inkscape plugin that converts curves in a graph into a table of numbers. I called it “toXY” (name chosen in a moment of total lack of imagination).

There are a number of steps to follow, but the “toXY” plugin is a quick way to get the numbers in order to perform a comparison of results without contacting the original authors.
Note that although the plugin work pretty well, I cannot keep the responsibility for you that the numbers will be always exact. Use your brain to check if the returned values are sound with respect to what your eyes see on the paper.
I also strongly suggest to check them with the original authors before including them in any new publication.

Installation

I assume you already have Inkscape installed.
Download the “toXY” plugin zip file.
Close Inkscape.Copy the two files (toXY.inx and toXY.py) contained in the zip into extensions directory of Inkscape, which is usually located…
…on Windows: “C:\Program Files\Inkscape\share\extensions”
…on Linux: “/usr/share/inkscape/extensions”
…on OS X: “$HOME/.config/inkscape/extensions” or “/Applications/Inkscape.app/Contents/Resources/extensions”, depending on the OS version.

On Linux/OS X you may also have to change the file permissions:
chmod 755 toXY.py
chmod 644 toXY.inx

Usage

  1. Start Inkscape.
  2. Drag and drop the PDF file on Inkscape.
  3. In the “PDF Import Settings” dialog select the page that contains the graph to be converted, click OK.
  4.  Now the page is a group of objects in the drawing, select it, and repeatedly ungroup it (Ctrl+Shift+G), until there are “No groups to ungroup in the selection”, see the lower part of the second image down here.
  5. Zoom (if needed) and select the curves you want to convert into numbers.
  6. Run the toXY plugin from the Extensions, Generate from Path menus.
  7. Fill the lower left corner and upper right corner appropriately, click Apply.
  8. A very long text box will appear below the graph. Each line reports the X and Y coordinates of a point (adjusted to the lower and upper corner coordinates you entered) and to which curve it belongs. Copy and paste it wherever you like.
  9. For example, I copied the data into Excel and, after a minor clean up, I have got back an exact replica of the graph.

Final thoughts

Writing this plugin has been a nice experience, I liked both Python (this is my first non-trivial piece of code in Python) and the Inkscape plugin model.

I have found that it is not easy to get absolute coordinates of points for a group of curves, since any curve can be included under a chain of arbitrary transformations and all of them must be taken into account.