Open-science, documentation, jupyter notebooks and readthedocs

The dynamics on information technologies and the world wide web are and will turn many aspects of our societies upside down. In particular, the way we do science (in the broad sens, education, fundamental research and technology) could be deeply modified, with open-source methods and tools at the center. This dynamics is often called open-science, but we don't know yet where it will lead us.

Today, inventing open-science is still a work for brave and patient pioneers. In the short term, working on open-science is for a scientist just a big waste of time! Researchers have to produce papers on recognized journals but the urgent work for open-science is to work on software development: to work together really efficiently, the scientific communities need good open-source development frameworks. We have fantastic tools to design and make these frameworks, for example Python and its scientific ecosystem, but a lot of energy has still to be put on software development... And developing frameworks is much slower than writing a bad code that just approximately works for our particular case, which by the way is still the common bad practice today in science. The frameworks need to be well written with documentation, examples, tests and continuous integration. Finally, we also need to work a lot on convincing our colleagues to use these frameworks and to contribute...

I now come to the real technical subject of this post: tools for software documentations. These last days, I work a lot on the documentations of the packages fluiddyn, fluidlab, fluidimage and fluidfft. As usually with open-source Python development, we have great tools to help us, in particular:

  • mercurial (a source control management tool, like git, but more adapted for my needs because it really has an easy and intuitive interface).
  • Heptapod (a repository management service, like github, but works with mercurial)
  • sphinx (a python program to produce a website from reStructuredText or markdown pages and a documented code),
  • readthedocs (an open-source service to "automatically" build the documentations and host the resulting websites),
  • anaconda (a Python distribution for science... and a little bit more),
  • the jupyter project (interactive computing in the browser).

So the fluid[...] documentations are written in reStructuredText, in rst files and in docstrings spread all over the code. All these files are contained in the mercurial repositories, which are hosted in the bitbucket servers (here). The static websites are built from the code with sphinx on the readthedocs servers and hosted by readthedocs (thanks a lot!). To build the website, we need to be able to import the packages and it is now possible to use conda on the readthedocs servers.

Then, we can produce and display in the documentation nice jupyter notebooks. This is very convenient to demonstrate that a software can help people! However, there was a real problem related to the size of the figures in the jupyter notebooks. Since I do not want to fill up the repositories for the codes with figures, it was not possible to really use jupyter notebooks to show figures in the documentations.

So I worked to overcome this issue. It is now possible to include many figures in the notebooks in websites produced with sphinx since I can now include in the repository only the input cells. The output cells and the figures are produced by executing the notebooks on the readthedocs servers.

The code to do this is in the package fluiddoc provided with fluiddyn. To use it, just put in the file doc/conf.py of your package:

from fluiddoc.ipynb_maker import ipynb_to_rst

ipynb_to_rst()

The documentation for this function is minimalist :-) but the function does the job! For example it produced these notebooks:

Now I just need some time to add nice figures :-)

Fluiddyn now provides a very small command line tool to strip out notebooks. It uses nbstripout, but in contrast to nbstripout, it does not strip out the notebooks produced by jupyter-nbconvert, i.e. the notebooks whose name ends with ".nbconvert.ipynb". It can be used for example like this:

fluidnbstripout ipynb