Mercurial as a great version source control management tool in academics

Subtitle: Why do I still use Mercurial in 2019?

I'd like to make this text readable and understandable for people with different levels in software engineering. The introduction should be readable by anyone interested. Then, I try to present and discuss slightly more technical aspects.

Introduction

When working with source code of any types, it is very convenient to use version source control management (SCM) tools. They allows their users to keep track of their work by making kind of snapshots of the code. They help a lot for collaboration.

Nowadays, in 2019, the most famous SCM tool is Git. All projects hosted in Github (and also Gitlab) use Git (and they don't have other choices). Git is a fantastic tool... for programming experts. It has been created by Linus Torvalds (the creator of the Linux kernel) to be used for the development of the Linux kernel. Github and Git are very convenient for open-source, in particular for very popular open-source projects, as for example CPython (the main Python interpreter).

Nowadays, scientists and more generally people working in academics need to read, write and collaborate on code. So being able to take advantage of a SCM tool is very useful. Because of Github and Gitlab, the natural choice is to use Git. I'm going to argue in this note that it is unfortunate because Git is not so well adapted for our uses in academics. It is not good for sciences and open-source to be so crazy about Git. There are other solutions for version control and I think Mercurial is especially adapted for academics.

Note that this opinion is not trendy at all. As a young scientists working with open-source coding, it would be much simpler and efficient to follow the Git / Github trend and to work a bit to really master Git. However, I'm concerned by this irrational phenomena in open-source for sciences and academics. People implied in this dynamics tend to follow some practices of professional programmers. In particular, they tend to use only Github and Git and to encourage everyone to do the same. However, Git is not well adapted for everyone and every usage. The communities of open-source for academics should be careful about the issue of uniformisation for such technical choices. In the long terms, if we think that version control should be used by most people working with code, more diversity in SCM tools would be very beneficial.

Git, a great tool for experts and Mercurial, an interesting alternative

Git is a great tool for experts in programming, but it is not the best SCM tool for everyone, especially for beginners in SCM.

At this point, I need to introduce the SCM tool that we use for the FluidDyn project called Mercurial (command hg). Mercurial and Git started at the same time (spring 2005) and are both "distributed" SCM tools. They share many similarities but are also different in some important points:

  • From the beginning, Git has been built to be a great tool for experts and Mercurial tries to make simple things simple. Simple commands, simple and clear concepts, simple documentation.

    • Some Git commands are really complicated with (good but) complicated documentation (see for example git help checkout, which can be compared to hg help update and hg help revert).

    • Mastering the concepts of staging area (also called Git index) and branches is necessary even for basic Git usage.

  • Git is written in Bash and C. In contrast, Mercurial is written in Python and C (and recently also in Rust), with a lot of Python. As users, we actually don't care about such internal details, but they are important to understand differences in terms of performances and extensibility.

  • Some tasks can complete faster with Git than with Mercurial (but the contrary is also true). Mercurial has a pretty long startup time (so hg --version yields it result in approximately 0.1 s). However, for my daily work / open-source activity, I never had any performance problems with hg.

  • Git is (mainly?) a big monolithic program (from the beginning you can use all Git commands) whereas Mercurial is extensible in Python. By default, only a minimalist and simple set of features is accessible to the users and they have to opt-in to use advanced features by enabling extensions (in a configuration file). The most important extensions are distributed with Mercurial so no other installation steps are needed.

  • For Git, unsafe history edition is normal and encouraged whereas for Mercurial history edition is circumscribed. By default, only simple history edition (for example hg commit --amend) is possible and one needs to activate extensions to enable more complex history edition (strip, rebase, ...). Moreover, the phase concept makes it easy to know what can be modified without problems for other collaborators (commits have different "phases" depending if they have been pushed in a publishing repositories, and the modification of "public" commits is strongly restricted). The evolve extension allows one to easily perform safe distributed history edition (the history of the history is kept and can be shared). Finally, the great command hg absorb was included in Mercurial 4.8. It automatically and intelligently incorporates uncommitted changes into prior commits, which is very convenient during code review.

A quasi monopolistic position of Git in open-source

The huge popularity (and dominance) of Git can mainly be explained by the huge success of the collaborating platform Github.

Github (and Gitlab) are not able to host Mercurial repositories. The only equivalent platform which offers this possibility is Bitbucket (thank you!), which is much smaller and less known than Github. Moreover, Bitbucket tends to lag behind the features proposed by Mercurial upstream (for example, Mercurial topics can not be visualized in Bitbucket).

It is however possible to work with Mercurial on Git repositories hosted on Github or Gitlab with an extension called hg-git. It works well but there are clearly drawbacks, in particular different hashes in local and remote repositories and performance issue when cloning big repositories. Moreover, some Mercurial concepts (phases, named branched, obsolete commits) cannot be stored in a Git repository.

In open-source, Git and Github have acquired a kind of quasi monopolly position and Mercurial has become a niche solution.

Mercurial still strong

However, Mercurial is still used for the big repositories of big projects, in particular Facebook, Mozilla (Firefox) and PyPy (an alternative faster Python interpreter with a JIT). As a consequence, "Mercurial has a very healthy development pace backed by serious actors in the industry".

Wait a moment, it means that Mercurial, which seems to be easier for newcomers is also good for advanced users?! Yes, modern Mercurial (in particular with evolve and absorb) is a great tool and it has really advantages compared to Git.

Simple workflow for simple projects and beginners

Version control help a lot when working alone or with few people on some scripts, a small code or a paper / application / thesis manuscript (for example using Latex). It should be a standard pratice in academics. Students should learn how to do this quite soon at university.

For these cases, people only have to use simple version control commands: init, clone, pull, commit, push. In practice, the commands that you need to use with Git and Mercurial are very similar (see for example our short Mercurial tutorial in the FluidDyn documentation).

However, Git is in my opinion too complicated for such simple workflow.

Git data transport commands
Git data transport commands showing the 4 Git levels: workspace, index, local repository and remote repository. Taken from this Stackoverflow answer..

For example, all users are confronted to the staging area (the Git index). This feature may be useful for 0.1 % of the users for 0.1 % of the usecases. But it is always there and all users can be easily confused with it. It is so easy to do such mistake

git add .
# at this point, the Git index is up-to-date
pytest
# arf a test failed...
# some other modifications
pytest
# awesome the problem is fixed
# but the Git index has not been updated!
git commit
git push

(I know, one can do git commit -a but I'm considering a workflow for newcomers so it's easy to just type git commit.)

So here the user pushed a wrong commit, whereas the local code in her/his directory is right! The staging area is really not a feature for most of the users. For most cases, one should commit what is in the directory, what compiled without error and what make the unittests pass.

But, wait, it is not so bad to have a bad commit pushed in the main repository because with Git, everything can be rewritten. Since history rewriting is directly available for all users and is part of the standard Git workflow, "let's do unsafe history edition" is a very natural thinking! Even for a bad typo!

However, for such simple workflow, unsafe history edition is not a good idea (especially without phases). It is useless and a common source of problems. Newcomers can even break the main repository.

We can also consider the case where another slightly more experienced user starts to work with branches (because branches are part of the standard Git workflow). Then, the newcomers also have to learn how work Git branches.

We see that Git has a "flat-steep" learning curve. Starting using init, clone, pull, add, commit, push is very simple, but as soon as you have a "problem" and that you need to really understand what happens, oups...

In comparison, Mercurial is simpler and safer. One can even completely avoid command line by using tortoisehg (of course, there are also graphical tools for Git, but they do not always bring simplicity and safety).

Therefore, when working with such simple workflow and with newcomers, it would really be reasonable to choose Mercurial (and with hg-git, it could even be true in cases one has to use Github or Gitlab).

Advanced workflows (branches, pull requests, history edition, Github / Gitlab)

What about more advanced features? If Mercurial is used internally at Facebook, we can deduce that it is not only a tool for beginners. Let's see what offers Mercurial in 2019.

Development of most large open-source collaborative projects have adopt a "Github-style" workflow involving feature branches (short live, not kept in the history in the long term), pull requests, code review, history edition and merging.

With the evolve and absorb extensions, Mercurial offers a very nice user-experience for such workflow. For some aspects, even nicer than with Git!

It is awesome that Bitbucket now proposes an experimental support for evolve. It works well and it improves a lot the user-experience.

What about branching? There are many methods to do this in Mercurial. If we just consider branches with one repository, we can use

  • unnamed branches (the simplest: a commit from a changeset which is not at the tip of a branch creates an unnamed branch),

  • named branches (long-term, for example for versions),

  • bookmarks (together with unnamed branches, they form something similar to Git branches. This is used for feature branches and hg-git),

  • and finally, topic branches.

Topic is an extension (contained in the package hg-evolve packages) for better feature branches in Mercurial. Topics really improve the user experience compared to bookmarks (better behavior of hg pull -u, convenient command hg stack, ...). Only drawback: a topic does not correspond to a Git branch when working with hg-git, so they cannot be used to work on projects hosted on Github/Gitlab.

With all these tools, we see that Mercurial is also awesome for a "Github style" workflow based on pull requests (with one publishing repository and some non publishing repositories).

Conclusions

Open-source software wars are waste of time and energy. But diversity in open-source is also important. It is good for improvements of the tools and good for the users (which are also diverse).

With the rise for Gitlab the centrality of Github is being questionned in the open-source community. A next step could be to also reconsider the monopolistic position of Git, which is unreasonable and restrain a wider adoption of version control in communities not specialized in software engineering.

People in the open-source community should stop thinking and acting as if "version control == Git".

Mercurial should regain popularity in particular for students and scientists, which could increase the use of version control by these people.

The main blocker for this change is the lack of GitLab's class tooling for Mercurial. In this perspective, the ongoing work to add Mercurial support to Gitlab (called Heptapod) is very promising. One can also mention Kallithea, "a free software source code management system supporting two leading version control systems, Mercurial and Git".

My wish list about Mercurial and Bitbucket

  • Python 3 support. Mercurial is in 2019 the last tool for which I used Python 2.7. In most system it is not so difficult to get Python 2.7 and pip for Python 2.7, but it brings extra complications. Being able to install Mercurial and its extensions in Python >= 3.6 would make things easier.

  • Bitbucket improvements (in particular visualization of topics).

  • hg-git improvements, in particular in terms of performance (parallelize the conversion between Git and Mercurial objects would save time) and distribution (pip install hg-git --user should just work whereas I often encounter incompatibilities between versions).