v1.2 - 2016/09/14

Software compatible with the scientific method must allow researchers to be fully in control of their computations. Their results must be reproducible and falsifiable by anyone without restrictions. Research and educational documents should not restrict access, share and study. In order to implement and encourage good scientific software and digital documents, appropriate licenses should be applied.

Contents

  1. Scientific Software
  2. Non-scientific Software
    1. Related Issues
  3. Licenses for Software
    1. Absence of License
    2. Software Patents
  4. Licenses for Other Works
  5. Further Resources

Scientific Software

The concern of users to be in control of a program is common to the Free Software movement. This community has a social mission. Free software [1] refers to software that respects users' freedom. It is also called libre software to avoid ambiguities (the issue is not about price). Roughly, it means that the users have the following freedoms:

  1. The freedom to run the program as wished, for any purpose.
  2. The freedom to study how the program works, and change it as needed.
  3. The freedom to redistribute copies, to help neighbors.
  4. The freedom to distribute copies of modified versions to others. The whole community will then benefit from the changes.

When users don't control the program, it is said to be a "non-free" or "proprietary" program. The freedom listed above turns out to be a precondition for scientific software.

An additional requirement for software to serve scientific purposes is the distribution of the code. If a developer does not publish a software, this still characterizes as libre software in a trivial way (being the user also the developer, she has full control over the program). However, it may not characterize as a program consistent with the scientific method if it cannot be validated by anyone and without restrictions. While in the case of short and trivial programs it may be acceptable to provide the mathematical representation of an algorithm in place of the code itself, the reproducibility of less than trivial codes relies on the respect of the four fundamental software freedoms together with the code availability. The mathematical description of an algorithm is still a useful support, together with a complete code documentation.

Example of libre scientific software is given by Maxima and GNU Octave, or by several scientific Python libraries available under free licenses (mostly MIT or New BSD, see below). This means that they are constantly peer-reviewed, can be modified and redistributed both in the original and modified form. Most of them are well documented and widely used: numpy, scipy, sympy, matplotlib. User friendly blog interfaces based on ipython are also available and can be loaded within a web browser. For very demanding computations a low-level programming language such C/C++ should be considered together with libraries such as the GNU Scientific Library.

Non-scientific Software

A typical example of a program that does not satisfy fundamental scientific requirements is Wolfram Mathematica. Being Mathematica a proprietary software, its developers are in control of the computations performed by users. Privacy issues also arise from the impossibility of knowing about possible back-doors, since only human unreadable binary files are distributed to the users; this should be of particular concern within large collaborations. Adaptations of built-in functions to specific needs are not possible. As all software, Mathematica proprietary libraries contain bugs [2]. Unfortunately, as all proprietary software, it prevents users/researchers to peer review its source code by studying it, changing it and running the original or modified versions. Hence, it represents an unnecessary obstacle to falsifiability. A valid software replacement is given by the several scientific Python libraries mentioned above, useful to perform both numerical and algebraic computations.

Source availability is a necessary condition for a software to be scientific, but it is not a sufficient one. Indeed, open source projects may be proprietary software. One example is Numerical Recipes [3]. The source code is available, both as text files and as paragraphs of the related book. The book excellent description of fundamental algorithms served as an important reference for several physicists. Regrettably, the software was released under a proprietary license. Already the usage of the programs itself is seriously restricted, even after license purchasing. Distribution of the software is not allowed (neither in the original of modified versions), which prevents proper verification of computations. Adaptations or copies of the source code (extracted from the software or from the book) constitute a copyright infringement. Academic citations alone do not avoid such infringement. Rather, specific exceptions should be asked (and may not be granted).

Use of libre standards (for example, digital document formats) is essential for documents preservation. Proprietary formats are deliberately not-compatible with most software, except few proprietary programs. Still, backward compatibility with previous versions of a program is often not guaranteed. This endangers future accessibility. Furthermore, proprietary standards are usually accessible only after purchase of extremely expensive programs. This should be a concern for institutions depending on public funds. Note, however, that a reasonable pricing is compatible with the four fundamental software freedoms.

Digital Restrictions Management (DRM) are nasty methods that should definitely be rejected, especially in a scientific context. DRM's are programs aiming at concentrating control over production and distribution of media. Technological restrictions are imposed, which control what users can do with digital media. A de facto damaged good is delivered. These restrictions are often applied to digital movies, music but also to digital books. For example, DRM's prevent reading a regularly purchased ebook on different devices, denying the possibility to share it with a colleague or to borrow it from a library.

Another aspect that typically threatens proprietary programs more than libre software, are the issues of privacy and security. Proprietary programs often have back-doors accessible by developers (and possibly by crackers), through which sensible information is leaked. This should be of concern for individuals, but also for large scientific collaborations. This worry arises, for example, when using proprietary instant messaging, video chats or, more in general, cloud services (nothing else than other people's computers).

To depend entirely upon libre software only, appropriate hardware should be chosen. Unfortunately, hardware producers not always release firmware in form of libre software. In this cases, only human unreadable binaries are distributed. Wireless and GPU cards are nowadays particularly affected by this issue. Lacking of appropriate producer specifications, libre firmware can only be obtained by reverse engineering, which is a difficult process. Hence, it is important to choose responsibly appropriate hardware.

Licenses for Software

The Free Software Foundation (FSF) maintains the GNU General Public License (GPL), based on the four essential freedoms. Hence, this license is also well adapted for scientific programs. Other common free licenses are the MIT (also called Expat) or the modified BSD. For example, they are used by most of Python libraries. These are lax and permissive licenses. They do not guarantee that derivative works based on free software will remain free. While the GPL requires derivative works to be licensed under the same terms, forks of MIT/New BSD software can be incorporated into proprietary programs. Under the GPL, any derivative work will also contribute back to the original code and community, while this is not necessary the case for other permissive licenses. It is good practice to release scientific software under the more rigorous and protective GPL license.

The GPL does not require publication of derivative work. It rather states that if a derivative work is shared, than also the source code must be made available. However, it is suitable to require any modification to a program to be made publicly available, if used in research. This can happen at the time of publication of the results. To avoid license proliferation, it seems a good practice to use the GPL anyway, but adding a request—not a requirement, that would be not compatible with the GPL. In fact, the license states that it is not possible to add further constraints than those already present. Such requests can be stated in the README of a program, since modifying the content of the GPL license would also imply to rename it and to modify its preamble.

Another common academic practice is to acknowledge not only the program authors through a copyright notice, but also to cite properly any publication for which the software was conceived. The GPL allows requiring author attributions in the form of additional terms to be stated in the form of a separately written license, or stated as exceptions.

It has been suggested [4] that scientific journal should require the availability of the source code when an article is submitted for publication. This, combined with a free license such as the GPL completed with additional terms for author attributions, would guarantee the scientific nature of research software and proper acknowledgment.

Exceptions to the GPL as the best scientific license choice may arise. It is good practice to release a derivative work under the same (free) license as the original one even if it differs from the GPL. This facilitates future merging of a project forks. For example, while the MIT or New BSD are compatible with the GPL license (i.e., work released under MIT or New BSD can be integrated under a GPL project), the inverse is not true. A public domain dedication or a permissive license (such as the Apache 2.0 license) may be considered for short programs, let's say less than 300 lines. If a software is conceived to be integrated into other works (e.g., a software library), a permissive license such as the GNU Lesser General Public License (LGPL) may be considered and preferred to other common but lax choices (MIT/New BSD).

Absence of License

A license should always be stated. Users are then unambiguously aware of the conditions under which they are allowed to use the software. Public domain dedications should be also stated clearly—for example, the Creative Commons No Rights Reserved license (CC0).

Indeed, regulations about software that does not declare a license change substantially from country to country. As an example, under the U.S. law the copyright holder possesses the exclusive right to prepare derivative works. Unless users are provided with a license, they won't be able to modify and share the work.

Software Patents

Software patents and copyright law pertain different domains. This is despite the fact that they are often assimilated by misleading use of the term "intellectual property". While copyrights (addressed by licenses) cover works, patents cover ideas. Unfortunately, patent laws are far from reasonable.

Patents are government-issued monopolies on using a certain idea, which last typically 20 years (a considerable amount of time in the fast-developing scientific community). Extremely trivial ideas are systematically patented. An example is the Elsevier patent about online peer review. It departs slightly from typical peer review in its discussion of what it calls a "waterfall process". In short, authors who are rejected by one journal are given an opportunity to immediately submit somewhere else. The fact itself that a monopoly can be claimed on such an obvious idea is debatable. Furthermore, this process was written well before Elsevier date claims, and was known as "cascading review" [5]. The ultimate employment of the patent is likely to be that of a leverage to prevent the rise of competitor free access journals.

Since ideas as much trivial as a status bar can be patented (without the need to practically implement them), and claims are often written in very technical yet ambiguous terms, an average software typically violates dozens of patents. This is unavoidable, since any algorithm will cross footling (but patented) ideas. Patent law is in practice used by groups already owning thousands of patents to enforce the acquisition of the monopoly on ideas issued to competitors.

Changing the effect of patents would be a much welcome approach. Developing, distributing, or running a program on generally used computing hardware should not constitute a patent infringement. In the meantime, universities and research institutions must avoid selling patents to so-called "patent trolls", which is against the interest of research groups and individuals. The version 3 of the GNU General Public License has a section preventing patents to make a program effectively non-libre. This license should then be preferred over more lax choices.

Licenses for Other Works

Principle related to software freedom also extend to other works. Research and educational documents should be accessible and sharable.

Documents with a practical purpose (e.g., software documentation, lecture notes, exercises) can be released under a GNU Free Documentation License (FDL), or under one of the Creative Commons licenses. An example is the Creative Commons Attribution Share Alike license (CC-BY-SA), which grants permission to copy and redistribute both original or modified copies of the material (giving appropriate credit) in any medium or format. Creative Commons licenses are particularly useful for scientific data.

Works of opinions should also allow distribution of the original material. However, in this case it may be appropriate to prevent distribution of derivative material. This can be achieved through a Creative Commons Attribution-NoDerivatives license (CC BY-ND).

We stress that usually free licenses do not put restrictions on the use of the respective works for commercial purposes. Free software developers or free licensed book authors may (and should, if need be) seek appropriate funds. This can be accomplished by directly selling the software, services (technical support), documents or by establishing donation plans.

Further Resources

A complete guide to software and other works licensing can be found on the pages of the Free Software Foundation [6].

A fundamental reading is Free Software Free Society: Selected Essays of Richard M. Stallman (GNU Press, 3rd Edition, 2015), a collection of essays by the FSF founder. It can be purchased on the FSF shop, or downloaded from the GNU website.

Back to top



1. R. M. Stallman, Free Software Free Society: Selected Essays of Richard M. Stallman, GNU Press (3rd Edition, 2015).
2. A. J. Durán, M. Pérez and J. L. Varona, Misfortunes of a mathematicians' trio using Computer Algebra Systems: Can we trust?, Notices Amer. Math. Soc. 61, 1249-1252, (2014).
3. W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery, Numerical Recipes: The Art of Scientific Computing, Cambridge University Press (3rd Edition, 2007), ISBN 0-521-88068-8.
4. D. C. Ince, L. Hatton and J. Graham-Cumming, The case for open computer programs, Nature 482, 485–488 (23 February 2012), doi:10.1038/nature10836.
5. E. Harmon and D. Nazer, Stupid Patent of the Month: Elsevier Patents Online Peer Review, eff.org, Agust 31, 2016.
6. https://fsf.org/.