Promoting your codes and software
- Introduction
- Definition
- Best practice
- Archiving and distribution of codes via Software Heritage and Hal
- How to do it?
- Some resources and tutorials for reporting codes and software in HAL
- Linking your codes to data and publications
- Some examples
- Writing a software paper
Introduction
A research code is a scientific output in the same way as publications and data. Software is one of the pillars of the research process. It is therefore essential to preserve it over the long term and to share it. The exploitation of research codes is the subject of the third axis of the 2nd National Open Science Plan (2021-2024) and an increasing number of research establishments and organisations are taking it into account. In its open science policy, the ANR has set itself the objective of “contributing to the sharing and opening up of research data, source codes and software”. In 2023, in its Call for Generic Projects, the ANR recommends that the codes and software developed as part of the project should be archived in Software Heritage, distributed under an open licence and described in HAL, indicating the reference (code décision) of the ANR project.
Definition
How do you define a source code? The Bothorel report proposes the following definition: “source code can be defined as a set of instructions that can be executed by a computer”. The Software and Source Codes College of The Committee for Open Science (COSO) defines research codes and software as follows: “research software is developed to meet specific scientific needs. It is designed, maintained and used by scientists (researchers and engineers) and research institutions, possibly on an international scale. They may arise from research work or encourage it, in particular through publications before/on/around/with the software. They can be formalised in different ways (a platform, middleware, workflow or library, module or plug-in for another piece of software), interacting as part of an ecosystem or, on the contrary, being more autonomous”.
In summary:
- Algorithm: describes the procedure for solving a given problem.
- Source code: implementation and formalisation of the algorithm in a computer language (e.g. Python, C++, Java, etc.). It is one or more text file(s).
- Executable: translation of source code (generally via a compiler or interpreter) into binary code that can be understood by a computer.
- Software: in general, the overall package comprising the source code and/or executable, and the documentation, examples of usage, any dependencies, etc., and of course the associated licence.
Best practice
To ensure that code produced as part of a research project can be shared and reused, a number of best practices are recommended:
- build it with a forge
- managing versions with git
- documenting it
- choose a suitable licence
- check its archiving and report it to Hal
- writing a software paper
These best practices are detailed on the ….. page
A passport in the “Passport to Open Science” collection addresses the specific issues involved in opening up the code and software produced and used in scientific research. The report Scholarly infrastructures for research software makes recommendations and proposes good practice at European level for research software infrastructures.
Archiving and distribution of codes via Software Heritage and Hal
The loss of codes that have been used for scientific production is a regular occurrence. Source code is fragile (obsolescence of formats, hardware problems, dependence that disappear). Therefore, it is essential archiving code and software which means preserving them for the long term. Since 2016, archiving has been provided by the Software Heritage (SWH) universal source code archive. This programme, launched by INRIA, is supported by UNESCO (1). SWH collects all software that is publicly available in source code form, from code hosting platforms such as GitHub, GitLab.com or Bitbucket, and package archives such as Npm or Pypi. SWH assigns a permanent identifier (SWHID) to each archived code. Thanks to a collaboration launched in 2018 between the CCSD and SWH, it is now possible to reference on HAL the codes archived on SWH via the SWHID. The two platforms therefore complement each other: codes and software referenced on HAL gain high visibility and are part of an open science approach. Permanent archiving is provided by Software Heritage. Metadata moderation guarantees the quality of the repository and the reusability of the code. HAL offers various export formats to facilitate the citation.
How to do it?
It is required to copy the contextualised SWHID and to insert it in the HAL record. This process makes it possible to recover all the associated metadata. The contextual SWHID is available from the Permalinks tab: you need to choose the type “dir” and the “contextual information” box, which will then include the following qualifiers: origin, visit and anchor.
Some resources and tutorials for reporting codes and software in HAL
See the article “Create software deposit in HAL“. hal-01872189v2 and the CCSD page. The Cellule Data Grenoble Alpes has also proposed a webinar in 2023 on referencing software via HAL and Software Heritage (support and video in french).
Linking your codes to data and publications
Enabling the reusability and reproducibility of what has been produced in the course of research is an important issue in the context of open science. One of the ways of encouraging this reusability and reproducibility is to link publications to the data and codes that led to their publication.
Axis 3 of the 2nd National Open Science Plan provides for the “development of effective links between software forges, open publication archives, data repositories and the scientific publishing community”. To ensure this link, simply add permanent identifiers such as doi or swid to the descriptive metadata for publications, data and codes.
Some examples
On Recherche Data Gouv repository, in the submission form, you can indicate :
- the doi or HAL identifier of the publication in the field “associated publication”
- SWHID contextualised in the “calculation workflow” section
For this dataset, the SWHID has been specified in the metadata tab:
In HAL, for example, for this code and that one, the associated publications have been mentioned
This communication posted in HAL indicates both the dataset deposited in Recherche Data Gouv repository and the code archived in Software Heritage and HAL.
Writing a software paper
To highlight the codes produced as part of your research, you can write a software paper. The Institut Pasteur gives this definition: “a software paper (or software article or software tool article or article on software) is a peer-reviewed publication whose aim is to present software to the scientific community. Unlike a traditional publication, the aim of a software paper is not to share a significant result, but to describe software that has been developed for research purposes, including the objectives that led to its development, the design process, technical details on how it works, instructions on how to use it, its potential for reuse, etc.”
CIRAD’s ” Where to publish” database can be used to identify journals that accept software papers. CIRAD has also produced a list of around a hundred titles that accept datapapers or software papers: for example, Geoscientific Model Development (Copernicus Publication), Journal of Open Research Software (Ubiquity Press), Software Impacts (Elsevier), SoftwareX (Elsevier) .
The Software Sustainability Institute has published a page entitled “In which journals should I publish my software? “, which lists a number of titles by discipline, particularly in engineering, physics and geosciences. EPFL has also published a list of data and code journals in 2022.
In an article published in Plos Computional Biology, JD Romano and JH Moore present the ten rules to follow when writing a software paper. To find more information, you can consult the following resource (in French) on DoraNum: Software papers, 2024.
There is another type of article that allows you to promote your code in a practical and direct way: executable articles. The Institut Pasteur describes it as follows: “the publication is packaged in the form of dynamic software that combines text, data and the code used for analysis. The publication is therefore interactive: the reader (or reviewer) can interact with the data and replay (or even modify) the codes used. The idea is to allow the reader to reproduce each step taken to arrive at the conclusions of a publication”. The journal Elife offers a collection of executable articles. The Institut Pasteur also provides some advice and feedback on publishing this type of article.
- See the article presenting the Software Heritage ecosystem: Roberto Di Cosmo, Stefano Zacchiroli, “The Software Heritage Open Science Ecosystem”, in Software Ecosystems, Springer International Publishing, pp.33-61, 2023, doi :10.1007/978-3-031-36060-2_2