Documentary resources
Contents
- Chemists, physicists and open science
- Data management: the national and European framework
- Scientific integrity
- Scientific reproducibility
- Plan S
- Other resources
Chemists, physicists and open science
A report about obstacles to opening data by discipline: for over 50% of the physicists questioned as part of a survey conducted by Springer Nature, efficiently organising data is the main obstacle to opening data. This precedes legal issues, choosing a warehouse and insufficient time. 41% of those questioned explained that they do not publish their data in the “supplementary materials” nor in warehouses.
Study on motivations to share and reuse astrophysics research data : The study involved 9 professors, PhD students and post-docs in the Department of Physics at the University of Oxford. It addresses the factors motivating and demotivating astrophysicists to share their data and reuse existing data, and how disciplines with lower sharing rates can be encouraged to share more. For example, two of the reasons for not sharing data are the massive volume of data sets in astrophysics, as well as the absence of metadata and tools to facilitate the reuse of data that are sometimes difficult to understand out of their context.
Couperin’s report on the publication practices of French researchers: for over 70% of the chemists questioned as part of a Couperin survey in 2019, publishing with open access improves the visibility of their work. On the other hand, open archives such as HAL and ChemrXiv are not their first choice. Only 30% of those questioned state that they regularly deposit their publications, compared with over 80% of mathematicians, for example.
Implementation of a new interoperable format by the former INC director. As part of a plan to promote data sharing and access, chemist Dominique Massiot, CNRS senior researcher at CEMHTI, worked with American and Danish researchers to implement a Core Scientific Data Model (CSDM) format to facilitate data access (such as spectra) between different software, without losing metadata.
Open data in chemistry: an informal group of researchers and IT specialists called The Blue Obelisk formed at the beginning of the 2000s in San Diego to promote easier access to chemistry data. Their actions targeted opening data, using open formats and open source software in chemistry. An article published in Cheminformatics describes the projects conducted between 2005 and 2011.
German NFDI4Chem consortium survey on data management practices in chemistry : conducted at the end of 2019 among 541 researchers, post-docs and PhD students in chemistry, the survey is part of the consortium’s initiative to digitizing of key steps of the research. The results published in December 2020 provide an overview of the needs in terms of data management in chemistry. For example, 17% of respondents indicate that they use laboratory notebook (including in the question also the spreadsheet software and word processing software). Find the results of the survey in english and german.
NFDI4Chem consortium roadmap : published in June 2020, the roadmap is organised into 6 key objectives reflecting the consortium’s vision of the need for chemistry to provide an infrastructure with services for each stage of the data life cycle: collection, storage, processing, analysis, publication and reuse. The third objective addresses, among other things, the need to move towards “intelligent” electronic tools in laboratories in order to improve the research data management process. This includes electronic laboratory notebooks.
Articles on open science in issue 62 of the journal Reflets de la Physique : published in June 2019, this issue puts open science in the spotlight with an editorial on the challenges in physics, a survey conducted in June 2018 by the Publications Commission of the French Physical Society among physicists on the state of the peer review system, and a forum about Plan S for open science and its limitations.
A book dedicated to the digitalisation of laboratories: First discussed at the American Chemical Society meeting in 1993, the issue of electronic laboratories notebooks is the subject of two chapters in a book dedicated to the digitalisation of laboratories (Wiley, june 2021). The authors review the key features of an ELN and also provide feedback through the implementation of the Scinote tool in a laboratory.
Article dedicated to the standardisation of computing practices in chemistry: published in JACS in august 2021, this article provides an overview of the current practices for standardising and sharing data in the discipline, using machine learning as a common thread. The paper proposes a physical architecture that includes all phases of chemical synthesis, as well as a new means of chemical information exchange: XDL. These two elements together allow the automation of the chemical synthesis process, up to the exploitation of the data by machine learning algorithms.
Data management: the national and European framework
Jussieu Call: after the Berlin Call (2003) and then the Amsterdam Call (2016) to encourage the open publication of scientific publications, the Jussieu Call, adopted in Paris on October 10, 2017, widens the debate to include data. The text supports the publication of data associated with articles and text mining.
The Digital Republic Bill (Loi pour une république numérique): voted on October 7th, 2016, is a French law promoting the circulation of public data and knowledge. Administrative data such as research data are covered by the text. The text serves two purposes: firstly, to limit the loss of scientific data, and secondly, to prevent data falling into the hands of private publishers. Private publishers cannot restrict free use when data is generated by work supported by public funding (article 30, al. III, chapter on knowledge economy). Nevertheless, the principle of free circulation and free publication assumes the data is not confidential (trade secret, classified, personal data), with exceptions covered by the law.
The French National plan for open science: adopted in July 2018, promotes “structure and open research data” (Directive 2). The plan’s objectives include: depositing data in warehouses, creating new discipline-specific warehouses, generalizing data management plans, publishing more data papers, and including data processing costs in the calls for proposals.
The ANR data management plan: since 2019, all ANR projects must include a data management plan. The model proposed by the ANR includes six sections (reusing existing data, data documentation, storage, legal requirements, sharing, resources). A guide drafted by librarians working on data (Couperin working group on open science) was published in 2020 and can be helpful. Both the integral version and a summary are available.
Recommendations of the Open Science Committee (Comité pour la Science Ouverte (CoSO)): given the mature practices of each discipline for opening data, DMP assessment features in recommendations issued by the CoSO, in response to a request by the ANR to facilitate the use of DMPs.
In the version published in April 2019, the European Directive on copyright stipulates that text mining is a competition issue in the EU. The text introduced an exception to copyright by authorizing research organisms (hence only public organisms) to reuse contents to which they have legal access, in order to make relevant data collection automatic with algorithms. The directive has been transposed into French law via an ordinance adopted in 2021.
Study into the legal framework of text and data mining: the negotiation of the European Directive reforming copyright was preceded by a Study into the legal framework of text and data mining.
Article 10 of the 2019 amendment of the European Directive on open data: extends the scope of public sector data to research data. It urges Member States to adopt national policies to make publicly funded research data openly available (‘open access policies’), in line with ‘open by default’ principle and compatible with the FAIR principles. The text stipulates that the data can be re-used for commercial purposes.
Recommendation of the European Commission on access to and preservation of scientific information: adopted in 2018, the text, which has no binding legal value, describes the framework that the European Commission promotes to Member States. The EC urges them to implement an “effective system for depositing electronic scientific information […] including born-digital publications and related research output”.
Comparative European report: 14 out of 28 Member States voted in favour of opening research data. The report, drafted in 2019 by the Scholarly Publishing and Academic Resources Coalition (Sparc Europe), is available here.
OCDE report on data repositoreis : Study conducted on the basis of 32 data deposit platforms in different fields (oceanography, neurosciences, physics, etc.)
Comparative study of the different infrastructures for data storage and data diffusion at international level: This study, published in January 2021, offers a perspective on 7 services developed in Australia, Norway, the Netherlands, Great Britain, Canada and Germany.
CNRS roadmap: made public on November 18, 2019, the document lists four actions for data management and three actions for text mining. The CNRS aims to support the implementation of data management in research facilities, the development of warehouses for topical data, and tools enabling automatic content mining (provided methods for transposing the European Copyright Directive are specified)
Micado (Mission calcul et données) official report by the CNRS: published in January 2018, the report presented a highly informative picture of data management in the CNRS institutes. For physics, huge volumes of data (between 200 To and 10 Po a year depending on the facilities) produced by the synchrotron (SOLEIL and ESRF), neutron diffusion (ILL and Orphée) and the free-electron laser (X-FEL), triggered the implementation of a data management policy. In chemistry, there is less progress, yet the report tells of reflection in the large research facilities RMN-THC (300 To stored in 2017), RENARD (40 To stored in 2017) and FT-ICR (260 To stored in 2017, 5000 To expected in 2022).
Report on the future of scientific publishing in France. Published in November 2019, the report by Jean-Yves Mérindol to the Ministry of Research thoroughly investigates French national and international governance of open science while recommending the implementation of a support plan for scientific publishing, to be included in the law for research planning.
French national strategy of Research Facilities: published in 2021, the road map paints the portrait of about one hundred research facilities. Fifteen of them are related to physics and chemistry.
Support on the legal framework for research data in France : As part of CommonData, the project led by Agnès Robin, lecturer of private law at the University of Montpellier, a seminar took place in January 2020 discussing the legal challenges of opening data. Watch the video of here.
CNRS Research Data Plan : revealed in November 2020, the document describes the actions that the CNRS wishes to carry out concerning the issues related to the storage and opening of research data. In particular, there is a desire to create a “functional directorate for open data”, to advance the discussion on electronic laboratory notebooks by making recommendations, and finally to revise the evaluation criteria for researchers by taking into account the production of reusable data.
Scientific integrity
Decree on the scientific integrity of the French Republic. Published on 3 December 2021 the document concerns the respect of scientific integrity requirements by public institutions that contributing to the public research service and by foundations recognised as being of public utility whose main activity is public research. The decree changes the situation with regard to research data adding, to the FAIR and open science requirements, ethical requirements framed by law. In the article 2, it encourages the diffusion of open access publications and the making available of methods and protocols, data and source codes associated with the research results. In addition, it encourages the publication of so-called negative research results; in the article 6, it makes Data Management Plans the normal framework for all research and It gives new obligations to institutions in terms of data preservation.
European Commission report on the reform of the evaluation of researchers. Published end 202, the document deplores the perverse effects of the current system, where the culture of « publish or perish » is carried out at the price of scientific quality and integrity. It is therefore recommended that other achievements than publications be taken into account, such as peer reviewing, supervision of doctoral students, dissemination of science and sharing of research data.
HCERES (2019-2020) evaluation consortium: as part of their laboratory evaluation campaign, HCERES advances a number of quality criteria in favour of scientific integrity. Keeping laboratory notebooks is one of them. In particular, “referencing metadata in the notebook and writing down the result of daily work in the laboratory; the existence of a countersigning procedure, archiving, and managing enclosed files” are included.
Report of the CNRS “Qualité en recherche” network on the traceability of research activities: Published in 2018, the report mentions electronic laboratory notebooks as “recording and traceability tools”.
Scientific reproducibility
Survey from Nature in 2016: almost 90% of the chemists questioned in the Nature survey reported that they had already failed to reproduce another researcher’s experiment. This was the highest rate amongst the disciplines questioned. Even worse, 70% of the chemists fail to reproduce their own experiments, compared to 50% of physicists.
CERN policy: open-access data is not sufficient for providing scientific reproducibility. In the field of high-energy physics, CERN implemented a certain number of best practices to provide the reproducibility and re-use of data. The CERN analysis preservation (CAP) and the CERN reusable and reproducible research data analysis platform (REANA) are described in the article published in NaturePhysics.
“No raw data, no science”: In an article published in February 2020, the editor-in-chief of the review Molecular Brain announced his decision to request, in principle, deposits of datasets to support manuscript conclusions from March 1, 2020. In his experience in publishing, he observed that a large proportion of authors prefer to withdraw their manuscripts rather than provide raw data when it is requested.
A book sprint on reproducibility: published in 2019, the document is the result of an original collaboration between researchers from different disciplines (biostatistics, physics-chemistry, computer science, neuroscience) and the URFIST in Bordeaux, the project leader. The goal was to “create a highly practical document designed by researchers for researchers”.
Plan S
Curated Resources: implemented by the main European research funding bodies (including the ANR), the plan S requires open-access for scientific articles resulting from projects they support by 2021. The Swiss national fund for research regularly inventories what is published on Plan S. 71 articles are already indicated.
Other resources
France’s scientific profile : in the 2019 issue about the ESR, a whole section is dedicated to the “scientific and technological position of France in chemistry research”. It ranks as the 9th contributor to worldwide publications in chemistry, with special mention for inorganic and nuclear chemistry, and in composite materials. The 2020 issue mentions the broad influence of the French publication in geochemistry, geophysics, astronomy and astrophysics.