Resources

Key players in data management

Contents

Existing networks bring together various professions (IT specialists, researchers, scientific and technical information professionals, etc.) to focus on data management.

We will not present the many networks and organizations in the fields of astronomy and astrophysics which have a very structured scientific community. We only mention the few networks in the fields of chemistry and physics (whether or not associated with other disciplines such as astronomy) and related disciplines. We only mention some networks for Chemistry and Physics (whether or not associated with other disciplines such as Astronomy) and the related fields.

The discipline-specific networks in France

CC-IN2P3

The Centre de Calcul de l’IN2P3 is a CNRS service and research unit (USR6402), attached to the IN2P3 institute. The center offers hosting, computing, and data storage solutions. Big data processing is also possible. Support is available for help with access conditions. See the FAQ.

France Grilles

The GIS FRANCE GRILLES provides access to an infrastructure of machines (hardware) and associated services (software). the CEA, Inria and Inserm figures among the organization’s partners. Services include: FG-DIRAC (distributed calculation task management), FG-IRODS (storage), FG-cloud (cloud service IaaS) and FG-SOL (preservation and reuse of research software. This service is only currently open for pilot laboratories) See documentation here. The network also organizes a data computing event: Journées Calculus Données (JCAD). See videos of the 2019 edition here and JCAD 2018 The JCAD event is a continuation of the SUCCES Days (Scientific meeting of grid, cloud, storage and regional centers’ users), organized by France Grilles, Grid5000, GDR RSD, and the computing group Groupe Calcul. Videos from 2015, 2016 and 2017 are available.

The international discipline-specific networks

Escape

The European Science Cluster for Astronomy and Particle Physics ESFRI research infrastructures (ESCAPE) aims at taking up the challenges of open science in cooperation with other Pan-European research infrastructures (CERN, ESO, JIVE) in astronomy and in particle physics. ESCAPE’s actions will focus on the development of solutions for big data sets processed by the ESFRI.
The solutions concern integration with the European cloud on open science (EOSC) and data production “FAIR“. Deliverables and reports are available here.

EGI

European Grid Infrastructure provides advanced IT services for scientists in most disciplines, multi-national projects, and research facilities. Services include data calculation, storage, archiving, transfer, security, and applications such as Notebooks. The access to such services is available for a fee. Other services are made available by members and partners of the EGI community with case studies.

OpenCern

The CERN has long-lasting expertise covering all aspects of big data management: data collection, processing, analysis, storage, dissemination, and reuse. CERN also got involved very early with open science. They make their expert knowledge available in numerous ways:

  • Opendata portal: unique entry point to the research data produced at CERN (which represents more than 2 Petabytes). Such data are in particular linked to the projects ATLAS, ALICE, CMS, LHCb, OPERA. The portal publishes results of research activities, including software and assisting documentation required for understanding and analyzing the data shared.
  • CERN uses Worldwide LHC Computing Grid (WLCG) Worldwide LHC Computing Grid (WLCG) which gives over 1,200 physicists access to their resources.
  • The documents, reports, bulletins, preprints, photos, images, and videos from CERN are available on the CERN Document Server.
  • INSPIRE, whose new version has been made publicly available, is a scientific information platform in High particle physics. It includes eight interconnected databases on scientific literature (articles, preprints, etc.), lectures, institutions, journals, researchers, experiments, jobs, and data. INSPIRE works closely with arXiv, ADS, HEPData, ORCID, Particle Data Group (PDG).
  • Zenodo, a multi-disciplinary repository of research data provides access to 1.5 million data sets.
  • CERN openlab aims (via a public-private partnership) to develop IT solutions for the scientific community. In 2017, CERN published a white paper about IT challenges for scientific research.
Chemistry at RDA

The Research Data Alliance (RDA) is an international organization set up in 2013 to encourage data sharing. It brings together 137 countries and over 8,400 members (researchers, data professionals and information professionals) from all disciplines.

Chosen themes include metadata (with the catalog of discipline-specific metadata standards), data interoperability and data citations. There are also discipline-specific groups, such as the Chemistry Research Data Interest Group.

NFDI

The National Research Data Infrastructure in Germany (NFDI) aims to systematically manage scientific and research data, to ensure long-term data storage, backup and accessibility. The disciplinary consortium NFDI4Chem, for the management of research data in chemistry, offers various services:

  • to set up an electronic lab notebook
  • list of chemical repositories
  • specific resources to facilitate the creation of a data management plan
  • a repository for chemistry ontologies : NFDI4Chem Terminology Service
  • information and recommendations for the digitisation of all key steps in chemical research : NFDI4Chem Knowledge Base

PUNCH4NFDI is the NFDI consortium of particle, astroparticle, hadron and nuclear physics. The NFDI4phys consortium represents nine fields of physics : atoms and molecules, optics & photonics, cold plasma, biological physics, dynamics, statistical physics & soft matter, socio-economic physics, quantum information & artificial intelligence, biomedical physics, and transdisciplinary physics.

French institutions

Data management clusters

The data management clusters are part of the national ecosystem set up as part of the Research Data Gouv. They are geographically close to research teams so they can provide researchers with initial expertise in rational research data management. Nineteen data management clusters are part of the national network as of september 2023. The network of data management clusters is run by a board.

Resource centers

In the national Research Data Gouv ecosystem, the resource centres provide support to establishments offering data management services. They include DoraNum, Opidor, the GIS of Urfist and the Repository catalog resource center.

Thematic reference centers

Another essential part of the national ecosystem Research Data Gouv, the thematic reference centers, thanks to their nationally and internationally recognised expertise, can provide specific and disciplinary support. For Astronomy and Astrophysics field, there is the Strasbourg Astronomical Center ; for the Earth and Environment System, there are Data Terra and PNDB ; for the Biology-Health field, there is IFB.

INIST

The INIST (the Scientific and Technical Information Institute) is an important player in France for communicating about best practices in data management. As a founding member of the DataCite consortium, INIST is the French agency which assigns DOI identifiers (adapted to the data). In line with the FAIR principles, INIST offers many services for data management and use. To that end, INIST created Opidor, a set of tools and services for optimizing research data sharing and interoperability.

  • CatOpidor : a catalog with descriptions of services in France for reliable data management (computing platform, data platform, management platform, assistance in data management). Search is possible by type of service, by location or by discipline. Search is possible by type of service, by location or by field.
  • DMP OPIDoR portal, help tool for preparing an on-line Data Management Plan (DMP).
  • PidOpidor : an application for research organizations for bulk DOI management and assignment.
The Open Science Committee (CoSO)

The CoSO is responsible for the implementation of the of the French National plan for open science published in France in July 2018. It brings together 200 members from all fields and they are organized in bodies. One of them adresses research data. A blog features national and international news and the committee’s work. In 2020, the research data body plans to perform a feasibility study for a shared data repository, an investigation into the use of digital tools, and research data in French scientific communities, and focus on the appropriation of open science by discipline-specific communities.

International institutions

Datacite

Datacite with an essential role for research data management. Founded in 2009, representatives include data centers, libraries, government bodies, universities, and research organizations from 20 countries.

One of DataCite’s primary responsibilities is to provide lasting identifiers (DOI) for research data. A tool also provides information about the corresponding set according to different bibliographic standards (e.g. American Chemical Society or IEEE) using the data set DOI.

The working group (Metadata working Group) develops and creates metadata standards. One of the group’s major achievements is the publication of the metadata of the metadata standard recommended by Datacite, with associated examples. DataCite’s roadmap can be consulted here. DataCite also has a support service for using the different services proposed.

CoDATA

The Committee on Data (CODATA) is the data committee for the International Science Council (ISC). The purpose of the committee is to improve the availability and use of data in all research fields via international collaboration. France is one of the 20 associate members countries, with many international scientific unions including the International Union of Pure and Applied Chemistry, International Union of Pure and Applied Physics and the International Union of Crystallography.

CODATA focuses on data interoperability and ease-of-use. In the organization’s strategic programme one of the issues is Fundamental Physical Constants. Another working group focuses on nanomaterials. Achievements include the publication since 2014 of a data journal entitled Data Science Journal (see Datajournals); a blog to inform the community. CODATA collaborates with large data conferences such as SciDataCon and the International Data Week (IDW) and participates in the working groups set up by RDA.

World Data Systems

World Data Systems (WDS) (Trusted Data Services for Global Science) is an interdisciplinary organization created in 2008 by the International Science Council. WDS has 81 members (including the Centre de Données astronomiques in Strasbourg). The many active working groups collaborate with Research Data Alliance groups. The WDS is comprised of 2 offices : the International Program Office (IPO) that has the task of coordinating the operations of WDS and is responsible for implementing the decisions of the Scientific Committee, as well as performing day-to-day tasks; the International Technology Office (ITO) with the mission to build trustworthy and enduring global research data infrastructure for the public good. Also noteworthy is the network for early careers researchers networks. WDS also organizes “SciDataCon”, a congress which takes place every two years with CoData.

GO FAIR

GO FAIR Data was initiated by open data players to develop the application of FAIR principles, by making data findable, accessible, interoperable, and reusable. Self-managed, the GO FAIR ecosystem is open to researchers, institutions and organizations work in networks. Actions aim to spread the word about data management in establishments and train staff responsible for data processing.

Go Fair, Research Data Alliance, CODATA and World Data Systems have published the Data Together statement which signals the reinforcement of cooperation between the four main international organizations for research data.

The European infrastructures

European Open Science Cloud

European Open Science Cloud (EOSC) is a portal for uniting European open science initiatives. The project was launched in 2016. An EOSC statement has been approved by more than 70 institutions, leading to the production of a roadmap for implementing the EOSC of the EOSC’s actions: architecture, data, services, access and interfaces, rules, and governance. The current portal lists available services training aids related to data management (e.g. on text searching).

EUDAT

EUDAT Collaborative Data Infrastructure (or EUDAT CDI) is a collection of services for the promotion of collaborative research data management in Europe. Backed by a network of over 20 European research organizations and centers for data and calculation centres. EUDAT has been developed through close collaboration between over 50 research communities from many scientific disciplines. See information about available services.

OpenAire (Open Access Infrastructure for Research in Europe)

Backed by several European programs (DRIVER, OpenAire and OpenAirePlus), the OpenAire which currently unites 31 countries, aims to unite the open science field through the 34 national offices (National Open Access Desks or, has the goal of bringing together all the players involved in open science through 34 national offices (National Open Access Desks or NOADs). The portal OpenAire Explore portal references 37 million publications, 1.5 million theses and 975,000 data.
The following are available to researchers, developers and funding bodies:

Publishers

Scientific integrity is also preoccupying publishers, and data management has a role: the scientific community must be able to question methods and means for acquiring data. In this context, several organisations in which publishers and other players participate, implement principles related to the FAIR management of data.

Committee for Publication Ethics (COPE)

The Committee for Publication Ethics (COPE) which addresses all issues related to scientific integrity, taked data and reproducibility into account by presenting case studies (e.g. on the reproducibility of methods and retraction), brainstorming seminars, etc. Committee for Publication Ethics (COPE)

Centre for Open Science

The Center for Open Science published in 2015 the Transparency and Openness Promotion (TOP) Guidelines in the journal Science. The guidelines list eight topics (Citation Standards, Data Transparency, Analytic Methods (Code) Transparency, Research Materials Transparency, Design and Analysis Transparency, Study Preregistration) and three levels of compatibility. Many publishers (such as Elsevier, TandF, Springer, American Geophysical Union Cambridge University Press Oxford University Press, Research Data Alliance, DataOne, American Society of Civil Engineers) and 1100 journals attempt to apply TOP principles. A search engine identifies journals compatible with these principles.

Coalition for Publishing Data in the Earth and Space Sciences (COPDESS)

Founded in 2014, the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS), makes recommendations about good practices in research data management and data sharing in the field of Earth Sciences, directed towards researchers and editors. Best practices are in line with Data Commitment Statement of COPDESS, signed by numerous publishers, by organisations like Datacite/Re3data and by repositories like DataOne, Pangaea (more information here). The coalition actively participates in the Enabling FAIR Data Project. It provides a FAQ and a guide for authors, see Author Guidelines.

COPDESS brings together funders, organisations, researchers, publishers like the American Geophysical Union (AGU), Proceedings of the National Academy of Sciences (PNAS), Nature, Science, Elsevier, PLOS, Hindawi, Copernicus Publications,

To go further…

Digital Curation Center

The Digital Curation Centre (DCC) is a center of expertise backed by the University of Edinburgh and specialized in the field of data processing, retention and sharing. The main goal of the center is to favor the acquisition of skills in research data management by the scientific community (researchers, documentation professionals). Since 2011, the DCC has provided access to a large number of resources, which include guidelines or check-lists case studies, tools Data Management Plans legal questions, data description standards, etc.

FOSTER

The portal FOSTER portal assembles multiple European resources for open science training. The portal is the result of a project conducted and financed as part of the Horizon2020 program, involving 11 universities and research organizations in 6 countries (Germany, Denmark, Spain, Netherlands, Portugal, United Kingdom). A large number of resources are available about text and data mining, data management, and research reproducibility. The contents are shared by a community of trainers in all languages.