Standard formats in chemistry
Reuse of research data is often hindered by the computer formats used. It is important to carefully consider format choices when preparing a data management plan. Choose standards that are open or widely recognized by the community. These are the main examples for chemistry. (1)
Format | Description |
---|---|
Jcamp | Extension .jdx or .dx. Open and universal standard specific to spectrometry. Used since 1988, it is one of the oldest formats. It is managed by IUPAC and compatible with most spectrum viewers. |
mzML | An open format, created in 2006, dedicated to mass spectrometry. Uses XML coding. Most proprietary formats can be converted into mzML using a converter (e.g. CompassXport for Bruker, MSConvert for Agilent, ThermoFisher, Shimadzu etc.) |
mol | Molecule proprietary format created by MDL, it is one of the most common formats used for accurately coding molecules. Most software reads MOL files or exports in this format. |
sdf | Structure data format. MOL format extension, also developed by MDL. Encodes several molecules , like the MOL format, and metadata can also be included (“tags”). |
rxn | Format also developed by MDL during the 1990s. The most popular format for storing information about reactions. Contains reagents and reaction products. |
rdf | Reaction data file. For storing reactions and molecules and includes tags at the end of the file. |
cml | Open metadata format derived from XML, applied to chemistry and developed at the end of the 1990s. Encodes molecules, reactions and spectra without losing associated information. Format compatible with tools such as JChemPaint, Jmol, XDrawChem, MarvinView. |
Smiles | Universal format that encodes a molecule in a line of text. Useful for describing sub-structures. It can also be used to encode reactions. |
Isomeric Smiles | Extension of the previous format and can be used for stereochemistry. |
InChi | Another universal encoding format for molecules from a line of text. Provides more details than Smiles. |
InChi Key | Another format for encoding molecules in a line of text that is found in many software programs and data banks. |
xyz | A more specific format that defines molecule geometry. |
FID | Proprietary format developed by Bruker, encoding RMN data. |
Other resources:
OpenBabel on-line converter specialized in chemistry formats
ChemAxon converter for on-line command conversion
- 1. This typology has been prepared with the assistance of Thierry Billard, CNRS Director of Research at ICBMS (UMR CNRS 5246).