Data Formats

Back to the Glossary

Choosing an appropriate format is essential for the long-term preservation and reuse of research data. The fullfillment of both of these requirements however may not always lead to the same data format. From the perspective of long-term preservation, the choise of format is often based on storage aspects (e.g. solid documentation, and the use of an open format type); from the perspective of data-reuseability, the choice of format may be primarily made on the availability of easy-to-use processing software and the standards (if any) used in the designated research community.

In general, data formats should respect national and international specifications of the corresponding scientific areas:

  • RADAR accepts all data formats for the preservation of research data. In order to facilitate the long-term preservation and reuse, RADAR-recommended formats are introduced below. The list follows international specifications such as the "Sustainability of Digital Formats" list from the Library of Congress and requirements of the corresponding scientific areas and is regularly verified and updated.

  • Always choose widespread file formats - it should be possible to use common and well-known software to open and read the files.

  • Use a clear and sequenced naming for data files - the file name should relate to its content.

  • Parameter notation within data packages should follow internationally recognised standards (for example, the International System of Units, SI). If such standards are not available for specific research fields, the use of institute or workgroup specific notations and corresponding nomenclature is recommended.

  • Datasets deployed in RADAR must be adequately described with the respective metadata scheme, from which at least the mandatory fields must be filled out. This can be made through our website or by completing the adequate XML-template. This document, as well as the respective exemplary files can be found here.

Data typeRecommended formats*Other appropriate formats#Unfit formats+

Text document

XML-based formats such as Microsoft Office XML (.docx),
Open Office XML (.sxw),
Open Document Format (.odt) and structured text/markup
(.xml, .sgml, .html, .dtd, .xsd and others)

Portable Document Format PDF, PDF/A-1, PDF/A-2 (.pdf)
Rich Text Format (.rtf)
Plain Text (.txt)

Microsoft Word (.doc)

Spreadsheet

Comma Separated Values (.csv)
as well as XML-based formats such as
Microsoft Office XML (.xlsx)

Portable Document Format PDF, PDF/A-1, PDF/A-2 (.pdf)Microsoft Excel (.xls)

Database

ANSI SQL (.sql)
Comma Separated Values (.csv)

  

Statistical data

 

SPSS Portable (.por)
SAS transport (.sas)
STATA (.dta)

  

Graphic (picture or raster-based)

TIFF (.tif, .tiff)
GeoTIFF (.geotiff for geo-referenced graphics) 
Adobe Digital Negative (.dng; for raw data from digital cameras)

Portable Network Graphics (.png)
Joint Photographic Expert Group (.jpeg, .jpg)
Graphics Interchange Format (.gif)
Bit-Mapped Graphics Format (Microsoft) (.bmp)
Photoshop (Adobe) (.psd)
CorelPaint (.cpt)
JPEG2000 (.jp2, .jpx)
RAW image format (.nef, .crw and others)

 

Graphic (vector-based)

 

Scalable Vector Graphics (.svg)

Portable Document Format PDF, PDF/A-1, PDF/A-2 (.pdf) 

Video

Lossless AVI (.avi)
MPEG-1, MPEG-2 (.mpg, .mpeg)
MPEG-4, H264 (.mp4)
FLV (.flv)

  

Audio

 

WAVE (.wav)
AIFF (.aiff)

  

Computer Aided Design (CAD)

AutoCAD DWG (Version 2000),
DXF (.dfx, Release 12/14)

  

Geographical Information (GIS)

MapInfo Interchange Format (.mif, .mid) or Esri Shapefile (.shp + .shx + .dbf) for vector data,
GeoTIFF for raster data

  

Virtual Reality, 3D

X3D (.x3d; contains animations as well as DG,P,F,I,B,M,V,L,T,G)
OBJ (.obj; contains DG,P,F,I,B,M,G)
COLLADA (.dae, XML-based; contains animations & DG,P,B-Rep,F,I,B,M,V,L,T,G)
PLY (.ply; contains DG,F,I,B,M)

Virtual Reality Modeling Language (.vrml and .avi, .mpg, .jpeg)
Universal 3D Format (.u3d and .avi, .mpg, .jpeg)
STL (.stl and .jpeg)
DXF (.dfx and .jpeg)

 

Nuclear Magnetic Resonance (NMR)

 Instrument-specific data formats
(Varian/Agilent, Bruker, Jeol)

Research areas often have their own subject-specific data-/file-formats.
These formats arise from discipline-related guidelines/companies
and remain necessary due to their subject-specific nature.

Difference gel electrophoresis (2D-DIGE)


TIFF (.tif, .tiff)
Gel Image (.gel)
Research areas often have their own subject-specific data-/file-formats.
These formats arise from discipline-related guidelines/companies
and remain necessary due to their subject-specific nature.

Back to the Glossary

* Recommended formats: The file formats can be safely assumed to remain readable and reuseable for a long period of time.

# Other appropriate formats: The file formats are likely to remain readable for several years.

+ Unfit formats: The file formats are unlikely to remain readable more than a few years.