LOFAR Data Format revisited
In document LOFAR-DATAFORMAT-001 Sydney Cadot describes the requirements for the LOFAR data format and shows that HDF5 meets about all requirements. The casacore table system has not been discussed in this document, but meets all requirements with the exception of:
- fully nestable data types - binding to Matlab and IDL - installable on Windows
A few requirements mentioned in the document are debatable, especially the one in 3.4.4: 3.4.2. Accomodating 32-bit file systems. This is not needed anymore. All operating systems support 64-bit file systems nowadays. 3.4.4 The data format should be optimized for sequential access of large arrays. This should be the opposite, because different applications (flagging, calibration, imaging) require very different access patterns to the same data. Instead the data format should allow for different efficient access patterns. 3.4.5. Pipeline processing. It is not needed to process data in a sequential way from tape. Possible processing of data in streaming mode will use very different data formats and is outside the scope of a disk-based data format discussion. Furthermore some possible requirements have been omitted:
- support of a boolean data type (in 3.5.1) - support of concurrent access by multiple processes - distributed storage. Note that distributed processing is a separate topic - thread safety
Needs for visibility data access The main data axes are baseline, time, and frequency. The application defines in which order the data will be traversed. For example, calibration steps through the data in time order, while for imaging it is preferable to step by frequency channel. Flagging is usually done per baseline in a running time/freq window. Plots can be made in all kinds of ways. The data can be regular, but that is not always the case. Shorter baselines might use longer time integration. The file format should be such that data traversal is possible in the various directions. The access in those directions shoyld be about equally fast. Brief comparison of HDF5 and CasaTables Both the Tables and the HDF5 data format are well suited for large collections of structured data. They share some characteristics, but differ in many others. The main difference is that HDF5 is hierarchical in nature, while CasaTables is relational. The 1980’s showed a move from hierarchical data bases to relational data bases because the latter offer more flexibility. Hierarchical data bases are (too) hard to traverse in a way different from the hierarchy. The following table gives a summary of the main differences between the formats and their (dis)advantages. HDF5 has a much wider user base. Hence, some more tools are available. However, the casapy tools like tablebrowser, tableplot, and casaviewer and the Table Query Language make inspection (and change) of CasaTables very easy. CasaTables
Usually single file (can be multiple) Directory of files
storage manager best suited Storage managers can be loaded dynamically, so very adaptable
Hierarchical Keywordset for table and per column
Higher level TaQL (SQL-like) Can also create, update, delete, and insert
supported by means of lock on entire table
Tableplot for arbitrary xy-plots (part of casapy, not casacore) TaQL
However, very slow when retrieving smallish data sizes (e.g. lines)
differences) Virtual Storage Managers to scale e.g. float to short
A data manager can be virtual (e.g. VirtualTaQLColumn)
Virtual columns New storage managers possible and are automatically loaded as needed
very responsive Only serious bug fixing New developments only if paying
Documentation Quite extensive, but not always clear Good class documentation
Very small chance of file corruption Very small chance of file corruption in case of machine crash
Rows can be deleted depending on storage manager
- Peter Fridman can access the table file containing the DATA directly in his RFI
- It is straightforward to store the DATA and FLAG as a normal file (outside table
system) and access it later as a table column using a dynamically loaded storage manager (like LofarStMan).
- FLAG can be a virtual column on top of LOFAR_FLAGS which can be an Int or so
White Pines Ranch Outdoor Education Center 3581 Pines Rd, Oregon IL 61061 (815-732-7923) Fax (815-732-7924) Emergency Medical Information School: Fairview South School Dates at WPR: September 11 – 13, 2013 Name of Participant_____________________Age____Birthdate______Boy/Girl Address ________________________City ____________ State ___ Zip_____ Name of Parent(s): ___
Infectious Diseases Society of America Emerging Infections Network Comments for Query: ‘ Antimicrobial Drug Shortages’ Comments made by 93 respondents [Pediatric responses are shown in blue font] Comments about Specific Cases/Examples of Drugs Affected • [Instead of] IV Bactrim, an AIDS patient was given po instead; he died from severe PJP, but he had horrif