HDF5
.hdf5
Hierarchical Data Format 
API/Library to pack large dataset/array into single file (for transfer over network?)  
There are serial vs parallel versions. 
h5import  	# create hdf5 archive?
h5repack    	# extract single dataset/group from an hdf5 archive ?
h5ls   		# 
h5dump  
h5diff  	ph5diff
h5debug  
h5stat        
h5copy   
h5cc   	h5pcc
h5c++  
h5fc    h5pfc
h5jam  
h5unjam 
h5mkgrp        
h5redeploy  
h5repart  
h52gif  
gif2h5  
h5perf_serial  
NetCDF
.nc
NetCDF - Network Common Data Form.  
Often used for geospatial data storing array-oriented scientific data 
(eg input for CMAQ modeling, oceanography, ArcGIS). 
Older CDF format were flat, whereas HDF was Hierarchical.  Thus HDF originally provided more grouping functionality to organize data, good for long term maintenance of the data "archive". 
NetCDF v4  is the HDF5 format, with some restrictions.  (typically considered a simpler API with essentially the same functionality) 
CDF5 format is from parallel-netcdf project 
There are serial vs parallel versions (even for NetCDF v4?) 
One can learn to use nc commands without invoking hdf5 commands directly.  And if just using it, the choice "between" .hdf5 vs .nc is form the application on hand, unless one is debating writting a new app.  
NetCDF file are self describing, having all info in its header.  
Array Strucutured in header also provide random seek to desired record.  
module load  gccc/6.3.0 hdf5/1.8.20-gcc-p netcdf/4.6.1-gcc-p 
ncdump 			# show structure of file in ascii (from NCAR)
ocprint 
nccopy  
ncgen  
ncgen3  
nc-config  
Related tools
    - ncBrowse - view java graphics, 3D viz
- ncview - view multi dimention data with changing color map, etc.  X-based?
- NCL - NCAR Command Language - viz netCDF files (and other formats)
IOAPI
IOAPI is library used by CMAQ to read/write .nc files. 
(Fortran files are machine specific)
Much of the data in CMAQ are time-series.
ie, for the variables/arrays defined, IOAPI automatically apply time stamp or stepping info for them.
JDATE: YYYYDay eg 201935 is Feb 4 of 2019.
JTIME: HHMMSS  (10000 * Hour) + (100 * Minute) + Seconds
Logical filename.  Max 16 chars.  hide NetCDF internal file structure from user/programmer.
IOAPI cmd
From 
CMAQ manual (v4.6) p50 of pdf
M3XTRACT
extract a subset of variables from a file for a specified time
interval
M3DIFF 
compute statistics for pairs of variables
M3STAT 
compute statistics for variables in a file
build a boundary-condition file for a sub-grid window of
a gridded file
BCWNDW
build a boundary-condition file for a sub-grid window of
a gridded file
M3EDHDR 
edit header attributes/file descriptive parameters
M3TPROC
compute time period aggregates and write them to an output
file
M3TSHIFT 
copy/time shift data from a file
M3WNDW 
window data from a gridded file to a sub-grid
M3FAKE
build a file according to user specifications, filled with
dummy data
VERTOT 
compute vertical-column totals of variables in a file
UTMTOOL 
coordinate conversions and grid-related computations for
Lat/Lon, Lambert, and UTM
CMAQ
Community Modeling ...   
By EPA.
Ref:
CMAQ manual (v4.6) 
Modeling components and workflow overview - 
(see p56 of pdf).
CCTM - CMAQ chemistry-trasnport model - main program of CMAQ modeler.  Most of the other programs are pre-processor to prepare data for CCTM use.
Pre-processors 
- 
MM5 or WRF - Meteorology model (input?)
- 
SMOKE - Emission model (not part of CMAQ)
- MCIP.  Meteorological models, produce GRIDDESC  (one sub-type of IOAPI format) suitable for CCTM.
- 
ICON - prepare initial condition (eg from ascii), produce netcdf file as input to CCTM.  The input are specific to the modeling grid and chem parameterization.
 Data source could be ascii or previous CCTM output.
- 
BCON - Boundary Condition.  produce .nc file as input for CCTM.  
 Data source could be ascii or previous CCTM output.
- 
JRPOC - output is a lookup table of phtolysis rate (in clear sky condition), which is needed by CCTM to do its modeling.
Support tools
- 
PARIO - Govern communication of Parrallel run of CCTM.
- 
STENEX - diagnostic tool for CMAQ
- 
- 
BioInformatics File Format
sam
- sam/bam/cram :
- BCFtools: BCF2/VCF/gVCF
- VCF: Variant call format.  Mutation centric view 
- 
ChemInformatics File Format
PDB
                                         
Protein databank:
http://pdb.org or or
http://rcsb.org/pdb/
 
Sample protein:
1tii		A 900+ residues protein, not huge, but a sizable molecule for testing 3D rendering (especially in space filling model)
11AS, 117E	Some pdb file around size of Haemoglobin
1A00            Hemoglobin
1z1g            Topoisomerases, a large protein, "Molecule of the Month" for Dec 2006.
.smi
Smiles - 
http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
SMILES (Simplified Molecular Input Line Entry System) is a line notation (a typographical method using printable characters) for entering and representing molecules and reactions. Some examples are:o
CC  			ethane  	
[OH3+]  		hydronium ion
c1ccccc1  		benzene  	
N[C@H](C)C(=O)O  	D-alanine
COc1cc(cc(OC)c1OC)C(=O)N\N=C(/C)\c2ccc(cc2)S(=O)(=O)N[C@@H](C)C(=O)O	some carboxylic acid
                  
OpenEye Omega take .smi input smiles and produces .oeb.gz files.
.oeb
.oeb receptor file containing active site for docking.
Probably OpenEye propietary.  Fred can open these files (and maybe ROCS?)
Engineering is the art of making compromises.
Science is the reverse engineering of the compromises made by nature.
Medicine is the hacking of the scientific knowledge base.   - A comp sci student :-)
  hoti1
  bofh1