Quality Control Level 2 Tools¶
- Table of contents
- Quality Control Level 2 Tools
Introduction¶
QC L2 tools consists of a QC tool (C/C++/bash) QC-Tool and a QC wrapper (python) QC-Wrapper, which collects messages and results from the QC tool in a central QC DB (oracle) QCDB located at DKRZ.
All code is stored in the svn at https://svn-mad.zmaw.de/svn/mad/Model/QualCheck. The QC L2 checks will be accomplished by WDCC and by BADC, possibly by PCMDI as well.
Information on the QC L2 performance and on criteria for QC L2 assignment are described in Quality-Control-Level-2
BADC set up a wiki page with notes about the QC tool installation: http://proj.badc.rl.ac.uk/go-essp/wiki/CMIP5/QualityControl
General Requirements ¶
- Get SVN access: send an email with subject prefix "[CMIP5_QC]" to dataATdkrz.de
with your details (name, email, institute) and that you would like to use the CMIP5QC tool with QC Wrapper. - and register on this redmine page (upper right corner)
- oracle: open port 1630 in your firewall
- Options named postgresdb are now in use for accessing the oracle DB
Contacts for QC Tools¶
- questions on QC tool QC-Tool: heinz-dieter.hollwegATzmaw.de
- other questions: martina.stockhauseATzmaw.de; dataATdkrz.de (Please use the subject prefix "[CMIP5_QC]"
- bug report / support: https://redmine.dkrz.de/collaboration/projects/cmip5-qc/issues/new
Contacts for QC Runs ¶
- BADC: kevin.marsh AT stfc.ac.uk
- WDCC: Martina / QC Tool: HDH; machine: cmip5qc1.dkrz.de as user cmip5qc; test on pinball.dkrz.de
- PCMDI painter1 AT llnl.gov (Jeff Painter); machines: multiple
- CSIRO
QC Tools¶
QC Tool¶
SVN: https://svn-mad.zmaw.de/svn/mad/Model/QualCheck/QC/branches/QC-0.3
User manual for QC tool: qc-user-manual-20110511.pdf
Checks performed:1. File consistency:
- in the end files will have the right number of records. The number is given in the metadata.
- strictly regular time steps (ESG checker allows for time gaps)
2. Metadata consistency (check of consistency between metadata of the standard_output table and the metadata of the file headers)
3. Physical properties of variables:- minimum and maximum are checked against specified ranges (default for an invalid current exterm value of a global field:
- mean:
mextr(t) = 1 / N - sum_{i=0}^{N=t / (delta t) - 1} extr_{i}
N:= index over all time steps 'delta t' up to the actual time t - Default (case for all variables in the standard_output table of K. Taylor):
mextr_{N-1} - extr_{N} > order_of_magnitude( mextr_{N-1} * 10^{5} - time series are calculated for:
- min
- max
- globally weighted mean (in case of no _FillValue)
- area weighted mean (in case of existing _FillValue; reasonable, e.g., for temperature of snow)
- standard deviation of the globally weighted mean.
Output: qc result files in netcdf for every atomic dataset
Record threshold checks¶
NOT IN USE !
SVN: https://svn-mad.zmaw.de/svn/mad/Model/QualCheck/QCWrapper/src/qcReccheck.py
The thresholds for the record checks are based on the initial values of Hans L. and extended by collegues of the MPI-M. The used threshold values are documented in table/ranges.txt.
Requirements:
- QCDB test1, qc1
- python >= 2.4 with modules sqlalchemy,
- oracle: cx_Oracle; oracle instant client packages "basic" and "sdk"; environment QCDB_TYPE=ORACLE
- openssl for password decryption of QCDB user ''qcdbuser''
- cdo >= 1.5 (1.4.5.1 has performance problems) for min/max extraction
- table with threshold values at table/ranges.txt
qcWrapper.py - QC Wrapper¶
SVN: https://svn-mad.zmaw.de/svn/mad/Model/QualCheck/QCWrapper/src/qcWrapper.py
Objectives ¶
- run the QC tools (QC-Tool,Record-threshold-checks) with a pre-defined and fixed set of options
- ensure the completeness of the DRS directory structure for the QC output directory
- prevent checking of old ESG published data versions
- store the QC results and messages (log files, error files, warning files, version, configuration) for all performed QCs in a central DB (QCDB)
User Manual of QC Wrapper: CMIP5-AR5-QC-L2.pdf
The wrapper uses the same configure file as the QC-Tool extended by a couple of options for the QCDB.
qcWrapper.py --configure=<qc.conf> [--rcheck] [--noqc] [--nodb] [--quiet] general flags: --noqc/-n don't run the QC tool but only fill QCDB with results (case of QCDB errors and deletion of QCDB) --nodb/-y only QC tool run without any inserts/updates in the QCDB --rcheck/-b do record checks on data not checked by QC tool --quiet/-x suppress print out of messages less severe than given level (default:INFO; other values: WARNING, ERROR, CRITICAL, ALL) --help/-h this message
Requirements¶
- QCDB test1, qc1
- python >= 2.4 with modules sqlalchemy,
- oracle: cx_Oracle; oracle instant client packages "basic" and "sdk"; environment QCDB_TYPE=ORACLE
- openssl for password decryption of QCDB user ''qcdbuser''
QCDB ¶
- Locations:
- ORACLE (CMIP5 usage) cera-www.dkrz.de (qc1 for production; test1 for testing; environment QCDB_TYPE=ORACLE)
- Access by sqlalchemy:
- ORACLE (cx_Oracle): engine=create_engine(`oracle://user:password@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(PORT=1630)(HOST=cman.dkrz.de))(ADDRESS=(PROTOCOL=TCP)(PORT=1521)(HOST=cmeta.dkrz.de)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=pcera.dkrz.de))(source_route=yes))`)
your firewall need to open port 1630
- ORACLE (cx_Oracle): engine=create_engine(`oracle://user:password@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(PORT=1630)(HOST=cman.dkrz.de))(ADDRESS=(PROTOCOL=TCP)(PORT=1521)(HOST=cmeta.dkrz.de)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=pcera.dkrz.de))(source_route=yes))`)
- Access QC Wrapper: user with password encrypted by openssl
- Tools to work with the QCDB: qcDbselect.py, qcExclude.py, qcAssignL2.py, and qcDbdelete.py
QC Result Analyses¶
qcDbselect.py - Select / Plot+Export¶
SVN: https://svn-mad.zmaw.de/svn/mad/Model/QualCheck/QCWrapper/src/qcDbselect.py
Objectives ¶
- check the results of the QC-Tool checks in the QCDB
- plot results of statistical QC tool and export quality metadata
- check for new data published
- check for experiments with finished QC L2 checks
- analyse exceptions using QC L2 exceptions and to aggregate the exceptions on DRS experiment, i.e. QC level using QC L2 assignment criteria
- export CIM quality documents
- export data for QC L3 double and cross-checks
qcDbselect.py --postgresdb=<database> [options]
- --postgresdb: test1 for testing and qc1 for production
qcDbselect 0.4 (martina.stockhause@zmaw.de, 2011-05-27) Check QC L2 Results in QCDB Usage: svn-work/QCWrapper/src/qcDbselect.py [-h/--help] [-q/--quiet] [-s/--postgresdb=db] [-a/--drsfile=drs] [-f/--experiment=experiment] [-e/--error] [-n/--download=destination] [-l/--logout=destination] [-r/--result=dataset] [-c/--list] [-u/--unchecked] [-o/--outcera=xmgrace template] [-d/--date=YYYY-MM-DD] [-v/--version=latest|vYYYYmmdd] [-x/--exportQC3] [-p/--esgds=publication unit] [-z/--cim] with: --postgresdb/-s specify db name of central QCDB; use test1 for tests and qc1 for production runs (mandatory) --drsfile/-a drs syntax file (old:drsold.txt, new:drsnew.txt, flag for mapping; default: drsnew.txt) --help/-h this message --quiet/-q silent mode (no stdout) option for --experiment or --esgds or --result --logout/-l download QC log messages to specified directory options for --experiment or --result --error/-e list only QC warnings and errors for experiment --download/-n download QC netCDF result files to specified directory option for --experiment or --esgds --cim/-z write a cim quality document (for experiment one for each esg dataset/publication unit and one for the aggregation on experiment level is written) mode Analyse Experiment: --experiment/-f specify experiment to extract from QcDB --outcera/-o generate output for CERA and CIM and plot data using the given xmgrace parameter template for plots --unchecked/-u return list of unchecked datasets for record checks (experiment) --version/-v extract only specific version of experiment ('latest' or 'vYYYYmmdd') --exportQC3/-x WDCC: export of QCL2 results for given experiment mode Atomic Dataset: --result/-r get QC result file for atomic dataset (dataset) mode Status Atomic Datasets (of Experiment for given --experiment): --date/-d list Atomic Datasets newer than given date (all experiments or of specified experiment) mode Status All Experiments: --list/-c list content/status (all experiments) in the QCDB; if --experiment specified: history of this experiment mode Publication Unit: --esgds/-p list of all Atomic Datasets for a ESG publication unit
Options/Examples¶
- List status of experiments qc checked (QC Stati Table):
qcDbselect.py --postgresdb=<database> --list qcDbselect.py --postgresdb=test1 --list
- List history of one experiment qc checked (QC Stati Table):
qcDbselect.py --postgresdb=<database> --list --experiment=<drs experiment name> qcDbselect.py --postgresdb=test1 --list --experiment=CMIP5/output/MPI-M/ECHAM6-MPIOM/rcp45
- Check for qc results (atomic datasets) newer than a given date for an experiment:
qcDbselect.py --postgresdb=<database> --date=<YYYY-MM-DD> [--experiment=<drs>] qcDbselect.py --postgresdb=test1 --date=2010-08-31 [--experiment=CMIP5/output/MPI-M/ECHAM6-MPIOM/rcp45]
- Check results of atomic datasets for experiment:
qcDbselect.py --postgresdb=<database> --experiment=<drs> [--error] [--cim] [--log=<download dir for logfiles>] qcDbselect.py --postgresdb=test1 --experiment=CMIP5/output/MPI-M/ECHAM6-MPIOM/rcp45 [--error] [--log=/tmp] Option: error - list only atomic datasets with occurred errors or which are unchecked Option: log - download all input and log files of QC tool run to specified directory Option: cim - create CIM quality documents for all esg datasets/publication units
- Check atomic dataset result and download QC netCDF result:
qcDbselect.py --postgresdb=<database> --result=<drs> [--download=<download dir for netcdf result>] qcDbselect.py --postgresdb=test1 --result=cmip5/output/MPI-M/ECHAM6-MPIOM-TR/amip/mon/atmos/Amon/r1i1p1/v20100928/cct [--download=/tmp]
- Create CIM quality document for esg dataset / publication unit (QC Stati Table):
qcDbselect.py --postgresdb=<database> --esgds=<drs> --cim qcDbselect.py --postgresdb=test1 --esgds=cmip5/output/MPI-M/ECHAM6-MPIOM-TR/amip/mon/atmos/Amon/r1i1p1/v20100928 --cim
- Plot netcdf results for experiment and export CIM metadata (initially designed as export of CERA metadata and has therefore now a misleading option name):
qcDbselect.py --postgresdb=<database> --experiment=<drs> --outcera=<xmgrace template> qcDbselect.py --postgresdb=test1 --experiment=CMIP5/output/MPI-M/ECHAM6-MPIOM/rcp45 --outcera=../table/qc_xmgrace.par
Requirements ¶
- QCDB test1, qc1
- python >= 2.4 with modules sqlalchemy, lxml
- oracle: cx_Oracle; oracle instant client packages "basic" and "sdk"; environment QCDB_TYPE=ORACLE
- openssl for password decryption of QCDB user ''qcdbuser''
- ncdump for checking the netcdf result for readability
- PLOT (--outcera): xmgrace,cdos (cdo >= 1.5 - 1.4.5.1 has performance problems) using template table/qc_xmgrace.par; xmgrace requires netcdf-3 qc results
qcExclude.py - Exclude ESG dataset / publication unit¶
SVN: https://svn-mad.zmaw.de/svn/mad/Model/QualCheck/QCWrapper/src/qcDbdelete.py
Objectives ¶
- exclude ESG dataset / publication unit from QC L2; atomic datasets connected to it are flagged (QCL2_excluded):
- They are excluded from QC L2 assignment by qcAssignL2.py and
- excluded from QC L2 data updates in the QCDB by qcWrapper.py
- include excluded ESG dataset / publication units again for QC L2; atomic datasets connected to it are flagged (QCL2_started, same as in qcWrapper.py):
qcExclude.py --postgresdb=<database> --esgds=<drs> --qstatus=<include|exclude> qcExclude.py --postgresdb=test1 --esgds=cmip5/output/MPI-M/ECHAM6-MPIOM-TR/amip/mon/atmos/Amon/r1i1p1/v20100928 --qstatus=exclude
- --postgresdb: test1 for testing and qc1 for production
- --esgds: DRS name of ESG dataset / publication unit
- --qstatus: valid values: "exclude" and "include"
Requirements ¶
- QCDB test1, qc1
- python >= 2.4 with modules sqlalchemy,
- oracle: cx_Oracle; oracle instant client packages "basic" and "sdk"; environment variables set cf. Quick Start item 4.
- openssl for password decryption of QCDB user ''qcdbuser''
qcDbdelete.py - Delete¶
SVN: https://svn-mad.zmaw.de/svn/mad/Model/QualCheck/QCWrapper/src/qcDbdelete.py
Objectives ¶
- delete entry from QCDB and corresponding files from DRS directory
qcDbdelete.py --postgresdb=<database> --log=<qc logfile> --entry=<drs> [--dbonly] qcDbdelete.py --postgresdb=test1 --entry=cmip5/output/MPI-M/ECHAM6-MPIOM-TR/amip/mon/atmos/Amon/r1i1p1/v20100928/tas --log=myqclogfile
- --postgresdb: test1 for testing and qc1 for production
- --dbonly: delete only QCDB entries for QC results but leave the QC results in the directory untouched
Note: entry could be an atomic dataset or experiment or any other aggregated set of data. sql wildcard '%' is allowed as well, but do use with care.
Requirements ¶
- QCDB test1, qc1
- python >= 2.4 with modules sqlalchemy,
- oracle: cx_Oracle; oracle instant client packages "basic" and "sdk"; environment QCDB_TYPE=ORACLE
- openssl for password decryption of QCDB user ''qcdbuser''
QC L2 assignment¶
qcAssignL2.py - assign QC Level 2¶
SVN: https://svn-mad.zmaw.de/svn/mad/Model/QualCheck/QCWrapper/src/qcAssignL2.py
Objectives¶
- assign QC Level 2 to the DRS experiment according to QC L2 assignment criteria
- update QC status in QCDB
- add QC result plot (pdf) to QCDB
- add comment on QC Level 2 assignment
- send CIM quality report to CIM repository (using pubEsgdsResult.py)
qcAssignL2.py --postgresdb=<database> --experiment=<drs experiment> --cimxml=<CIM quality report xml> --plot=<QC result plot> --comment=<comment on QCL2 assignment> qcAssignL2.py --postgresdb=test1 --experiment=CMIP5/output/MPI-M/ECHAM6-MPIOM/rcp45 --cimxml=cimqr_CMIP5_output_MPI-M_ECHAM6-MPIOM_rcp45.xml --plot=QCL2_images_CMIP5_output_MPI-M_ECHAM6-MPIOM_rcp45.pdf
- postgresdb: test1 for testing and qc1 for production
- --plot option optional but recommended
Requirements ¶
- QCDB test1, qc1
- python >= 2.4 with modules sqlalchemy,
- oracle: cx_Oracle; oracle instant client packages "basic" and "sdk"; environment QCDB_TYPE=ORACLE
- openssl for password decryption of QCDB user ''qcdbuser''
- run of qcDbselect.py with option --outcera to plot data (recommended)
and --cim to export CIM quality report xml (mandatory)
Send CIM QC documents to CIM QC repository¶
pubEsgdsResult.py - send CIM QC document to CIM QC repository¶
SVN: https://svn-mad.zmaw.de/svn/mad/Model/QualCheck/QCWrapper/src/pubEsgdsResult.py
Objectives¶
- upload given CIM QC document to CIM QC repository at DKRZ:
- documentVersion, documentID and pass flag set
- individualName, externalID (=DRS_ID) read from given XML
- CIM QC Documents are published by atom feed (http://cera-www.dkrz.de/WDCC/CMIP5/feed)
- CIM document versioning done during upload (pass-flag; uuid; external_id; QC Manager etc. are extracted from given XML)
- module used within qcAssignL2.py for QC L2 assignment for an experiment
pubEsgdsResult.py --postgresdb=<database> --xml=<CIM quality document xml> pubEsgdsResult.py --postgresdb=test1 --xml=/home/k204082/programs/QCWrapper/cimqr_cmip5_output_MTEST_ECHAM6-MPIOM-TR_amip.xml
- --postgresdb: test1 for testing and qc1 for production
- --plot option optional but recommended
Requirements ¶
- QCDB test1, qc1
- python >= 2.4 with modules sqlalchemy, lxml
- oracle: cx_Oracle; oracle instant client packages "basic" and "sdk"; environment QCDB_TYPE=ORACLE
- openssl for password decryption of QCDB user ''qcdbuser''
- run of qcDbselect.py with option --cim to create CIM QC template XMLs
Installation Instructions¶
1. Installation of the QC tool (checkout from https://svn-mad.zmaw.de/svn/mad/Model/QualCheck/QC/branches/QC-0.3) according to Readme.txt. netcdf-3 library is sufficient, netcdf-4 library is not needed.
2. Python >= 2.4 + modules sqlalchemy, lxml,cx_Oracle
3. openssl in path for password decryption of QCDB user
4. For use of qcDbselect.py 'ncdump' in path (QC netcdf result's readability checked by ncdump -h)
5. For use of qcDbselect --outcera option for netCDF plotting 'xmgrace' (incl. netcdf-3, ps and pdf support) and 'cdo' (version >= 1.5 with netcdf-3 support) in path
6. For Oracle: oracle instant client packages "basic" and "sdk"; environment QCDB_TYPE=ORACLE
Quick Start¶
QC tool is to be installed for netcdf-3 I/O.
Preconditions:
- QC tool checkout from https://svn-mad.zmaw.de/svn/mad/Model/QualCheck/QC/branches/QC-0.3
1. Install QC tool:
install netcdf library netcdf-3.6.3 with configuration export CC="/usr/bin/gcc" export CFLAGS="-static -O2" export CPPFLAGS="-DNDEBUG" export CXX="/usr/bin/g++" export CXXFLAGS="-static -O2" export FC="" export F90="" ./configure --prefix=path/netcdf-3.6.3 --disable-examples --disable-docs-install --disable-fortran-compiler-check \ make install cd QC/src edit compiler instruction file zg++ : set -DNC3 (netcdf-3 option) run ./zg++
- zg++ compiler script example (17.03.2011): example_zg__
- QC tool makefile example (12.12.2011): Makefile
2. Ensure QC tool is installed an running by:
qcManager -E_CHECK_TOOLS
3. Run test for QC tool:
cd QC/example edit qc_test.conf run ../QC/scripts/qcManager -f qc_test.conf
4. Check accessibility of QCDB:
- python module: sqlalchemy, cx_Oracle
installation and accessiblity check (port 1630):
- oracle instant client packages "basic" and "sdk"
(http://www.oracle.com/technetwork/database/features/instant-client/index-097480.html) - set softlinks for version independent libs, e.g.
libclntsh.so -> libclntsh.so.11.1
libocci.so -> libocci.so.11.1 - set or add to environment variables (cf. table/set_env.*):
- ORACLE_HOME=<instant client directory>
- LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$ORACLE_HOME"
- PATH="$PATH:$ORACLE_HOME
- QCDB_TYPE="ORACLE"
- python module cx_Oracle: In my case I needed to install libaio as well.
- test to import cx_Oracle into python
- check that your local firewall is open on port 1630
- test DB access by
src/access_oracle.py
5. Check QCDB result access by:
qcDbselect.py --postgresdb=test1 --list
6. Try an QC DB ingest by editing qc.conf example and running:
qcWrapper.py --configure=qc.conf
7. Ensure for plotting (qcDbselect --outcera) that ncdump, xmgrace, gs (or psmerge) and cdo (version >= 1.5) are installed and working