*%%%%%* *%%%%%%%%%%%* *%%%%%/\|/\%%%%%* *%%%%%%|--2--|%%%%%%* C Y C L O P S 2 ...... STAR data name checker *%%%%%\/|\/%%%%%* *%%%%%%%%%%%* Version 2.1.5 *%%%%%* 29 November 2009 CYCLOPS2 is a fortran program for checking STAR data names against data -------- name dictionaries written in DDL-STAR format proposed by Tony Cook of ORAC Ltd., Leeds. Data names may be checked in any text file. CYCLOPS Version 2 by Copyright © 1997 Sydney R. Hall (syd@crystal.uwa.edu.au) Crystallography Centre University of Western Australia Nedlands 6009, AUSTRALIA and Copyright © 1997 Herbert J. Bernstein (yaya@bernstein-plus-sons.com) Bernstein + Sons 5 Brewster Lane Bellport, NY 11713, U.S.A. Version 2 handles DDL1 and DDL2 dictionaries, and uses CIFtbx2 The latest program source and information is available from: Em: syd@crystal.uwa.edu.au ,-_|\ Sydney R. Hall sendcif@crystal.uwa.edu.au / \ Crystallography Centre Fx: +61 9 380 1118 --> *_,-._/ University of Western Australia Ph: +61 9 380 2725 v Nedlands 6009, AUSTRALIA
In order to ensure continuing availability of source code and documentation CYCLOPS2 and its documentation are subject to copyright. This does not prevent you from using the program, from making copies and changes, but prevents the creation of "closed source" versions out of the open source versions. See NOTICE.
Science is best served when the tools we use are fully understood by those who wield those tools and by those who make used of results obtained with those tools. When a scientific tool exists as software, access to source code is an important element in achieving full understanding of that tool. As our field evolves and new versions of software are required, access to source allows us to adapt our tools quickly and effectively.
In the early days of software development, most scientific software source code was freely and openly shared with a minimum of formalities. These days, it appears that carefully drawn legal documents are necessary to protect free access to the source code of scientific software. We are all deeply indebted to Richard Stallman for showing us how a creative combination of copyrights and seemingly restrictive licenses could give us truly unfettered freedom to use programs, to read their source code and to develop new versions. The GNU project, and the Linux project have shown that an open source approach works. We use the GNU General Public License (the "GPL") for our program starting with the releases of CIFtbx3. Older versions use the license from OpenRasmol. The OpenRasMol conditions for use have correctly been called "GPL-like".
If you are a user of this program, you will find that the copyrights and notices ask little more of you than that you avoid mistakes by others by keeping the notices with copies, display scientific integrity by citing your sources properly and treating this like other shared scientific developments by not inferring a warranty. If you are a software developer and wish to incorporate what you find here into new code, or to pick up bits and pieces and used them in another context, the situation becomes more complex. Read the copyrights and notices carefully. You will find that they are "infectious". Whatever you make from our Open Source code must itself be offered as Open Source code. In addition, in order to allow users to understand what has changed and to ensure orderly development you have to describe your changes.
This version of CYCLOPS is released with CIFtbx2. Before reading this document, please read the CIFtbx2 README file. The basic instructions for building CYCLOPS2 are included with those for building CIFtbx2. Additional information needed for the installation of CYCLOPS appears in the section INSTALLATION NOTES, below. If you are on a UNIX system, the Makefile for CIFtbx2 can be used to build and test cyclops.
NOTE: If no dictionaries are specified on the command line, an attempt is made to open a file named "STARDICT" to use either as a dictionary or from which to obtain a list of names of dictionaries.
STARDICT may directly contain a dictionary as in version 1 of CYCLOPS, or contain a list of file names of dictionaries, one per line. In that case the list must begin with the line for which the first 5 characters are "#DICT". The list of unreferenced data names is suppressed unless the line "#VERBOSE" appears in STARDICT, or the command line argument "-v yes" is used.
In a UNIX-like environment, the program is run as:
cyclops [-i input_text] [-o validation_output] \ [-d dictionary] [-p priority] [-c catck] \ [-f command_file] [-v verbose] [-s short]\ where: input_text defaults to $CYCLOPS_INPUT_TEXT or stdin validation_output defaults to $CYCLOPS_VALIDATION_OUT or stdout dictionary defaults to $CYCLOPS_CHECK_DICTIONARY (multiple dictionaries may be specified) input_text of "-" is stdin, validation_output of "-" is stdout -c has values of t or 1 or y vs. f or 0 or n, (default f, i.e. no category checking), -v has values of t or 1 or y vs. f or 0 or n, (default f, i.e. non-verbose), -s has values of t or 1 or y vs. f or 0 or n, (default f, i.e. not short), short restricts output to items not in dictionaries -p has values of first, final or nodup (default first for first dictionary has priority) a command file may contain additional arguments
1.0. Before you try to install this version of cyclops2
*** ========================================================== *** *** ========================================================== *** *** ==>>> You must have ciftbx version 2.5.4 or greater <<<== *** *** ==>>> installed in a directory named ciftbx.src. <<<== *** *** ==>>> The scripts mkdecompln and rmdecompln, which <<<== *** *** ==>>> come with ciftbx, must be installed in the <<<== *** *** ==>>> top level directory and executable. <<<== *** *** ==>>> To test cyclops, you must have compressed <<<== *** *** ==>>> copies of cif_core.dic and cif_mm.dic installed<<<== *** *** ==>>> in a directory named dictionaries. <<<== *** *** ========================================================== *** *** ========================================================== ***The directory structure within which you will work is
top level directory ------------------- | | ------------------------------ | | | dictionaries ciftbx.src cyclops.src ------------ ---------- -----------
You may have acquired this package in one of several forms. The most likely are as a "C-shell Archive," a "Shell Archive", or as separate files. The idea is to get to separate files, all in the same directory, named cyclops.src, parallel to the directory ciftbx.src, but let's start with the possibility that you got the package as one big file, i.e. in one of the archive file formats. Place the archive in the top level directory.
*** ========================================================== *** *** ========================================================== *** *** ==>>> The files in this kit will unpack into a <<<== *** *** ==>>> directory named cyclops.src. It is a good idea<<<== *** *** ==>>> to save the current contents of cyclops.src <<<== *** *** ==>>> and then to make the directory empty <<<== *** *** ========================================================== *** *** ========================================================== ***If you are on a machine which does not provide a unix-like shell, you will need to take apart the archive by hand using a text editor. We'll get to that in a moment.
1.1. ON A UNIX MACHINE
If you have the shell archive on a unix machine, follow the instructions at the front of the archive, i.e. save the uncompressed archive file as "file", then, if the archive is a "Shell Archive" execute "sh file". If the archive is a "C-Shell Archive" execute "csh file".
1.2. IF YOU DON'T HAVE UNIX
If sh or csh are not available, then it is best to start with the "C-Shell Archive" and do the steps that follow. If you must use the "Shell Archive" you should be aware that the lines you want to extract have been prefixed with "X", while most of the lines you want to discard have not. For a "C-Shell Archive" such prefixes are rare and the file is easier to read. Assume you have a "C-Shell Archive".
Use your editor to separate the different parts of the file into individual files in your workspace. Each part starts with a lot of unixisms, then several blank lines and then two lines which identify the file, and most importantly, contain the text "CUT_HERE_CUT_HERE_CUT_HERE" You can look at the line before and the line after to see if you are at the head or tail of a file. Use your editor to search for the "CUT_HERE" lines. Each part is carefully labeled and indicates the recommended filename for the separated file. On some machines these filenames may need to be altered to suit the OS or compiler.
1.3. MANIFEST
The partitions are as follows:
part filename description 1 COPYING GPL (GNU General Public License) 2 NOTICE Notices 3 cyclops.src/README.cyclops this file 4 cyclops.src/MANIFEST a list of files in the kit 5 cyclops.src/Makefile a control file for make to compile and test the code 6 cyclops.src/cyclops.cmn CYCLOPS common block 7 cyclops.src/cyclops.f CYCLOPS fortran source 8 cyclops.src/cyclops_test.prt CYCLOPS stdout test output 9 cyclops.src/mtest.cyc CYCLOPS STARCHEK output
For non-UNIX-like environments, you will have to provide replacements for iargc, getarg and getenv. The following are reasonable possibilities:
integer function iargc(dummy) iargc=1 return end subroutine getarg(narg,string) integer narg character*(*) string string=char(0) if (narg.eq.1) string='-vn' return end subroutine getenv(evar,string) character*(*) evar,string string=char(0) if(evar.eq."CYCLOPS_INPUT_TEXT") * string='STARTEXT'//char(0) if(evar.eq."CYCLOPS_VALIDATION_OUT") * string='STARCHEK'//char(0) if(evar.eq."CYCLOPS_CHECK_DICTIONARY") * string=char(0) return end
This combination of substitute routines would "wire-in" CYCLOPS2 to read its input text from a file named STARTEXT, write the validations output to a file named STARCHEK, and check names against the default dictionary or file listing dictionaries STARDICT
Note that the PARAMETER 'MAXBUF' should contain the maximum number of characters contained on a single text line. The default value is 200.
If you don't wish to use the Makefile or can't, then here are the essential steps to build cyclops:
(a) compile 'cyclops.f' [note that you will need the files 'cyclops.cmn', 'ciftbx.cmn', ciftbx.cmv', 'ciftbx.cmf', and 'ciftbx.sys' in the same directory as cyclops.f, or you will need to change the fortran includes.]
(b) if you hove not already done so, compile 'ciftbx.f' and 'hash_funcs.f' to create the object files 'ciftbx.o' and 'hash_funcs.o'.
(c) link 'cyclops.o', 'ciftbx.o' and 'hash_funcs.o' together to make the program 'cyclops'
(d) if at all possible, please test cyclops by creating a file named 'STARDICT' with the three lines
#DICT cif_core.dic cif_mm.dicand place a copy of cif_core.dic and of cif_mm.dic into the same directory. Then execute the command
./cyclops mtest.prt STARCHEK 2> cyclops_test.lstCompare STARCHEK to 'mtest.cyc' and 'cyclops_test.lst' to 'cyclops_test.prt' There should not be any differences.
(e) if you have any problems with this process please report them to Herbert J. Bernstein [em: yaya@bernstein-plus-sons.com, ph: 1-631-286-1339] for the changes from CYCLOPS to CYCLOPS2 in particular, or to Syd Hall [em: syd@crystal.uwa.edu.au fx: 61(9)3801118] for general ciftbx and CYCLOPS issues. If in doubt as to where your problem lies, send it to whichever one of us is more likely to be convenient to your time-zone, and we will try to sort things out for you.
dictionary input input on device 1 Optionally a list of dictionaries, 1 per line after a #DICT line validation output output on device 6 ('stdout') Input text input on device 1, if a file, 5 if 'stdin' Message device output on device 0 ('stderr') may be diverted to the validation output file Direct access in/out on device 3
CYCLOPS Check List ------------------ Dictionary data names = 2244 New data names in text = 4 [1] Dictionary cif_core.dic 2.0.1 data names = 624 [2] Dictionary cif_mm.dic 0.9.01 data names = 1620 Data names NOT in Dictionary Line Numbers _blat1 . . . . . . . . . . . . . . . . . . . . . . . . 9 11 94 96 197 199 306 312 318 324 330 _blat2 . . . . . . . . . . . . . . . . . . . . . . . . 13 15 98 100 201 203 303 309 315 321 327 _dummy_test . . . . . . . . . . . . . . . . . . . . . 5 7 90 92 193 195 217 _rubish_here . . . . . . . . . . . . . . . . . . . . . 447 [1] Dictionary cif_core.dic 2.0.1 [2] Dictionary cif_mm.dic 0.9.01 Line Numbers [2] _atom_site.calc_attached_atom . . . . . . . . . . 429 [1] = _atom_site_calc_attached_atom 428 [2] _atom_site.calc_flag . . . . . . . . . . . . . . . 426 [1] = _atom_site_calc_flag 425 [2] _atom_site.fract_x . . . . . . . . . . . . . . . . 38 44 50 406 [1] = _atom_site_fract_x 405 [2] _atom_site.fract_y . . . . . . . . . . . . . . . . 39 45 51 410 [1] = _atom_site_fract_y 409 [2] _atom_site.fract_z . . . . . . . . . . . . . . . . 40 46 52 414 [1] = _atom_site_fract_z 413
cyclops < text > summarythe new command line could be
cyclops text STARCHEK 2> summary
The STARCHEK output has been reorganized to be more practical with very large dictionaries. The list of data names not defined in any dictionary now comes first. This is followed by the data names which are both defined and referenced. Finally, the names which are defined but not referenced are listed.
The number of line numbers listed for any name has been reduced from 99 to 19. If there are more than 19 lines on which a name is used, this is indicated after the 10th line number by ".." with the first 10 and the last 9 line numbers being explicitly given.
To avoid having to copy large dictionaries and to allow for multiple, layered dictionaries, the file STARDICT may optionally contain a list of paths to dictionaries to load. To distinguish this case from a dictionary loaded directly from STARDICT, the first 5 characters of STARTDICT must be "#DICT" followed by a list of dictionary file names, one per line. If the dictionaries are not in the same directory, give full path names to the proper directories.
***** WARNING ***** CYCLOPS2 will overwrite an existing STARCHEK file without warning messages.
Release 2.1.5 adapted CYCLOPS2 to CIFtbx 4.1.0 to start the conversion to handling of DDLm dictionaries when they become available.
Release 2.1.4 restored the missing code for the "-f command-file" command line option. Examples were changed to use cif_core.dic and cif_mm.dic. Several typos in the documentation were corrected. The Makefile was updated to use compressed dictionaries. The new command line option "-c catck" to control dictionary category checking was added. The default is not to check.
Release 2.1.3 added the option "-p priority" to the command line, and changed the handling to STARDICT to prevent opening of STARDICT if any dictionaries are specified on the command line. The default list of tags to be ignored was updated from the DDL0 list to the DDL1.4 list. The scan for tags enclosed in parentheses, brackets or braces was made more robust.
Release 2.1.2 added the option "#SHORT" to STARDICT and "-s short" to the command line to allow for a short output of unreferenced names only. A bug which prevented recognition of names of the form "*_name" was corrected. Duplicate dictionary file names are now detected and removed.
Release 2.1.1 made the necessary changes to adapt to the reorganization of 'ciftbx.cmn' and 'ciftbx.sys' in ciftbx 2.5.1.
Release 2.1.0 reorganized the report, so that "[n]" is used to identify the fact that a tag came from dictionary "n" in a consolidated report for all dictionaries sorted alphabetically by name, first giving the data names which were not referenced, then the data names which were referenced, and, optionally (depending on the command-line argument -v[y|n]), a section reporting the data names defined in the dictionary, but not referenced.
Release 2.0.4 used ciftbx 2.4.6 to report upper/lower case versions of data item names.
Release 2.0.3 introduced support for aliases. The output is annotated with the remarks "Referenced Alias: ..." or "Unreferenced Alias: ..." listing the name of any referenced or unreferenced aliases to the name being cited. An unreferenced alias is not given a further listing, but a referenced alias will be listed with its own citations.
Release 2.0.2 fixed a problem with identification of data names within quoted strings. In this release, the first blank after the initial underscore terminates the search for the end of the token.
Release 2.0.1 differed from release 2.0 in having the format of the REPORT output cleaned up for better column alignment and some missing parameters on some of the cerr calls corrected.
*** ===>>>Note the change in line 12 from the same example in earlier versions of 'cyclops.sh' <<<=== ***
#! /bin/sh DICT=/usr1/syd/cif EXT=cif_core.dic if [ $# -eq 2 ]; then EXT=$2 fi DICT=$DICT/$EXT rm STARCHEK rm STARDICT echo "#DICT" > STARDICT echo $DICT >> STARDICT cyclops $1 STARCHEK rm STARDICT vi STARCHEKAs a first test, after compiling and linking cyclops.f to create cyclops, use the above script to check 'cif_core.dic' itself.
Enter:
cyclops.sh cif_core.dicNote that the two "extra" data names detected in the dictionary arose from an appendix and an example. Use the listed line numbers to check this in the dictionary file.
If you have a CIF that you wish to validate against the IUCr core dictionary, enter:
cyclops.shIf you have a CIF that you wish to validate against another dictionary (say, cifdic.P92), enter:
cyclops.shcifdic.P92
yaya@bernstein-plus-sons.com