CBF/imgCIF Extensions Dictionary

Draft version 1.5 for comment

Image dictionary (imgCIF)

Extended data types

The following extended data types are defined in this dictionary:

Code Primitive data type Regular expression construct Description
code char [_,.;:"&<>()/\{}'`~!@#$%A-Za-z0-9*|+-]* code item types/single words ...
ucode uchar [_,.;:"&<>()/\{}'`~!@#$%A-Za-z0-9*|+-]* code item types/single words (case insensitive) ...
line char [][ \t_(),.;:"&<>/\{}'`~!@#$%?+=*A-Za-z0-9|^-]* char item types / multi-word items ...
uline uchar [][ \t_(),.;:"&<>/\{}'`~!@#$%?+=*A-Za-z0-9|^-]* char item types / multi-word items (case insensitive)...
text char [][ \n\t()_,.;:"&<>/\{}'`~!@#$%?+=*A-Za-z0-9|^-]* text item types / multi-line text ...
binary char \n--CIF-BINARY-FORMAT-SECTION--\n\ [][ \n\t()_,.;:"&<>/\{}'`~!@#$%?+=*A-Za-z0-9|^-]*\ \n--CIF-BINARY-FORMAT-SECTION---- binary items are presented as MIME-like ascii-encoded sections in an imgCIF. In a CBF, raw octet streams are used to convey the same information.
int numb -?[0-9]+ int item types are the subset of numbers that are the negative or positive integers.
float numb -?(([0-9]+)[.]?|([0-9]*[.][0-9]+))([(][0-9]+[)])?([eE][+-]?[0-9]+)? float item types are the subset of numbers that are the floating point numbers.
any char .* A catch all for items that may take any form...
yyyy-mm-dd char \ [0-9]?[0-9]?[0-9][0-9]-[0-9]?[0-9]-[0-9]?[0-9]\ ((T[0-2][0-9](:[0-5][0-9](:[0-5][0-9](.[0-9]+)?)?)?)?\ ([+-][0-5][0-9]:[0-5][0-9]))? Standard format for CIF date and time strings (see http://www.iucr.org/iucr-top/cif/spec/datetime.html), consisting of a yyyy-mm-dd date optionally followed by the character 'T' followed by a 24-hour clock time, optionally followed by a signed time-zone offset. The IUCr standard has been extended to allow for an optional decimal fraction on the seconds of time. Time is local time if no time-zone offset is given. Note that this type extends the mmCIF yyyy-mm-dd type but does not conform to the mmCIF yyyy-mm-dd:hh:mm type that uses a ':' in place if the 'T' specified by the IUCr standard. For reading, both forms should be accepted, but for writing, only the IUCr form should be used. For maximal compatibility, the special time zone indicator 'Z' (for 'zulu') should be accepted on reading in place of '+00:00' for GMT.

