Extension of RasMol to Display Surfaces,
to Handle CML/XML and Other New Features
by
Herbert J. Bernstein
Dowling College, Oakdale, NY 11769, USA
Frances C. Bernstein
Bernstein + Sons, Bellport, NY 11713, USA
(Work Supported in part (for HJB) by NSF grant DBI-0203064)
(expanded report of talk presented at IUCr meeting in Geneva, August 2002)
RasMol
- Molecular graphics program written by R. Sayle in 1992
- Heavily used, well established
- Multiple platforms (MS Windows, Mac, Unix, VMS,
)
- Open-source (GPL-like license)
- Reads:
- 'pdb' (Protein Data Bank format),
- 'mdl' (Molecular Design Limited's MOL file format),
- 'alchemy' (Tripos' Alchemy file format),
- 'mol2' (Tripos' Sybyl Mol2 file format),
- 'charmm' (CHARMm file format),
- 'xyz' (MSC's XMol XYZ file format),
- 'mopac' (J. P. Stewart's MOPAC file format) or
- 'cif' (IUCr CIF or mmCIF file format)
- Draws:
- Wireframe, Sticks
- Spacefill (CPK)
- Backbone Ribbons, Cartoons
- Hydrogen Bonds, S-S Bonds
- Dotted surfaces (effectively a translucent surface)
- Exports GIF, PPM, BMP, PICT,
Capabilities being Added
- Read CML/XML
- Draw
- Translucent and Transparent Surfaces
- Solvent Accessible Surfaces
- Molecular Surfaces
- Bond editing
- Multimolecule support from UCB RasMol
- Multilingual support
- More platforms
Surface Calculation
- Commonly accepted definitions [Lee, Richards 71]
- Roll a probe molecule (e.g.water) over the van der Waals surface
- Solvent (accessible) surface -- surface traced by the center of the probe (a CPK surface with extended radii)
- Molecular surface -- surface of the volume from which the probe is
excluded (contact surfaces (spherical) + reentrant surfaces (toroidal))
- Extensive literature, e.g. [Connolly 96]
- Can be an expensive calculation
- RasMol has supported solvent surfaces drawn by dots
- RasTop removes interior dots
Two Carbon atoms and an Oxygen probe
New Approach for RasMol
- Solvent surface by extending radii in CPK calculation
- Molecular surface:
- Note that all surface segments result from
- Probe touching just one atom, or
- Probe tangent to two atoms
- Three atom case results from pseudo 3-membered ring of 2-atom cases
- CPK spheres
- Toroidal bond surfaces for atom pairs
- Hidden line removal
Reentrant surface formed by probe atom rolling in tangent contact
Top left: molecular surface; Top Right: CPK model
Bottom Center: pseudo-bond three-membered ring
Transparent and Translucent Surfaces
- Truly transparent bodies cannot be seen
- Transparent effect by using bodies that absorb part of the spectrum
- Translucent bodies reflect part of the incident light from their surface
- Full model of transparent/translucent objects requires
- Model of entire volume
- Ray tracing
- Multiplicative cascading of absorbing layers
- Handling of dispersion
- Approximation for RasMol
- Approximate multiplication by minimization
- Model homogeneous volumes by their surfaces
- Avoid ray tracing
- Render opaque objects first
- Make a second pass for transparent objects
Transparent rendering of alternate conformers of residue 29 of 4ins
The Basics of XML
- Definition of XML:
- XML document:
- character data intermingled with "markup"
- "&" , "%", "<" , ">" highly significant in XML
- "Whitespace" in XML (as well as in CIF)
- refers to any non-empty sequence of spaces, tabs or line-terminators.
- XML name
- string beginning with a letter, underscore ("_") or colon (":")
- letters, digits, hyphens, underscores, colons or periods (".").
- reserve names beginning with "xml" (case-insensitive) or containing ":"
- XML "system" literal string
- quoted either with single quote ("'") or a double quote
- may not contain the character chosen as the quote mark (unlike CIF)
- "'" and """ may be used within character data
XML Markup consists of
start-tags |
<name>
<name attribute=value attribute=value ...> |
-- marks the beginning of an XML element. The attribute-value pairs are optional and no attribute may appear more than once
|
end-tags |
</name> |
-- marks the end of the XML element begun by the start-tag with a matching name
|
empty-element
tags |
<name/>
<name attribute=value attribute=value ... />
|
-- this is a special form equivalent to <name></name> or <name attribute=value attribute=value ...></name> which is used when a tag has no content
|
entity references |
&name;
%name;
|
-- entity references refer to objects by name. The symbols "&", "<", ">", "'", and the double quote are represented by "&", "<", ">", "'", """ respectively. |
character
references |
&#nnn;
|
-- specifies a character with decimal unicode value nnn
|
|
&#xhhh; |
-- specifies a character with hexadecimal unicode value hhh
|
comments |
<!-- comment --> |
-- this special markup is used to include comment text
|
CDATA sections |
<![CDATA[ character_data ]]> |
-- this special markup is used to embed text which might otherwise be interpreted as markup.
|
document type
declarations |
<?xml version="1.0"?> |
-- this optional special markup unambiguously identifies an XML document.
|
|
<!DOCTYPE name ... > |
-- this optional special markup provides information on the markup declarations that define the grammar of the document.
|
element type
declarations |
<!ELEMENT name contents> |
|
attribute list
declarations |
<!ATTLIST name elementname type default ... >
|
|
entity declarations |
<!ENTITY name entity_definition >
<!ENTITY % name parsed_entity_definition >
|
|
notation
declarations
|
<!NOTATION name id >
|
|
processing
instructions |
<?program_name parameters ?> |
|
XML has been used as a framework for definition of a Chemical Markup Language (CML)
[Murray-Rust, Rzepa 99]. The program Jmol [Gezelter 99] is able to display
CML datasets.
A typical fragment of a CML dataset presents atomic coordinates by columns, as seen in this
example of methanol distributed as an example in the Jmol release:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE molecule SYSTEM "cml.dtd" []>
<molecule id="METHANOL">
<atomArray>
<stringArray builtin="id">a1 a2 a3 a4 a5 a6</stringArray>
<stringArray builtin="elementType">C O H H H H</stringArray>
<floatArray builtin="x3" units="pm">-0.748 0.558 -1.293 -1.263 -0.699 0.716
</floatArray>
<floatArray builtin="y3" units="pm">-0.015 0.420 0.202 0.754 -0.934 1.404</floatArray>
<floatArray builtin="z3" units="pm">0.024 -0.278 -0.901 0.600 0.609 0.137</floatArray>
</atomArray>
</molecule>
Issues in Handling CML/XML in RasMol
- Problems
- XML is tree oriented, "naturally" column-based -- could get all X's before any Y's or Z's
- Equivalent data may be at different depths in different data sets or even at different depths in the same tree.
- There is no inherent distinction between rows and columns
- CML has many dialects, and no "enforcement" of the CML DTD
- Any existing packages used must be open source
- Some CML test datasets have conflicts between declared units and actual values.
- Resources
- Solutions Chosen
- Parse with Expat
- Reliable and easy to work with
- Supports C
- License compatible with RasMol's
- Low overhead (event-driven)
- Parse to a CIF data structure
- Disadvantages
- Must translate CML/XML tags to CIF tags
- Extra data structures -- uses time and memory
- Advantages
- Translating CML/XML tags to CIF tags resolves ambiguities
- Extra data structures -- provides structure needed to sort atomic coordinates into tables of rows<
- Allows the code to be extracted as a standalone cml2cif.
- Follow Jmol example and ignore units
Data used to translate from CML to mmCIF
typedef struct _cmlitem {
char* tag; char* atttag; char* valtag; char* atttag2; char* valtag2;
char* ciftag; char* ciftag2; char* ciftag3; cifitem ciftype; } cmlitem;
#define CMLlist 200
#define NULL (char *)0
static cmlitem CMLitems[CMLlist] = {
{ "atom", "id", "%", NULL, NULL,
"_atom_site.label_atom_id", NULL, NULL, tag },
{ "atomArray", NULL, NULL, NULL, NULL,
"atom_site", NULL, NULL, category },
{ "coordinate3", "builtin", "xyz3", NULL, NULL,
"_atom_site.Cartn_x",
"_atom_site.Cartn_y",
"_atom_site.Cartn_z", tag },
CML File rendered by Jmol and RasMol
Example of CML/XML data set of Glycine Crystal from Jmol release kit, rendered on the left by Jmol and on the
right by RasMol
References
- [Bernstein et al. 98] Bernstein, H.J.,Bernstein, F.C., Bourne, P.E.
"pdb2cif: Translating PDB Entries into mmCIF Format," , J. Appl. Cryst.,
31, pp. 282-295, 1998, software available from http://www.iucr.org/iucr-top/CIF and
http://ndbserver.rutgers.edu
- [Bray, Paoli, Sperberg-McQueen 98] Bray, T., Paoli, J., Sperberg, C. M., eds,
"Extensible Markup Language (XML)", W3C Recommendation 10-Feb-98, REC-xml-19980210,
http://www.w3c.org/TR/1998/REC-xml-19980210
- [Connolly 96] M. L. Connolly "Molecular Surfaces: A Review", 1996,
http://www.netsci.org/Science/Compchem/feature14.html.
- [Gezelter 99] Gezelter, D., "Jmol" an open source
Java program. See http://www.openscience.org/jmol.
- [Lee, Richards 71] B. Lee, F. M. Richards "The interpretation of
protein structures: Estimation of static accessibility." J. Mol. Biol.
55: 379-400 (1971).
- [Murray-Rust, Rzepa 99] Murray-Rust, P., Rzepa, H., "Chemical markup, XML
and the WWW, Part I: Basic principles," J. Chem. Inf . Comp. Sci, 39 No. 6,
928-942,(1999). See http://www.xml-cml.org.