IUPAC SMILES+ Contributors: Vincent F. Scalfani (Chair), Evan Bolton, Helen Cooke, Chris Grulke, John Irwin, Oliver Koepler, Gregory Landrum, José L. Medina-Franco, Miguel Quirós Olozábal, Susan Richardson, and Issaku Yamada.
v0.1,2019-04-15: Working Draft
IUPAC SMILES+ Project No. 2019-002-024
Copyright © 2020, IUPAC
Content is available under GNU Free Documentation License 1.2
This IUPAC SMILES+ Specification Appendix [working draft] document is a modified derivative of the OpenSMILES Specification. We have endeavored to maintain all prior author names, contributor names, copyright notices, and revision history.
OpenSMILES Specification
Craig A. James
v1.0,2016-05-15: Current specification
Copyright © 2007-2016, Craig A. James
Content is available under GNU Free Documentation License 1.2
OpenSMILES Contributors: Richard Apodaca, Noel O’Boyle, Andrew Dalke, John van Drie, Peter Ertl, Geoff Hutchison, Craig A. James, Greg Landrum, Chris Morley, Egon Willighagen, Hans De Winter, Tim Vandermeersch, John May
Daylight proposed, and OpenEye actually implemented, an extension that
specifies bonds to external R-groups. An external R-group is specified
using ampersand '&'
followed by a ring-closure specification (either a
digit, or '%'
and two digits). However, unlike ring-closures, the bond is to
an external, unspecified R-group. Example: n1c(&1)c(&2)cccc1
- 2,3-substituted pyridine.
Daylight (Weininger) proposed, but never implemented, an extension for crystals and
polymers. Daylight also used the ampersand '&'
character, (which may
conflict with the R-group proposal, above), but with the added rule that
if a number appears more than once, it creates a repeating unit.
SMILES | Name |
---|---|
|
polystyrene |
|
diamond |
|
graphite |
The directional '/'
and '\'
marks for cis/trans bonds seem simple on
the surface but are problematic for complex systems. The issue is that
in conjugated systems one directional bond can be used in defining the
configuration of two double bonds. When assigning the directional bonds the
existing labels must be considered or rewritten. In a long series of
conjugated double bonds, changing the configuration of one bond can require
rewriting dozens of bond symbols.
More importantly, there is a theoretical flaw with the use of '/'
and
'\'
. It is possible to write valid SMILES for the molecule
cyclooctatetraene by alternating
directional assignments for the cis configurations. However, as shown below attempting
to change one configuration is not possible. Reassigning the directional labels for
adjacent double bonds will not work as it reassignment propagates around the ring
and the conflict is not resolved.
Including directional labels to explicit hydrogen atoms is a possible resolution but does not follow standard-form and complicates the assignment procedure.
Depiction | SMILES | Comment |
---|---|---|
|
cyclooctatetraene |
|
Todo |
|
one bond changes two configurations |
The proposed syntax for double bond configurations uses the '@'
and '@@'
atom-based
specification. For example:
Depiction | SMILES | Name |
---|---|---|
|
trans-difluoroethene |
|
|
||
|
cis-difluoroethene |
|
|
Interpretation of '@'
and '@@'
follows the tetrahedral convention:
The atoms, as encountered in the SMILES string, are either in anticlockwise
'@'
or clockwise '@@'
order as viewed on the page. Since cis/trans
configurations are planar, they can also be "viewed from underneath the
page", which results in the two valid SMILES shown for each compound,
above.
As with the other atom-bases specifications one must consider the relative
position of implicit atoms. It is not always true that a trans form has
opposite "clock-ness" ('@‘,’@@'
or '@@‘,’@‘), and the cis form
has the same "clock-ness" (’@‘,’@'
or '@@‘,’@@'
).
Depiction | SMILES | Name |
---|---|---|
|
trans-difluoroethene |
|
|
||
|
cis-difluoroethene |
|
|
Atom-based '@'
and '@@'
for the stereo-specification of double bonds does not
suffer from the theoretical flaw illustrated with cyclooctatetraene. The assignments
are not-shared and adjacent configurations do not need to be considered. This is more
flexible and and simplifies generation of canonical SMILES.
Depiction | SMILES | Name |
---|---|---|
|
cyclooctatetraene |
Note that the first stereo-specification carbon must be represented as '@'
since the
'1'
follows the H
, whereas the rest of the carbons use '@@'
to characterize the
cis configuration of each bond. Since this is a specification on the atom, rather than
the single bond, no conflict arises at the ring-closure bond.
This section needs considerable work. The following text is courtesy Chris Morley, who commented: "I guess the last paragraph doesn’t look too good in a formal specification. There are two reasons for the frailty: lack of proof that the radical and aromatic uses can always be unambigous (I doubt anybody has tried); and a known deficiency in the parser." However, it is a good starting point…
A single lowercase symbol is interpreted as a radical center. CCc
is an alternative to CC[CH2]
and
is the 1-propyl radical; CcC
or C[CH]C
is the 2-propyl radical, Co
is the methoxy radical. An odd
number of adjacent lowercase symbols is a delocalized conjugated radical. So Cccccc
is CC=CC=C[CH2]
or CC=C[CH]C=C
or C[CH]C=CC=C
Lowercase 'c'
or 'n'
can be used in a ring: C1cCCCC1
is the cyclohexyl
radical.
The use of the non-aromatic lowercase symbol is a shorted form with improved intelligibility that allows the use of implicit hydrogen in radicals. However it is intended only for simple unambiguous molecules and is not reliable when combined with aromatic atoms.
An interesting extension that specifies conformational information via bond dihedral angles and bond lengths was proposed by McLeod and Peters:
Revision | Date | Description | Name |
---|---|---|---|
1.0 |
2020-09-24 |
Transfer proposed extensions to this appendix |
Vincent F. Scalfani |