Topliss batchwise scheme reviewed in the era of

Werbung
Pharmacoinformatics Research Group
Department of Pharmaceutical Chemistry
Topliss batchwise scheme
reviewed in the era of Open Data
Lars Richter, Gerhard F. Ecker
Dept. of Pharmaceutical Chemistry
[email protected]
pharminfo.univie.ac.at
Topliss batchwise scheme
Topliss substituent
proposals
Topliss ranking schemes
subst.
π
3,4-Cl2
4-Cl
4-CH3
4-OCH3
H
1Topliss
1
σ
-σ
π+σ
Es
scheme
new substituent selection
1
1
5
1
2-5
π
2
2
4
2
2-5
3-CF3, 4-Cl; 3-CF3, 4-NO2; 4CF3; 2,4-C12; 4-c-C5H9; 4-cC6H11; 4-CH(CH3)2; 4-C(CH3)3;
3,4-(CH3)2; 4-O(CH3),CH3; 4OCH2Ph; 4-N(C2H5)
σ
3-CF3, 4-Cl; 3-CF3, 4-NO2; 4CF3; 2,4-C12; 4-c-C5H9; 4-cC6H11
3
4-5
4-5
4
5
3
2
1
3
et al. J Med Chem 1977
3
5
4
2-5
2-5
1
-σ
π+σ
4-N(C2H5)2; 4-N(CH3)2; 4-NH2;
4-NHC4H9; 4-OH; 4OCH(CH3)2; 3-CH3,4-OCH3
3-CF3, 4-Cl; 3-CF3, 4-NO2; 4CF3; 2,4-C12; 4-c-C5H9; 4-cC6H11
Topliss batchwise scheme
Series of five phenyl-substituted propafenone
derivatives measured against P-Glycoprotein
substituion
EC50
rank
3,4-Cl2
0.150
5
4-Cl
0.132
4
4-CH3
0.063
2
4-OCH3
0.045
1
H
0.079
3
Which compound should be synthesized next?
Topliss batchwise scheme
Topliss
propafenone dataset
substituent
EC
rank
proposals
3,4-Cl
0.150
5
substituion
Topliss ranking schemes
50
2
subst.
π
σ
-σ
π+σ
Es
scheme
-σπ
3,4-Cl2
1
1
5
1
2-5
4-Cl
2
2
4
2
2-5
4-CH3
4-OCH3
H
3
4-5
4-5
4
5
3
2
1
3
3
5
4
2-5
2-5
1
σ
-σ
π+σ
4-Cl substituent
0.132 selection
4
new
4-CH3
0.063
2
3-CF3, 4-Cl; 3-CF3, 4-NO2; 44-OCH3
0.045
1
CF3; 2,4-C12;
4-c-C5H9; 4-cC6H11;
4-CH(CH3)2;
4-C(CH3)3;
H
0.079
3
3,4-(CH3)2; 4-O(CH3),CH3; 4OCH2Ph; 4-N(C2H5)
3-CF3, 4-Cl; 3-CF3, 4-NO2; 4CF3; 2,4-C12; 4-c-C5H9; 4-cC6H11
4-N(C2H5)2; 4-N(CH3)2; 4-NH2;
4-NHC4H9; 4-OH; 4OCH(CH3)2; 3-CH3,4-OCH3
3-CF3, 4-Cl; 3-CF3, 4-NO2; 4CF3; 2,4-C12; 4-c-C5H9; 4-cC6H11
Topliss batchwise scheme
propafenone dataset
-σ
substituion
EC50
rank
3,4-Cl2
0.150
5
4-Cl
0.132
4
4-CH3
0.063
2
4-OCH3
0.045
1
H
0.079
3
• 4-N(CH3)2 derivative was
synthesized and tested
• no affinity increase
4-N(CH3)2
How often do Topliss schemes
(π, σ, -σ, π+σ, Es) occur in large databases?
How useful do Topliss schemes prove in
activity optimization?
www.openphacts.org
www.openphacts.org
How often do Topliss patterns occur?
1. Return 3,4-dichloro substituted compounds
in postgresql ChEMBL 20 using RDKit cartridge
9312 cpds
540 x
2a. For each 3,4-Cl2 substituent check for availablity of
4-Cl, 4-OCH3, 4-CH3 and H substitutions
200 series
3. Check for each compound series for bioactivity data
(pChEMBL) measured in
- same target in same assay
SQL query
- activity type = IC50 or Ki
- plus, if available, activity
for new subst. selection
3nM 5nM 8nM 9nM 10nM
1
2
3
4
5
new substitution
selection
1108 bioactivity data
for additional substituents
Raw data output after mining ChEMBL
200 series
new substitution
selection
1108 bioactivity data
for additional substituents
3nM 5nM 8nM 9nM 10nM
1
2
3
4
5
How often do Topliss patterns occur?
subst.
π
σ
-σ
π+σ
Es
3,4-Cl2
1
1
5
1
2-5
4-Cl
2
2
4
2
2-5
4-CH3
3
4
2
3
2-5
4-OCH3
4-5
5
1
5
2-5
H
4-5
3
3
4
1
13
7
3
2
34
# of series
57 of 200 series (29%) extracted from ChEMBL 20
follow a Topliss pattern
200 series
3nM 5nM 8nM 9nM 10nM
1
2
3
4
5
distribution of 200 series
π
σ
-σ
π+σ
Es
others
How useful do Topliss prove in activity
optimization?
Topliss
pattern
# of
series
substituent
selection [1]
more
active [2]
percent
age
π
13
29
9
31 %
σ
7
9
1
11 %
-σ
3
5
1
20 %
π+σ
2
2
1
50 %
scheme
π
3-CF3, 4-Cl; 3-CF3, 4-NO2; 4CF3; 2,4-C12; 4-c-C5H9; 4-cC6H11; 4-CH(CH3)2; 4C(CH3)3; 3,4-(CH3)2; 4O(CH3),CH3; 4-OCH2Ph; 4N(C2H5)
σ
3-CF3, 4-Cl; 3-CF3, 4-NO2; 4CF3; 2,4-C12; 4-c-C5H9; 4-cC6H11
-σ
[1] For each series, bioactivity for substituents, proposed
by Topliss new substituent selection were collected from
ChEMBL 20, if available.
[2] Check whether proposed substituents lead to more
active cpds
Topliss approach seems to have difficulties for
series following the σ scheme in activity
optimization for the series found in ChEMBL.
new substituent selection
π+σ
4-N(C2H5)2; 4-N(CH3)2; 4-NH2;
4-NHC4H9; 4-OH; 4OCH(CH3)2; 3-CH3,4-OCH3
3-CF3, 4-Cl; 3-CF3, 4-NO2; 4CF3; 2,4-C12; 4-c-C5H9; 4-cC6H11
poor performance of -σ
is in agreement with
propafenone data
How useful do Topliss prove in activity
optimization?
propafenone dataset
-σ
substituion
EC50
rank
3,4-Cl2
0.150
5
4-Cl
0.132
4
4-CH3
0.063
2
4-OCH3
0.045
1
H
0.079
3
target
Type
Topliss proposal for propafenone dataset,
4-N(CH3)2, did not show activity gain.
Are there -σ series in ChEMBL with
bioactivity data for 4-N(CH3)2
substitution?
4-OCH3 (nM)
4-N(CH3)2 (nM)
EC50
45
82
Alpha-1a adrenergic receptor
(ChEMBL)
Ki
0.3
0.8
µ-opioid receptor (ChEMBL)
Ki
0.50
63
P-Glycoprotein
Also in the two cases of ChEMBL the -σ proposal 4-N(CH3)2
failed to increase activity.
Topliss batchwise scheme
propafenone aryloxy
non Topliss
substituion
EC50
rank
3,4-Cl2
0.522
5
4-Cl
0.190
4
4-CH3
0.063
1
4-OCH3
0.180
3
H
0.079
2
Ranking pattern 5 4 1 3 2 in
this dataset can‘t be assigned
to an existing Topliss scheme
How often does the pattern 5 4 1 3 2 occur
in ChEMBL?
In general, which other, non Topliss pattern
occur frequently in ChEMBL?
Which non Topliss pattern occur in
ChEMBL?
subst.
new1
3,4-Cl2
1
new2 new3
5
5
aryloxy
5
4-Cl
2
2
3
4
4-CH3
4
4
1
1
The pattern found in aryloxy
dataset, does not occur in ChEMBL
However: High similarity to new3
distribution of 200 series
4-OCH3
3
1
4
3
H
5
3
2
2
# series
6
4
4
0
Do we find an underlying physicochemical
driving force in the new3 pattern?
Can we extrapolate to aryloxy dataset?
π
σ
-σ
π+σ
Es
new1
new2
new3
Correlation analysis within new3
series
target name
pattern
# of cpds in series [1]
r (π)
r (σ )
Prostanoid EP 1 rec
53142
5+8
-0.81**
Adenosine A3 rec
53142
5+8
-0.54*
Adenosine A3 rec
53142
5+8
-0.67**
Chymase
53142
5 + 13
-0.49**
P-Glycoprotein
54132
5
[1] Next to the 5 datapoints from 3,4-Cl2, 4-Cl, 4-OCH3, 4-CH3 and
H, bioactivity data from other substituents listed in Topliss et al
1977 were selected for correlation analysis.
r (vdw_area)
** p < 0.05 , * p < 0.10
Correlation analyses were undertaken to calculate the Pearson
correlation coefficient (r) between physicochemical features π,
σ , vdw_area and the respective bioactivity data.
Statistically significant negative vdw_area correlations
indicate that new3 pattern & aryloxy bind to a tight pocket
Discover the ranking globe
How to look at the ranking space globally?
There are 120 (5!) ranking possibilites (patterns)
(1,2,3,4,5), (2,1,3,4,5), (1,3,2,4,5), … (5,4,3,2,1)
Calculation of Spearman’s rank correlation
distance matrix for 120 possibilities
(R function corDist)
Spherical MDS to represent the distance matrix
on the surface of a sphere (R function
smacofSphere), Kruksal-Stress = 0.15
Each point represents a pattern (e.g. 1,2,3,4,5)
similar patterns are in vincinity to each other
Frequency contour map
Color coding based on
frequency of patterns.
Red = high frequency
Blue = low frequency
Map analysis
Frequency contour map
Color coding based on
frequency of patterns.
-σ
• Only three –σ
pattern in ChEMBL
• In the investigated
cases, poor
predictability of –σ
scheme
aryloxy
steric island
π and σ continent
trench
Red = high frequency
Blue = low frequency
*Es
steric island
π and σ continent
• surrounded by Es
pattern
• lies in area with
negative vdw_area
correlation
Only Topliss patterns (π, σ, π+σ, Es ) and rankings patterns with
four or more series (new1, new2, new3) are schown.
Van der Waals contour map
Color coding based on
vdw_area correlations with
bioactivity.
Only series with activity data
for five additional derivatives
(e.g. 4-CF3, 4-OH ...) are used
in correlation analysis
(n>=10). Resulting
correlations with p > 0.1 were
omitted.
The remaining coefficients
were used for color coding.
Red ... positive correlation
Blue ... negative correlation
Summary & Outlook
• Open medicinal chemistry data such as those in ChEMBL allow
analysis of complex SAR patterns
• Connecting these data with data from pathways and diseases like
implemented in the Open PHACTS Discovery Platform will
open up completely new possibilities for linking chemical SAR
patterns to biological endpoints
• Quality of data is key for the analysis (assays)
Next steps
• Look for X-ray structures of complexes
• Analyse with respect to target classes
Pharmacoinformatics Research Group
Department of Pharmaceutical Chemistry
Pharmacoinformatics Research Group
Department of Pharmaceutical Chemistry
SQL query: get all 3,4-Cl2 compounds
RDKit
Chemoinformatics
toolkit 2014.03
SMILES
RDKit
cartridge
Data processing in python
ChEMBL 20 postgreSQL
> 13 000 000 activities
200 series
Pharmacoinformatics Research Group
Department of Pharmaceutical Chemistry
-> 120 ranking possibilies
are created
-> Spearman ranking
distance matrix calculated
-> Spherical MDS is
undertaken
-> X,Y,Z coordinates
are exported as CSV file
Coordinates.csv
Python data
preprossesing
Spherical MDS in R software
2D - EquidistantCylindrical
Projections
3D - Orthographic
Basemap toolkit
• provides list of globe projections
• create contour maps
Pharmacoinformatics Research Group
Department of Pharmaceutical Chemistry
For each series bioactivity data for
3,4-Cl2, 4-Cl, 4-CH3, 4-OCH3 and 4-H is
available
• For the majority of the series (91%) there are
bioactivity data for more substituents e.g. 4-CF3,
4-OH, 4-F, ... available. (Substituents taken from
„new substituent selection“)
• More than 57% of the series have activity data
for five or more additional substituents.
For series with 5 or more additional substituents (n>=10) correlation analysis were run:
Series_8
3,4-Cl2
4-Cl
4-CH3
4-OCH3
4-H
4-CF3
4-F
4-OH
3,4-(CH3)2
4-C(CH3)3
pIC50
6.3
7.0
7.4
7.6
8
6.9
7.7
6.6
7
6.1
vdw_area
134
117
116
131
99
129
103
109
134
152
In this example: R = -0.70, p = 0.03
Series 8 with pattern 5 4 3 2 1, has R(vdw) = -0.7
Pharmacoinformatics Research Group
Department of Pharmaceutical Chemistry
Details to Multidimensional Scaling with
First 2D MDS  bad Kruksal-Stress-1 > 0.2
Second 3D MDS  good Kruksal-Stress-1 = 0.11 but visualization not helpful
Third Spherical MDS  moderate Kruksal-Stress-1 = 0.15, good visualization
✔
get120Possibilities() ... creates a vector with 120 rankings [(1,2,3,4,5), (2,1,3,4,5) ...]
corDist ()
... calculates Spearman‘s rank correlation distance
smacofSphere() ... runs spherical MDS, type=„ordinal“ because we have rankings,
algorithm=„primal“ ... handling of ties
xyz.120
... x,y,z – coordinates of the MDS run
Coordinates (xyz.120) are exported to CSV file and are the input for Basemap
Pharmacoinformatics Research Group
Department of Pharmaceutical Chemistry
•
•
•
•
•
•
•
Potentielle Fragen:
-> Wie lange dauert so eine Suche wenn der Workflow steht  ~ 1 Tag (4 Prozessoren Rechner, 8GB RAM)
-> Wie werden Salze behandelt?  Skript ist so geschrieben dass diese nicht berücksichtigt werden. Soll
heißen es wäre potentiell möglich dass die diChloro verbindung ein Natriumsalz ist und das Methylderivat
ein Kaliumsalz. Wie auch immer in den 200 Serien war dies nie zu finden und spielt somit keine Rolle.
-> Wie steht es um Chiralität. Ich habe die Chiralität nicht berücksichtigt in der Query. Dies wäre möglich
gewesen aber da die Codierung von Chiralitäten in ChEMBL nicht umfassend ist habe ich es nicht
berücksichtigt.
-> wie groß muss den Unterschied sein zwischen den Bioaktivitäten damit es als Serie anerkannt wurde?
Im Topliss paper findet man rankings mit log >0.1 zwischen den Verbindungen. Wir haben darauf keine
Rücksicht genommen und alle Daten verwendet (so wie es übrigens auch die Gruppe die 2014 eine
ähnliche Analyse auch gemacht haben)
Die Datenanalyse zeigt von den 200 serien: Haben 43 eine Differenz von mindestens „>0.1 log“ zwischen
den rankings. 77 series haben 1 verstoß dieser regel, d.h. die differnz zwischen 2 rankings ist ein mal
kleiner 0.1. 80 haben dann 2 oder mehr verstöße.
 Warum habt ihr die anderen pattern 2pi-pi^2, pi-sigma usw. nicht berücksichtigt?  Die Komplexität
wäre deutlich höher gewesen ohne dass es einen nennenswerten Informationsgewinn gegeben hätte. Zur
Abgrenzung, die neuen pattern „new 1, new 2, new 3) fallen in keines der von Topliss postulierten pattern
auch nicht in die erweiterte Auswahl (2pi- pi^2, pi-3sigma, usw.)
Zugehörige Unterlagen
Herunterladen