I came upon a piece on an award won by Michele Samorani at the Leeds School of Business, part of the University of Colorado at Boulder. I read it with some interest which turned to doubt and outright cynicism – it claimed that you could input the characteristics of the molecules, (“such as the presence of carbon”, a highly unique trait among drug candidates), the algorithm would pick out the important ones and then it could act as a virtual screen for your library. There would hardly be any point in making the things at all.
It all sounds like another MBA approach to science, which does not do well with the simplifying principles that business folks like to apply.
I was curious enough about the paper that I sought out some more information. You can see the submitted paper for yourself. It is some pretty heavy data mining (beyond me anyway), but the essence I took from it was that he can use his technique – based on “multi-relational data mining” methods – to make predictions at least as good as anything else out there. The catch here is that he has looked at two particular things, not the panopoly of all medicinal chemistry and pharmacology, so it is a first step on the road to such a process rather than an end product.
The parameters examined are quite representative of what sort of question a med chemist might ask though: one looks at binding to a receptor (i.e. activity) and the other looks at the Ames test (so a screen for mutagenicity and thus toxicity). One is relatively straightforward – a molecule should either fit well or not – but mutagens can operate by many different mechanisms, so is quite a challenge for a virtual screen. It aslo looks at how the atoms are connected to formulate its comparisons, which certainly increases the complexity of the input compared to Lipinski’s Rule of Five.
It was not very clear to me what the paper set out to achieve. As far as I could tell, the data was not used to test new compounds, but to pinpoint elements of the structures which gave rise to the examined property. I was a little disappointed with the conclusions it reached, which were ones that any medicinal chemist could have reached without resorting to software. I’ll give that the benefit of the doubt and say that it is working towards making more sophisticated problems and it would be in its predictive powers that users would be more interested in any case. I also did not see any mention of the reported “presence of carbon”, the nearest I saw being the “number of carbon atoms”, which is at least a more useful parameter to look at.
So in the end, a first step on a road to some useful data mining techniques for drug development, an area where many people have attempted to make predictions about the “drug-ability” of the proposed compounds before wasting valuable time and chemicals in actually making them. The piece on the web page touting Michele’s achievement is quite misleading about the state of its development, the work of a third year graduate student, not of countless man-hours in an industry think-tank. I noted that Michele is planning on working on other problems, so I hope that someone is going to take the work so far onto the next phase.