<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5484972306101057660</id><updated>2011-09-16T07:44:32.975-07:00</updated><category term='CUDA'/><category term='Model Applicability'/><category term='Descriptor-based similarity'/><category term='Definition'/><category term='parallel programming'/><category term='Tanimoto index'/><category term='Structure-based similarity'/><category term='Cheminformatics'/><category term='QSAR'/><category term='Chemical Space'/><category term='QSPR'/><title type='text'>Cheminformatics - QSAR</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>13</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-2061308551001595340</id><published>2011-09-15T17:48:00.000-07:00</published><updated>2011-09-15T17:57:24.792-07:00</updated><title type='text'></title><content type='html'>A comprehensive and detailed essay about Y-Scrambling/Y-Randomization.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;(A)&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.mathe2.uni-bayreuth.de/markus/pdf/pub/YRandQsar.pdf"&gt;Y-Randomization – A Useful Tool in QSAR Validation, or Folklore?&lt;/a&gt; - Christoph Rücker, Gerta Rücker, and Markus Meringer&lt;br /&gt;&lt;br /&gt;(B)&lt;br /&gt;&lt;span class="Apple-style-span" style="background-color: white; font-family: arial, helvetica, sans-serif; font-size: 12px; line-height: 18px;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;div class="cit" style="font-size: 0.91666em; line-height: 1.45em;"&gt;&lt;a abstractlink="yes" alsec="jour" alterm="J Chem Inf Model." href="http://www.ncbi.nlm.nih.gov/pubmed/17880194#" style="border-bottom-color: initial; border-bottom-style: initial; border-bottom-width: 0px; color: #333333; text-decoration: underline;" title="Journal of chemical information and modeling."&gt;J Chem Inf Model.&lt;/a&gt;&amp;nbsp;2007 Nov-Dec;47(6):2345-57. Epub 2007 Sep 20.&lt;/div&gt;&lt;h1 style="font-size: 1.3333em; font-weight: bold; line-height: 1.125em; margin-bottom: 0.375em; margin-left: 0px; margin-right: 0px; margin-top: 0.375em;"&gt;y-Randomization and its variants in QSPR/QSAR.&lt;/h1&gt;&lt;div class="auths"&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22R%C3%BCcker%20C%22%5BAuthor%5D" style="border-bottom-color: initial; border-bottom-style: initial; border-bottom-width: 0px; color: #333333; text-decoration: underline;"&gt;Rücker C&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22R%C3%BCcker%20G%22%5BAuthor%5D" style="border-bottom-color: initial; border-bottom-style: initial; border-bottom-width: 0px; color: #333333; text-decoration: underline;"&gt;Rücker G&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22Meringer%20M%22%5BAuthor%5D" style="border-bottom-color: initial; border-bottom-style: initial; border-bottom-width: 0px; color: #333333; text-decoration: underline;"&gt;Meringer M&lt;/a&gt;.&lt;/div&gt;&lt;div class="aff" style="font-size: 0.91666em; line-height: 1.0915em;"&gt;&lt;h3 class="label" style="font-size: 1em; height: 1px; left: -10000px; overflow-x: hidden; overflow-y: hidden; position: absolute; top: auto; width: 1px;"&gt;Source&lt;/h3&gt;&lt;div style="margin-bottom: 0.5em; margin-top: 0.5em;"&gt;Biozentrum, University of Basel, 4056 Basel, Switzerland. christoph.ruecker@uni-bayreuth.de&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-2061308551001595340?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/2061308551001595340/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=2061308551001595340' title='0 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/2061308551001595340'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/2061308551001595340'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2011/09/comprehensive-and-detailed-essay-about.html' title=''/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-3399505079917021960</id><published>2011-07-20T07:26:00.000-07:00</published><updated>2011-07-20T07:26:31.485-07:00</updated><title type='text'>Cheminformatics</title><content type='html'>Cheminformatics for development of new materials (by Ricardo Stefani - in portuguese)&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;div id="__ss_7337220" style="width: 425px;"&gt;&lt;strong style="display: block; margin: 12px 0 4px;"&gt;&lt;a href="http://www.slideshare.net/ricstefani/quimioinformatica" target="_blank" title="Quimioinformatica"&gt;Quimioinformatica&lt;/a&gt;&lt;/strong&gt; &lt;iframe frameborder="0" height="355" marginheight="0" marginwidth="0" scrolling="no" src="http://www.slideshare.net/slideshow/embed_code/7337220" width="425"&gt;&lt;/iframe&gt; &lt;br /&gt;&lt;div style="padding: 5px 0 12px;"&gt;View more &lt;a href="http://www.slideshare.net/" target="_blank"&gt;presentations&lt;/a&gt; from &lt;a href="http://www.slideshare.net/ricstefani" target="_blank"&gt;Ricardo Stefani&lt;/a&gt; &lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;div style="width: 425px;"&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-3399505079917021960?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/3399505079917021960/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=3399505079917021960' title='0 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/3399505079917021960'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/3399505079917021960'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2011/07/cheminformatics.html' title='Cheminformatics'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-6434410854595729161</id><published>2011-07-19T05:32:00.000-07:00</published><updated>2011-07-19T05:32:33.994-07:00</updated><title type='text'>comp.ai.neural-nets FAQ, Part 1 of 7: Introduction</title><content type='html'>&lt;div&gt;Neural Networks FAQ&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;a href="http://www.faqs.org/faqs/ai-faq/neural-nets/part1/preamble.html"&gt;comp.ai.neural-nets FAQ, Part 1 of 7: Introduction&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-6434410854595729161?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.faqs.org/faqs/ai-faq/neural-nets/part1/preamble.html' title='comp.ai.neural-nets FAQ, Part 1 of 7: Introduction'/><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/6434410854595729161/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=6434410854595729161' title='0 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/6434410854595729161'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/6434410854595729161'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2011/07/compaineural-nets-faq-part-1-of-7.html' title='comp.ai.neural-nets FAQ, Part 1 of 7: Introduction'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-4863887928702252531</id><published>2011-06-10T10:39:00.000-07:00</published><updated>2011-06-10T10:42:41.972-07:00</updated><title type='text'>The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models - Tropsha - 2003 - QSAR &amp; Combinatorial Science - Wiley Online Library</title><content type='html'>The Importance of Being Earnest:&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Alexander Tropsha1,†, Paola Gramatica2, Vijay K. Gombar3&lt;br /&gt;Article first published online: 16 APR 2003&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, 'Lucida Grande', Geneva, Verdana, Helvetica, sans-serif; font-size: 10px; line-height: 10px;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;h3 style="background-attachment: initial; background-clip: initial; background-color: transparent; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; color: black; font-size: 1.8em; line-height: 21px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; vertical-align: baseline;"&gt;Abstract&lt;/h3&gt;&lt;div class="para" style="background-attachment: initial; background-clip: initial; background-color: transparent; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; clear: both; font-size: 10px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; vertical-align: baseline;"&gt;&lt;div style="background-attachment: initial; background-clip: initial; background-color: transparent; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; vertical-align: baseline;"&gt;This paper emphasizes the importance of rigorous validation as a crucial, integral component of Quantitative Structure Property Relationship (QSPR) model development. We consider some examples of published QSPR models, which in spite of their high fitted accuracy for the training sets and apparent mechanistic appeal, fail rigorous validation tests, and, thus, may lack practical utility as reliable screening tools. We present a set of simple guidelines for developing validated and predictive QSPR models. To this end, we discuss several validation strategies including (1) randomization of the modelled property, also called Y-scrambling, (2) multiple leave-many-out cross-validations, and (3) external validation using rational division of a dataset into training and test sets. We also highlight the need to establish the domain of model applicability in the chemical space to flag molecules for which predictions may be unreliable, and discuss some algorithms that can be used for this purpose. We advocate the broad use of these guidelines in the development of predictive QSPR models.&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;DOI: &lt;a href="http://onlinelibrary.wiley.com/doi/10.1002/qsar.200390007/abstract"&gt;10.1002/qsar.200390007&lt;/a&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #a3a2a2; font-family: Arial, 'Lucida Grande', Geneva, Verdana, Helvetica, sans-serif; font-size: 11px; line-height: 14px;"&gt;Copyright © 2003 WILEY-VCH Verlag GmbH &amp;amp; Co. KGaA, Weinheim&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-4863887928702252531?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/4863887928702252531/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=4863887928702252531' title='0 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/4863887928702252531'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/4863887928702252531'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2011/06/importance-of-being-earnest-validation.html' title='The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models - Tropsha - 2003 - QSAR &amp; Combinatorial Science - Wiley Online Library'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-1877992294483970641</id><published>2011-06-06T07:45:00.000-07:00</published><updated>2011-06-06T07:45:22.955-07:00</updated><title type='text'>Extreme Learning Machine Survey</title><content type='html'>&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-1877992294483970641?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.ntu.edu.sg/home/egbhuang/ELM-Survey.pdf' title='Extreme Learning Machine Survey'/><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/1877992294483970641/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=1877992294483970641' title='0 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/1877992294483970641'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/1877992294483970641'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2011/06/extreme-learning-machine-survey.html' title='Extreme Learning Machine Survey'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-1564052013997105580</id><published>2011-05-18T08:56:00.000-07:00</published><updated>2011-05-18T09:00:12.289-07:00</updated><title type='text'>A Risk Assessment Perspective of Current Practice in Characterizing Uncertainties in QSAR Regression Predictions - Sahlin - 2011 - Molecular Informatics - Wiley Online Library</title><content type='html'>&lt;a href="http://onlinelibrary.wiley.com/doi/10.1002/minf.201000177/abstract;jsessionid=8259590C553D52E0067BBC61038ABCE9.d03t04?systemMessage=Wiley+Online+Library+will+be+disrupted+21+May+from+10-12+BST+for+monthly+maintenance"&gt;A Risk Assessment Perspective of Current Practice in Characterizing Uncertainties in QSAR Regression Predictions - Sahlin - 2011 - Molecular Informatics - Wiley Online Library&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-1564052013997105580?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/1564052013997105580/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=1564052013997105580' title='0 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/1564052013997105580'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/1564052013997105580'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2011/05/risk-assessment-perspective-of-current.html' title='A Risk Assessment Perspective of Current Practice in Characterizing Uncertainties in QSAR Regression Predictions - Sahlin - 2011 - Molecular Informatics - Wiley Online Library'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-5236323045549821855</id><published>2011-03-29T08:04:00.001-07:00</published><updated>2011-03-29T08:07:44.515-07:00</updated><title type='text'>Best Practices for QSAR Modelling</title><content type='html'>&lt;p&gt;Interesting papers about the validation and predictivity of QSAR Models:&lt;/p&gt;  &lt;p&gt;&lt;a href="www.kubinyi.de/istanbul-2004-manuscript.pdf" target="_blank"&gt;Validation and Predictivity of QSAR Models&lt;/a&gt; - Hugo Kubinyi (&lt;a href="www.kubinyi.de/istanbul-2004-lecture.pdf" target="_blank"&gt;lecture slides&lt;/a&gt;)&lt;/p&gt;  &lt;p&gt;&lt;a href="http://onlinelibrary.wiley.com/doi/10.1002/minf.201000061/abstract" target="_blank"&gt;Best Practices for QSAR Model Development, Validation, and Exploitation&lt;/a&gt; - Alexander Tropsha - (2010), Best Practices for QSAR Model Development, Validation, and Exploitation. Molecular Informatics, 29: 476–488. doi: 10.1002/minf.201000061&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-5236323045549821855?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/5236323045549821855/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=5236323045549821855' title='0 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/5236323045549821855'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/5236323045549821855'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2011/03/best-practices-for-qsar-modelling.html' title='Best Practices for QSAR Modelling'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-3368356444787399407</id><published>2011-01-21T23:19:00.001-08:00</published><updated>2011-01-21T23:19:04.847-08:00</updated><title type='text'>Which MATLAB functions benefit from multithreaded computation?</title><content type='html'>&lt;p&gt;&lt;a title="http://www.mathworks.com/support/solutions/en/data/1-4PG4AN/?solution=1-4PG4AN" href="http://www.mathworks.com/support/solutions/en/data/1-4PG4AN/?solution=1-4PG4AN"&gt;http://www.mathworks.com/support/solutions/en/data/1-4PG4AN/?solution=1-4PG4AN&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-3368356444787399407?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/3368356444787399407/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=3368356444787399407' title='0 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/3368356444787399407'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/3368356444787399407'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2011/01/which-matlab-functions-benefit-from.html' title='Which MATLAB functions benefit from multithreaded computation?'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-7371667820925678828</id><published>2010-08-17T12:02:00.001-07:00</published><updated>2010-08-17T12:02:23.639-07:00</updated><title type='text'>GA Multiple response models</title><content type='html'>&lt;p&gt;“An important characteristic of the GA–VSS method is that a single model is not necessarily obtained but the result usually is a population of acceptable models; this characteristic, sometimes considered a disadvantage, provides an opportunity to make an evaluation of the relationships with the response from different points of view. A theoretical disadvantage is that the absolute best model could be not present in the final population. However, after a   &lt;br /&gt;careful selection of the best models, ! consensus analysis can be performed contemporarily using the selected models and estimating the response as weighted average of the responses of the single models.”&lt;/p&gt;  &lt;p&gt;Molecular Descriptors for Chemoinformatics,   &lt;br /&gt;Volumes I &amp;amp; II    &lt;br /&gt;Roberto Todeschini    &lt;br /&gt;Viviana Consonni&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-7371667820925678828?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/7371667820925678828/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=7371667820925678828' title='0 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/7371667820925678828'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/7371667820925678828'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2010/08/ga-multiple-response-models.html' title='GA Multiple response models'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-7455354832625865079</id><published>2009-10-14T11:30:00.000-07:00</published><updated>2009-10-14T11:30:14.004-07:00</updated><title type='text'>Data pre-processing - Normalization</title><content type='html'>Dogra, Shaillay K., "Normalization." From QSARWorld--A Strand Life Sciences Web Resource. &lt;br /&gt;http://www.qsarworld.com/qsar-statistics-normalization.php&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;&lt;span style="font-size: large;"&gt;Normalization&lt;/span&gt;&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Most of the computed descriptors differ in the scales in which their values lie. One may need to normalize them before proceeding with further statistical analysis. This mostly depends on the subsequent Machine Learning algorithms that one wants to run on the data. &lt;br /&gt;&lt;br /&gt;Algorithms like Decision Trees, Regression Forest, Decision Forest and Naïve Bayes do not require normalized data as input. For Linear Regression, normalization is a recommended step. For Neural Networks – classification or regression, Support Vector Machines – classification or regression, normalization of data is required.&lt;br /&gt;&lt;br /&gt;In context of cheminformatics, a standard way to normalize data is by mean shifting and auto-scaling. This makes the mean of a thus transformed descriptor column as 0 and the standard deviation as 1.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;&lt;span style="font-size: large;"&gt;Mean Shifting&lt;/span&gt;&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Most of the computed descriptors differ in the scales in which their values lie. One may thus want to normalize them before proceeding with further statistical analysis. As part of normalization, each value for a given descriptor (all values in a column) is adjusted or shifted by the mean value. As a result, the new mean value becomes 0. This happens for all the descriptors and they thus now have the same mean value 0. Hence, mean, as a measure of central location of the distribution of values, for all the descriptors, is now the same. However, the 'spread' or the 'variation' in the data, about the mean, is still the same as in the original data. This can now be taken care of by scaling the values with the standard deviation. &lt;br /&gt;&lt;br /&gt;This is best illustrated with an example. Consider these numbers: 1, 2, 3, 4, and 5. The total of these numbers is 15 and the mean is 3. Adjusting each value by the mean value gives the transformed numbers as: -2, -1, 0, 1, and 2. The new total is 0 and thus the new mean is 0. However, note that the standard deviation is still the same as original (√2). This can now be taken care of by scaling the values with standard deviation in order to make the new standard deviation as 1.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;strong&gt;Autoscaling&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For a given set of values, the standard deviation can be made to be unit by scaling (dividing) all the values by the original standard deviation. This is a standard step in normalization of data.&lt;br /&gt;&lt;br /&gt;Say, the values are - 1, 2, 3, 4 and 5. The standard deviation is √2. Now, dividing each value by the standard deviation gives us the transformed data as - 1/√2, √2, 3/√2, 2√2 and 5/√2. The new standard deviation for this set of values is 1.&lt;br /&gt;&lt;br /&gt;(The above principle is better demonstrated algebraically).&lt;br /&gt;&lt;br /&gt;A value x belonging to a distribution with mean 'x_mean' and standard deviation 's' can be transformed to a standard score, or z-score, in the following manner: &lt;br /&gt;&lt;br /&gt;z = (x - x_mean)/s&lt;br /&gt;&lt;br /&gt;The mean of standard scores is zero. When values are standardized, the units in which they are expressed are equal to the standard deviation, s. For the standardized scores, the standard deviation becomes 1. (Variance is also 1). The interpretation of the standard-score of a given value is in terms of the number of standard deviations the value is above or below the mean (of the distribution of standardized scores). &lt;br /&gt;&lt;br /&gt;So, the standardization of a set of values involves two steps. First, the mean is subtracted from every value, which shifts the central location of the distribution to 0. Then the thus mean-shifted values are divided by the standard deviation, s. This now makes the standard deviation as 1.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-7455354832625865079?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/7455354832625865079/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=7455354832625865079' title='1 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/7455354832625865079'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/7455354832625865079'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2009/10/data-pre-processing-normalization.html' title='Data pre-processing - Normalization'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-5007882771522727308</id><published>2009-10-14T06:53:00.000-07:00</published><updated>2009-10-15T10:11:58.503-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='parallel programming'/><title type='text'>Some Interesting Links about CUDA Programming</title><content type='html'>&lt;a href="http://llpanorama.wordpress.com/2008/04/24/getting-started-with-cuda/"&gt;Getting started with CUDA&lt;/a&gt;&lt;br /&gt;&lt;a href="http://llpanorama.wordpress.com/cuda-tutorial/"&gt;CUDA Tutorial&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.hpcwire.com/"&gt;HPCWire.com&lt;/a&gt;&lt;br /&gt;&lt;a href="http://multicoreinfo.com/"&gt;multicoreinfo.com&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;strong&gt;Videos&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Scalable Parallel Programming with CUDA on Manycore GPUs&lt;br /&gt;&lt;span style="font-size: x-small;"&gt;Stanford University Computer Systems Colloquium (EE 380).&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: x-small;"&gt;John Nickolls - NVIDIA&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;object height="285" style="clear: left; float: left;" width="340"&gt;&lt;param name="movie" value="http://www.youtube.com/v/nlGnKPpOpbE&amp;hl=es&amp;fs=1&amp;rel=0&amp;border=1"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/nlGnKPpOpbE&amp;hl=es&amp;fs=1&amp;rel=0&amp;border=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="340" height="285"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-5007882771522727308?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/5007882771522727308/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=5007882771522727308' title='0 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/5007882771522727308'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/5007882771522727308'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2009/10/some-interesting-links-about-cuda.html' title='Some Interesting Links about CUDA Programming'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-7279963322345788828</id><published>2009-10-14T05:56:00.000-07:00</published><updated>2009-10-14T05:56:00.925-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='QSAR'/><category scheme='http://www.blogger.com/atom/ns#' term='Descriptor-based similarity'/><category scheme='http://www.blogger.com/atom/ns#' term='Tanimoto index'/><category scheme='http://www.blogger.com/atom/ns#' term='Model Applicability'/><category scheme='http://www.blogger.com/atom/ns#' term='QSPR'/><category scheme='http://www.blogger.com/atom/ns#' term='Structure-based similarity'/><category scheme='http://www.blogger.com/atom/ns#' term='Chemical Space'/><title type='text'>Model Applicability</title><content type='html'>Dogra, Shaillay K., "Model Applicability" From QSARWorld--A Strand Life Sciences Web Resource.&lt;br /&gt;http://www.qsarworld.com/insilico-chemistry-model-applicability.php&lt;br /&gt;&lt;br /&gt;When using a model for predicting the value(s) for some unknown compound(s), assessment of the applicability of the model, in context of the compound(s) under study, is necessary. This can be assessed with different approaches, all of which in some sense try to assess whether the structure-, chemical-, or descriptor-based properties of the ‘unknown’ compound lie in similar ‘space’ as those for the compounds that were part of the training set used for building the model. This is an issue because the basic assumption of QSAR modeling is that similar compounds have similar activity/property and hence, given an unknown compound, we shall be able to predict its activity/property with confidence if it is ‘similar’ to the compounds that were used for building the model.&lt;br /&gt;&lt;br /&gt;Whether the compound(s) under study is ‘similar’ to the training set compounds can be assessed in various ways:&lt;br /&gt;&lt;br /&gt;1) Structure-based similarity: Tanimoto coefficient values, obtained by comparing MACCS fingerprints, can be used to assess structural similarity. If any of the training set compound has a Tanimoto coefficient value &amp;gt; 0.85 when compared against the compound under study, the same can be taken as an indication of high structural similarity and it can be believed that the given model is applicable for this case.&lt;br /&gt;&lt;br /&gt;2)Descriptor-based similarity: Similarity of the compound under study against the compounds in the training set can also be estimated by computing the distances (Euclidean) of the descriptors, that were used in training the model, between the unknown compound(s) and the training set compounds. This distance should lie between 0 to ∞ and possibly, the lesser the distance the better it is.&lt;br /&gt;&lt;br /&gt;3) Chemical Space: Comparing the ‘chemical space’ of the ‘unknown’ compounds against the compounds in the training set (used for building the model) can be another way to assess model applicability. What can be done here is to run a Principal Components Analysis (PCA) on the descriptors used in the model, for both the training set and the ‘unknown’ compounds, and then launch a plot on the first two components. In the figure below, the training set compounds are shown in red while the ‘unknown’ compounds, for which the predictions need to be made, are depicted in green.  Thus, at a glance it can be visualized if the ‘unknown’ compounds belong to the same distribution or ‘space’ as the ones used for deriving the model (and decide for or against using the given model).&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_oVmBc-tEwdU/StXKIGpfpoI/AAAAAAAAI5E/Jl17e6i-Ptw/s1600-h/Model-Applicability.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_oVmBc-tEwdU/StXKIGpfpoI/AAAAAAAAI5E/Jl17e6i-Ptw/s320/Model-Applicability.jpg" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;4) Statistical Measures: The model can anyway be used for predicting the values for the ‘unknown’ compounds. The predicted values usually also have a measure of statistical significance associated with them. &lt;br /&gt;&lt;br /&gt;In case of regression models (prediction of a continuous value), this measure is in terms of standard error. A simple interpretation of the standard error is that, according to the model, the predicted value lies in an interval bound by +/- standard error with a 95% confidence. Say, the predicted value is x and the associated standard error is y, then the value is estimated to lie in x-y to x+y interval with a 95% confidence. (This however does not imply that there exists some interval wherein the confidence could be 100%).&lt;br /&gt;&lt;br /&gt;In case of classification models (prediction of a categorical value), the statistical significance is in terms of confidence measure. This lies in a 0-1 scale and can be interpreted as the % confidence that the underlying algorithm (in the model) has when it is predicting some given compound to belong to a particular class. Say, if an ‘unknown’ compound is called by the model to belong to a particular class, and the model associates a confidence measure of 0.90 with that prediction, this implies that the algorithm is 90% confident about making this prediction.  In other words, statistically, in the long run, when the algorithm makes large enough such predictions, 90% of them would turn out to be correct.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-7279963322345788828?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/7279963322345788828/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=7279963322345788828' title='0 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/7279963322345788828'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/7279963322345788828'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2009/10/model-applicability.html' title='Model Applicability'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_oVmBc-tEwdU/StXKIGpfpoI/AAAAAAAAI5E/Jl17e6i-Ptw/s72-c/Model-Applicability.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5484972306101057660.post-4402664324453628319</id><published>2008-10-13T18:46:00.000-07:00</published><updated>2008-10-13T22:38:55.751-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Definition'/><category scheme='http://www.blogger.com/atom/ns#' term='Cheminformatics'/><title type='text'>Cheminformatics - Definition</title><content type='html'>Cheminformatics (also known as chemoinformatics and chemical informatics) is the use of computer and informational techniques, applied to a range of problems in the field of chemistry. These in silico techniques are used in pharmaceutical companies in the process of drug discovery. These methods can also be used in chemical and allied industries in various other forms.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5484972306101057660-4402664324453628319?l=cheminformatics-qsar.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cheminformatics-qsar.blogspot.com/feeds/4402664324453628319/comments/default' title='Enviar comentarios'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5484972306101057660&amp;postID=4402664324453628319' title='0 comentarios'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/4402664324453628319'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5484972306101057660/posts/default/4402664324453628319'/><link rel='alternate' type='text/html' href='http://cheminformatics-qsar.blogspot.com/2008/10/cheminformatics-definition.html' title='Cheminformatics - Definition'/><author><name>Gustavo Vazquez</name><uri>http://www.blogger.com/profile/11930484901165691682</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='27' src='http://1.bp.blogspot.com/_oVmBc-tEwdU/StXZg4eqfII/AAAAAAAAI5Q/u7Kq3r_5LY8/S220/cara1.JPG'/></author><thr:total>0</thr:total></entry></feed>
