
  • 邓小龙,陈渊,谭泗桥,袁哲明.醇酚类化合物毒性的QSAR研究[J].环境科学学报,2016,36(12):4490-4499

  • 醇酚类化合物毒性的QSAR研究
  • QSAR study on toxicities of alcohol and phenol compounds
  • 基金项目:教育部博士点基金(No.20124320110002);长沙市科技计划项目(No.K1406018-21)
  • 作者
  • 单位
  • 邓小龙
  • 1 湖南农业大学湖南省作物种质创新与资源利用重点实验室, 长沙 410128;2 湖南农业大学植物病虫害生物学与防控湖南省重点实验室, 长沙 410128
  • 陈渊
  • 湖南农业大学湖南省作物种质创新与资源利用重点实验室, 长沙 410128
  • 谭泗桥
  • 湖南农业大学信息科学技术学院, 长沙 410128
  • 袁哲明
  • 1 湖南农业大学湖南省作物种质创新与资源利用重点实验室, 长沙 410128;2 湖南农业大学植物病虫害生物学与防控湖南省重点实验室, 长沙 410128
  • 摘要:化合物毒性与描述符通常呈现为非线性关系,量子化学计算的化合物分子描述符中包含诸多无关特征与冗余特征.最大相关最小冗余(mRMR)是应用较广泛的特征选择方法,但当前的mRMR对连续型因变量不适用,且存在相关性测度与冗余性测度不可比的缺陷.定量构效关系(QSAR)研究中因变量(毒性)与自变量(描述符)多为连续型变量,本文以非线性的距离相关系数(dCor)取代线性的Pearson相关系数(R),在非线性条件下实现了相关性测度与冗余性测度可比,由此提出了新的特征选择方法mRMR-dCor.3个醇酚类化合物毒性QSAR数据集的分析表明,基于mRMR-dCor选择特征的支持向量回归(SVR)模型独立预测Q2分别为0.954、0.941、0.981,明显优于参比模型与文献报道,mRMR-dCor选择的多数保留分子描述符得到文献报道支持.mRMR-dCor在化合物QSAR、定量构质关系等研究中有广泛应用前景.
  • Abstract:The toxicities and features of compounds are generally presented as a non-linear relationship. The compound molecular descriptors calculated by the quantum chemistry methods contain numerous irrelevant and redundant features. Although widely used, the current version of minimal redundancy maximal relevance (mRMR) feature selection method is not applicable for continuous dependent variable and the measurement of relevance and redundancy is incomparable. For quantitative structure-activity relationship (QSAR), both dependent variables (toxicities) and independent variables (molecular descriptors) are usually continuous. Therefore, we used distance correlation (dCor) to replace Pearson correlation coefficient (R) to solve the measurement comparability between relevance and redundancy, and developed a new feature selection method named mRMR-dCor by combining mRMR with dCor in this work. Based on the in-house feature selection method and support vector regression (SVR), the independent prediction results of three phenolic and alcohol compounds datasets indicated that mRMR-dCor was superior to other reference feature selection methods in the prediction performance, with Q2 of 0.954, 0.941 and 0.981, respectively. Most of molecular descriptors selected by mRMR-dCor were also reported in previous literatures. Therefore, mRMR-dCor has broad application prospects in various domains such as QSAR and quantitative structure-pharmacokinetics relationship.

  • 摘要点击次数: 1128 全文下载次数: 2916