molencversion
Molecular encoder/featurizer using rdkit and OCaml
Chemical fingerprints are lossy encodings of molecules. molenc allows to encode molecules using unfolded-counted fingerprints (i.e. a potentially very long but sparse vector of positive integers).
Currently, Faulon fingerprints are supported. In the future, atom pair fingerprints might be added. Currently, atom types are the quadruplet (#pi-electrons, element symbol, #HA neighbors, formal charge). In the future, pharmacophore features might be supported (a more abstract/fuzzy atom typing scheme).
Bibliography:
Carhart, R. E., Smith, D. H., & Venkataraghavan, R. (1985). Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences, 25(2), 64-73.
Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T., & Sheridan, R. P. (1996). Chemical similarity using physiochemical property descriptors. Journal of Chemical Information and Computer Sciences, 36(1), 118-127.
Faulon, J. L., Visco, D. P., & Pophale, R. S. (2003). The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. Journal of chemical information and computer sciences, 43(3), 707-720.
OpenSMILES specification. Craig A. James et. al. v1.0 2016-05-15. http://opensmiles.org/opensmiles.html
Author | Francois Berenger |
---|---|
License | BSD-3-Clause |
Published | |
Homepage | https://github.com/UnixJunkie/molenc |
Issue Tracker | https://github.com/UnixJunkie/molenc/issues |
Maintainer | unixjunkie@sdf.org |
Dependencies |
|
Source [http] | https://github.com/UnixJunkie/molenc/archive/v5.0.1.tar.gz sha256=e5e665156ce7a4bf7cea63d95f753ef328f9fbc0bce02170bc60ed10c0a3642a md5=c665b8e27de72f2b7ccf5f54d758ed28 |
Edit | https://github.com/ocaml/opam-repository/tree/master/packages/molenc/molenc.5.0.1/opam |