Designing Optical Molecules With Generative AI
In previous covered prediction of UV/Vis spectra for synthetic molecules (e.g. nano particles). Simulations and machine learning can help with that. The input the molecule, the output is the UV/Vis spectrum.
We can reverse that process and design molecules that have desired optical properties. The input are the desired properties, e.g. a specific wavelength of absorption or emission, and the output is a molecule that has these properties.
Figure 1 illustrates both approaches to computational molecular design.
graph TD A[Molecule<br/>Structure] --> B[Simulations/<br/>ML Model] B --> C[UV/Vis<br/>Spectrum] style A fill:#1f77b4,stroke:#333,stroke-width:2px,color:#fff style B fill:#e6f3ff,stroke:#333,stroke-width:2px style C fill:#ff7f0e,stroke:#333,stroke-width:2px,color:#fff
graph TD A[Desired Optical<br/>Properties] --> B[Generative<br/>AI Model] B --> C[Molecule<br/>Structure] style A fill:#ff7f0e,stroke:#333,stroke-width:2px,color:#fff style B fill:#e6f3ff,stroke:#333,stroke-width:2px style C fill:#1f77b4,stroke:#333,stroke-width:2px,color:#fff

Organic molecules with tailored properties
Today I read “Generative Deep Learning-Based Efficient Design of Organic Molecules with Tailored Properties” by Minhi Han, Joonyoung F. Joung, Minseok Jeong, Dong Hoon Choi, and Sungnam Park.
Their aim is to develop new molecules with specific optical properties, such as fluorescence and phosphorescence, which are crucial for applications in organic light-emitting diodes (OLEDs), organic solar cells (OSCs), and organic photodetectors (OPDs). There are also countless applications in biological imaging, sensors, and other fields.
Application areas
Organic molecules with tailored optical properties find applications across fields. The needs for specific optical properties vary by application.
Organic Electronics
- OLEDs (Organic Light-Emitting Diodes): Require materials with high photoluminescence quantum yields (\(\Phi\)) and narrow emission bands for high-efficiency, color-pure displays
- OPVs (Organic Photovoltaics): Need broad absorption spectra and high extinction coefficients (\(\varepsilon\)) to maximize light harvesting efficiency in solar cells
- OPDs (Organic Photodetectors): Benefit from materials with specific absorption wavelengths and fast response times for sensors and imaging devices
Biomedical Applications
- Fluorescence Microscopy: NIR fluorophores enable deep tissue imaging with minimal autofluorescence interference
- In Vivo Imaging: Molecules with large Stokes shifts reduce self-quenching and improve signal-to-noise ratios
- Photodynamic Therapy: Photosensitizers with specific absorption wavelengths for targeted cancer treatment
- Biosensors: Fluorescent probes that respond to specific biological conditions or molecules
Advanced Materials
- Laser Dyes: Materials with narrow emission bands and high quantum yields for tunable laser applications
- Optical Storage: Photochromic materials that change properties upon light exposure
- Nonlinear Optics: Molecules with large nonlinear optical coefficients for frequency conversion and optical switching
The ability to computationally design molecules with specific optical properties accelerates the discovery of new materials for these applications, potentially reducing the time and cost of traditional trial-and-error approaches.
Re-inventing useful molecules
The authors demonstrate that their generative model can “re-invent” useful molecules by generating compounds with specific target properties that match known, practically useful molecules from the literature. This serves as validation that the model can generate meaningful molecules beyond its training data. Figure 2 shows an example of a molecule generated by their model alongside its predicted spectrum.
1. NIR fluorophores for high-resolution microscopy
Target properties: \(\lambda_{emi}\)1 = 760 nm, log \(\varepsilon\)2 = 5
The Gen-DL model generated a near-infrared (NIR) imaging dye with \(\lambda_{emi}\) = 766 nm and log \(\varepsilon\) = 5.34. This molecule was originally developed by Sletten and co-workers for high-resolution fluorescence microscopy applications. NIR fluorophores are particularly valuable for microscopy because they minimize autofluorescence interference and enable deeper tissue penetration.
2. Imaging dyes with large Stokes shifts
Target properties: \(\lambda_{abs}\)3 = 570 nm, \(\lambda_{emi}\) = 660 nm, Stokes shift = 90 nm, log \(\varepsilon\) = 5
The model generated a fluorophore originally developed by Ren and co-workers for fluorescence microscopy and in vivo imaging. The actual properties were \(\lambda_{abs}\) = 571 nm, \(\lambda_{emi}\) = 651 nm with a Stokes shift of 80 nm and log \(\varepsilon\) = 5. Large Stokes shifts are crucial for in vivo imaging as they reduce self-quenching and allow for better separation of excitation and emission light.
3. Narrowband emitters for OLEDs
Target properties: \(\lambda_{emi}\) = 520 nm, \(\sigma_{emi}\)4 = 1500 cm⁻¹, \(\Phi\)5 = 0.9
For OLED applications, the model generated a green emitter originally developed by Zhang and co-workers. The actual molecule showed \(\lambda_{emi}\) = 500 nm with a narrow bandwidth (\(\sigma_{emi}\) = 25 at 500 nm) and high photoluminescence quantum yield (\(\Phi\) = 0.887). Narrowband emitters are essential for achieving high color purity in OLED displays.
4. Organic photovoltaic materials
Target properties: \(\lambda_{abs}\) = 500 nm, \(\sigma_{abs}\)6 = 4200 cm⁻¹, log \(\varepsilon\) = 5
The model generated a molecule originally developed by Liang and co-workers for small molecular photovoltaic applications. The actual properties were \(\lambda_{abs}\) = 569 nm, \(\sigma_{abs}\) = 4031 cm⁻¹ (∼120 nm at 570 nm), and log \(\varepsilon\) = 4.74. Broad absorption spectra with large extinction coefficients are crucial for efficient light harvesting in organic solar cells.
Dataset
The authors draw on their previous work published in Nature Joung et al. (2020), which provides a comprehensive dataset of optical properties of organic molecules in various solvents.
They downloaded a total of 1,358 articles containing organic compounds were downloaded from journals of Nature Research, American Chemical Society, Royal Society of Chemistry, Springer, and Elsevier by exploring keywords such as fluorescence, luminescence, emission, OLED, fluorescence lifetime, or PLQY Joung et al. (2020).
They obtained a dataset of 71424 molecule/solvent pairs. Absorption and emmision spectra in those publications were extracted, checked for quality and used to derive the optical properties of the molecules. Molecular structures are reported as SMILES strings for both the molecules and the solvents.
The following properties were derived from the spectra:
No. | Column name | Unit | Data type | Description |
---|---|---|---|---|
1 | Tag | — | Float | The numbering of data points |
2 | Chromophore | — | String | SMILES of chromophore structure |
3 | Solvent | — | String | SMILES of solvent structure |
4 | Absorption max (nm) | nm | Float | Maximum absorption wavelength, λabs,max |
5 | Emission max (nm) | nm | Float | Maximum emission wavelength, λemi,max |
6 | Lifetime (ns) | ns | Float | Fluorescence lifetime, τflu |
7 | Quantum yield | — | Float | Photoluminescence quantum yield, ΦQY |
8 | log(e/mol-1 dm3 cm-1) | — | Float | Extinction coefficient at λabs,max, log (ε) |
9 | abs FWHM (cm-1) | cm⁻¹ | Float | Absorption bandwidth (FWHM), σabs |
10 | emi FWHM (cm-1) | cm⁻¹ | Float | Emission bandwidth (FWHM), σemi |
11 | abs FWHM (nm) | nm | Float | Absorption bandwidth (FWHM), σabs |
12 | emi FWHM (nm) | nm | Float | Emission bandwidth (FWHM), σemi |
13 | Molecular weight (g mol-1) | g mol⁻¹ | Float | Molecular weight of chromophore |
14 | Reference | — | String | Source document DOI |
Model
The model uses graph neural networks to generate the molecular structures. The molecules are generated in steps, starting from an initial seed molecule, refining it each time.
The algorithm uses a greedy search that selects the next atom to add based on the predicted properties, as illustrated in Figure 3.

The code is hosted on github under MIT licence.
An interactive version is available at http://deep4chem.korea.ac.kr/DeepMoleculeGen
Conclusion
Where large language model (LLM) are used to generate text, the authors use a graph neural network (GNN) to generate molecules. To obtain training data, the authors scraped the literature to extract optical properties of organic molecules and their structures.
It is interesting to see that this approach does produce meaningful molecules that match known compounds with useful properties.