Designing Optical Molecules With Generative AI

Published

July 16, 2025

In previous covered prediction of UV/Vis spectra for synthetic molecules (e.g. nano particles). Simulations and machine learning can help with that. The input the molecule, the output is the UV/Vis spectrum.

We can reverse that process and design molecules that have desired optical properties. The input are the desired properties, e.g. a specific wavelength of absorption or emission, and the output is a molecule that has these properties.

Figure 1 illustrates both approaches to computational molecular design.

graph TD
    A[Molecule<br/>Structure] --> B[Simulations/<br/>ML Model]
    B --> C[UV/Vis<br/>Spectrum]
    
    style A fill:#1f77b4,stroke:#333,stroke-width:2px,color:#fff
    style B fill:#e6f3ff,stroke:#333,stroke-width:2px
    style C fill:#ff7f0e,stroke:#333,stroke-width:2px,color:#fff

Forward process: predicting spectra from molecular structure

graph TD
    A[Desired Optical<br/>Properties] --> B[Generative<br/>AI Model]
    B --> C[Molecule<br/>Structure]
    
    style A fill:#ff7f0e,stroke:#333,stroke-width:2px,color:#fff
    style B fill:#e6f3ff,stroke:#333,stroke-width:2px
    style C fill:#1f77b4,stroke:#333,stroke-width:2px,color:#fff

Reverse process: generating molecules from desired properties

Figure 1: Computational molecular design workflows for optical properties
Figure 2: Design molecule with spectrum from Han et al. (2025)

Organic molecules with tailored properties

Today I read “Generative Deep Learning-Based Efficient Design of Organic Molecules with Tailored Properties” by Minhi Han, Joonyoung F. Joung, Minseok Jeong, Dong Hoon Choi, and Sungnam Park.

Their aim is to develop new molecules with specific optical properties, such as fluorescence and phosphorescence, which are crucial for applications in organic light-emitting diodes (OLEDs), organic solar cells (OSCs), and organic photodetectors (OPDs). There are also countless applications in biological imaging, sensors, and other fields.

Application areas

Organic molecules with tailored optical properties find applications across fields. The needs for specific optical properties vary by application.

Organic Electronics

  • OLEDs (Organic Light-Emitting Diodes): Require materials with high photoluminescence quantum yields (\(\Phi\)) and narrow emission bands for high-efficiency, color-pure displays
  • OPVs (Organic Photovoltaics): Need broad absorption spectra and high extinction coefficients (\(\varepsilon\)) to maximize light harvesting efficiency in solar cells
  • OPDs (Organic Photodetectors): Benefit from materials with specific absorption wavelengths and fast response times for sensors and imaging devices

Biomedical Applications

  • Fluorescence Microscopy: NIR fluorophores enable deep tissue imaging with minimal autofluorescence interference
  • In Vivo Imaging: Molecules with large Stokes shifts reduce self-quenching and improve signal-to-noise ratios
  • Photodynamic Therapy: Photosensitizers with specific absorption wavelengths for targeted cancer treatment
  • Biosensors: Fluorescent probes that respond to specific biological conditions or molecules

Advanced Materials

  • Laser Dyes: Materials with narrow emission bands and high quantum yields for tunable laser applications
  • Optical Storage: Photochromic materials that change properties upon light exposure
  • Nonlinear Optics: Molecules with large nonlinear optical coefficients for frequency conversion and optical switching

The ability to computationally design molecules with specific optical properties accelerates the discovery of new materials for these applications, potentially reducing the time and cost of traditional trial-and-error approaches.

Re-inventing useful molecules

The authors demonstrate that their generative model can “re-invent” useful molecules by generating compounds with specific target properties that match known, practically useful molecules from the literature. This serves as validation that the model can generate meaningful molecules beyond its training data. Figure 2 shows an example of a molecule generated by their model alongside its predicted spectrum.

1. NIR fluorophores for high-resolution microscopy

Target properties: \(\lambda_{emi}\)1 = 760 nm, log \(\varepsilon\)2 = 5

The Gen-DL model generated a near-infrared (NIR) imaging dye with \(\lambda_{emi}\) = 766 nm and log \(\varepsilon\) = 5.34. This molecule was originally developed by Sletten and co-workers for high-resolution fluorescence microscopy applications. NIR fluorophores are particularly valuable for microscopy because they minimize autofluorescence interference and enable deeper tissue penetration.

2. Imaging dyes with large Stokes shifts

Target properties: \(\lambda_{abs}\)3 = 570 nm, \(\lambda_{emi}\) = 660 nm, Stokes shift = 90 nm, log \(\varepsilon\) = 5

The model generated a fluorophore originally developed by Ren and co-workers for fluorescence microscopy and in vivo imaging. The actual properties were \(\lambda_{abs}\) = 571 nm, \(\lambda_{emi}\) = 651 nm with a Stokes shift of 80 nm and log \(\varepsilon\) = 5. Large Stokes shifts are crucial for in vivo imaging as they reduce self-quenching and allow for better separation of excitation and emission light.

3. Narrowband emitters for OLEDs

Target properties: \(\lambda_{emi}\) = 520 nm, \(\sigma_{emi}\)4 = 1500 cm⁻¹, \(\Phi\)5 = 0.9

For OLED applications, the model generated a green emitter originally developed by Zhang and co-workers. The actual molecule showed \(\lambda_{emi}\) = 500 nm with a narrow bandwidth (\(\sigma_{emi}\) = 25 at 500 nm) and high photoluminescence quantum yield (\(\Phi\) = 0.887). Narrowband emitters are essential for achieving high color purity in OLED displays.

4. Organic photovoltaic materials

Target properties: \(\lambda_{abs}\) = 500 nm, \(\sigma_{abs}\)6 = 4200 cm⁻¹, log \(\varepsilon\) = 5

The model generated a molecule originally developed by Liang and co-workers for small molecular photovoltaic applications. The actual properties were \(\lambda_{abs}\) = 569 nm, \(\sigma_{abs}\) = 4031 cm⁻¹ (∼120 nm at 570 nm), and log \(\varepsilon\) = 4.74. Broad absorption spectra with large extinction coefficients are crucial for efficient light harvesting in organic solar cells.

Dataset

The authors draw on their previous work published in Nature Joung et al. (2020), which provides a comprehensive dataset of optical properties of organic molecules in various solvents.

They downloaded a total of 1,358 articles containing organic compounds were downloaded from journals of Nature Research, American Chemical Society, Royal Society of Chemistry, Springer, and Elsevier by exploring keywords such as fluorescence, luminescence, emission, OLED, fluorescence lifetime, or PLQY Joung et al. (2020).

They obtained a dataset of 71424 molecule/solvent pairs. Absorption and emmision spectra in those publications were extracted, checked for quality and used to derive the optical properties of the molecules. Molecular structures are reported as SMILES strings for both the molecules and the solvents.

The following properties were derived from the spectra:

No. Column name Unit Data type Description
1 Tag Float The numbering of data points
2 Chromophore String SMILES of chromophore structure
3 Solvent String SMILES of solvent structure
4 Absorption max (nm) nm Float Maximum absorption wavelength, λabs,max
5 Emission max (nm) nm Float Maximum emission wavelength, λemi,max
6 Lifetime (ns) ns Float Fluorescence lifetime, τflu
7 Quantum yield Float Photoluminescence quantum yield, ΦQY
8 log(e/mol-1 dm3 cm-1) Float Extinction coefficient at λabs,max, log (ε)
9 abs FWHM (cm-1) cm⁻¹ Float Absorption bandwidth (FWHM), σabs
10 emi FWHM (cm-1) cm⁻¹ Float Emission bandwidth (FWHM), σemi
11 abs FWHM (nm) nm Float Absorption bandwidth (FWHM), σabs
12 emi FWHM (nm) nm Float Emission bandwidth (FWHM), σemi
13 Molecular weight (g mol-1) g mol⁻¹ Float Molecular weight of chromophore
14 Reference String Source document DOI

Model

The model uses graph neural networks to generate the molecular structures. The molecules are generated in steps, starting from an initial seed molecule, refining it each time.

The algorithm uses a greedy search that selects the next atom to add based on the predicted properties, as illustrated in Figure 3.

Figure 3: Greedy search over molecules supported by graph neural network

The code is hosted on github under MIT licence.

An interactive version is available at http://deep4chem.korea.ac.kr/DeepMoleculeGen

Conclusion

Where large language model (LLM) are used to generate text, the authors use a graph neural network (GNN) to generate molecules. To obtain training data, the authors scraped the literature to extract optical properties of organic molecules and their structures.

It is interesting to see that this approach does produce meaningful molecules that match known compounds with useful properties.

References

Han, Minhi, Joonyoung F. Joung, Minseok Jeong, Dong Hoon Choi, and Sungnam Park. 2025. “Generative Deep Learning-Based Efficient Design of Organic Molecules with Tailored Properties.” ACS Central Science 11 (2): 219–27. https://doi.org/10.1021/acscentsci.4c00656.
Joung, Joonyoung F., Minhi Han, Minseok Jeong, and Sungnam Park. 2020. “Experimental Database of Optical Properties of Organic Compounds.” Scientific Data 7 (1): 295. https://doi.org/10.1038/s41597-020-00634-8.

Footnotes

  1. \(\lambda_{emi}\): emission wavelength↩︎

  2. \(\varepsilon\): extinction coefficient↩︎

  3. \(\lambda_{abs}\): absorption wavelength↩︎

  4. \(\sigma_{emi}\): emission bandwidth (FWHM)↩︎

  5. \(\Phi\): photoluminescence quantum yield↩︎

  6. \(\sigma_{abs}\): absorption bandwidth (FWHM)↩︎