AI.news
主页教程研究工具模型AI创业讨论新闻WIKI🚀 创业库★ 投稿
AI+医疗机器人教育金融能源健康娱乐思考

FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning

arxiv.org
分享到

View PDF HTML (experimental)

Abstract:Molecular representation learning methods typically tokenize molecules as individual atoms or use rigid, rule-based fragment decompositions, limiting their ability to capture meaningful chemical substructure context. We introduce FragmentNet, a graph-to-sequence model built around a novel adaptive, learned tokenizer that decomposes molecular graphs into chemically valid fragments of adjustable granularity, complemented by chemically aware spatial positional encodings that preserve molecular topology in the resulting sequence. Extending masked pre-training strategies from natural language processing to the molecular domain, we mask and reconstruct molecules at the level of chemically meaningful fragments rather than individual atoms. Evaluating across multiple property prediction benchmarks, we find that pre-training at fragment granularity leads to improved downstream performance on the majority of tasks, demonstrating that tokenization granularity is an important design choice for molecular representation learning.

Submission history

From: Ankur Samanta [view email]
[v1] Mon, 3 Feb 2025 09:21:49 UTC (24,068 KB)
[v2] Mon, 25 May 2026 05:20:26 UTC (11,891 KB)