Footnotes
-
基于数据异质性的本地更新对分散式学习的有效性研究
The Effectiveness of Local Updates for Decentralized Learning under Data Heterogeneity
https://arxiv.org/abs/2403.15654
通过在分散式学习中引入多个本地更新步骤,可以降低通信复杂度,从而在数据异质性低且网络高度连通时有效降低通信成本。
我们重新审视了两种基本的分散式优化方法,即Decentralized Gradient Tracking (DGT) 和 Decentralized Gradient Descent (DGD),并引入了多个本地更新步骤。我们考虑了两种情境,并且证明了加入 $K > 1$ 个本地更新步骤能够降低通信复杂度。具体而言,对于 $\mu$-强凸和 $L$-光滑损失函数,我们证明了本地 DGT 方法实现了通信复杂度为 $\tilde{\mathcal{O}} \Big(\frac{L}{\mu K} + \frac{\delta}{\mu (1 - \rho)} + \frac{\rho }{(1 - \rho)^2} \cdot \frac{L+ \delta}{\mu}\Big)$,其中 $\rho$ 衡量网络连通性,$\delta$ 表示本地损失的二阶异质性。我们的结果揭示了通信和计算之间的权衡,并表明在数据异质性低且网络高度连通时,增加 $K$ 能有效降低通信成本。
arXiv:2403.15654v1 Announce Type: new Abstract: We revisit two fundamental decentralized optimization methods, Decentralized Gradient Tracking (DGT) and Decentralized Gradient Descent (DGD), with multiple local updates. We consider two settings and demonstrate that incorporating $K > 1$ local update steps can reduce communication complexity. Specifically, for $\mu$-strongly convex and $L$-smooth loss functions, we proved that local DGT achieves communication complexity $\tilde{\mathcal{O}} \Big(\frac{L}{\mu K} + \frac{\delta}{\mu (1 - \rho)} + \frac{\rho }{(1 - \rho)^2} \cdot \frac{L+ \delta}{\mu}\Big)$, where $\rho$ measures the network connectivity and $\delta$ measures the second-order heterogeneity of the local loss. Our result reveals the tradeoff between communication and computation and shows increasing $K$ can effectively reduce communication costs when the data heterogeneity is low and the network is well-connected. We then consider the over-parameterization regime where the ↩
-
ICLN:输入凸损失网络用于决策集中学习
ICLN: Input Convex Loss Network for Decision Focused Learning
https://arxiv.org/abs/2403.01875
提出了输入凸损失网络(ICLN),通过输入凸神经网络学习任务损失,为决策集中学习提供了全局替代损失。
在不确定性条件下的决策问题中,预测未知参数通常被认为与优化部分无关。决策集中学习(DFL)是一个面向任务的框架,通过调整预测模型以为相应任务提供更好的决策来整合预测和优化。本文提出了输入凸损失网络(ICLN),这是一种新颖的全局替代损失,可以在一般的DFL范式中实现。ICLN通过输入凸神经网络学习任务损失,已经被保证为某些情况下是凸的。
arXiv:2403.01875v1 Announce Type: cross Abstract: In decision-making problem under uncertainty, predicting unknown parameters is often considered independent of the optimization part. Decision-focused Learning (DFL) is a task-oriented framework to integrate prediction and optimization by adapting predictive model to give better decision for the corresponding task. Here, an inevitable challenge arises when computing gradients of the optimal decision with respect to the parameters. Existing researches cope this issue by smoothly reforming surrogate optimization or construct surrogate loss function that mimic task loss. However, they are applied to restricted optimization domain or build functions in a local manner leading a large computational time. In this paper, we propose Input Convex Loss Network (ICLN), a novel global surrogate loss which can be implemented in a general DFL paradigm. ICLN learns task loss via Input Convex Neural Networks which is guaranteed to be convex for some in ↩
-
通过深度学习推动增材制造:当前进展和未来挑战的综合评述
Advancing Additive Manufacturing through Deep Learning: A Comprehensive Review of Current Progress and Future Challenges
https://arxiv.org/abs/2403.00669
深度学习在增材制造领域显示出巨大潜力,能够克服高维数据的复杂挑战,推动该领域不断发展。
增材制造(AM)已经被证明是广泛使用的减少制造的潜在替代品,因为其在最小材料浪费的情况下制造高度定制产品的能力。然而,由于包括复杂和动态过程相互作用在内的一些主要固有挑战,即使使用传统的机器学习,有时也难以完全理解,因为涉及到高维数据,如图像、点云和体素。然而,最近出现的深度学习(DL)在克服许多这些挑战方面显示出了巨大的潜力,因为DL能够自动从高维数据中捕捉复杂关系,而无需手工制作特征提取。因此,AM和DL交叉领域的研究量每年呈指数增长,这可能会将增材制造推向更广阔的应用领域。
arXiv:2403.00669v1 Announce Type: new Abstract: Additive manufacturing (AM) has already proved itself to be the potential alternative to widely-used subtractive manufacturing due to its extraordinary capacity of manufacturing highly customized products with minimum material wastage. Nevertheless, it is still not being considered as the primary choice for the industry due to some of its major inherent challenges, including complex and dynamic process interactions, which are sometimes difficult to fully understand even with traditional machine learning because of the involvement of high-dimensional data such as images, point clouds, and voxels. However, the recent emergence of deep learning (DL) is showing great promise in overcoming many of these challenges as DL can automatically capture complex relationships from high-dimensional data without hand-crafted feature extraction. Therefore, the volume of research in the intersection of AM and DL is exponentially growing each year which ma ↩
-
扩散模型中的相变揭示了数据的分层性质
A Phase Transition in Diffusion Models Reveals the Hierarchical Nature of Data
https://arxiv.org/abs/2402.16991
扩散模型在研究数据的分层生成模型中展示出了在阈值时间发生相变的特性,这影响了高级特征和低级特征的重建过程。
理解真实数据的结构在推动现代深度学习方法方面至关重要。自然数据,如图像,被认为是由以层次和组合方式组织的特征组成的,神经网络在学习过程中捕捉到这些特征。最近的进展显示,扩散模型能够生成高质量的图像,暗示了它们捕捉到这种潜在结构的能力。我们研究了数据的分层生成模型中的这一现象。我们发现,在时间$t$后作用的反向扩散过程受到某个阈值时间处的相变控制,此时重建高级特征(如图像的类别)的概率突然下降。相反,低级特征(如图像的具体细节)的重建在整个扩散过程中平稳演变。这一结果暗示,在超出转变时间的时刻,类别已变化,但是基
arXiv:2402.16991v1 Announce Type: cross Abstract: Understanding the structure of real data is paramount in advancing modern deep-learning methodologies. Natural data such as images are believed to be composed of features organised in a hierarchical and combinatorial manner, which neural networks capture during learning. Recent advancements show that diffusion models can generate high-quality images, hinting at their ability to capture this underlying structure. We study this phenomenon in a hierarchical generative model of data. We find that the backward diffusion process acting after a time $t$ is governed by a phase transition at some threshold time, where the probability of reconstructing high-level features, like the class of an image, suddenly drops. Instead, the reconstruction of low-level features, such as specific details of an image, evolves smoothly across the whole diffusion process. This result implies that at times beyond the transition, the class has changed but the gene ↩
-
通过动态学习器追踪概率变化
Tracking Changing Probabilities via Dynamic Learners
https://arxiv.org/abs/2402.10142
该论文介绍了通过动态学习器追踪概率变化的方法,通过输出候选项目及其概率来预测离散项目序列中下一个可能出现的项目。
考虑一个预测器,即一个学习器,其输入是一系列离散项目。预测器的任务是在每个时间点进行概率多类别预测,即通过输出有零个或多个候选项目及其概率来预测接下来可能发生的项目,然后揭示实际项目并从中学习。为了输出概率,预测器会跟踪其所见项目的比例。预测器具有恒定(有限)的空间,我们寻求高效的预测和更新技术:流是无界的,项目的集合对预测器是未知的,它们的总数也可能无限增长。此外,存在非平稳性:项目的潜在频率可能会不时发生显著变化。例如,新项目可能开始出现,一些当前频繁出现的项目可能再次停止出现。由于有空间限制,预测器只需要提供概率。
arXiv:2402.10142v1 Announce Type: cross Abstract: Consider a predictor, a learner, whose input is a stream of discrete items. The predictor's task, at every time point, is probabilistic multiclass prediction, i.e., to predict which item may occur next by outputting zero or more candidate items, each with a probability, after which the actual item is revealed and the predictor learns from this observation. To output probabilities, the predictor keeps track of the proportions of the items it has seen. The predictor has constant (limited) space and we seek efficient prediction and update techniques: The stream is unbounded, the set of items is unknown to the predictor and their totality can also grow unbounded. Moreover, there is non-stationarity: the underlying frequencies of items may change, substantially, from time to time. For instance, new items may start appearing and a few currently frequent items may cease to occur again. The predictor, being space-bounded, need only provide pro ↩
-
通过提示的上下文向量检测钓鱼网络攻击
Prompted Contextual Vectors for Spear-Phishing Detection
https://arxiv.org/abs/2402.08309
通过新的文档向量化方法,我们的方法使用大型语言模型来检测钓鱼网络攻击的电子邮件,并在实验证明具有高效性能。
钓鱼网络攻击是一个重大的安全挑战,而大型语言模型(LLMs)通过生成令人信服的电子邮件并方便目标侦察来升级了威胁。为了解决这个问题,我们提出了一种基于新颖文档向量化方法的检测方法,该方法利用LLMs的集合来创建表示向量。通过提示LLMs来推理和回答人工制定的问题,我们量化电子邮件内容中常见说服原则的存在,为下游监督机器学习模型生成提示上下文文档向量。我们使用一个专有系统生成的独特数据集来评估我们的方法,该系统自动化目标侦察和钓鱼电子邮件的创建。我们的方法在仅包含传统钓鱼和良性电子邮件的训练集中实现了91%的F1得分,其中关键贡献包括一种创新的文档向量化方法。
Spear-phishing attacks present a significant security challenge, with large language models (LLMs) escalating the threat by generating convincing emails and facilitating target reconnaissance. To address this, we propose a detection approach based on a novel document vectorization method that utilizes an ensemble of LLMs to create representation vectors. By prompting LLMs to reason and respond to human-crafted questions, we quantify the presence of common persuasion principles in the email's content, producing prompted contextual document vectors for a downstream supervised machine learning model. We evaluate our method using a unique dataset generated by a proprietary system that automates target reconnaissance and spear-phishing email creation. Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails, with the training set comprising only traditional phishing and benign emails. Key contributions include an innovative document vectorization method utilizin ↩
-
Bandit Convex Optimisation(强盗凸优化)
Bandit Convex Optimisation
https://arxiv.org/abs/2402.06535
这篇论文介绍了强盗凸优化的基本框架和用于解决这一问题的多种工具。虽然没有太多创新,但通过以新颖的方式应用现有工具,获得了新的算法和改进了一些界限。
强盗凸优化是研究零阶凸优化的基本框架。本文介绍了用于解决该问题的许多工具,包括切平面方法、内点方法、连续指数权重、梯度下降和在线牛顿步骤。解释了许多假设和设置之间的细微差别。尽管在这里没有太多真正新的东西,但一些现有工具以新颖的方式应用于获得新算法。一些界限稍微改进了一些。
Bandit convex optimisation is a fundamental framework for studying zeroth-order convex optimisation. These notes cover the many tools used for this problem, including cutting plane methods, interior point methods, continuous exponential weights, gradient descent and online Newton step. The nuances between the many assumptions and setups are explained. Although there is not much truly new here, some existing tools are applied in novel ways to obtain new algorithms. A few bounds are improved in minor ways. ↩
-
学习相互激励以实现手对手和人对人交互识别
Learning Mutual Excitation for Hand-to-Hand and Human-to-Human Interaction Recognition
https://arxiv.org/abs/2402.02431
本文介绍了一种学习相互激励的图卷积网络(me-GCN),用于手对手和人对人交互识别。通过堆叠相互激励图卷积层(me-GC),该网络能够自适应地建模成对实体之间的互相约束,并提取和合并深度特征。
识别交互动作,包括手对手交互和人对人交互,在视频分析和人机交互领域具有广泛的应用。考虑到图卷积在建模骨骼数据的拓扑感知特征方面的成功,最近的方法通常将图卷积应用于独立实体,并在交互动作识别时使用后期融合,这几乎无法建模成对实体之间的互相语义关系。为此,我们通过堆叠相互激励图卷积(me-GC)层,提出了一种相互激励图卷积网络(me-GCN)。具体来说,me-GC使用相互拓扑激励模块首先从单个实体中提取邻接矩阵,然后自适应地对它们之间的相互约束进行建模。此外,me-GC进一步使用相互特征激励模块从成对实体中提取和合并深度特征。
Recognizing interactive actions, including hand-to-hand interaction and human-to-human interaction, has attracted increasing attention for various applications in the field of video analysis and human-robot interaction. Considering the success of graph convolution in modeling topology-aware features from skeleton data, recent methods commonly operate graph convolution on separate entities and use late fusion for interactive action recognition, which can barely model the mutual semantic relationships between pairwise entities. To this end, we propose a mutual excitation graph convolutional network (me-GCN) by stacking mutual excitation graph convolution (me-GC) layers. Specifically, me-GC uses a mutual topology excitation module to firstly extract adjacency matrices from individual entities and then adaptively model the mutual constraints between them. Moreover, me-GC extends the above idea and further uses a mutual feature excitation module to extract and merge deep features from pairw ↩
-
克服通信约束,实现联邦学习中大型预训练模型的应用
Conquering the Communication Constraints to Enable Large Pre-Trained Models in Federated Learning
https://arxiv.org/abs/2210.01708
研究克服联邦学习中通信约束的方法,以实现强大的预训练模型在FL中的应用,并同时减少通信负担。
联邦学习(FL)已经成为一种旨在在本地设备上协力训练模型而不需要对原始数据进行中心化访问的有前景的范式。在典型的FL范式(例如FedAvg)中,每一轮模型权重都会被发送到参与客户端并回传到服务器。最近,在联邦学习优化和收敛改进方面展示了使用小型预训练模型是有效的。然而,最近的最先进预训练模型变得更加强大,但也拥有更多参数。在传统的FL中,共享巨大的模型权重可以迅速给系统带来巨大的通信负担,尤其是如果采用更加强大的模型。我们能否找到一个解决方案,在FL中启用这些强大且现成的预训练模型以实现出色性能的同时减少通信负担?为此,我们研究了使用参数高效的方法
arXiv:2210.01708v3 Announce Type: replace Abstract: Federated learning (FL) has emerged as a promising paradigm for enabling the collaborative training of models without centralized access to the raw data on local devices. In the typical FL paradigm (e.g., FedAvg), model weights are sent to and from the server each round to participating clients. Recently, the use of small pre-trained models has been shown effective in federated learning optimization and improving convergence. However, recent state-of-the-art pre-trained models are getting more capable but also have more parameters. In conventional FL, sharing the enormous model weights can quickly put a massive communication burden on the system, especially if more capable models are employed. Can we find a solution to enable those strong and readily-available pre-trained models in FL to achieve excellent performance while simultaneously reducing the communication burden? To this end, we investigate the use of parameter-efficient fin ↩
-
ProCNS: 用于弱监督医学图像分割的渐进式原型校准和噪声抑制
ProCNS: Progressive Prototype Calibration and Noise Suppression for Weakly-Supervised Medical Image Segmentation. (arXiv:2401.14074v1 [cs.CV])
http://arxiv.org/abs/2401.14074
ProCNS是一种用于弱监督医学图像分割的新方法,采用渐进式原型校准和噪声抑制的原则来解决现有方法中的问题。
弱监督分割(WSS)作为缓解注释成本和模型性能之间冲突的解决方案而出现,采用稀疏的注释格式(例如点、涂鸦、块等)。典型的方法试图利用解剖和拓扑先验将稀疏注释直接扩展为伪标签。然而,由于对医学图像中模糊边缘的关注不足和对稀疏监督的不充分探索,现有方法往往会在噪声区域生成错误且过于自信的伪建议,导致模型误差累积和性能下降。在这项工作中,我们提出了一种新颖的WSS方法,名为ProCNS,它包含两个协同模块,设计原则是渐进式原型校准和噪声抑制。具体而言,我们设计了一种基于原型的区域空间相似性(PRSA)损失函数,最大化空间和语义元素之间的成对相似度,为我们感兴趣的模型提供了
Weakly-supervised segmentation (WSS) has emerged as a solution to mitigate the conflict between annotation cost and model performance by adopting sparse annotation formats (e.g., point, scribble, block, etc.). Typical approaches attempt to exploit anatomy and topology priors to directly expand sparse annotations into pseudo-labels. However, due to a lack of attention to the ambiguous edges in medical images and insufficient exploration of sparse supervision, existing approaches tend to generate erroneous and overconfident pseudo proposals in noisy regions, leading to cumulative model error and performance degradation. In this work, we propose a novel WSS approach, named ProCNS, encompassing two synergistic modules devised with the principles of progressive prototype calibration and noise suppression. Specifically, we design a Prototype-based Regional Spatial Affinity (PRSA) loss to maximize the pair-wise affinities between spatial and semantic elements, providing our model of interest ↩
-
AgentBoard: 一种多轮LLM智能体的分析评估板
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents. (arXiv:2401.13178v1 [cs.CL])
http://arxiv.org/abs/2401.13178
AgentBoard是一个综合的基准测试和评估框架,专为分析评估LLM智能体而设计,解决了在多轮交互和部分可观察环境中对智能体性能进行基准测试的挑战,并提供了细粒度的进展率指标和评估工具包。
评估大型语言模型(LLM)作为通用智能体对于理解其能力并促进其融入实际应用至关重要。然而,评估过程面临重大挑战。主要障碍之一是在统一框架内对智能体在不同场景下的性能进行基准测试,特别是在维护部分可观察环境和确保多轮交互方面。此外,当前的评估框架主要关注最终成功率,过程中提供的见解很少,无法深入理解模型的能力。为了解决这些挑战,我们引入了AgentBoard,这是一个创新的综合基准和伴随的开源评估框架,专为LLM智能体的分析评估而设计。AgentBoard提供了一种细粒度的进展率指标,捕捉逐步的进展,以及一个综合的评估工具包,具有易于评估和分析模型能力的功能。
Evaluating large language models (LLMs) as general-purpose agents is essential for understanding their capabilities and facilitating their integration into practical applications. However, the evaluation process presents substantial challenges. A primary obstacle is the benchmarking of agent performance across diverse scenarios within a unified framework, especially in maintaining partially-observable environments and ensuring multi-round interactions. Moreover, current evaluation frameworks mostly focus on the final success rate, revealing few insights during the process and failing to provide a deep understanding of the model abilities. To address these challenges, we introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents. AgentBoard offers a fine-grained progress rate metric that captures incremental advancements as well as a comprehensive evaluation toolkit that features easy assess ↩
-
完美对齐可能对图形对比学习产生负面影响
Perfect Alignment May be Poisonous to Graph Contrastive Learning. (arXiv:2310.03977v1 [cs.LG])
http://arxiv.org/abs/2310.03977
本研究探讨了图形对比学习中增强方法和下游性能的关系,并发现图形对比学习主要通过分离不同类别的节点来为下游任务做出贡献。
图形对比学习旨在通过对齐正样本和分离负样本来学习节点表示。然而,在基于图形的学习中,对于特定增强方法背后的内在规律的研究有限。什么样的增强方法可以提高下游性能?对比学习如何实际影响下游任务?为什么增强的幅度很重要?本文试图通过建立增强方法和下游性能之间的联系,以及对对比学习的泛化性进行研究来回答这些问题。我们的发现表明,图形对比学习主要通过分离不同类别而不是聚集同一类别的节点来为下游任务做出贡献。因此,无法解释对比学习的成功,即全部样本完美对齐和增强重叠。为了理解增强如何辅助对比学习过程,我们进行了进一步的研究。
Graph Contrastive Learning (GCL) aims to learn node representations by aligning positive pairs and separating negative ones. However, limited research has been conducted on the inner law behind specific augmentations used in graph-based learning. What kind of augmentation will help downstream performance, how does contrastive learning actually influence downstream tasks, and why the magnitude of augmentation matters? This paper seeks to address these questions by establishing a connection between augmentation and downstream performance, as well as by investigating the generalization of contrastive learning. Our findings reveal that GCL contributes to downstream tasks mainly by separating different classes rather than gathering nodes of the same class. So perfect alignment and augmentation overlap which draw all intra-class samples the same can not explain the success of contrastive learning. Then in order to comprehend how augmentation aids the contrastive learning process, we conduct ↩
-
通过神经符号约束来调节基于评分的生成模型
Conditioning Score-Based Generative Models by Neuro-Symbolic Constraints. (arXiv:2308.16534v1 [cs.LG])
http://arxiv.org/abs/2308.16534
本文提出了一种方法,通过神经符号约束来调节基于评分的生成模型,实现了在非条件生成模型下强制执行任意的逻辑约束,从而获得了一个有效的、无需额外训练的条件采样算法。
基于评分和扩散模型已经成为一种有效的条件和非条件生成方法。然而,条件生成基于特定训练的条件模型或分类器指导,这需要训练一个噪声依赖的分类器,即使对于未损坏数据的分类器已经给出。我们提出了一种方法,可以从非条件评分生成模型中采样,可以强制执行任意的逻辑约束,而无需进行额外的训练。首先,我们展示了如何操纵学习得到的评分,以便在用户定义的约束条件下从非归一化分布中采样。然后,我们定义了一个灵活而数值稳定的神经符号框架,用于编码软逻辑约束。将这两个组成部分结合起来,我们获得了一个一般的但是近似的条件采样算法。我们进一步开发了有效的启发式方法来改进近似。最后,我们展示了我们方法的有效性。
Score-based and diffusion models have emerged as effective approaches for both conditional and unconditional generation. Still conditional generation is based on either a specific training of a conditional model or classifier guidance, which requires training a noise-dependent classifier, even when the classifier for uncorrupted data is given. We propose an approach to sample from unconditional score-based generative models enforcing arbitrary logical constraints, without any additional training. Firstly, we show how to manipulate the learned score in order to sample from an un-normalized distribution conditional on a user-defined constraint. Then, we define a flexible and numerically stable neuro-symbolic framework for encoding soft logical constraints. Combining these two ingredients we obtain a general, but approximate, conditional sampling algorithm. We further developed effective heuristics aimed at improving the approximation. Finally, we show the effectiveness of our approach fo ↩
-
在临床计算机断层扫描成像中优化卷积神经网络用于慢性阻塞性肺疾病检测
Optimizing Convolutional Neural Networks for Chronic Obstructive Pulmonary Disease Detection in Clinical Computed Tomography Imaging. (arXiv:2303.07189v2 [eess.IV] UPDATED)
http://arxiv.org/abs/2303.07189
本文旨在通过探索手动调整和自动化窗口设置优化,利用卷积神经网络在临床计算机断层扫描图像中检测慢性阻塞性肺疾病。研究结果表明,通过添加自定义层实现的自动化窗口设置优化可以改善检测性能。
目的:通过探索手动调整和自动化窗口设置优化,利用卷积神经网络(CNN)在肺部计算机断层扫描(CT)图像中检测慢性阻塞性肺疾病(COPD)的存在,来优化二进制COPD的检测。方法:回顾性选择了78名受试者(43名COPD患者;35名健康对照组)的7,194个CT图像(3,597个COPD;3,597个健康对照组)(2018年10月至2019年12月)。对每个图像,将强度值手动裁剪到肺气肿窗口设置和基准的“全范围”窗口设置。类平衡的训练、验证和测试集包含了3,392、1,114和2,688个图像。通过比较不同的CNN架构来优化网络主干。此外,还通过向模型添加自定义层来实现自动化的窗口设置优化。根据受试者工作特征曲线(ROC)下面积(AUC)的图像水平,计算出P值来评估性能。
Purpose: To optimize the binary detection of Chronic Obstructive Pulmonary Disease (COPD) based on emphysema presence in the lung with convolutional neural networks (CNN) by exploring manually adjusted versus automated window-setting optimization (WSO) on computed tomography (CT) images. Methods: 7,194 CT images (3,597 with COPD; 3,597 healthy controls) from 78 subjects (43 with COPD; 35 healthy controls) were selected retrospectively (10.2018-12.2019) and preprocessed. For each image, intensity values were manually clipped to the emphysema window setting and a baseline 'full-range' window setting. Class-balanced train, validation, and test sets contained 3,392, 1,114, and 2,688 images. The network backbone was optimized by comparing various CNN architectures. Furthermore, automated WSO was implemented by adding a customized layer to the model. The image-level area under the Receiver Operating Characteristics curve (AUC) [lower, upper limit 95% confidence] and P-values calculated from ↩
-
超几何表征学习的数值稳定性
The Numerical Stability of Hyperbolic Representation Learning. (arXiv:2211.00181v2 [cs.LG] UPDATED)
http://arxiv.org/abs/2211.00181
本文研究了超几何表征学习中的数值不稳定性问题,比较了两种流行的超几何模型Poincar'e球和Lorentz模型,发现Lorentz模型具有更好的数值稳定性和优化性能,同时提出一种新的欧几里得优化方案作为超几何学习的另一个选择。
由于超球的容量随半径的指数增长,超几何空间能够将具有层次结构的数据集嵌入其中而不失真。然而,这种指数增长的性质常常导致数值不稳定性,使得训练超几何学习模型有时会导致灾难性的NaN问题和浮点算术中遇到无法表示的值。在本文中,我们对两种广泛使用的超几何模型——Poincar'e球和Lorentz模型的局限性进行了仔细的分析。我们首先展示了,在64位算术系统下,Poincar'e球相对于Lorentz模型具有更大的能力来正确表示点。然后,我们从优化的角度理论上验证了Lorentz模型优于Poincar'e球的优越性。鉴于两种模型的数值限制,我们确定一种欧几里得优化方案,在Poincar'e球和Lorentz模型之外为超几何学习提供了一种新的方案。
Given the exponential growth of the volume of the ball w.r.t. its radius, the hyperbolic space is capable of embedding trees with arbitrarily small distortion and hence has received wide attention for representing hierarchical datasets. However, this exponential growth property comes at a price of numerical instability such that training hyperbolic learning models will sometimes lead to catastrophic NaN problems, encountering unrepresentable values in floating point arithmetic. In this work, we carefully analyze the limitation of two popular models for the hyperbolic space, namely, the Poincar'e ball and the Lorentz model. We first show that, under the 64 bit arithmetic system, the Poincar'e ball has a relatively larger capacity than the Lorentz model for correctly representing points. Then, we theoretically validate the superiority of the Lorentz model over the Poincar'e ball from the perspective of optimization. Given the numerical limitations of both models, we identify one Eucli ↩
-
利用特征空间中的领域约束以增强Android恶意软件检测的鲁棒性:通过RealAEs升级
Level Up with RealAEs: Leveraging Domain Constraints in Feature Space to Strengthen Robustness of Android Malware Detection. (arXiv:2205.15128v3 [cs.LG] UPDATED)
http://arxiv.org/abs/2205.15128
本文提出了一种在特征空间中生成RealAEs的解决方案,该方法通过解释Android领域约束为在特征空间中的边界来实现。实验结果表明,这种方法提高了检测模型的鲁棒性。
针对基于机器学习的Android恶意软件检测面临的对抗示例容易受攻击的问题,本文提出了一种新的解决方案——在特征空间中生成领域约束下的可行对抗样本(RealAEs)。 在现实攻击下,RealAEs比不可行的对抗样本更有效。此外,本文还提出了一种理解Android领域约束在特征空间中的方法。该方法首先学习特征,并将领域约束解释为在特征空间中的边界。 实验结果表明,该方法在不降低检测性能的情况下,提高了模型鲁棒性。
The vulnerability to adversarial examples remains one major obstacle for Machine Learning (ML)-based Android malware detection. Realistic attacks in the Android malware domain create Realizable Adversarial Examples (RealAEs), i.e., AEs that satisfy the domain constraints of Android malware. Recent studies have shown that using such RealAEs in Adversarial Training (AT) is more effective in defending against realistic attacks than using unrealizable AEs (unRealAEs). This is because RealAEs allow defenders to explore certain pockets in the feature space that are vulnerable to realistic attacks. However, existing defenses commonly generate RealAEs in the problem space, which is known to be time-consuming and impractical for AT. In this paper, we propose to generate RealAEs in the feature space, leading to a simpler and more efficient solution. Our approach is driven by a novel interpretation of Android domain constraints in the feature space. More concretely, our defense first learns featu ↩