From dfd1d8815b68ae203dc6d12167e26929a04bc145 Mon Sep 17 00:00:00 2001 From: qhduan Date: Tue, 19 Nov 2024 09:06:44 +0000 Subject: [PATCH] Add changes --- cs.AI.md | 421 +++++++++++++++++++------ cs.AI.xml | 496 ++++++++++++++++++++++++------ cs.CL.md | 206 +++++++++++-- cs.CL.xml | 256 ++++++++++++++-- cs.IR.md | 45 ++- cs.IR.xml | 62 +++- cs.LG.md | 591 +++++++++++++++++++++++++++++++---- cs.LG.xml | 746 +++++++++++++++++++++++++++++++++++++++++---- econ.md | 87 ++++-- econ.xml | 102 +++++-- latest_updated.txt | 2 +- q-fin.md | 29 +- q-fin.xml | 34 ++- stat.ML.md | 162 ++++++++-- stat.ML.xml | 202 ++++++++++-- 15 files changed, 2962 insertions(+), 479 deletions(-) diff --git a/cs.AI.md b/cs.AI.md index 5070dc7d1..2aabd20c5 100644 --- a/cs.AI.md +++ b/cs.AI.md @@ -2,217 +2,442 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [Security and Privacy Challenges of Large Language Models: A Survey](https://rss.arxiv.org/abs/2402.00888) | 大型语言模型具有卓越的能力,但也面临着安全和隐私攻击的威胁。本调查全面审查了LLM的安全和隐私挑战,涵盖了训练数据、用户和应用风险等方面,并对解决方法进行了回顾。 | -| [^2] | [SHIELD: A regularization technique for eXplainable Artificial Intelligence](https://arxiv.org/abs/2404.02611) | SHIELD引入了一种正则化技术,通过隐藏部分输入数据并评估预测结果的差异,从而改善了可解释人工智能模型的质量。 | -| [^3] | [Optimization-based Prompt Injection Attack to LLM-as-a-Judge](https://arxiv.org/abs/2403.17710) | 介绍了一种基于优化的提示注入攻击方法,JudgeDeceiver,针对LLM-as-a-Judge,通过自动化生成对抗序列实现了有针对性和高效的模型评估操控。 | -| [^4] | [ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image](https://arxiv.org/abs/2403.09871) | ThermoHands提出了一个新的基准ThermoHands,旨在解决热图中主观视角3D手部姿势估计的挑战,介绍了一个具有双transformer模块的定制基线方法TheFormer,表明热成像在恶劣条件下实现稳健的3D手部姿势估计的有效性。 | -| [^5] | [CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion](https://arxiv.org/abs/2402.14551) | CLCE方法结合了标签感知对比学习与交叉熵损失,通过协同利用难例挖掘提高了性能表现 | -| [^6] | [Mitigating the Linguistic Gap with Phonemic Representations for Robust Multilingual Language Understanding](https://arxiv.org/abs/2402.14279) | 通过使用音素表示,本文提出了一种新颖的解决方案来减缓高资源语言和低资源语言之间的性能差距,并通过实证研究和理论分析证明了其有效性。 | -| [^7] | [ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters](https://arxiv.org/abs/2402.10930) | ConSmax是一种硬件友好型Softmax替代方案,通过引入可学习参数,在不影响性能的情况下实现了对原Softmax关键任务的高效处理。 | -| [^8] | [Large Language Model-Based Interpretable Machine Learning Control in Building Energy Systems](https://arxiv.org/abs/2402.09584) | 本文研究了机器学习控制在建筑能源系统中的可解释性,通过将Shapley值和大型语言模型相结合,提高了机器学习控制模型的透明性和理解性。 | -| [^9] | [Advancing Building Energy Modeling with Large Language Models: Exploration and Case Studies](https://arxiv.org/abs/2402.09579) | 本文研究了将大型语言模型ChatGPT与EnergyPlus建筑能源建模软件融合的创新方法,并强调了大型语言模型在解决建筑能源建模挑战方面的潜力和多种应用。 | -| [^10] | [Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs](https://arxiv.org/abs/2312.11282) | 该论文评估了当前最先进的大型语言模型(GPT-4)在知识图谱上的对话推理能力,提出了一种基于KG推理的LLM基准代理(LLM-ARK),该代理利用全文环境提示来实现精确和适应性强的KG路径预测,并采用近端策略优化算法进行训练。 | -| [^11] | [ShaRP: Explaining Rankings with Shapley Values.](http://arxiv.org/abs/2401.16744) | ShaRP是一个基于Shapley值的框架,用于解释排名结果中各个特征的贡献。即使使用线性评分函数,特征的权重也不一定对应其Shapley值的贡献,而是取决于特征分布和评分特征之间的局部相互作用。 | -| [^12] | [Integrating Symbolic Reasoning into Neural Generative Models for Design Generation.](http://arxiv.org/abs/2310.09383) | 这项研究将神经网络和符号推理结合起来,提出了Spatial Reasoning Integrated Generator (SPRING),用于设计生成。SPRING通过将神经网络和符号约束满足结合起来,能够生成满足用户规格和实用要求的设计。 | -| [^13] | [Fault Injection and Safe-Error Attack for Extraction of Embedded Neural Network Models.](http://arxiv.org/abs/2308.16703) | 本文介绍了故障注入和安全错误攻击用于提取嵌入式神经网络模型的方法,并阐述了对32位微控制器上的深度神经网络进行模型提取攻击的实验结果。 | -| [^14] | [A Machine with Short-Term, Episodic, and Semantic Memory Systems.](http://arxiv.org/abs/2212.02098) | 本文研究了一个具有短期、情节和语义内存系统的机器代理模型,通过基于知识图谱的建模,在强化学习环境中实现了短期记忆的管理和存储,实验证明这种人类记忆系统结构的代理比没有该结构的代理表现更好。 | +| [^1] | [Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based Bias Evaluation](https://arxiv.org/abs/2404.01768) | 该研究引入了Multi-Grain Stereotype(MGS)数据集,探索了不同的机器学习方法用于建立陈规检测的基线,并提出了一系列基于MGS数据训练的英文文本的陈规分类器模型。 | +| [^2] | [SugarcaneNet2024: An Optimized Weighted Average Ensemble Approach of LASSO Regularized Pre-trained Models for Sugarcane Disease Classification](https://arxiv.org/abs/2403.18870) | SugarcaneNet2024是通过优化加权平均集成LASSO正则化的预训练模型,在甘蔗病害分类中表现出色,具有快速准确的检测能力。 | +| [^3] | [PhD: A Prompted Visual Hallucination Evaluation Dataset](https://arxiv.org/abs/2403.11116) | 本研究针对Intrinsic Vision-Language Hallucination(IVL-Hallu)问题进行了深入分析,提出了几种新颖的IVL-Hallu任务,并将其分为四种类型,有助于揭示其产生的原因和反映。 | +| [^4] | [LIGHTCODE: Light Analytical and Neural Codes for Channels with Feedback](https://arxiv.org/abs/2403.10751) | 本文提出了一种LIGHTCODE轻量级神经编码方案,在具备解释性的基础上,在低信噪比区域实现了最先进的可靠性。 | +| [^5] | [Specification Overfitting in Artificial Intelligence](https://arxiv.org/abs/2403.08425) | 本文定义了规格过度拟合问题,即系统过度关注指定指标而损害了高级要求和任务性能。 | +| [^6] | [NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning](https://arxiv.org/abs/2403.06828) | NeuPAN 是一种实时、高度准确、无地图、适用于各种机器人且对环境不变的机器人导航解决方案,最大的创新在于将原始点直接映射到学习到的多帧距离空间,并具有端到端模型学习的可解释性,从而实现了可证明的收敛。 | +| [^7] | [Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume](https://arxiv.org/abs/2403.05100) | 提出新指标对抗超体积来全面评估深度学习模型在多种扰动强度下的鲁棒性,并采用新型训练算法来提高对抗鲁棒性。 | +| [^8] | [ARNN: Attentive Recurrent Neural Network for Multi-channel EEG Signals to Identify Epileptic Seizures](https://arxiv.org/abs/2403.03276) | ARNN提出了一种注意力循环神经网络,用于处理多通道脑电图信号,具有线性复杂度和并行计算,结合注意力和LSTM gate的优势,并避免了它们的缺点。 | +| [^9] | [Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models](https://arxiv.org/abs/2403.01101) | 通过代理进行特征对齐,以解决预先计算特征无法区分标记样本类别和避免通过代理模型选择样本时牺牲宝贵预训练信息的问题。 | +| [^10] | [When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning](https://arxiv.org/abs/2402.17747) | RLHF在考虑部分观察性时可能导致策略欺骗性地夸大性能或过度辩护行为,我们提出了数学条件来解决这些问题,并警告不要盲目应用RLHF在部分可观测情况下。 | +| [^11] | [Interpreting Grokked Transformers in Complex Modular Arithmetic](https://arxiv.org/abs/2402.16726) | 本研究通过可解释的逆向工程在复杂模块化算术中观察了Transformer内部电路学习过程,并发现减法在Transformer上造成了强烈的不对称性,乘法需要余弦偏置分量,多项式叠加了基本算术模式,但在挑战性情况下并不清晰,Grokking甚至可以在具有基本对称和交替表达式的高次公式中轻松发生。 | +| [^12] | [Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond](https://arxiv.org/abs/2402.14259) | 本论文提出了一种新方法单词序列熵(WSE),用于在自由形式医学问答任务中量化答案的不确定性,相比其他基线方法表现更优秀。 | +| [^13] | [Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks](https://arxiv.org/abs/2402.05271) | 了解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。前人的研究表明,在训练过程中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这被称为神经特征分析(NFA)。本研究解释了这种相关性的出现,并发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 | +| [^14] | [Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning](https://arxiv.org/abs/2402.03607) | 本研究提出了一种将常识知识图谱与大型视觉语言模型相结合的框架,用于改进预测多模态营销活动效果的性能。该方法能够提供早期检测可能具有说服力的多模态活动并评估和增强营销理论的能力。 | +| [^15] | [TSIS: A Supplementary Algorithm to t-SMILES for Fragment-based Molecular Representation](https://arxiv.org/abs/2402.02164) | 本研究引入了TSIS算法作为t-SMILES的补充,用于改进基于字符串的分子表示方法。实验证明,TSIS模型在处理语法中的长期依赖性方面表现优于其他模型。 | +| [^16] | [GPT4Battery: An LLM-driven Framework for Adaptive State of Health Estimation of Raw Li-ion Batteries](https://arxiv.org/abs/2402.00068) | 本论文提出了一种基于LLM的框架,可以适应不同类型的锂离子电池,实现准确的健康状态估计。这项工作解决了生成训练数据的时间和资源成本高的挑战,并在实际应用中具有良好的泛化能力。 | +| [^17] | [Large Language Models are Null-Shot Learners](https://arxiv.org/abs/2401.08273) | 本文提出了零射击提示方法,通过利用大规模语言模型中的错误信息来指导模型进行任务,以提高任务表现。实验结果表明,在不同数据集上,包括阅读理解、算术推理和闭卷问答,模型性能有所提升。这些结果也显示出不同模型之间存在不同程度的错误信息。 | +| [^18] | [SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks.](http://arxiv.org/abs/2401.15299) | SupplyGraph是一个基准数据集,用于使用图神经网络进行供应链规划。该数据集包含了来自孟加拉国一家领先快速消费品公司的实际数据,用于优化、预测和解决供应链问题。数据集中的时间数据作为节点特征,可用于销售预测、生产计划和故障识别。 | +| [^19] | [A Comprehensive Study of Knowledge Editing for Large Language Models.](http://arxiv.org/abs/2401.01286) | 本研究全面研究了大型语言模型的知识编辑,旨在有效修改模型的行为,同时保持整体性能。 | +| [^20] | [Navigating Privacy and Copyright Challenges Across the Data Lifecycle of Generative AI.](http://arxiv.org/abs/2311.18252) | 这项研究探讨了生成性人工智能中数据隐私和版权保护的多方面挑战,并提出了将技术创新与伦理前瞻相结合的综合方法,旨在全面解决这些问题。 | +| [^21] | [Clover: Closed-Loop Verifiable Code Generation.](http://arxiv.org/abs/2310.17807) | Clover是一种闭环可验证代码生成的范式,通过在代码、docstrings和形式注释之间进行一致性检查,确保生成的代码的正确性。 | +| [^22] | [Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models.](http://arxiv.org/abs/2310.17086) | Transformers学会了高阶优化方法,用于上下文学习,通过实现类似于迭代牛顿法的算法,而不是梯度下降。 | +| [^23] | [Uncovering Hidden Connections: Iterative Tracking and Reasoning for Video-grounded Dialog.](http://arxiv.org/abs/2310.07259) | 本文提出了一种迭代跟踪和推理策略,结合文本编码器和视觉编码器以生成准确的响应,解决了视频对话中逐步理解对话历史和吸收视频信息的挑战。 | +| [^24] | [A Model-Agnostic Graph Neural Network for Integrating Local and Global Information.](http://arxiv.org/abs/2309.13459) | MaGNet是一种模型无关的图神经网络框架,能够顺序地整合不同顺序的信息,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。 | +| [^25] | [Matching Patients to Clinical Trials with Large Language Models.](http://arxiv.org/abs/2307.15051) | 本研究调查了使用大型语言模型(LLMs)来帮助患者和转诊医生识别合适的临床试验的潜力,并引入了TrialGPT架构,该架构能够准确预测合格性并提供解释,实验证明其有效性。 | +| [^26] | [A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning.](http://arxiv.org/abs/2307.09218) | 遗忘是深度学习中普遍存在的现象,不仅限于连续学习领域。解决遗忘问题面临多个挑战,包括平衡保留旧任务知识与快速学习新任务的挑战,管理任务干扰与冲突目标的挑战,以及防止隐私泄露等。遗忘不总是有害的,可以在某些情况下是有益且可取的,特别是在隐私保护场景中。 | +| [^27] | [Learning policies for resource allocation in business processes.](http://arxiv.org/abs/2304.09970) | 本文提出了两种基于学习的方法来进行企业流程资源分配,具有优于常见启发式方法的效果。 | +| [^28] | [Smooth Non-Stationary Bandits.](http://arxiv.org/abs/2301.12366) | 本文提出了一种非平稳两臂赌博机问题的策略,能够处理平滑变化,并证明了该策略在二次Lipschitz连续的情况下的遗憾为 $\tilde O(T^{3/5})$。 | +| [^29] | [System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation.](http://arxiv.org/abs/2208.10489) | 本文提出了深度伪造音频的系统指纹识别方法,并通过收集来自中国七个供应商的语音合成系统的数据集进行了初步研究。这项研究为进一步发展系统指纹识别方法提供了基础,并在模型版权保护和数字证据取证等实际场景中具有重要应用价值。 | # 详细 -[^1]: 大型语言模型的安全和隐私挑战:一项调查 +[^1]: 用于增强基于文本的陈规检测和基于探测的偏见评估的大规模语言模型审计 - Security and Privacy Challenges of Large Language Models: A Survey + Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based Bias Evaluation - [https://rss.arxiv.org/abs/2402.00888](https://rss.arxiv.org/abs/2402.00888) + [https://arxiv.org/abs/2404.01768](https://arxiv.org/abs/2404.01768) - 大型语言模型具有卓越的能力,但也面临着安全和隐私攻击的威胁。本调查全面审查了LLM的安全和隐私挑战,涵盖了训练数据、用户和应用风险等方面,并对解决方法进行了回顾。 + 该研究引入了Multi-Grain Stereotype(MGS)数据集,探索了不同的机器学习方法用于建立陈规检测的基线,并提出了一系列基于MGS数据训练的英文文本的陈规分类器模型。 - 大型语言模型(LLM)展示了非凡的能力,并在生成和总结文本、语言翻译和问答等多个领域做出了贡献。如今,LLM正在成为计算机语言处理任务中非常流行的工具,具备分析复杂语言模式并根据上下文提供相关和适当回答的能力。然而,尽管具有显著优势,这些模型也容易受到安全和隐私攻击的威胁,如越狱攻击、数据污染攻击和个人可识别信息泄露攻击。本调查全面审查了LLM的安全和隐私挑战,包括训练数据和用户方面的问题,以及在交通、教育和医疗等各个领域中应用带来的风险。我们评估了LLM的脆弱性程度,调查了出现的安全和隐私攻击,并对潜在的解决方法进行了回顾。 + 大型语言模型(LLMs)的最新进展显著提高了它们在面向人类的人工智能(AI)应用中的影响力。然而,LLMs可能会复制甚至加剧自训练数据中的陈规输出。本研究介绍了Multi-Grain Stereotype(MGS)数据集,包括51,867个实例,涵盖性别、种族、职业、宗教和陈规文本,通过融合多个先前公开的陈规检测数据集收集而来。我们探索了旨在为陈规检测建立基线的不同机器学习方法,并微调了多种架构和模型大小的几个语言模型,本文展示了一系列基于MGS训练的英文文本的陈规分类器模型。为了了解我们的陈规检测器是否捕捉到与人类常识一致的相关特征,我们利用了各种可解释的AI工具, - Large Language Models (LLMs) have demonstrated extraordinary capabilities and contributed to multiple fields, such as generating and summarizing text, language translation, and question-answering. Nowadays, LLM is becoming a very popular tool in computerized language processing tasks, with the capability to analyze complicated linguistic patterns and provide relevant and appropriate responses depending on the context. While offering significant advantages, these models are also vulnerable to security and privacy attacks, such as jailbreaking attacks, data poisoning attacks, and Personally Identifiable Information (PII) leakage attacks. This survey provides a thorough review of the security and privacy challenges of LLMs for both training data and users, along with the application-based risks in various domains, such as transportation, education, and healthcare. We assess the extent of LLM vulnerabilities, investigate emerging security and privacy attacks for LLMs, and review the potent + arXiv:2404.01768v1 Announce Type: cross Abstract: Recent advancements in Large Language Models (LLMs) have significantly increased their presence in human-facing Artificial Intelligence (AI) applications. However, LLMs could reproduce and even exacerbate stereotypical outputs from training data. This work introduces the Multi-Grain Stereotype (MGS) dataset, encompassing 51,867 instances across gender, race, profession, religion, and stereotypical text, collected by fusing multiple previously publicly available stereotype detection datasets. We explore different machine learning approaches aimed at establishing baselines for stereotype detection, and fine-tune several language models of various architectures and model sizes, presenting in this work a series of stereotypes classifier models for English text trained on MGS. To understand whether our stereotype detectors capture relevant features (aligning with human common sense) we utilise a variety of explanainable AI tools, including -[^2]: SHIELD: 一种用于可解释人工智能的正则化技术 +[^2]: SugarcaneNet2024: LASSO正则化的预训练模型的优化加权平均集成方法用于甘蔗病害分类 - SHIELD: A regularization technique for eXplainable Artificial Intelligence + SugarcaneNet2024: An Optimized Weighted Average Ensemble Approach of LASSO Regularized Pre-trained Models for Sugarcane Disease Classification - [https://arxiv.org/abs/2404.02611](https://arxiv.org/abs/2404.02611) + [https://arxiv.org/abs/2403.18870](https://arxiv.org/abs/2403.18870) - SHIELD引入了一种正则化技术,通过隐藏部分输入数据并评估预测结果的差异,从而改善了可解释人工智能模型的质量。 + SugarcaneNet2024是通过优化加权平均集成LASSO正则化的预训练模型,在甘蔗病害分类中表现出色,具有快速准确的检测能力。 - 随着人工智能系统在各个领域变得不可或缺,对可解释性的需求与日俱增。尽管科学界的努力主要集中在为模型获取更好的解释上,但重要的是不要忽视这个解释过程对改善训练的潜力。虽然现有的努力主要集中在为黑盒模型生成和评估解释上,但直接通过这些评估来增强模型仍存在关键差距。本文介绍了SHIELD(选择性隐藏输入评估学习动态),这是一种适用于可解释人工智能的正则化技术,旨在通过隐藏部分输入数据并评估预测结果的差异来改善模型质量。与传统方法相比,SHIELD正则化无缝集成到目标函数中,提高了模型的可解释性同时也改善了性能 + 甘蔗作为世界糖业的关键作物,容易受多种病害侵害,这些病害对其产量和质量都有重大负面影响。为了有效管理和实施预防措施,必须及时准确地检测病害。本研究提出了一种名为SugarcaneNet2024的独特模型,通过叶片图像处理,能够优于先前方法自动快速检测甘蔗病害。我们提出的模型汇总了七个定制的、经过LASSO正则化的预训练模型的优化加权平均集成,特别是InceptionV3、InceptionResNetV2、DenseNet201、DenseNet169、Xception和ResNet152V2。最初,我们在这些预训练模型底部添加了三层更密集层,具有0.0001的LASSO正则化,三个30%的dropout层和三个启用renorm的批量归一化,以提高性能。 - arXiv:2404.02611v1 Announce Type: new Abstract: As Artificial Intelligence systems become integral across domains, the demand for explainability grows. While the effort by the scientific community is focused on obtaining a better explanation for the model, it is important not to ignore the potential of this explanation process to improve training as well. While existing efforts primarily focus on generating and evaluating explanations for black-box models, there remains a critical gap in directly enhancing models through these evaluations. This paper introduces SHIELD (Selective Hidden Input Evaluation for Learning Dynamics), a regularization technique for explainable artificial intelligence designed to improve model quality by concealing portions of input data and assessing the resulting discrepancy in predictions. In contrast to conventional approaches, SHIELD regularization seamlessly integrates into the objective function, enhancing model explainability while also improving perfor + arXiv:2403.18870v1 Announce Type: cross Abstract: Sugarcane, a key crop for the world's sugar industry, is prone to several diseases that have a substantial negative influence on both its yield and quality. To effectively manage and implement preventative initiatives, diseases must be detected promptly and accurately. In this study, we present a unique model called sugarcaneNet2024 that outperforms previous methods for automatically and quickly detecting sugarcane disease through leaf image processing. Our proposed model consolidates an optimized weighted average ensemble of seven customized and LASSO-regularized pre-trained models, particularly InceptionV3, InceptionResNetV2, DenseNet201, DenseNet169, Xception, and ResNet152V2. Initially, we added three more dense layers with 0.0001 LASSO regularization, three 30% dropout layers, and three batch normalizations with renorm enabled at the bottom of these pre-trained models to improve the performance. The accuracy of sugarcane leaf dise -[^3]: 基于优化的对LLM评判系统的提示注入攻击 +[^3]: 博士论文:一个提示的视觉幻觉评估数据集 - Optimization-based Prompt Injection Attack to LLM-as-a-Judge + PhD: A Prompted Visual Hallucination Evaluation Dataset - [https://arxiv.org/abs/2403.17710](https://arxiv.org/abs/2403.17710) + [https://arxiv.org/abs/2403.11116](https://arxiv.org/abs/2403.11116) - 介绍了一种基于优化的提示注入攻击方法,JudgeDeceiver,针对LLM-as-a-Judge,通过自动化生成对抗序列实现了有针对性和高效的模型评估操控。 + 本研究针对Intrinsic Vision-Language Hallucination(IVL-Hallu)问题进行了深入分析,提出了几种新颖的IVL-Hallu任务,并将其分为四种类型,有助于揭示其产生的原因和反映。 - LLM-as-a-Judge 是一种可以使用大型语言模型(LLMs)评估文本信息的新颖解决方案。根据现有研究,LLMs在提供传统人类评估的引人注目替代方面表现出色。然而,这些系统针对提示注入攻击的鲁棒性仍然是一个未解决的问题。在这项工作中,我们引入了JudgeDeceiver,一种针对LLM-as-a-Judge量身定制的基于优化的提示注入攻击。我们的方法制定了一个精确的优化目标,用于攻击LLM-as-a-Judge的决策过程,并利用优化算法高效地自动化生成对抗序列,实现对模型评估的有针对性和有效的操作。与手工制作的提示注入攻击相比,我们的方法表现出卓越的功效,给基于LLM的判断系统当前的安全范式带来了重大挑战。 + 大型语言模型(LLMs)的快速增长推动了大型视觉语言模型(LVLMs)的发展。在LLMs中普遍存在的幻觉挑战也出现在LVLMs中。然而,大部分现有研究主要集中在LVLM中的对象幻觉上,忽略了LVLM幻觉的多样化类型。本研究深入探讨了固有视觉语言幻觉(IVL-Hallu)问题,对导致幻觉的不同类型的IVL-Hallu进行了彻底分析。具体来说,我们提出了几个新颖的IVL-Hallu任务,并将它们分为四种类型:(a)对象幻觉,由于对象的误识别而产生,(b)属性幻觉,由于属性的误识别而引起,(c)多模态冲突幻觉,源自文本和视觉信息之间的矛盾,以及(d)反常识幻觉,由于对立之间的矛盾。 - arXiv:2403.17710v1 Announce Type: cross Abstract: LLM-as-a-Judge is a novel solution that can assess textual information with large language models (LLMs). Based on existing research studies, LLMs demonstrate remarkable performance in providing a compelling alternative to traditional human assessment. However, the robustness of these systems against prompt injection attacks remains an open question. In this work, we introduce JudgeDeceiver, a novel optimization-based prompt injection attack tailored to LLM-as-a-Judge. Our method formulates a precise optimization objective for attacking the decision-making process of LLM-as-a-Judge and utilizes an optimization algorithm to efficiently automate the generation of adversarial sequences, achieving targeted and effective manipulation of model evaluations. Compared to handcraft prompt injection attacks, our method demonstrates superior efficacy, posing a significant challenge to the current security paradigms of LLM-based judgment systems. T + arXiv:2403.11116v1 Announce Type: cross Abstract: The rapid growth of Large Language Models (LLMs) has driven the development of Large Vision-Language Models (LVLMs). The challenge of hallucination, prevalent in LLMs, also emerges in LVLMs. However, most existing efforts mainly focus on object hallucination in LVLM, ignoring diverse types of LVLM hallucinations. In this study, we delve into the Intrinsic Vision-Language Hallucination (IVL-Hallu) issue, thoroughly analyzing different types of IVL-Hallu on their causes and reflections. Specifically, we propose several novel IVL-Hallu tasks and categorize them into four types: (a) object hallucination, which arises from the misidentification of objects, (b) attribute hallucination, which is caused by the misidentification of attributes, (c) multi-modal conflicting hallucination, which derives from the contradictions between textual and visual information, and (d) counter-common-sense hallucination, which owes to the contradictions betwee -[^4]: ThermoHands:一种用于从主观视角热图中估计3D手部姿势的基准 +[^4]: LIGHTCODE:具有反馈通道的光解析和神经编码 - ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image + LIGHTCODE: Light Analytical and Neural Codes for Channels with Feedback - [https://arxiv.org/abs/2403.09871](https://arxiv.org/abs/2403.09871) + [https://arxiv.org/abs/2403.10751](https://arxiv.org/abs/2403.10751) - ThermoHands提出了一个新的基准ThermoHands,旨在解决热图中主观视角3D手部姿势估计的挑战,介绍了一个具有双transformer模块的定制基线方法TheFormer,表明热成像在恶劣条件下实现稳健的3D手部姿势估计的有效性。 + 本文提出了一种LIGHTCODE轻量级神经编码方案,在具备解释性的基础上,在低信噪比区域实现了最先进的可靠性。 - 在这项工作中,我们提出了ThermoHands,这是一个针对基于热图的主观视角3D手部姿势估计的新基准,旨在克服诸如光照变化和遮挡(例如手部穿戴物)等挑战。该基准包括来自28名主体进行手-物体和手-虚拟交互的多样数据集,经过自动化过程准确标注了3D手部姿势。我们引入了一个定制的基线方法TheFormer,利用双transformer模块在热图中实现有效的主观视角3D手部姿势估计。我们的实验结果突显了TheFormer的领先性能,并确认了热成像在实现恶劣条件下稳健的3D手部姿势估计方面的有效性。 + 通道反馈中可靠且高效的编码方案设计一直是通信理论中一项长期挑战。虽然深度学习技术取得了显著进展,神经编码往往面临计算成本高、缺乏可解释性以及在资源受限环境中的实用性有限等问题。本文旨在设计解释性强且更适用于通信系统的低复杂度编码方案。我们先进了解析编码和神经编码。首先,我们展示了POWERBLAST,一种受Schalkwijk-Kailath(SK)和Gallager-Nakiboglu(GN)方案启发的解析编码方案,在高信噪比(SNR)区域实现了明显的可靠性改进,胜过神经编码。接下来,为了增强低SNR区域的可靠性,我们提出了LIGHTCODE,一种轻量级神经编码,实现了最先进的可靠性。 - arXiv:2403.09871v1 Announce Type: cross Abstract: In this work, we present ThermoHands, a new benchmark for thermal image-based egocentric 3D hand pose estimation, aimed at overcoming challenges like varying lighting and obstructions (e.g., handwear). The benchmark includes a diverse dataset from 28 subjects performing hand-object and hand-virtual interactions, accurately annotated with 3D hand poses through an automated process. We introduce a bespoken baseline method, TheFormer, utilizing dual transformer modules for effective egocentric 3D hand pose estimation in thermal imagery. Our experimental results highlight TheFormer's leading performance and affirm thermal imaging's effectiveness in enabling robust 3D hand pose estimation in adverse conditions. + arXiv:2403.10751v1 Announce Type: cross Abstract: The design of reliable and efficient codes for channels with feedback remains a longstanding challenge in communication theory. While significant improvements have been achieved by leveraging deep learning techniques, neural codes often suffer from high computational costs, a lack of interpretability, and limited practicality in resource-constrained settings. We focus on designing low-complexity coding schemes that are interpretable and more suitable for communication systems. We advance both analytical and neural codes. First, we demonstrate that POWERBLAST, an analytical coding scheme inspired by Schalkwijk-Kailath (SK) and Gallager-Nakiboglu (GN) schemes, achieves notable reliability improvements over both SK and GN schemes, outperforming neural codes in high signal-to-noise ratio (SNR) regions. Next, to enhance reliability in low-SNR regions, we propose LIGHTCODE, a lightweight neural code that achieves state-of-the-art reliability -[^5]: CLCE:一种优化学习融合的改进交叉熵和对比学习方法 +[^5]: 人工智能规格过度拟合问题 - CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion + Specification Overfitting in Artificial Intelligence - [https://arxiv.org/abs/2402.14551](https://arxiv.org/abs/2402.14551) + [https://arxiv.org/abs/2403.08425](https://arxiv.org/abs/2403.08425) - CLCE方法结合了标签感知对比学习与交叉熵损失,通过协同利用难例挖掘提高了性能表现 + 本文定义了规格过度拟合问题,即系统过度关注指定指标而损害了高级要求和任务性能。 - 最先进的预训练图像模型主要采用两阶段方法:在大规模数据集上进行初始无监督预训练,然后使用交叉熵损失(CE)进行特定任务的微调。然而,已经证明CE可能会损害模型的泛化性和稳定性。为了解决这些问题,我们引入了一种名为CLCE的新方法,该方法将标签感知对比学习与CE相结合。我们的方法不仅保持了两种损失函数的优势,而且以协同方式利用难例挖掘来增强性能。 + 机器学习(ML)和人工智能(AI)方法经常被批评存在固有的偏见,以及缺乏控制、问责和透明度,监管机构因此难以控制这种技术的潜在负面影响。高级要求,如公平性和鲁棒性,需要被形式化为具体的规格度量,而这些度量是捕捉基本要求的独立方面的不完美代理。鉴于不同指标之间可能存在的权衡及其对过度优化的脆弱性,将规格度量整合到系统开发过程中并不是一件简单的事情。本文定义了规格过度拟合,即系统过度侧重于指定的度量,从而损害了高级要求和任务性能。我们进行了大量文献调研,对研究人员如何提出、测量和优化规格进行了分类。 - arXiv:2402.14551v1 Announce Type: cross Abstract: State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been demonstrated that CE can compromise model generalization and stability. While recent works employing contrastive learning address some of these limitations by enhancing the quality of embeddings and producing better decision boundaries, they often overlook the importance of hard negative mining and rely on resource intensive and slow training using large sample batches. To counter these issues, we introduce a novel approach named CLCE, which integrates Label-Aware Contrastive Learning with CE. Our approach not only maintains the strengths of both loss functions but also leverages hard negative mining in a synergistic way to enhance performance. Experimental results demonstrate that CLCE significantly outperf + arXiv:2403.08425v1 Announce Type: new Abstract: Machine learning (ML) and artificial intelligence (AI) approaches are often criticized for their inherent bias and for their lack of control, accountability, and transparency. Consequently, regulatory bodies struggle with containing this technology's potential negative side effects. High-level requirements such as fairness and robustness need to be formalized into concrete specification metrics, imperfect proxies that capture isolated aspects of the underlying requirements. Given possible trade-offs between different metrics and their vulnerability to over-optimization, integrating specification metrics in system development processes is not trivial. This paper defines specification overfitting, a scenario where systems focus excessively on specified metrics to the detriment of high-level requirements and task performance. We present an extensive literature survey to categorize how researchers propose, measure, and optimize specification -[^6]: 使用音素表示减缓语言差异,实现稳健的多语言理解 +[^6]: NeuPAN:直接点机器人导航的端到端基于模型学习 - Mitigating the Linguistic Gap with Phonemic Representations for Robust Multilingual Language Understanding + NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning - [https://arxiv.org/abs/2402.14279](https://arxiv.org/abs/2402.14279) + [https://arxiv.org/abs/2403.06828](https://arxiv.org/abs/2403.06828) - 通过使用音素表示,本文提出了一种新颖的解决方案来减缓高资源语言和低资源语言之间的性能差距,并通过实证研究和理论分析证明了其有效性。 + NeuPAN 是一种实时、高度准确、无地图、适用于各种机器人且对环境不变的机器人导航解决方案,最大的创新在于将原始点直接映射到学习到的多帧距离空间,并具有端到端模型学习的可解释性,从而实现了可证明的收敛。 - 为了改善多语言理解,通常需要在训练阶段使用多种语言,依赖复杂的训练技术,并且在高资源语言和低资源语言之间存在显著的性能差距。我们假设语言之间的性能差距受到这些语言之间的语言差异的影响,并通过使用音素表示(具体来说,将音素作为输入标记输入到语言模型中,而不是子词)提供了一种新颖的解决方案,以实现稳健的多语言建模。我们通过三个跨语言任务的定量证据展示了音素表示的有效性,这进一步得到了对跨语言性能差距的理论分析的证明。 + 在拥挤环境中对非全向机器人进行导航需要极其精确的感知和运动以避免碰撞。本文提出NeuPAN:一种实时、高度准确、无地图、适用于各种机器人,且对环境不变的机器人导航解决方案。NeuPAN采用紧耦合的感知-运动框架,与现有方法相比有两个关键创新:1)它直接将原始点映射到学习到的多帧距离空间,避免了从感知到控制的误差传播;2)从端到端基于模型学习的角度进行解释,实现了可证明的收敛。NeuPAN的关键在于利用插拔式(PnP)交替最小化传感器(PAN)网络解高维端到端数学模型,其中包含各种点级约束,使NeuPAN能够直接生成实时、端到端、物理可解释的运动。 - arXiv:2402.14279v1 Announce Type: cross Abstract: Approaches to improving multilingual language understanding often require multiple languages during the training phase, rely on complicated training techniques, and -- importantly -- struggle with significant performance gaps between high-resource and low-resource languages. We hypothesize that the performance gaps between languages are affected by linguistic gaps between those languages and provide a novel solution for robust multilingual language modeling by employing phonemic representations (specifically, using phonemes as input tokens to LMs rather than subwords). We present quantitative evidence from three cross-lingual tasks that demonstrate the effectiveness of phonemic representation, which is further justified by a theoretical analysis of the cross-lingual performance gap. + arXiv:2403.06828v1 Announce Type: cross Abstract: Navigating a nonholonomic robot in a cluttered environment requires extremely accurate perception and locomotion for collision avoidance. This paper presents NeuPAN: a real-time, highly-accurate, map-free, robot-agnostic, and environment-invariant robot navigation solution. Leveraging a tightly-coupled perception-locomotion framework, NeuPAN has two key innovations compared to existing approaches: 1) it directly maps raw points to a learned multi-frame distance space, avoiding error propagation from perception to control; 2) it is interpretable from an end-to-end model-based learning perspective, enabling provable convergence. The crux of NeuPAN is to solve a high-dimensional end-to-end mathematical model with various point-level constraints using the plug-and-play (PnP) proximal alternating-minimization network (PAN) with neurons in the loop. This allows NeuPAN to generate real-time, end-to-end, physically-interpretable motions direct -[^7]: ConSmax: 具有可学习参数的硬件友好型Softmax替代方案 +[^7]: 探索对抗界限:通过对抗超体积量化鲁棒性 - ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters + Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume - [https://arxiv.org/abs/2402.10930](https://arxiv.org/abs/2402.10930) + [https://arxiv.org/abs/2403.05100](https://arxiv.org/abs/2403.05100) - ConSmax是一种硬件友好型Softmax替代方案,通过引入可学习参数,在不影响性能的情况下实现了对原Softmax关键任务的高效处理。 + 提出新指标对抗超体积来全面评估深度学习模型在多种扰动强度下的鲁棒性,并采用新型训练算法来提高对抗鲁棒性。 - 自注意机制将基于transformer的大型语言模型(LLM)与卷积和循环神经网络区分开来。尽管性能有所提升,但由于自注意中广泛使用Softmax,在硅上实现实时LLM推断仍具挑战性。为了解决这一挑战,我们提出了Constant Softmax(ConSmax),这是一种高效的Softmax替代方案,采用可微的规范化参数来消除Softmax中的最大搜索和分母求和,实现了大规模并行化。 + 在深度学习模型面临日益严重的对抗攻击威胁,特别是在安全关键领域,强调了对鲁棒深度学习系统的需求。传统的鲁棒性评估依赖于对抗准确性,该指标衡量模型在特定扰动强度下的性能。然而,这一单一指标并不能完全概括模型对不同程度扰动的整体韧性。为了填补这一空白,我们提出了一种新的指标,称为对抗超体积,从多目标优化的角度综合评估了深度学习模型在一系列扰动强度下的鲁棒性。该指标允许深入比较防御机制,并承认了较弱的防御策略所带来的鲁棒性改进。此外,我们采用了一种提高对抗鲁棒性均匀性的新型训练算法。 - arXiv:2402.10930v1 Announce Type: cross Abstract: The self-attention mechanism sets transformer-based large language model (LLM) apart from the convolutional and recurrent neural networks. Despite the performance improvement, achieving real-time LLM inference on silicon is challenging due to the extensively used Softmax in self-attention. Apart from the non-linearity, the low arithmetic intensity greatly reduces the processing parallelism, which becomes the bottleneck especially when dealing with a longer context. To address this challenge, we propose Constant Softmax (ConSmax), a software-hardware co-design as an efficient Softmax alternative. ConSmax employs differentiable normalization parameters to remove the maximum searching and denominator summation in Softmax. It allows for massive parallelization while performing the critical tasks of Softmax. In addition, a scalable ConSmax hardware utilizing a bitwidth-split look-up table (LUT) can produce lossless non-linear operation and + arXiv:2403.05100v1 Announce Type: cross Abstract: The escalating threat of adversarial attacks on deep learning models, particularly in security-critical fields, has underscored the need for robust deep learning systems. Conventional robustness evaluations have relied on adversarial accuracy, which measures a model's performance under a specific perturbation intensity. However, this singular metric does not fully encapsulate the overall resilience of a model against varying degrees of perturbation. To address this gap, we propose a new metric termed adversarial hypervolume, assessing the robustness of deep learning models comprehensively over a range of perturbation intensities from a multi-objective optimization standpoint. This metric allows for an in-depth comparison of defense mechanisms and recognizes the trivial improvements in robustness afforded by less potent defensive strategies. Additionally, we adopt a novel training algorithm that enhances adversarial robustness uniformly -[^8]: 基于大型语言模型的建筑能源系统机器学习控制的可解释性研究 +[^8]: ARNN: 用于识别癫痫发作的多通道脑电图信号的注意力循环神经网络 - Large Language Model-Based Interpretable Machine Learning Control in Building Energy Systems + ARNN: Attentive Recurrent Neural Network for Multi-channel EEG Signals to Identify Epileptic Seizures - [https://arxiv.org/abs/2402.09584](https://arxiv.org/abs/2402.09584) + [https://arxiv.org/abs/2403.03276](https://arxiv.org/abs/2403.03276) - 本文研究了机器学习控制在建筑能源系统中的可解释性,通过将Shapley值和大型语言模型相结合,提高了机器学习控制模型的透明性和理解性。 + ARNN提出了一种注意力循环神经网络,用于处理多通道脑电图信号,具有线性复杂度和并行计算,结合注意力和LSTM gate的优势,并避免了它们的缺点。 - 机器学习控制在暖通空调系统中的潜力受限于其不透明的性质和推理机制,这对于用户和建模者来说是具有挑战性的,难以完全理解,最终导致对基于机器学习控制的决策缺乏信任。为了解决这个挑战,本文研究和探索了可解释机器学习(IML),它是机器学习的一个分支,可以增强模型和推理的透明性和理解性,以提高MLC及其在暖通空调系统中的工业应用的可信度。具体而言,我们开发了一个创新性的框架,将Shapley值的原则和大型语言模型(LLMs)的上下文学习特性相结合。而Shapley值在解剖ML模型中各种特征的贡献方面起到了重要作用,LLM则可以深入理解MLC中基于规则的部分;将它们结合起来,LLM进一步将这些洞见打包到一个 + 我们提出了一种注意力循环神经网络(ARNN),其沿着序列循环应用注意力层,并且具有与序列长度相关的线性复杂度。该模型在多通道脑电图信号上运行,而不是单通道信号,并利用并行计算。在该模型中,注意力层是一种计算单元,可以有效地应用自注意力机制和交叉注意力机制来计算一组广泛数量的状态向量和输入信号的递归函数。我们的架构在某种程度上受到了注意力层和长短期记忆(LSTM)单元的启发,并使用长短风格门,但通过多个阶段将这种典型单元扩展到多通道脑电图信号的并行化。它继承了注意力层和LSTM门的优势,同时避免了它们各自的缺点。我们通过对异质实验进行了广泛的模型有效性评估。 - arXiv:2402.09584v1 Announce Type: new Abstract: The potential of Machine Learning Control (MLC) in HVAC systems is hindered by its opaque nature and inference mechanisms, which is challenging for users and modelers to fully comprehend, ultimately leading to a lack of trust in MLC-based decision-making. To address this challenge, this paper investigates and explores Interpretable Machine Learning (IML), a branch of Machine Learning (ML) that enhances transparency and understanding of models and their inferences, to improve the credibility of MLC and its industrial application in HVAC systems. Specifically, we developed an innovative framework that combines the principles of Shapley values and the in-context learning feature of Large Language Models (LLMs). While the Shapley values are instrumental in dissecting the contributions of various features in ML models, LLM provides an in-depth understanding of rule-based parts in MLC; combining them, LLM further packages these insights into a + arXiv:2403.03276v1 Announce Type: cross Abstract: We proposed an Attentive Recurrent Neural Network (ARNN), which recurrently applies attention layers along a sequence and has linear complexity with respect to the sequence length. The proposed model operates on multi-channel EEG signals rather than single channel signals and leverages parallel computation. In this cell, the attention layer is a computational unit that efficiently applies self-attention and cross-attention mechanisms to compute a recurrent function over a wide number of state vectors and input signals. Our architecture is inspired in part by the attention layer and long short-term memory (LSTM) cells, and it uses long-short style gates, but it scales this typical cell up by several orders to parallelize for multi-channel EEG signals. It inherits the advantages of attention layers and LSTM gate while avoiding their respective drawbacks. We evaluated the model effectiveness through extensive experiments with heterogeneou -[^9]: 用大型语言模型推动建筑能源建模:探索和案例研究 +[^9]: 特征对齐:在预训练模型背景下通过代理思考高效主动学习 - Advancing Building Energy Modeling with Large Language Models: Exploration and Case Studies + Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models - [https://arxiv.org/abs/2402.09579](https://arxiv.org/abs/2402.09579) + [https://arxiv.org/abs/2403.01101](https://arxiv.org/abs/2403.01101) - 本文研究了将大型语言模型ChatGPT与EnergyPlus建筑能源建模软件融合的创新方法,并强调了大型语言模型在解决建筑能源建模挑战方面的潜力和多种应用。 + 通过代理进行特征对齐,以解决预先计算特征无法区分标记样本类别和避免通过代理模型选择样本时牺牲宝贵预训练信息的问题。 - 人工智能的快速发展促进了像ChatGPT这样的大型语言模型的出现,为专门的工程建模(尤其是基于物理的建筑能源建模)提供了潜在的应用。本文研究了将大型语言模型与建筑能源建模软件(具体为EnergyPlus)融合的创新方法。首先进行了文献综述,揭示了在工程建模中整合大型语言模型的增长趋势,但在建筑能源建模中的应用研究仍然有限。我们强调了大型语言模型在解决建筑能源建模挑战方面的潜力,并概述了潜在的应用,包括:1)模拟输入生成,2)模拟输出分析和可视化,3)进行错误分析,4)共模拟,5)模拟知识提取。 + 使用主动学习对预训练模型进行微调有望降低注释成本。然而,这种组合引入了显著的计算成本,尤其是随着预训练模型规模的增长。最近的研究提出了基于代理的主动学习,它预先计算特征以减少计算成本。然而,这种方法通常会在主动学习性能上造成重大损失,甚至可能超过计算成本节约。 - arXiv:2402.09579v1 Announce Type: cross Abstract: The rapid progression in artificial intelligence has facilitated the emergence of large language models like ChatGPT, offering potential applications extending into specialized engineering modeling, especially physics-based building energy modeling. This paper investigates the innovative integration of large language models with building energy modeling software, focusing specifically on the fusion of ChatGPT with EnergyPlus. A literature review is first conducted to reveal a growing trend of incorporating of large language models in engineering modeling, albeit limited research on their application in building energy modeling. We underscore the potential of large language models in addressing building energy modeling challenges and outline potential applications including 1) simulation input generation, 2) simulation output analysis and visualization, 3) conducting error analysis, 4) co-simulation, 5) simulation knowledge extraction a + arXiv:2403.01101v1 Announce Type: cross Abstract: Fine-tuning the pre-trained model with active learning holds promise for reducing annotation costs. However, this combination introduces significant computational costs, particularly with the growing scale of pre-trained models. Recent research has proposed proxy-based active learning, which pre-computes features to reduce computational costs. Yet, this approach often incurs a significant loss in active learning performance, which may even outweigh the computational cost savings. In this paper, we argue the performance drop stems not only from pre-computed features' inability to distinguish between categories of labeled samples, resulting in the selection of redundant samples but also from the tendency to compromise valuable pre-trained information when fine-tuning with samples selected through the proxy model. To address this issue, we propose a novel method called aligned selection via proxy to update pre-computed features while sele -[^10]: 评估和增强用于知识图谱上的对话推理的大型语言模型 +[^10]: 当你的AI欺骗你:在奖励学习中人类评估者部分可观测性的挑战 - Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs + When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning - [https://arxiv.org/abs/2312.11282](https://arxiv.org/abs/2312.11282) + [https://arxiv.org/abs/2402.17747](https://arxiv.org/abs/2402.17747) - 该论文评估了当前最先进的大型语言模型(GPT-4)在知识图谱上的对话推理能力,提出了一种基于KG推理的LLM基准代理(LLM-ARK),该代理利用全文环境提示来实现精确和适应性强的KG路径预测,并采用近端策略优化算法进行训练。 + RLHF在考虑部分观察性时可能导致策略欺骗性地夸大性能或过度辩护行为,我们提出了数学条件来解决这些问题,并警告不要盲目应用RLHF在部分可观测情况下。 - 大型语言模型(LLM)的发展得益于预训练技术的进展。通过手动设计的提示,这些模型展示了强大的推理能力。在这项工作中,我们评估了当前最先进的LLM(GPT-4)在知识图谱(KG)上的对话推理能力。然而,由于缺乏KG环境意识和开发有效的中间推理阶段优化机制的困难,LLM的性能受到限制。我们进一步引入了LLM-ARK,一个基于KG推理的LLM基准代理,旨在提供精确和适应性强的KG路径预测。LLM-ARK利用全文环境(FTE)提示来吸收每个推理步骤中的状态信息。我们将KG上的多跳推理挑战重新框定为顺序决策任务。利用近端策略优化(PPO)在线策略梯度强化学习算法,我们的模型... + 强化学习从人类反馈(RLHF)的过去分析假设人类完全观察到环境。当人类反馈仅基于部分观察时会发生什么?我们对两种失败情况进行了正式定义:欺骗和过度辩护。通过将人类建模为对轨迹信念的Boltzmann-理性,我们证明了RLHF保证会导致策略欺骗性地夸大其性能、为了留下印象而过度辩护或者两者兼而有之的条件。为了帮助解决这些问题,我们数学地刻画了环境部分可观测性如何转化为(缺乏)学到的回报函数中的模糊性。在某些情况下,考虑环境部分可观测性使得在理论上可能恢复回报函数和最优策略,而在其他情况下,存在不可减少的模糊性。我们警告不要盲目应用RLHF在部分可观测情况下。 - The development of large language models (LLMs) has been catalyzed by advancements in pre-training techniques. These models have demonstrated robust reasoning capabilities through manually designed prompts. In this work, we evaluate the conversational reasoning capabilities of the current state-of-the-art LLM (GPT-4) on knowledge graphs (KGs). However, the performance of LLMs is constrained due to a lack of KG environment awareness and the difficulties in developing effective optimization mechanisms for intermediary reasoning stages. We further introduce LLM-ARK, a LLM grounded KG reasoning agent designed to deliver precise and adaptable predictions on KG paths. LLM-ARK leverages Full Textual Environment (FTE) prompt to assimilate state information within each reasoning step. We reframe the challenge of multi-hop reasoning on the KG as a sequential decision-making task. Utilizing the Proximal Policy Optimization (PPO) online policy gradient reinforcement learning algorithm, our model i + arXiv:2402.17747v1 Announce Type: cross Abstract: Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deception and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is guaranteed to result in policies that deceptively inflate their performance, overjustify their behavior to make an impression, or both. To help address these issues, we mathematically characterize how partial observability of the environment translates into (lack of) ambiguity in the learned return function. In some cases, accounting for partial observability makes it theoretically possible to recover the return function and thus the optimal policy, while in other cases, there is irreducible ambiguity. We caution against blindly applying RLHF in partially observa -[^11]: ShaRP:用Shapley值解释排名 +[^11]: 在复杂模块化算术中解释理解的Transformer - ShaRP: Explaining Rankings with Shapley Values. (arXiv:2401.16744v1 [cs.AI]) + Interpreting Grokked Transformers in Complex Modular Arithmetic - [http://arxiv.org/abs/2401.16744](http://arxiv.org/abs/2401.16744) + [https://arxiv.org/abs/2402.16726](https://arxiv.org/abs/2402.16726) - ShaRP是一个基于Shapley值的框架,用于解释排名结果中各个特征的贡献。即使使用线性评分函数,特征的权重也不一定对应其Shapley值的贡献,而是取决于特征分布和评分特征之间的局部相互作用。 + 本研究通过可解释的逆向工程在复杂模块化算术中观察了Transformer内部电路学习过程,并发现减法在Transformer上造成了强烈的不对称性,乘法需要余弦偏置分量,多项式叠加了基本算术模式,但在挑战性情况下并不清晰,Grokking甚至可以在具有基本对称和交替表达式的高次公式中轻松发生。 - 在招聘、大学招生和贷款等重要领域的算法决策常常是基于排名的。由于这些决策对个人、组织和人群的影响,有必要了解它们:了解决策是否遵守法律,帮助个人提高他们的排名,并设计更好的排名程序。本文提出了ShaRP(Shapley for Rankings and Preferences),这是一个基于Shapley值的框架,用于解释特征对排名结果不同方面的贡献。使用ShaRP,我们展示了即使算法排名器使用的评分函数是已知的且是线性的,每个特征的权重也不一定对应其Shapley值的贡献。贡献取决于特征的分布以及评分特征之间微妙的局部相互作用。ShaRP基于量化输入影响框架,并可以计算贡献。 + Grokking一直是解开延迟泛化之谜的积极探索。在已解密模型中识别可解释的算法是理解其机制的暗示性线索。在这项工作中,除了最简单和广为研究的模块化加法外,我们通过可解释的逆向工程观察了通过Grokking在复杂模块化算术中学到的内部电路,突出显示了它们动力学上的重大差异:减法对Transformer产生强烈的不对称性;乘法在傅立叶域的所有频率上需要余弦偏置分量;多项式通常导致基本算术模式的叠加,但在挑战性情况下清晰的模式并不显现;即使在具有基本对称和交替表达式的高次公式中,Grokking也很容易发生。我们还引入了模块化算术的新颖进展度量;傅立叶频率 - Algorithmic decisions in critical domains such as hiring, college admissions, and lending are often based on rankings. Because of the impact these decisions have on individuals, organizations, and population groups, there is a need to understand them: to know whether the decisions are abiding by the law, to help individuals improve their rankings, and to design better ranking procedures. In this paper, we present ShaRP (Shapley for Rankings and Preferences), a framework that explains the contributions of features to different aspects of a ranked outcome, and is based on Shapley values. Using ShaRP, we show that even when the scoring function used by an algorithmic ranker is known and linear, the weight of each feature does not correspond to its Shapley value contribution. The contributions instead depend on the feature distributions, and on the subtle local interactions between the scoring features. ShaRP builds on the Quantitative Input Influence framework, and can compute the contri + arXiv:2402.16726v2 Announce Type: replace-cross Abstract: Grokking has been actively explored to reveal the mystery of delayed generalization. Identifying interpretable algorithms inside the grokked models is a suggestive hint to understanding its mechanism. In this work, beyond the simplest and well-studied modular addition, we observe the internal circuits learned through grokking in complex modular arithmetic via interpretable reverse engineering, which highlights the significant difference in their dynamics: subtraction poses a strong asymmetry on Transformer; multiplication requires cosine-biased components at all the frequencies in a Fourier domain; polynomials often result in the superposition of the patterns from elementary arithmetic, but clear patterns do not emerge in challenging cases; grokking can easily occur even in higher-degree formulas with basic symmetric and alternating expressions. We also introduce the novel progress measure for modular arithmetic; Fourier Freque -[^12]: 将符号推理整合到神经生成模型中的设计生成 +[^12]: 单词序列熵:走向自由形式医学问答应用及其不确定性估计 - Integrating Symbolic Reasoning into Neural Generative Models for Design Generation. (arXiv:2310.09383v1 [cs.AI]) + Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond - [http://arxiv.org/abs/2310.09383](http://arxiv.org/abs/2310.09383) + [https://arxiv.org/abs/2402.14259](https://arxiv.org/abs/2402.14259) - 这项研究将神经网络和符号推理结合起来,提出了Spatial Reasoning Integrated Generator (SPRING),用于设计生成。SPRING通过将神经网络和符号约束满足结合起来,能够生成满足用户规格和实用要求的设计。 + 本论文提出了一种新方法单词序列熵(WSE),用于在自由形式医学问答任务中量化答案的不确定性,相比其他基线方法表现更优秀。 - 设计生成需要将神经和符号推理紧密结合,因为良好的设计必须满足显式用户需求和隐含的美学、实用性和便利性规则。当前由神经网络驱动的自动化设计工具能够生成吸引人的设计,但不能满足用户的规格和实用要求。符号推理工具(如约束编程)不能感知图像中的低级视觉信息或捕捉到美学等微妙方面。我们引入了Spatial Reasoning Integrated Generator (SPRING)用于设计生成。SPRING在深度生成网络中嵌入了一个神经和符号整合的空间推理模块。空间推理模块通过一个循环神经网络预测并通过符号约束满足来决定要生成的对象的位置,以边界框的形式表示。将符号推理嵌入神经生成保证了SPRING的输出满足用户的规格和实用要求。 + 不确定性估计在确保安全关键的人工智能系统与人类互动的可靠性中发挥关键作用,尤其在医疗领域尤为重要。然而,在自由形式的医学问答任务中,尚未建立一种通用方法来量化答案的不确定性,其中无关的词汇和语序含有有限的语义信息可能是不确定性的主要来源,这是由于生成不平等的存在。本文提出了单词序列熵(WSE),该方法根据语义相关性在单词和序列级别上校准不确定性比例,在不确定性量化时更加强调关键词和更相关的序列。我们在5个自由形式医学问答数据集上,利用7种“现成的”大语言模型(LLMs)将WSE与6种基线方法进行比较,并展示了WSE在性能上的优越性。 - Design generation requires tight integration of neural and symbolic reasoning, as good design must meet explicit user needs and honor implicit rules for aesthetics, utility, and convenience. Current automated design tools driven by neural networks produce appealing designs, but cannot satisfy user specifications and utility requirements. Symbolic reasoning tools, such as constraint programming, cannot perceive low-level visual information in images or capture subtle aspects such as aesthetics. We introduce the Spatial Reasoning Integrated Generator (SPRING) for design generation. SPRING embeds a neural and symbolic integrated spatial reasoning module inside the deep generative network. The spatial reasoning module decides the locations of objects to be generated in the form of bounding boxes, which are predicted by a recurrent neural network and filtered by symbolic constraint satisfaction. Embedding symbolic reasoning into neural generation guarantees that the output of SPRING satisfi + arXiv:2402.14259v1 Announce Type: cross Abstract: Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems, particularly in the medical domain. However, a general method for quantifying the uncertainty of free-form answers has yet to be established in open-ended medical question-answering (QA) tasks, where irrelevant words and sequences with limited semantic information can be the primary source of uncertainty due to the presence of generative inequality. In this paper, we propose the Word-Sequence Entropy (WSE), which calibrates the uncertainty proportion at both the word and sequence levels according to the semantic relevance, with greater emphasis placed on keywords and more relevant sequences when performing uncertainty quantification. We compare WSE with 6 baseline methods on 5 free-form medical QA datasets, utilizing 7 "off-the-shelf" large language models (LLMs), and show that WSE exhibits superior performance on ac -[^13]: 故障注入和安全错误攻击用于提取嵌入式神经网络模型 +[^13]: 梯度下降引发了深度非线性网络权重与经验NTK之间的对齐 - Fault Injection and Safe-Error Attack for Extraction of Embedded Neural Network Models. (arXiv:2308.16703v1 [cs.CR]) + Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks - [http://arxiv.org/abs/2308.16703](http://arxiv.org/abs/2308.16703) + [https://arxiv.org/abs/2402.05271](https://arxiv.org/abs/2402.05271) - 本文介绍了故障注入和安全错误攻击用于提取嵌入式神经网络模型的方法,并阐述了对32位微控制器上的深度神经网络进行模型提取攻击的实验结果。 + 了解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。前人的研究表明,在训练过程中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这被称为神经特征分析(NFA)。本研究解释了这种相关性的出现,并发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 - 模型提取作为一种关键的安全威胁而出现,攻击向量利用了算法和实现方面的方法。攻击者的主要目标是尽可能多地窃取受保护的受害者模型的信息,以便他可以用替代模型来模仿它,即使只有有限的访问相似的训练数据。最近,物理攻击,如故障注入,已经显示出对嵌入式模型的完整性和机密性的令人担忧的效果。我们的重点是32位微控制器上的嵌入式深度神经网络模型,这是物联网中广泛使用的硬件平台系列,以及使用标准故障注入策略-安全错误攻击(SEA)来进行具有有限训练数据访问的模型提取攻击。由于攻击强烈依赖于输入查询,我们提出了一种黑盒方法来构建一个成功的攻击集。对于一个经典的卷积神经网络,我们成功地恢复了至少90%的 + 理解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。先前的研究已经确定,在一般结构的训练神经网络中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这个说法被称为神经特征分析(NFA)。然而,这些数量在训练过程中如何相关尚不清楚。在这项工作中,我们解释了这种相关性的出现。我们发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。我们证明了先前研究中引入的NFA是由隔离这种对齐的中心化NFA驱动的。我们还展示了在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 - Model extraction emerges as a critical security threat with attack vectors exploiting both algorithmic and implementation-based approaches. The main goal of an attacker is to steal as much information as possible about a protected victim model, so that he can mimic it with a substitute model, even with a limited access to similar training data. Recently, physical attacks such as fault injection have shown worrying efficiency against the integrity and confidentiality of embedded models. We focus on embedded deep neural network models on 32-bit microcontrollers, a widespread family of hardware platforms in IoT, and the use of a standard fault injection strategy - Safe Error Attack (SEA) - to perform a model extraction attack with an adversary having a limited access to training data. Since the attack strongly depends on the input queries, we propose a black-box approach to craft a successful attack set. For a classical convolutional neural network, we successfully recover at least 90% of + Understanding the mechanisms through which neural networks extract statistics from input-label pairs is one of the most important unsolved problems in supervised learning. Prior works have identified that the gram matrices of the weights in trained neural networks of general architectures are proportional to the average gradient outer product of the model, in a statement known as the Neural Feature Ansatz (NFA). However, the reason these quantities become correlated during training is poorly understood. In this work, we explain the emergence of this correlation. We identify that the NFA is equivalent to alignment between the left singular structure of the weight matrices and a significant component of the empirical neural tangent kernels associated with those weights. We establish that the NFA introduced in prior works is driven by a centered NFA that isolates this alignment. We show that the speed of NFA development can be predicted analytically at early training times in terms of sim -[^14]: 一个具有短期、情节和语义内存系统的机器 +[^14]: 提高多模态营销的上下文一致性:知识基础学习的有效性 - A Machine with Short-Term, Episodic, and Semantic Memory Systems. (arXiv:2212.02098v2 [cs.AI] UPDATED) + Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning - [http://arxiv.org/abs/2212.02098](http://arxiv.org/abs/2212.02098) + [https://arxiv.org/abs/2402.03607](https://arxiv.org/abs/2402.03607) - 本文研究了一个具有短期、情节和语义内存系统的机器代理模型,通过基于知识图谱的建模,在强化学习环境中实现了短期记忆的管理和存储,实验证明这种人类记忆系统结构的代理比没有该结构的代理表现更好。 + 本研究提出了一种将常识知识图谱与大型视觉语言模型相结合的框架,用于改进预测多模态营销活动效果的性能。该方法能够提供早期检测可能具有说服力的多模态活动并评估和增强营销理论的能力。 - 受认知科学理论中显性人类记忆系统的启发,我们建立了一个具有短期、情节和语义记忆系统的代理模型,每个记忆系统都用知识图谱建模。为了评估该系统并分析该代理的行为,我们设计并发布了我们自己的强化学习代理环境“房间”,在这个环境中,代理必须学习如何编码、存储和检索记忆,通过回答问题来最大化回报。我们证明了我们基于深度Q学习的代理成功学习了短期记忆是否应该被遗忘,还是应该存储在情节或语义记忆系统中。我们的实验表明,具有类人记忆系统的代理在环境中表现优于没有这种记忆结构的代理。 + 智能设备的普及使用户能够在线体验多模态信息。然而,大型语言模型(LLM)和视觉模型(LVM)仍然受到捕捉跨模态语义关系的整体意义的限制。缺乏明确的常识知识(例如,作为一个知识图谱),视觉语言模型(VLM)仅通过捕捉庞大的语料库中的高级模式来学习隐式表示,从而忽略了重要的上下文跨模态线索。在这项工作中,我们设计了一个框架,将显式的常识知识以知识图谱的形式与大型的VLM相结合,以提高下游任务的性能,即预测多模态营销活动的有效性。虽然营销应用提供了一个有说服力的指标来评估我们的方法,但我们的方法使得早期发现可能具有说服力的多模态活动成为可能,并评估和增强营销理论。 - Inspired by the cognitive science theory of the explicit human memory systems, we have modeled an agent with short-term, episodic, and semantic memory systems, each of which is modeled with a knowledge graph. To evaluate this system and analyze the behavior of this agent, we designed and released our own reinforcement learning agent environment, "the Room", where an agent has to learn how to encode, store, and retrieve memories to maximize its return by answering questions. We show that our deep Q-learning based agent successfully learns whether a short-term memory should be forgotten, or rather be stored in the episodic or semantic memory systems. Our experiments indicate that an agent with human-like memory systems can outperform an agent without this memory structure in the environment. + The prevalence of smart devices with the ability to capture moments in multiple modalities has enabled users to experience multimodal information online. However, large Language (LLMs) and Vision models (LVMs) are still limited in capturing holistic meaning with cross-modal semantic relationships. Without explicit, common sense knowledge (e.g., as a knowledge graph), Visual Language Models (VLMs) only learn implicit representations by capturing high-level patterns in vast corpora, missing essential contextual cross-modal cues. In this work, we design a framework to couple explicit commonsense knowledge in the form of knowledge graphs with large VLMs to improve the performance of a downstream task, predicting the effectiveness of multi-modal marketing campaigns. While the marketing application provides a compelling metric for assessing our methods, our approach enables the early detection of likely persuasive multi-modal campaigns and the assessment and augmentation of marketing theory. + +[^15]: TSIS: t-SMILES的补充算法用于基于片段的分子表示 + + TSIS: A Supplementary Algorithm to t-SMILES for Fragment-based Molecular Representation + + [https://arxiv.org/abs/2402.02164](https://arxiv.org/abs/2402.02164) + + 本研究引入了TSIS算法作为t-SMILES的补充,用于改进基于字符串的分子表示方法。实验证明,TSIS模型在处理语法中的长期依赖性方面表现优于其他模型。 + + + + 字符串基本的分子表示方法,如SMILES,在线性表示分子信息方面是事实上的标准。然而,必须使用配对符号和解析算法导致了长的语法依赖关系,使得即使是最先进的深度学习模型也难以准确理解语法和语义。尽管DeepSMILES和SELFIES已经解决了某些限制,但它们仍然在处理高级语法方面存在困难,使得一些字符串难以阅读。本研究引入了一个补充算法TSIS(TSID简化),用于t-SMILES家族。TSIS与另一个基于片段的线性解决方案SAFE进行了比较实验,结果表明SAFE在处理语法中的长期依赖性时存在挑战。TSIS继续使用t-SMILES中定义的树作为其基础数据结构,这使其与SAFE模型有所不同。TSIS模型的性能超过了SAFE模型,表明t-SMILES的树结构起到了重要作用。 + + String-based molecular representations, such as SMILES, are a de facto standard for linearly representing molecular information. However, the must be paired symbols and the parsing algorithm result in long grammatical dependencies, making it difficult for even state-of-the-art deep learning models to accurately comprehend the syntax and semantics. Although DeepSMILES and SELFIES have addressed certain limitations, they still struggle with advanced grammar, which makes some strings difficult to read. This study introduces a supplementary algorithm, TSIS (TSID Simplified), to t-SMILES family. Comparative experiments between TSIS and another fragment-based linear solution, SAFE, indicate that SAFE presents challenges in managing long-term dependencies in grammar. TSIS continues to use the tree defined in t-SMILES as its foundational data structure, which sets it apart from the SAFE model. The performance of TSIS models surpasses that of SAFE models, indicating that the tree structure of t + +[^16]: GPT4Battery: 一种基于LLM驱动的自适应锂离子电池健康状态估计框架 + + GPT4Battery: An LLM-driven Framework for Adaptive State of Health Estimation of Raw Li-ion Batteries + + [https://arxiv.org/abs/2402.00068](https://arxiv.org/abs/2402.00068) + + 本论文提出了一种基于LLM的框架,可以适应不同类型的锂离子电池,实现准确的健康状态估计。这项工作解决了生成训练数据的时间和资源成本高的挑战,并在实际应用中具有良好的泛化能力。 + + + + 健康状态(SOH)是评估电池退化水平的关键指标,无法直接测量但需要估计。准确的SOH估计提升了锂离子电池的检测、控制和反馈能力,实现安全高效的能源管理,并指导新一代电池的发展。尽管在数据驱动的SOH估计方面取得了显著进展,但为生成寿命长期训练数据而进行的耗时且资源密集的退化实验在建立一个能处理多样化锂离子电池(例如,跨化学、跨制造商和跨容量)的大型模型方面存在挑战。因此,本文利用大型语言模型(LLM)的强大泛化能力,提出了一种适用于不同电池的可调整SOH估计的新型框架。为了适应实际情景,其中未标记的数据按顺序以及分布变化的方式到达,所提出的模型在测试时进行了修改。 + + State of health (SOH) is a crucial indicator for assessing the degradation level of batteries that cannot be measured directly but requires estimation. Accurate SOH estimation enhances detection, control, and feedback for Li-ion batteries, allowing for safe and efficient energy management and guiding the development of new-generation batteries. Despite the significant progress in data-driven SOH estimation, the time and resource-consuming degradation experiments for generating lifelong training data pose a challenge in establishing one large model capable of handling diverse types of Li-ion batteries, e.g., cross-chemistry, cross-manufacturer, and cross-capacity. Hence, this paper utilizes the strong generalization capability of large language model (LLM) to proposes a novel framework for adaptable SOH estimation across diverse batteries. To match the real scenario where unlabeled data sequentially arrives in use with distribution shifts, the proposed model is modified by a test-time t + +[^17]: 大规模语言模型是零射击学习器 + + Large Language Models are Null-Shot Learners + + [https://arxiv.org/abs/2401.08273](https://arxiv.org/abs/2401.08273) + + 本文提出了零射击提示方法,通过利用大规模语言模型中的错误信息来指导模型进行任务,以提高任务表现。实验结果表明,在不同数据集上,包括阅读理解、算术推理和闭卷问答,模型性能有所提升。这些结果也显示出不同模型之间存在不同程度的错误信息。 + + + + 本文提出了零射击提示方法。零射击提示利用大规模语言模型(LLMs)中的错误信息,通过指示LLMs利用从“示例”部分中获取的信息(该信息在所提供的上下文中不存在)来完成任务。虽然减少错误信息对于LLMs的日常和重要用途至关重要,但我们提出在目前的环境中,这些LLMs仍然具有错误信息,实际上可以利用错误信息来提高与标准零射击提示相比的任务表现。对八个LLMs进行实验,结果显示在大多数八个数据集(包括阅读理解、算术推理和闭卷问答)中,性能有所提升。观察到的不一致性增加相对性能在LLMs之间的差异,也可能表示每个模型中存在不同程度的错误信息。 + + arXiv:2401.08273v2 Announce Type: replace-cross Abstract: This paper presents null-shot prompting. Null-shot prompting exploits hallucination in large language models (LLMs) by instructing LLMs to utilize information from the "Examples" section that never exists within the provided context to perform a task. While reducing hallucination is crucial and non-negligible for daily and critical uses of LLMs, we propose that in the current landscape in which these LLMs still hallucinate, it is possible, in fact, to exploit hallucination to increase performance in performing tasks compared to standard zero-shot prompting. Experiments with eight LLMs show improvements in performance across the majority of eight datasets, including reading comprehension, arithmetic reasoning, and closed-book question answering. The observed inconsistency in increased relative performance across the LLMs also potentially indicates a different degree of inherent hallucination in each model. These differences show + +[^18]: SupplyGraph: 使用图神经网络进行供应链规划的基准数据集 + + SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks. (arXiv:2401.15299v1 [cs.LG]) + + [http://arxiv.org/abs/2401.15299](http://arxiv.org/abs/2401.15299) + + SupplyGraph是一个基准数据集,用于使用图神经网络进行供应链规划。该数据集包含了来自孟加拉国一家领先快速消费品公司的实际数据,用于优化、预测和解决供应链问题。数据集中的时间数据作为节点特征,可用于销售预测、生产计划和故障识别。 + + + + 图神经网络(GNNs)在不同领域如运输、生物信息学、语言处理和计算机视觉中取得了重要进展。然而,在将GNNs应用于供应链网络方面,目前尚缺乏研究。供应链网络在结构上类似于图形,使其成为应用GNN方法的理想选择。这为优化、预测和解决供应链问题开辟了无限可能。然而,此方法的一个主要障碍在于缺乏真实世界的基准数据集以促进使用GNN来研究和解决供应链问题。为了解决这个问题,我们提供了一个来自孟加拉国一家领先的快速消费品公司的实际基准数据集,该数据集侧重于用于生产目的的供应链规划的时间任务。该数据集包括时间数据作为节点特征,以实现销售预测、生产计划和故障识别。 + + Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision. However, there is a noticeable absence of research on applying GNNs to supply chain networks. Supply chain networks are inherently graph-like in structure, making them prime candidates for applying GNN methodologies. This opens up a world of possibilities for optimizing, predicting, and solving even the most complex supply chain problems. A major setback in this approach lies in the absence of real-world benchmark datasets to facilitate the research and resolution of supply chain problems using GNNs. To address the issue, we present a real-world benchmark dataset for temporal tasks, obtained from one of the leading FMCG companies in Bangladesh, focusing on supply chain planning for production purposes. The dataset includes temporal data as node features to enable sales predictions, production planning, and the identification of fa + +[^19]: 大型语言模型的知识编辑全面研究 + + A Comprehensive Study of Knowledge Editing for Large Language Models. (arXiv:2401.01286v1 [cs.CL]) + + [http://arxiv.org/abs/2401.01286](http://arxiv.org/abs/2401.01286) + + 本研究全面研究了大型语言模型的知识编辑,旨在有效修改模型的行为,同时保持整体性能。 + + + + 大型语言模型(LLM)在理解和生成与人类交流紧密相似的文本方面展现出了非凡的能力。然而,其主要限制在于训练过程中的显著计算需求,这是由于其广泛的参数化造成的。这一挑战在于世界的动态性,需要频繁更新LLM以修正过时的信息或集成新知识,从而确保其持续的相关性。许多应用需要在训练后进行持续的模型调整,以解决缺陷或不良行为。近年来,对于LLM的知识编辑技术的兴趣越来越高,在特定领域内有效地修改LLM的行为,同时保持整体性能在各种输入中的表现。本文首先定义了知识编辑的目标和挑战,然后综述了现有的知识编辑方法和技术,并讨论了其应用和未来发展的方向。 + + Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the kno + +[^20]: 跨越生成性人工智能数据生命周期的隐私和版权挑战导航 + + Navigating Privacy and Copyright Challenges Across the Data Lifecycle of Generative AI. (arXiv:2311.18252v2 [cs.SE] UPDATED) + + [http://arxiv.org/abs/2311.18252](http://arxiv.org/abs/2311.18252) + + 这项研究探讨了生成性人工智能中数据隐私和版权保护的多方面挑战,并提出了将技术创新与伦理前瞻相结合的综合方法,旨在全面解决这些问题。 + + + + 生成性人工智能的出现标志着人工智能领域的重要里程碑,展示出在生成真实图像、文本和数据模式方面的卓越能力。然而,这些进展也带来了对数据隐私和版权侵犯的更高关注,主要是由于模型训练对大规模数据集的依赖。传统方法如差分隐私、机器遗忘和数据中毒只提供了对这些复杂问题的片面解决方案。本文深入探讨了数据生命周期内隐私和版权保护的多方面挑战。我们主张采用将技术创新与伦理前瞻相结合的综合方法,通过研究和制定在生命周期视角下的解决方案,全面解决这些问题。本研究旨在推动更广泛的讨论,并激励对生成性人工智能中数据隐私和版权完整性的协同努力。 + + The advent of Generative AI has marked a significant milestone in artificial intelligence, demonstrating remarkable capabilities in generating realistic images, texts, and data patterns. However, these advancements come with heightened concerns over data privacy and copyright infringement, primarily due to the reliance on vast datasets for model training. Traditional approaches like differential privacy, machine unlearning, and data poisoning only offer fragmented solutions to these complex issues. Our paper delves into the multifaceted challenges of privacy and copyright protection within the data lifecycle. We advocate for integrated approaches that combines technical innovation with ethical foresight, holistically addressing these concerns by investigating and devising solutions that are informed by the lifecycle perspective. This work aims to catalyze a broader discussion and inspire concerted efforts towards data privacy and copyright integrity in Generative AI. + +[^21]: Clover: 闭环可验证代码生成 + + Clover: Closed-Loop Verifiable Code Generation. (arXiv:2310.17807v1 [cs.SE]) + + [http://arxiv.org/abs/2310.17807](http://arxiv.org/abs/2310.17807) + + Clover是一种闭环可验证代码生成的范式,通过在代码、docstrings和形式注释之间进行一致性检查,确保生成的代码的正确性。 + + + + 在软件开发中,使用大型语言模型进行代码生成是一个快速增长的趋势。然而,如果没有有效的方法来确保生成的代码的正确性,这个趋势可能会导致许多不良结果。在本文中,我们提出了一个解决这个挑战的愿景:Clover范式,即闭环可验证代码生成,它将正确性检查简化为更可访问的一致性检查问题。在Clover的核心是一个检查器,它在代码、docstrings和形式注释之间进行一致性检查。该检查器使用了形式验证工具和大型语言模型的新颖集成实现。我们提供了理论分析来支持我们的论点,即Clover在一致性检查方面应该是有效的。我们还在一个由手工设计的数据集(CloverBench)上进行了实证调查,该数据集包含了注释的Dafny程序,难度水平与教科书相当。实验结果显示 + + The use of large language models for code generation is a rapidly growing trend in software development. However, without effective methods for ensuring the correctness of generated code, this trend could lead to any number of undesirable outcomes. In this paper, we lay out a vision for addressing this challenge: the Clover paradigm, short for Closed-Loop Verifiable Code Generation, which reduces correctness checking to the more accessible problem of consistency checking. At the core of Clover lies a checker that performs consistency checks among code, docstrings, and formal annotations. The checker is implemented using a novel integration of formal verification tools and large language models. We provide a theoretical analysis to support our thesis that Clover should be effective at consistency checking. We also empirically investigate its feasibility on a hand-designed dataset (CloverBench) featuring annotated Dafny programs at a textbook level of difficulty. Experimental results sho + +[^22]: Transformers学会了高阶优化方法用于上下文学习:一项与线性模型的研究 + + Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models. (arXiv:2310.17086v1 [cs.LG]) + + [http://arxiv.org/abs/2310.17086](http://arxiv.org/abs/2310.17086) + + Transformers学会了高阶优化方法,用于上下文学习,通过实现类似于迭代牛顿法的算法,而不是梯度下降。 + + + + Transformers在上下文学习中表现出色,但是它们是如何进行上下文学习仍然是一个谜。最近的研究表明,Transformers可能通过内部运行梯度下降,即一阶优化方法,来进行上下文学习。本文中,我们展示了Transformers学会了实现高阶优化方法来进行上下文学习。我们以上下文线性回归为重点,展示了Transformers学会了实现一个非常类似于迭代牛顿法的算法,而不是梯度下降。从实证上来看,我们展示了连续的Transformer层的预测与牛顿法的不同迭代非常接近,每个中间层大致计算了3次迭代。相比之下,需要指数级的梯度下降步骤才能匹配额外的Transformer层;这表明Transformers具有相当的收敛速率。 + + Transformers are remarkably good at in-context learning (ICL) -- learning from demonstrations without parameter updates -- but how they perform ICL remains a mystery. Recent work suggests that Transformers may learn in-context by internally running Gradient Descent, a first-order optimization method. In this paper, we instead demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL. Focusing on in-context linear regression, we show that Transformers learn to implement an algorithm very similar to Iterative Newton's Method, a higher-order optimization method, rather than Gradient Descent. Empirically, we show that predictions from successive Transformer layers closely match different iterations of Newton's Method linearly, with each middle layer roughly computing 3 iterations. In contrast, exponentially more Gradient Descent steps are needed to match an additional Transformers layer; this suggests that Transformers have an comparable rate of conv + +[^23]: 揭示隐藏的联系:用于视频对话的迭代跟踪和推理 + + Uncovering Hidden Connections: Iterative Tracking and Reasoning for Video-grounded Dialog. (arXiv:2310.07259v1 [cs.CV]) + + [http://arxiv.org/abs/2310.07259](http://arxiv.org/abs/2310.07259) + + 本文提出了一种迭代跟踪和推理策略,结合文本编码器和视觉编码器以生成准确的响应,解决了视频对话中逐步理解对话历史和吸收视频信息的挑战。 + + + + 与传统的视觉问答相比,视频对话需要对对话历史和视频内容进行深入理解,以生成准确的响应。尽管现有的方法取得了令人称赞的进展,但它们常常面临逐步理解复杂的对话历史和吸收视频信息的挑战。为了弥补这一差距,我们提出了一种迭代跟踪和推理策略,将文本编码器、视觉编码器和生成器相结合。我们的文本编码器以路径跟踪和聚合机制为核心,能够从对话历史中获取重要的细微差别,以解释所提出的问题。同时,我们的视觉编码器利用迭代推理网络,精心设计以从视频中提取和强调关键视觉标记,增强对视觉理解的深度。最后,我们使用预训练的GPT-模型将这些丰富的信息综合起来。 + + In contrast to conventional visual question answering, video-grounded dialog necessitates a profound understanding of both dialog history and video content for accurate response generation. Despite commendable strides made by existing methodologies, they often grapple with the challenges of incrementally understanding intricate dialog histories and assimilating video information. In response to this gap, we present an iterative tracking and reasoning strategy that amalgamates a textual encoder, a visual encoder, and a generator. At its core, our textual encoder is fortified with a path tracking and aggregation mechanism, adept at gleaning nuances from dialog history that are pivotal to deciphering the posed questions. Concurrently, our visual encoder harnesses an iterative reasoning network, meticulously crafted to distill and emphasize critical visual markers from videos, enhancing the depth of visual comprehension. Culminating this enriched information, we employ the pre-trained GPT- + +[^24]: 模型无关的图神经网络用于整合局部和全局信息的研究 + + A Model-Agnostic Graph Neural Network for Integrating Local and Global Information. (arXiv:2309.13459v1 [stat.ML]) + + [http://arxiv.org/abs/2309.13459](http://arxiv.org/abs/2309.13459) + + MaGNet是一种模型无关的图神经网络框架,能够顺序地整合不同顺序的信息,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。 + + + + 图神经网络(GNNs)在各种以图为重点的任务中取得了令人满意的性能。尽管取得了成功,但现有的GNN存在两个重要限制:由于黑盒特性,结果缺乏可解释性;无法学习不同顺序的表示。为了解决这些问题,我们提出了一种新的模型无关的图神经网络(MaGNet)框架,能够顺序地整合不同顺序的信息,从高阶邻居中提取知识,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。特别地,MaGNet由两个组件组成:图拓扑下复杂关系的潜在表示的估计模型和识别有影响力的节点、边和重要节点特征的解释模型。从理论上,我们通过经验Rademacher复杂度建立了MaGNet的泛化误差界,并展示了其强大的能力。 + + Graph Neural Networks (GNNs) have achieved promising performance in a variety of graph-focused tasks. Despite their success, existing GNNs suffer from two significant limitations: a lack of interpretability in results due to their black-box nature, and an inability to learn representations of varying orders. To tackle these issues, we propose a novel Model-agnostic Graph Neural Network (MaGNet) framework, which is able to sequentially integrate information of various orders, extract knowledge from high-order neighbors, and provide meaningful and interpretable results by identifying influential compact graph structures. In particular, MaGNet consists of two components: an estimation model for the latent representation of complex relationships under graph topology, and an interpretation model that identifies influential nodes, edges, and important node features. Theoretically, we establish the generalization error bound for MaGNet via empirical Rademacher complexity, and showcase its pow + +[^25]: 使用大型语言模型将患者与临床试验匹配 + + Matching Patients to Clinical Trials with Large Language Models. (arXiv:2307.15051v1 [cs.CL]) + + [http://arxiv.org/abs/2307.15051](http://arxiv.org/abs/2307.15051) + + 本研究调查了使用大型语言模型(LLMs)来帮助患者和转诊医生识别合适的临床试验的潜力,并引入了TrialGPT架构,该架构能够准确预测合格性并提供解释,实验证明其有效性。 + + + + 临床试验在推动药物研发和基于证据的医学方面非常重要,但患者招募常常受到限制。在这项工作中,我们调查了使用大型语言模型(LLMs)来帮助患者和转诊医生识别合适的临床试验的潜力。具体而言,我们引入了一种新颖的架构TrialGPT,采用LLMs预测基于标准的合格性,并提供详细的解释,并根据患者病历中的自由文本来对候选临床试验进行排名和排除。我们在三个公开可用的184名患者和18,238个注释的临床试验的队列上评估了TrialGPT。实验结果表明几个关键发现:第一,TrialGPT在标准级别的预测准确性上表现出很高的准确率,并提供准确的解释。第二,TrialGPT的综合试验级别评分与专家标注的合格性高度相关。第三,这些评分 + + Clinical trials are vital in advancing drug development and evidence-based medicine, but their success is often hindered by challenges in patient recruitment. In this work, we investigate the potential of large language models (LLMs) to assist individual patients and referral physicians in identifying suitable clinical trials from an extensive selection. Specifically, we introduce TrialGPT, a novel architecture employing LLMs to predict criterion-level eligibility with detailed explanations, which are then aggregated for ranking and excluding candidate clinical trials based on free-text patient notes. We evaluate TrialGPT on three publicly available cohorts of 184 patients and 18,238 annotated clinical trials. The experimental results demonstrate several key findings: First, TrialGPT achieves high criterion-level prediction accuracy with faithful explanations. Second, the aggregated trial-level TrialGPT scores are highly correlated with expert eligibility annotations. Third, these scor + +[^26]: 深度学习中遗忘现象的全面调查:超越连续学习 + + A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning. (arXiv:2307.09218v1 [cs.LG]) + + [http://arxiv.org/abs/2307.09218](http://arxiv.org/abs/2307.09218) + + 遗忘是深度学习中普遍存在的现象,不仅限于连续学习领域。解决遗忘问题面临多个挑战,包括平衡保留旧任务知识与快速学习新任务的挑战,管理任务干扰与冲突目标的挑战,以及防止隐私泄露等。遗忘不总是有害的,可以在某些情况下是有益且可取的,特别是在隐私保护场景中。 + + + + 遗忘指的是先前获取的信息或知识的丧失或恶化。尽管现有的关于遗忘的调查主要集中在连续学习方面,但在深度学习中,遗忘是一种普遍现象,可以在各种其他研究领域中观察到。遗忘在研究领域中表现出来,例如由于生成器漂移而在生成模型领域中表现出来,以及由于客户端之间存在异构数据分布而在联邦学习中表现出来。解决遗忘问题涉及到几个挑战,包括在快速学习新任务的同时平衡保留旧任务知识,管理任务干扰与冲突目标,以及防止隐私泄露等。此外,大多数现有的连续学习调查都默认认为遗忘总是有害的。相反,我们的调查认为遗忘是一把双刃剑,在某些情况下可以是有益且可取的,例如隐私保护场景。通过在更广泛的背景下探讨遗忘现象, + + Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context + +[^27]: 学习策略在企业流程资源分配中的应用 + + Learning policies for resource allocation in business processes. (arXiv:2304.09970v1 [cs.AI]) + + [http://arxiv.org/abs/2304.09970](http://arxiv.org/abs/2304.09970) + + 本文提出了两种基于学习的方法来进行企业流程资源分配,具有优于常见启发式方法的效果。 + + + + 资源分配是将资源分配到必须在运行时刻执行的业务流程活动中。虽然资源分配在制造等其他领域中已经得到深入研究,但在业务流程管理中却只存在少量的方法。现有方法不适用于大型企业流程的应用或是只针对单个实例进行资源分配的优化。本文提出了两种基于学习的方法来进行企业流程资源分配:一种基于深度强化学习的方法和一种基于评分的价值函数逼近方法。在代表典型业务流程结构的一组情景以及在代表现实业务流程的完整网络上,将两种方法与现有的启发式方法进行比较。结果表明,我们的学习方法在大多数情景中优于或与常见的启发式方法竞争力相当。 + + Resource allocation is the assignment of resources to activities that must be executed in a business process at a particular moment at run-time. While resource allocation is well-studied in other fields, such as manufacturing, there exist only a few methods in business process management. Existing methods are not suited for application in large business processes or focus on optimizing resource allocation for a single case rather than for all cases combined. To fill this gap, this paper proposes two learning-based methods for resource allocation in business processes: a deep reinforcement learning-based approach and a score-based value function approximation approach. The two methods are compared against existing heuristics in a set of scenarios that represent typical business process structures and on a complete network that represents a realistic business process. The results show that our learning-based methods outperform or are competitive with common heuristics in most scenarios a + +[^28]: 平滑的非平稳连续赌博机 + + Smooth Non-Stationary Bandits. (arXiv:2301.12366v2 [cs.LG] UPDATED) + + [http://arxiv.org/abs/2301.12366](http://arxiv.org/abs/2301.12366) + + 本文提出了一种非平稳两臂赌博机问题的策略,能够处理平滑变化,并证明了该策略在二次Lipschitz连续的情况下的遗憾为 $\tilde O(T^{3/5})$。 + + + + 在许多在线决策应用中,环境都是非平稳的,因此使用能够处理变化的赌博算法至关重要。大多数现有方法是为了保护非平滑变化而设计的,仅受到总变差或时间上的Lipschitz性的限制,其中它们保证$\tilde \Theta(T^{2/3})$的遗憾。然而,在实践中,环境经常以平稳的方式改变,因此这种算法可能会在这些设置中产生比必要更高的遗憾,并且不利用变化率的信息。我们研究了一个非平稳的两臂赌博机问题,假设臂的平均回报是一个$\beta$-H\''older函数,即它是$(\beta-1)$次Lipschitz连续可微分的,我们展示了一个策略,对于$\beta=2$,它的遗憾为$\tilde O(T^{3/5})$,从而首次在平滑和非平滑之间进行了区分。我们通过一个任意$\Omg(T^{(\beta+1)/(2\beta+1)})$的下界来补充这个结果,说明了这个问题的困难程度。 + + In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee $\tilde \Theta(T^{2/3})$ regret. However, in practice environments are often changing {\bf smoothly}, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the rate of change. We study a non-stationary two-armed bandits problem where we assume that an arm's mean reward is a $\beta$-H\"older function over (normalized) time, meaning it is $(\beta-1)$-times Lipschitz-continuously differentiable. We show the first separation between the smooth and non-smooth regimes by presenting a policy with $\tilde O(T^{3/5})$ regret for $\beta=2$. We complement this result by an $\Omg(T^{(\beta+1)/(2\beta+1)})$ lower bound for any int + +[^29]: 深度伪造音频的系统指纹识别:初始数据集与研究 + + System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation. (arXiv:2208.10489v3 [cs.SD] UPDATED) + + [http://arxiv.org/abs/2208.10489](http://arxiv.org/abs/2208.10489) + + 本文提出了深度伪造音频的系统指纹识别方法,并通过收集来自中国七个供应商的语音合成系统的数据集进行了初步研究。这项研究为进一步发展系统指纹识别方法提供了基础,并在模型版权保护和数字证据取证等实际场景中具有重要应用价值。 + + + + 深度语音合成模型的快速发展给社会带来了重大威胁,例如恶意内容操纵。因此,许多研究出现了,旨在检测所谓的深度伪造音频。然而,现有的工作都集中在对真实音频和伪造音频进行二元检测。在模型版权保护和数字证据取证等实际场景中,需要知道生成深度伪造音频的工具或模型来解释决策。这促使我们提出一个问题:我们能识别深度伪造音频的系统指纹吗?在本文中,我们提出了第一个系统指纹识别(SFR)的深度伪造音频数据集,并进行了初步研究。我们从使用最新的深度学习技术的七个中国供应商的语音合成系统中收集了该数据集,包括清晰和压缩集。此外,为了促进系统指纹识别方法的进一步发展,我们提供了外部参考音频,以便进行评估和对比实验。 + + The rapid progress of deep speech synthesis models has posed significant threats to society such as malicious content manipulation. Therefore, many studies have emerged to detect the so-called deepfake audio. However, existing works focus on the binary detection of real audio and fake audio. In real-world scenarios such as model copyright protection and digital evidence forensics, it is needed to know what tool or model generated the deepfake audio to explain the decision. This motivates us to ask: Can we recognize the system fingerprints of deepfake audio? In this paper, we present the first deepfake audio dataset for system fingerprint recognition (SFR) and conduct an initial investigation. We collected the dataset from the speech synthesis systems of seven Chinese vendors that use the latest state-of-the-art deep learning technologies, including both clean and compressed sets. In addition, to facilitate the further development of system fingerprint recognition methods, we provide ex diff --git a/cs.AI.xml b/cs.AI.xml index 43e00c35a..fcb69ef88 100644 --- a/cs.AI.xml +++ b/cs.AI.xml @@ -1,281 +1,581 @@ -Chat Arxiv cs.AIhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.AI大型语言模型具有卓越的能力,但也面临着安全和隐私攻击的威胁。本调查全面审查了LLM的安全和隐私挑战,涵盖了训练数据、用户和应用风险等方面,并对解决方法进行了回顾。https://rss.arxiv.org/abs/2402.00888<p> -大型语言模型的安全和隐私挑战:一项调查 +Chat Arxiv cs.AIhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.AI该研究引入了Multi-Grain Stereotype(MGS)数据集,探索了不同的机器学习方法用于建立陈规检测的基线,并提出了一系列基于MGS数据训练的英文文本的陈规分类器模型。https://arxiv.org/abs/2404.01768<p> +用于增强基于文本的陈规检测和基于探测的偏见评估的大规模语言模型审计 </p> <p> -Security and Privacy Challenges of Large Language Models: A Survey +Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based Bias Evaluation </p> <p> -https://rss.arxiv.org/abs/2402.00888 +https://arxiv.org/abs/2404.01768 </p> <p> -大型语言模型具有卓越的能力,但也面临着安全和隐私攻击的威胁。本调查全面审查了LLM的安全和隐私挑战,涵盖了训练数据、用户和应用风险等方面,并对解决方法进行了回顾。 +该研究引入了Multi-Grain Stereotype(MGS)数据集,探索了不同的机器学习方法用于建立陈规检测的基线,并提出了一系列基于MGS数据训练的英文文本的陈规分类器模型。 </p> <p> </p> <p> -大型语言模型(LLM)展示了非凡的能力,并在生成和总结文本、语言翻译和问答等多个领域做出了贡献。如今,LLM正在成为计算机语言处理任务中非常流行的工具,具备分析复杂语言模式并根据上下文提供相关和适当回答的能力。然而,尽管具有显著优势,这些模型也容易受到安全和隐私攻击的威胁,如越狱攻击、数据污染攻击和个人可识别信息泄露攻击。本调查全面审查了LLM的安全和隐私挑战,包括训练数据和用户方面的问题,以及在交通、教育和医疗等各个领域中应用带来的风险。我们评估了LLM的脆弱性程度,调查了出现的安全和隐私攻击,并对潜在的解决方法进行了回顾。 +大型语言模型(LLMs)的最新进展显著提高了它们在面向人类的人工智能(AI)应用中的影响力。然而,LLMs可能会复制甚至加剧自训练数据中的陈规输出。本研究介绍了Multi-Grain Stereotype(MGS)数据集,包括51,867个实例,涵盖性别、种族、职业、宗教和陈规文本,通过融合多个先前公开的陈规检测数据集收集而来。我们探索了旨在为陈规检测建立基线的不同机器学习方法,并微调了多种架构和模型大小的几个语言模型,本文展示了一系列基于MGS训练的英文文本的陈规分类器模型。为了了解我们的陈规检测器是否捕捉到与人类常识一致的相关特征,我们利用了各种可解释的AI工具, </p> <p> -Large Language Models (LLMs) have demonstrated extraordinary capabilities and contributed to multiple fields, such as generating and summarizing text, language translation, and question-answering. Nowadays, LLM is becoming a very popular tool in computerized language processing tasks, with the capability to analyze complicated linguistic patterns and provide relevant and appropriate responses depending on the context. While offering significant advantages, these models are also vulnerable to security and privacy attacks, such as jailbreaking attacks, data poisoning attacks, and Personally Identifiable Information (PII) leakage attacks. This survey provides a thorough review of the security and privacy challenges of LLMs for both training data and users, along with the application-based risks in various domains, such as transportation, education, and healthcare. We assess the extent of LLM vulnerabilities, investigate emerging security and privacy attacks for LLMs, and review the potent -</p>SHIELD引入了一种正则化技术,通过隐藏部分输入数据并评估预测结果的差异,从而改善了可解释人工智能模型的质量。https://arxiv.org/abs/2404.02611<p> -SHIELD: 一种用于可解释人工智能的正则化技术 +arXiv:2404.01768v1 Announce Type: cross Abstract: Recent advancements in Large Language Models (LLMs) have significantly increased their presence in human-facing Artificial Intelligence (AI) applications. However, LLMs could reproduce and even exacerbate stereotypical outputs from training data. This work introduces the Multi-Grain Stereotype (MGS) dataset, encompassing 51,867 instances across gender, race, profession, religion, and stereotypical text, collected by fusing multiple previously publicly available stereotype detection datasets. We explore different machine learning approaches aimed at establishing baselines for stereotype detection, and fine-tune several language models of various architectures and model sizes, presenting in this work a series of stereotypes classifier models for English text trained on MGS. To understand whether our stereotype detectors capture relevant features (aligning with human common sense) we utilise a variety of explanainable AI tools, including +</p>SugarcaneNet2024是通过优化加权平均集成LASSO正则化的预训练模型,在甘蔗病害分类中表现出色,具有快速准确的检测能力。https://arxiv.org/abs/2403.18870<p> +SugarcaneNet2024: LASSO正则化的预训练模型的优化加权平均集成方法用于甘蔗病害分类 </p> <p> -SHIELD: A regularization technique for eXplainable Artificial Intelligence +SugarcaneNet2024: An Optimized Weighted Average Ensemble Approach of LASSO Regularized Pre-trained Models for Sugarcane Disease Classification </p> <p> -https://arxiv.org/abs/2404.02611 +https://arxiv.org/abs/2403.18870 </p> <p> -SHIELD引入了一种正则化技术,通过隐藏部分输入数据并评估预测结果的差异,从而改善了可解释人工智能模型的质量。 +SugarcaneNet2024是通过优化加权平均集成LASSO正则化的预训练模型,在甘蔗病害分类中表现出色,具有快速准确的检测能力。 </p> <p> </p> <p> -随着人工智能系统在各个领域变得不可或缺,对可解释性的需求与日俱增。尽管科学界的努力主要集中在为模型获取更好的解释上,但重要的是不要忽视这个解释过程对改善训练的潜力。虽然现有的努力主要集中在为黑盒模型生成和评估解释上,但直接通过这些评估来增强模型仍存在关键差距。本文介绍了SHIELD(选择性隐藏输入评估学习动态),这是一种适用于可解释人工智能的正则化技术,旨在通过隐藏部分输入数据并评估预测结果的差异来改善模型质量。与传统方法相比,SHIELD正则化无缝集成到目标函数中,提高了模型的可解释性同时也改善了性能 +甘蔗作为世界糖业的关键作物,容易受多种病害侵害,这些病害对其产量和质量都有重大负面影响。为了有效管理和实施预防措施,必须及时准确地检测病害。本研究提出了一种名为SugarcaneNet2024的独特模型,通过叶片图像处理,能够优于先前方法自动快速检测甘蔗病害。我们提出的模型汇总了七个定制的、经过LASSO正则化的预训练模型的优化加权平均集成,特别是InceptionV3、InceptionResNetV2、DenseNet201、DenseNet169、Xception和ResNet152V2。最初,我们在这些预训练模型底部添加了三层更密集层,具有0.0001的LASSO正则化,三个30%的dropout层和三个启用renorm的批量归一化,以提高性能。 </p> <p> -arXiv:2404.02611v1 Announce Type: new Abstract: As Artificial Intelligence systems become integral across domains, the demand for explainability grows. While the effort by the scientific community is focused on obtaining a better explanation for the model, it is important not to ignore the potential of this explanation process to improve training as well. While existing efforts primarily focus on generating and evaluating explanations for black-box models, there remains a critical gap in directly enhancing models through these evaluations. This paper introduces SHIELD (Selective Hidden Input Evaluation for Learning Dynamics), a regularization technique for explainable artificial intelligence designed to improve model quality by concealing portions of input data and assessing the resulting discrepancy in predictions. In contrast to conventional approaches, SHIELD regularization seamlessly integrates into the objective function, enhancing model explainability while also improving perfor -</p>介绍了一种基于优化的提示注入攻击方法,JudgeDeceiver,针对LLM-as-a-Judge,通过自动化生成对抗序列实现了有针对性和高效的模型评估操控。https://arxiv.org/abs/2403.17710<p> -基于优化的对LLM评判系统的提示注入攻击 +arXiv:2403.18870v1 Announce Type: cross Abstract: Sugarcane, a key crop for the world's sugar industry, is prone to several diseases that have a substantial negative influence on both its yield and quality. To effectively manage and implement preventative initiatives, diseases must be detected promptly and accurately. In this study, we present a unique model called sugarcaneNet2024 that outperforms previous methods for automatically and quickly detecting sugarcane disease through leaf image processing. Our proposed model consolidates an optimized weighted average ensemble of seven customized and LASSO-regularized pre-trained models, particularly InceptionV3, InceptionResNetV2, DenseNet201, DenseNet169, Xception, and ResNet152V2. Initially, we added three more dense layers with 0.0001 LASSO regularization, three 30% dropout layers, and three batch normalizations with renorm enabled at the bottom of these pre-trained models to improve the performance. The accuracy of sugarcane leaf dise +</p>本研究针对Intrinsic Vision-Language Hallucination(IVL-Hallu)问题进行了深入分析,提出了几种新颖的IVL-Hallu任务,并将其分为四种类型,有助于揭示其产生的原因和反映。https://arxiv.org/abs/2403.11116<p> +博士论文:一个提示的视觉幻觉评估数据集 </p> <p> -Optimization-based Prompt Injection Attack to LLM-as-a-Judge +PhD: A Prompted Visual Hallucination Evaluation Dataset </p> <p> -https://arxiv.org/abs/2403.17710 +https://arxiv.org/abs/2403.11116 </p> <p> -介绍了一种基于优化的提示注入攻击方法,JudgeDeceiver,针对LLM-as-a-Judge,通过自动化生成对抗序列实现了有针对性和高效的模型评估操控。 +本研究针对Intrinsic Vision-Language Hallucination(IVL-Hallu)问题进行了深入分析,提出了几种新颖的IVL-Hallu任务,并将其分为四种类型,有助于揭示其产生的原因和反映。 </p> <p> </p> <p> -LLM-as-a-Judge 是一种可以使用大型语言模型(LLMs)评估文本信息的新颖解决方案。根据现有研究,LLMs在提供传统人类评估的引人注目替代方面表现出色。然而,这些系统针对提示注入攻击的鲁棒性仍然是一个未解决的问题。在这项工作中,我们引入了JudgeDeceiver,一种针对LLM-as-a-Judge量身定制的基于优化的提示注入攻击。我们的方法制定了一个精确的优化目标,用于攻击LLM-as-a-Judge的决策过程,并利用优化算法高效地自动化生成对抗序列,实现对模型评估的有针对性和有效的操作。与手工制作的提示注入攻击相比,我们的方法表现出卓越的功效,给基于LLM的判断系统当前的安全范式带来了重大挑战。 +大型语言模型(LLMs)的快速增长推动了大型视觉语言模型(LVLMs)的发展。在LLMs中普遍存在的幻觉挑战也出现在LVLMs中。然而,大部分现有研究主要集中在LVLM中的对象幻觉上,忽略了LVLM幻觉的多样化类型。本研究深入探讨了固有视觉语言幻觉(IVL-Hallu)问题,对导致幻觉的不同类型的IVL-Hallu进行了彻底分析。具体来说,我们提出了几个新颖的IVL-Hallu任务,并将它们分为四种类型:(a)对象幻觉,由于对象的误识别而产生,(b)属性幻觉,由于属性的误识别而引起,(c)多模态冲突幻觉,源自文本和视觉信息之间的矛盾,以及(d)反常识幻觉,由于对立之间的矛盾。 </p> <p> -arXiv:2403.17710v1 Announce Type: cross Abstract: LLM-as-a-Judge is a novel solution that can assess textual information with large language models (LLMs). Based on existing research studies, LLMs demonstrate remarkable performance in providing a compelling alternative to traditional human assessment. However, the robustness of these systems against prompt injection attacks remains an open question. In this work, we introduce JudgeDeceiver, a novel optimization-based prompt injection attack tailored to LLM-as-a-Judge. Our method formulates a precise optimization objective for attacking the decision-making process of LLM-as-a-Judge and utilizes an optimization algorithm to efficiently automate the generation of adversarial sequences, achieving targeted and effective manipulation of model evaluations. Compared to handcraft prompt injection attacks, our method demonstrates superior efficacy, posing a significant challenge to the current security paradigms of LLM-based judgment systems. T -</p>ThermoHands提出了一个新的基准ThermoHands,旨在解决热图中主观视角3D手部姿势估计的挑战,介绍了一个具有双transformer模块的定制基线方法TheFormer,表明热成像在恶劣条件下实现稳健的3D手部姿势估计的有效性。https://arxiv.org/abs/2403.09871<p> -ThermoHands:一种用于从主观视角热图中估计3D手部姿势的基准 +arXiv:2403.11116v1 Announce Type: cross Abstract: The rapid growth of Large Language Models (LLMs) has driven the development of Large Vision-Language Models (LVLMs). The challenge of hallucination, prevalent in LLMs, also emerges in LVLMs. However, most existing efforts mainly focus on object hallucination in LVLM, ignoring diverse types of LVLM hallucinations. In this study, we delve into the Intrinsic Vision-Language Hallucination (IVL-Hallu) issue, thoroughly analyzing different types of IVL-Hallu on their causes and reflections. Specifically, we propose several novel IVL-Hallu tasks and categorize them into four types: (a) object hallucination, which arises from the misidentification of objects, (b) attribute hallucination, which is caused by the misidentification of attributes, (c) multi-modal conflicting hallucination, which derives from the contradictions between textual and visual information, and (d) counter-common-sense hallucination, which owes to the contradictions betwee +</p>本文提出了一种LIGHTCODE轻量级神经编码方案,在具备解释性的基础上,在低信噪比区域实现了最先进的可靠性。https://arxiv.org/abs/2403.10751<p> +LIGHTCODE:具有反馈通道的光解析和神经编码 </p> <p> -ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image +LIGHTCODE: Light Analytical and Neural Codes for Channels with Feedback </p> <p> -https://arxiv.org/abs/2403.09871 +https://arxiv.org/abs/2403.10751 </p> <p> -ThermoHands提出了一个新的基准ThermoHands,旨在解决热图中主观视角3D手部姿势估计的挑战,介绍了一个具有双transformer模块的定制基线方法TheFormer,表明热成像在恶劣条件下实现稳健的3D手部姿势估计的有效性。 +本文提出了一种LIGHTCODE轻量级神经编码方案,在具备解释性的基础上,在低信噪比区域实现了最先进的可靠性。 </p> <p> </p> <p> -在这项工作中,我们提出了ThermoHands,这是一个针对基于热图的主观视角3D手部姿势估计的新基准,旨在克服诸如光照变化和遮挡(例如手部穿戴物)等挑战。该基准包括来自28名主体进行手-物体和手-虚拟交互的多样数据集,经过自动化过程准确标注了3D手部姿势。我们引入了一个定制的基线方法TheFormer,利用双transformer模块在热图中实现有效的主观视角3D手部姿势估计。我们的实验结果突显了TheFormer的领先性能,并确认了热成像在实现恶劣条件下稳健的3D手部姿势估计方面的有效性。 +通道反馈中可靠且高效的编码方案设计一直是通信理论中一项长期挑战。虽然深度学习技术取得了显著进展,神经编码往往面临计算成本高、缺乏可解释性以及在资源受限环境中的实用性有限等问题。本文旨在设计解释性强且更适用于通信系统的低复杂度编码方案。我们先进了解析编码和神经编码。首先,我们展示了POWERBLAST,一种受Schalkwijk-Kailath(SK)和Gallager-Nakiboglu(GN)方案启发的解析编码方案,在高信噪比(SNR)区域实现了明显的可靠性改进,胜过神经编码。接下来,为了增强低SNR区域的可靠性,我们提出了LIGHTCODE,一种轻量级神经编码,实现了最先进的可靠性。 </p> <p> -arXiv:2403.09871v1 Announce Type: cross Abstract: In this work, we present ThermoHands, a new benchmark for thermal image-based egocentric 3D hand pose estimation, aimed at overcoming challenges like varying lighting and obstructions (e.g., handwear). The benchmark includes a diverse dataset from 28 subjects performing hand-object and hand-virtual interactions, accurately annotated with 3D hand poses through an automated process. We introduce a bespoken baseline method, TheFormer, utilizing dual transformer modules for effective egocentric 3D hand pose estimation in thermal imagery. Our experimental results highlight TheFormer's leading performance and affirm thermal imaging's effectiveness in enabling robust 3D hand pose estimation in adverse conditions. -</p>CLCE方法结合了标签感知对比学习与交叉熵损失,通过协同利用难例挖掘提高了性能表现https://arxiv.org/abs/2402.14551<p> -CLCE:一种优化学习融合的改进交叉熵和对比学习方法 +arXiv:2403.10751v1 Announce Type: cross Abstract: The design of reliable and efficient codes for channels with feedback remains a longstanding challenge in communication theory. While significant improvements have been achieved by leveraging deep learning techniques, neural codes often suffer from high computational costs, a lack of interpretability, and limited practicality in resource-constrained settings. We focus on designing low-complexity coding schemes that are interpretable and more suitable for communication systems. We advance both analytical and neural codes. First, we demonstrate that POWERBLAST, an analytical coding scheme inspired by Schalkwijk-Kailath (SK) and Gallager-Nakiboglu (GN) schemes, achieves notable reliability improvements over both SK and GN schemes, outperforming neural codes in high signal-to-noise ratio (SNR) regions. Next, to enhance reliability in low-SNR regions, we propose LIGHTCODE, a lightweight neural code that achieves state-of-the-art reliability +</p>本文定义了规格过度拟合问题,即系统过度关注指定指标而损害了高级要求和任务性能。https://arxiv.org/abs/2403.08425<p> +人工智能规格过度拟合问题 </p> <p> -CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion +Specification Overfitting in Artificial Intelligence </p> <p> -https://arxiv.org/abs/2402.14551 +https://arxiv.org/abs/2403.08425 </p> <p> -CLCE方法结合了标签感知对比学习与交叉熵损失,通过协同利用难例挖掘提高了性能表现 +本文定义了规格过度拟合问题,即系统过度关注指定指标而损害了高级要求和任务性能。 </p> <p> </p> <p> -最先进的预训练图像模型主要采用两阶段方法:在大规模数据集上进行初始无监督预训练,然后使用交叉熵损失(CE)进行特定任务的微调。然而,已经证明CE可能会损害模型的泛化性和稳定性。为了解决这些问题,我们引入了一种名为CLCE的新方法,该方法将标签感知对比学习与CE相结合。我们的方法不仅保持了两种损失函数的优势,而且以协同方式利用难例挖掘来增强性能。 +机器学习(ML)和人工智能(AI)方法经常被批评存在固有的偏见,以及缺乏控制、问责和透明度,监管机构因此难以控制这种技术的潜在负面影响。高级要求,如公平性和鲁棒性,需要被形式化为具体的规格度量,而这些度量是捕捉基本要求的独立方面的不完美代理。鉴于不同指标之间可能存在的权衡及其对过度优化的脆弱性,将规格度量整合到系统开发过程中并不是一件简单的事情。本文定义了规格过度拟合,即系统过度侧重于指定的度量,从而损害了高级要求和任务性能。我们进行了大量文献调研,对研究人员如何提出、测量和优化规格进行了分类。 </p> <p> -arXiv:2402.14551v1 Announce Type: cross Abstract: State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been demonstrated that CE can compromise model generalization and stability. While recent works employing contrastive learning address some of these limitations by enhancing the quality of embeddings and producing better decision boundaries, they often overlook the importance of hard negative mining and rely on resource intensive and slow training using large sample batches. To counter these issues, we introduce a novel approach named CLCE, which integrates Label-Aware Contrastive Learning with CE. Our approach not only maintains the strengths of both loss functions but also leverages hard negative mining in a synergistic way to enhance performance. Experimental results demonstrate that CLCE significantly outperf -</p>通过使用音素表示,本文提出了一种新颖的解决方案来减缓高资源语言和低资源语言之间的性能差距,并通过实证研究和理论分析证明了其有效性。https://arxiv.org/abs/2402.14279<p> -使用音素表示减缓语言差异,实现稳健的多语言理解 +arXiv:2403.08425v1 Announce Type: new Abstract: Machine learning (ML) and artificial intelligence (AI) approaches are often criticized for their inherent bias and for their lack of control, accountability, and transparency. Consequently, regulatory bodies struggle with containing this technology's potential negative side effects. High-level requirements such as fairness and robustness need to be formalized into concrete specification metrics, imperfect proxies that capture isolated aspects of the underlying requirements. Given possible trade-offs between different metrics and their vulnerability to over-optimization, integrating specification metrics in system development processes is not trivial. This paper defines specification overfitting, a scenario where systems focus excessively on specified metrics to the detriment of high-level requirements and task performance. We present an extensive literature survey to categorize how researchers propose, measure, and optimize specification +</p>NeuPAN 是一种实时、高度准确、无地图、适用于各种机器人且对环境不变的机器人导航解决方案,最大的创新在于将原始点直接映射到学习到的多帧距离空间,并具有端到端模型学习的可解释性,从而实现了可证明的收敛。https://arxiv.org/abs/2403.06828<p> +NeuPAN:直接点机器人导航的端到端基于模型学习 </p> <p> -Mitigating the Linguistic Gap with Phonemic Representations for Robust Multilingual Language Understanding +NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning </p> <p> -https://arxiv.org/abs/2402.14279 +https://arxiv.org/abs/2403.06828 </p> <p> -通过使用音素表示,本文提出了一种新颖的解决方案来减缓高资源语言和低资源语言之间的性能差距,并通过实证研究和理论分析证明了其有效性。 +NeuPAN 是一种实时、高度准确、无地图、适用于各种机器人且对环境不变的机器人导航解决方案,最大的创新在于将原始点直接映射到学习到的多帧距离空间,并具有端到端模型学习的可解释性,从而实现了可证明的收敛。 </p> <p> </p> <p> -为了改善多语言理解,通常需要在训练阶段使用多种语言,依赖复杂的训练技术,并且在高资源语言和低资源语言之间存在显著的性能差距。我们假设语言之间的性能差距受到这些语言之间的语言差异的影响,并通过使用音素表示(具体来说,将音素作为输入标记输入到语言模型中,而不是子词)提供了一种新颖的解决方案,以实现稳健的多语言建模。我们通过三个跨语言任务的定量证据展示了音素表示的有效性,这进一步得到了对跨语言性能差距的理论分析的证明。 +在拥挤环境中对非全向机器人进行导航需要极其精确的感知和运动以避免碰撞。本文提出NeuPAN:一种实时、高度准确、无地图、适用于各种机器人,且对环境不变的机器人导航解决方案。NeuPAN采用紧耦合的感知-运动框架,与现有方法相比有两个关键创新:1)它直接将原始点映射到学习到的多帧距离空间,避免了从感知到控制的误差传播;2)从端到端基于模型学习的角度进行解释,实现了可证明的收敛。NeuPAN的关键在于利用插拔式(PnP)交替最小化传感器(PAN)网络解高维端到端数学模型,其中包含各种点级约束,使NeuPAN能够直接生成实时、端到端、物理可解释的运动。 </p> <p> -arXiv:2402.14279v1 Announce Type: cross Abstract: Approaches to improving multilingual language understanding often require multiple languages during the training phase, rely on complicated training techniques, and -- importantly -- struggle with significant performance gaps between high-resource and low-resource languages. We hypothesize that the performance gaps between languages are affected by linguistic gaps between those languages and provide a novel solution for robust multilingual language modeling by employing phonemic representations (specifically, using phonemes as input tokens to LMs rather than subwords). We present quantitative evidence from three cross-lingual tasks that demonstrate the effectiveness of phonemic representation, which is further justified by a theoretical analysis of the cross-lingual performance gap. -</p>ConSmax是一种硬件友好型Softmax替代方案,通过引入可学习参数,在不影响性能的情况下实现了对原Softmax关键任务的高效处理。https://arxiv.org/abs/2402.10930<p> -ConSmax: 具有可学习参数的硬件友好型Softmax替代方案 +arXiv:2403.06828v1 Announce Type: cross Abstract: Navigating a nonholonomic robot in a cluttered environment requires extremely accurate perception and locomotion for collision avoidance. This paper presents NeuPAN: a real-time, highly-accurate, map-free, robot-agnostic, and environment-invariant robot navigation solution. Leveraging a tightly-coupled perception-locomotion framework, NeuPAN has two key innovations compared to existing approaches: 1) it directly maps raw points to a learned multi-frame distance space, avoiding error propagation from perception to control; 2) it is interpretable from an end-to-end model-based learning perspective, enabling provable convergence. The crux of NeuPAN is to solve a high-dimensional end-to-end mathematical model with various point-level constraints using the plug-and-play (PnP) proximal alternating-minimization network (PAN) with neurons in the loop. This allows NeuPAN to generate real-time, end-to-end, physically-interpretable motions direct +</p>提出新指标对抗超体积来全面评估深度学习模型在多种扰动强度下的鲁棒性,并采用新型训练算法来提高对抗鲁棒性。https://arxiv.org/abs/2403.05100<p> +探索对抗界限:通过对抗超体积量化鲁棒性 </p> <p> -ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters +Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume </p> <p> -https://arxiv.org/abs/2402.10930 +https://arxiv.org/abs/2403.05100 </p> <p> -ConSmax是一种硬件友好型Softmax替代方案,通过引入可学习参数,在不影响性能的情况下实现了对原Softmax关键任务的高效处理。 +提出新指标对抗超体积来全面评估深度学习模型在多种扰动强度下的鲁棒性,并采用新型训练算法来提高对抗鲁棒性。 </p> <p> </p> <p> -自注意机制将基于transformer的大型语言模型(LLM)与卷积和循环神经网络区分开来。尽管性能有所提升,但由于自注意中广泛使用Softmax,在硅上实现实时LLM推断仍具挑战性。为了解决这一挑战,我们提出了Constant Softmax(ConSmax),这是一种高效的Softmax替代方案,采用可微的规范化参数来消除Softmax中的最大搜索和分母求和,实现了大规模并行化。 +在深度学习模型面临日益严重的对抗攻击威胁,特别是在安全关键领域,强调了对鲁棒深度学习系统的需求。传统的鲁棒性评估依赖于对抗准确性,该指标衡量模型在特定扰动强度下的性能。然而,这一单一指标并不能完全概括模型对不同程度扰动的整体韧性。为了填补这一空白,我们提出了一种新的指标,称为对抗超体积,从多目标优化的角度综合评估了深度学习模型在一系列扰动强度下的鲁棒性。该指标允许深入比较防御机制,并承认了较弱的防御策略所带来的鲁棒性改进。此外,我们采用了一种提高对抗鲁棒性均匀性的新型训练算法。 </p> <p> -arXiv:2402.10930v1 Announce Type: cross Abstract: The self-attention mechanism sets transformer-based large language model (LLM) apart from the convolutional and recurrent neural networks. Despite the performance improvement, achieving real-time LLM inference on silicon is challenging due to the extensively used Softmax in self-attention. Apart from the non-linearity, the low arithmetic intensity greatly reduces the processing parallelism, which becomes the bottleneck especially when dealing with a longer context. To address this challenge, we propose Constant Softmax (ConSmax), a software-hardware co-design as an efficient Softmax alternative. ConSmax employs differentiable normalization parameters to remove the maximum searching and denominator summation in Softmax. It allows for massive parallelization while performing the critical tasks of Softmax. In addition, a scalable ConSmax hardware utilizing a bitwidth-split look-up table (LUT) can produce lossless non-linear operation and -</p>本文研究了机器学习控制在建筑能源系统中的可解释性,通过将Shapley值和大型语言模型相结合,提高了机器学习控制模型的透明性和理解性。https://arxiv.org/abs/2402.09584<p> -基于大型语言模型的建筑能源系统机器学习控制的可解释性研究 +arXiv:2403.05100v1 Announce Type: cross Abstract: The escalating threat of adversarial attacks on deep learning models, particularly in security-critical fields, has underscored the need for robust deep learning systems. Conventional robustness evaluations have relied on adversarial accuracy, which measures a model's performance under a specific perturbation intensity. However, this singular metric does not fully encapsulate the overall resilience of a model against varying degrees of perturbation. To address this gap, we propose a new metric termed adversarial hypervolume, assessing the robustness of deep learning models comprehensively over a range of perturbation intensities from a multi-objective optimization standpoint. This metric allows for an in-depth comparison of defense mechanisms and recognizes the trivial improvements in robustness afforded by less potent defensive strategies. Additionally, we adopt a novel training algorithm that enhances adversarial robustness uniformly +</p>ARNN提出了一种注意力循环神经网络,用于处理多通道脑电图信号,具有线性复杂度和并行计算,结合注意力和LSTM gate的优势,并避免了它们的缺点。https://arxiv.org/abs/2403.03276<p> +ARNN: 用于识别癫痫发作的多通道脑电图信号的注意力循环神经网络 </p> <p> -Large Language Model-Based Interpretable Machine Learning Control in Building Energy Systems +ARNN: Attentive Recurrent Neural Network for Multi-channel EEG Signals to Identify Epileptic Seizures </p> <p> -https://arxiv.org/abs/2402.09584 +https://arxiv.org/abs/2403.03276 </p> <p> -本文研究了机器学习控制在建筑能源系统中的可解释性,通过将Shapley值和大型语言模型相结合,提高了机器学习控制模型的透明性和理解性。 +ARNN提出了一种注意力循环神经网络,用于处理多通道脑电图信号,具有线性复杂度和并行计算,结合注意力和LSTM gate的优势,并避免了它们的缺点。 </p> <p> </p> <p> -机器学习控制在暖通空调系统中的潜力受限于其不透明的性质和推理机制,这对于用户和建模者来说是具有挑战性的,难以完全理解,最终导致对基于机器学习控制的决策缺乏信任。为了解决这个挑战,本文研究和探索了可解释机器学习(IML),它是机器学习的一个分支,可以增强模型和推理的透明性和理解性,以提高MLC及其在暖通空调系统中的工业应用的可信度。具体而言,我们开发了一个创新性的框架,将Shapley值的原则和大型语言模型(LLMs)的上下文学习特性相结合。而Shapley值在解剖ML模型中各种特征的贡献方面起到了重要作用,LLM则可以深入理解MLC中基于规则的部分;将它们结合起来,LLM进一步将这些洞见打包到一个 +我们提出了一种注意力循环神经网络(ARNN),其沿着序列循环应用注意力层,并且具有与序列长度相关的线性复杂度。该模型在多通道脑电图信号上运行,而不是单通道信号,并利用并行计算。在该模型中,注意力层是一种计算单元,可以有效地应用自注意力机制和交叉注意力机制来计算一组广泛数量的状态向量和输入信号的递归函数。我们的架构在某种程度上受到了注意力层和长短期记忆(LSTM)单元的启发,并使用长短风格门,但通过多个阶段将这种典型单元扩展到多通道脑电图信号的并行化。它继承了注意力层和LSTM门的优势,同时避免了它们各自的缺点。我们通过对异质实验进行了广泛的模型有效性评估。 </p> <p> -arXiv:2402.09584v1 Announce Type: new Abstract: The potential of Machine Learning Control (MLC) in HVAC systems is hindered by its opaque nature and inference mechanisms, which is challenging for users and modelers to fully comprehend, ultimately leading to a lack of trust in MLC-based decision-making. To address this challenge, this paper investigates and explores Interpretable Machine Learning (IML), a branch of Machine Learning (ML) that enhances transparency and understanding of models and their inferences, to improve the credibility of MLC and its industrial application in HVAC systems. Specifically, we developed an innovative framework that combines the principles of Shapley values and the in-context learning feature of Large Language Models (LLMs). While the Shapley values are instrumental in dissecting the contributions of various features in ML models, LLM provides an in-depth understanding of rule-based parts in MLC; combining them, LLM further packages these insights into a -</p>本文研究了将大型语言模型ChatGPT与EnergyPlus建筑能源建模软件融合的创新方法,并强调了大型语言模型在解决建筑能源建模挑战方面的潜力和多种应用。https://arxiv.org/abs/2402.09579<p> -用大型语言模型推动建筑能源建模:探索和案例研究 +arXiv:2403.03276v1 Announce Type: cross Abstract: We proposed an Attentive Recurrent Neural Network (ARNN), which recurrently applies attention layers along a sequence and has linear complexity with respect to the sequence length. The proposed model operates on multi-channel EEG signals rather than single channel signals and leverages parallel computation. In this cell, the attention layer is a computational unit that efficiently applies self-attention and cross-attention mechanisms to compute a recurrent function over a wide number of state vectors and input signals. Our architecture is inspired in part by the attention layer and long short-term memory (LSTM) cells, and it uses long-short style gates, but it scales this typical cell up by several orders to parallelize for multi-channel EEG signals. It inherits the advantages of attention layers and LSTM gate while avoiding their respective drawbacks. We evaluated the model effectiveness through extensive experiments with heterogeneou +</p>通过代理进行特征对齐,以解决预先计算特征无法区分标记样本类别和避免通过代理模型选择样本时牺牲宝贵预训练信息的问题。https://arxiv.org/abs/2403.01101<p> +特征对齐:在预训练模型背景下通过代理思考高效主动学习 </p> <p> -Advancing Building Energy Modeling with Large Language Models: Exploration and Case Studies +Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models </p> <p> -https://arxiv.org/abs/2402.09579 +https://arxiv.org/abs/2403.01101 </p> <p> -本文研究了将大型语言模型ChatGPT与EnergyPlus建筑能源建模软件融合的创新方法,并强调了大型语言模型在解决建筑能源建模挑战方面的潜力和多种应用。 +通过代理进行特征对齐,以解决预先计算特征无法区分标记样本类别和避免通过代理模型选择样本时牺牲宝贵预训练信息的问题。 </p> <p> </p> <p> -人工智能的快速发展促进了像ChatGPT这样的大型语言模型的出现,为专门的工程建模(尤其是基于物理的建筑能源建模)提供了潜在的应用。本文研究了将大型语言模型与建筑能源建模软件(具体为EnergyPlus)融合的创新方法。首先进行了文献综述,揭示了在工程建模中整合大型语言模型的增长趋势,但在建筑能源建模中的应用研究仍然有限。我们强调了大型语言模型在解决建筑能源建模挑战方面的潜力,并概述了潜在的应用,包括:1)模拟输入生成,2)模拟输出分析和可视化,3)进行错误分析,4)共模拟,5)模拟知识提取。 +使用主动学习对预训练模型进行微调有望降低注释成本。然而,这种组合引入了显著的计算成本,尤其是随着预训练模型规模的增长。最近的研究提出了基于代理的主动学习,它预先计算特征以减少计算成本。然而,这种方法通常会在主动学习性能上造成重大损失,甚至可能超过计算成本节约。 </p> <p> -arXiv:2402.09579v1 Announce Type: cross Abstract: The rapid progression in artificial intelligence has facilitated the emergence of large language models like ChatGPT, offering potential applications extending into specialized engineering modeling, especially physics-based building energy modeling. This paper investigates the innovative integration of large language models with building energy modeling software, focusing specifically on the fusion of ChatGPT with EnergyPlus. A literature review is first conducted to reveal a growing trend of incorporating of large language models in engineering modeling, albeit limited research on their application in building energy modeling. We underscore the potential of large language models in addressing building energy modeling challenges and outline potential applications including 1) simulation input generation, 2) simulation output analysis and visualization, 3) conducting error analysis, 4) co-simulation, 5) simulation knowledge extraction a -</p>该论文评估了当前最先进的大型语言模型(GPT-4)在知识图谱上的对话推理能力,提出了一种基于KG推理的LLM基准代理(LLM-ARK),该代理利用全文环境提示来实现精确和适应性强的KG路径预测,并采用近端策略优化算法进行训练。https://arxiv.org/abs/2312.11282<p> -评估和增强用于知识图谱上的对话推理的大型语言模型 +arXiv:2403.01101v1 Announce Type: cross Abstract: Fine-tuning the pre-trained model with active learning holds promise for reducing annotation costs. However, this combination introduces significant computational costs, particularly with the growing scale of pre-trained models. Recent research has proposed proxy-based active learning, which pre-computes features to reduce computational costs. Yet, this approach often incurs a significant loss in active learning performance, which may even outweigh the computational cost savings. In this paper, we argue the performance drop stems not only from pre-computed features' inability to distinguish between categories of labeled samples, resulting in the selection of redundant samples but also from the tendency to compromise valuable pre-trained information when fine-tuning with samples selected through the proxy model. To address this issue, we propose a novel method called aligned selection via proxy to update pre-computed features while sele +</p>RLHF在考虑部分观察性时可能导致策略欺骗性地夸大性能或过度辩护行为,我们提出了数学条件来解决这些问题,并警告不要盲目应用RLHF在部分可观测情况下。https://arxiv.org/abs/2402.17747<p> +当你的AI欺骗你:在奖励学习中人类评估者部分可观测性的挑战 </p> <p> -Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs +When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning </p> <p> -https://arxiv.org/abs/2312.11282 +https://arxiv.org/abs/2402.17747 </p> <p> -该论文评估了当前最先进的大型语言模型(GPT-4)在知识图谱上的对话推理能力,提出了一种基于KG推理的LLM基准代理(LLM-ARK),该代理利用全文环境提示来实现精确和适应性强的KG路径预测,并采用近端策略优化算法进行训练。 +RLHF在考虑部分观察性时可能导致策略欺骗性地夸大性能或过度辩护行为,我们提出了数学条件来解决这些问题,并警告不要盲目应用RLHF在部分可观测情况下。 </p> <p> </p> <p> -大型语言模型(LLM)的发展得益于预训练技术的进展。通过手动设计的提示,这些模型展示了强大的推理能力。在这项工作中,我们评估了当前最先进的LLM(GPT-4)在知识图谱(KG)上的对话推理能力。然而,由于缺乏KG环境意识和开发有效的中间推理阶段优化机制的困难,LLM的性能受到限制。我们进一步引入了LLM-ARK,一个基于KG推理的LLM基准代理,旨在提供精确和适应性强的KG路径预测。LLM-ARK利用全文环境(FTE)提示来吸收每个推理步骤中的状态信息。我们将KG上的多跳推理挑战重新框定为顺序决策任务。利用近端策略优化(PPO)在线策略梯度强化学习算法,我们的模型... +强化学习从人类反馈(RLHF)的过去分析假设人类完全观察到环境。当人类反馈仅基于部分观察时会发生什么?我们对两种失败情况进行了正式定义:欺骗和过度辩护。通过将人类建模为对轨迹信念的Boltzmann-理性,我们证明了RLHF保证会导致策略欺骗性地夸大其性能、为了留下印象而过度辩护或者两者兼而有之的条件。为了帮助解决这些问题,我们数学地刻画了环境部分可观测性如何转化为(缺乏)学到的回报函数中的模糊性。在某些情况下,考虑环境部分可观测性使得在理论上可能恢复回报函数和最优策略,而在其他情况下,存在不可减少的模糊性。我们警告不要盲目应用RLHF在部分可观测情况下。 </p> <p> -The development of large language models (LLMs) has been catalyzed by advancements in pre-training techniques. These models have demonstrated robust reasoning capabilities through manually designed prompts. In this work, we evaluate the conversational reasoning capabilities of the current state-of-the-art LLM (GPT-4) on knowledge graphs (KGs). However, the performance of LLMs is constrained due to a lack of KG environment awareness and the difficulties in developing effective optimization mechanisms for intermediary reasoning stages. We further introduce LLM-ARK, a LLM grounded KG reasoning agent designed to deliver precise and adaptable predictions on KG paths. LLM-ARK leverages Full Textual Environment (FTE) prompt to assimilate state information within each reasoning step. We reframe the challenge of multi-hop reasoning on the KG as a sequential decision-making task. Utilizing the Proximal Policy Optimization (PPO) online policy gradient reinforcement learning algorithm, our model i -</p>ShaRP是一个基于Shapley值的框架,用于解释排名结果中各个特征的贡献。即使使用线性评分函数,特征的权重也不一定对应其Shapley值的贡献,而是取决于特征分布和评分特征之间的局部相互作用。http://arxiv.org/abs/2401.16744<p> -ShaRP:用Shapley值解释排名 +arXiv:2402.17747v1 Announce Type: cross Abstract: Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deception and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is guaranteed to result in policies that deceptively inflate their performance, overjustify their behavior to make an impression, or both. To help address these issues, we mathematically characterize how partial observability of the environment translates into (lack of) ambiguity in the learned return function. In some cases, accounting for partial observability makes it theoretically possible to recover the return function and thus the optimal policy, while in other cases, there is irreducible ambiguity. We caution against blindly applying RLHF in partially observa +</p>本研究通过可解释的逆向工程在复杂模块化算术中观察了Transformer内部电路学习过程,并发现减法在Transformer上造成了强烈的不对称性,乘法需要余弦偏置分量,多项式叠加了基本算术模式,但在挑战性情况下并不清晰,Grokking甚至可以在具有基本对称和交替表达式的高次公式中轻松发生。https://arxiv.org/abs/2402.16726<p> +在复杂模块化算术中解释理解的Transformer </p> <p> -ShaRP: Explaining Rankings with Shapley Values. (arXiv:2401.16744v1 [cs.AI]) +Interpreting Grokked Transformers in Complex Modular Arithmetic </p> <p> -http://arxiv.org/abs/2401.16744 +https://arxiv.org/abs/2402.16726 </p> <p> -ShaRP是一个基于Shapley值的框架,用于解释排名结果中各个特征的贡献。即使使用线性评分函数,特征的权重也不一定对应其Shapley值的贡献,而是取决于特征分布和评分特征之间的局部相互作用。 +本研究通过可解释的逆向工程在复杂模块化算术中观察了Transformer内部电路学习过程,并发现减法在Transformer上造成了强烈的不对称性,乘法需要余弦偏置分量,多项式叠加了基本算术模式,但在挑战性情况下并不清晰,Grokking甚至可以在具有基本对称和交替表达式的高次公式中轻松发生。 </p> <p> </p> <p> -在招聘、大学招生和贷款等重要领域的算法决策常常是基于排名的。由于这些决策对个人、组织和人群的影响,有必要了解它们:了解决策是否遵守法律,帮助个人提高他们的排名,并设计更好的排名程序。本文提出了ShaRP(Shapley for Rankings and Preferences),这是一个基于Shapley值的框架,用于解释特征对排名结果不同方面的贡献。使用ShaRP,我们展示了即使算法排名器使用的评分函数是已知的且是线性的,每个特征的权重也不一定对应其Shapley值的贡献。贡献取决于特征的分布以及评分特征之间微妙的局部相互作用。ShaRP基于量化输入影响框架,并可以计算贡献。 +Grokking一直是解开延迟泛化之谜的积极探索。在已解密模型中识别可解释的算法是理解其机制的暗示性线索。在这项工作中,除了最简单和广为研究的模块化加法外,我们通过可解释的逆向工程观察了通过Grokking在复杂模块化算术中学到的内部电路,突出显示了它们动力学上的重大差异:减法对Transformer产生强烈的不对称性;乘法在傅立叶域的所有频率上需要余弦偏置分量;多项式通常导致基本算术模式的叠加,但在挑战性情况下清晰的模式并不显现;即使在具有基本对称和交替表达式的高次公式中,Grokking也很容易发生。我们还引入了模块化算术的新颖进展度量;傅立叶频率 </p> <p> -Algorithmic decisions in critical domains such as hiring, college admissions, and lending are often based on rankings. Because of the impact these decisions have on individuals, organizations, and population groups, there is a need to understand them: to know whether the decisions are abiding by the law, to help individuals improve their rankings, and to design better ranking procedures. In this paper, we present ShaRP (Shapley for Rankings and Preferences), a framework that explains the contributions of features to different aspects of a ranked outcome, and is based on Shapley values. Using ShaRP, we show that even when the scoring function used by an algorithmic ranker is known and linear, the weight of each feature does not correspond to its Shapley value contribution. The contributions instead depend on the feature distributions, and on the subtle local interactions between the scoring features. ShaRP builds on the Quantitative Input Influence framework, and can compute the contri -</p>这项研究将神经网络和符号推理结合起来,提出了Spatial Reasoning Integrated Generator (SPRING),用于设计生成。SPRING通过将神经网络和符号约束满足结合起来,能够生成满足用户规格和实用要求的设计。http://arxiv.org/abs/2310.09383<p> -将符号推理整合到神经生成模型中的设计生成 +arXiv:2402.16726v2 Announce Type: replace-cross Abstract: Grokking has been actively explored to reveal the mystery of delayed generalization. Identifying interpretable algorithms inside the grokked models is a suggestive hint to understanding its mechanism. In this work, beyond the simplest and well-studied modular addition, we observe the internal circuits learned through grokking in complex modular arithmetic via interpretable reverse engineering, which highlights the significant difference in their dynamics: subtraction poses a strong asymmetry on Transformer; multiplication requires cosine-biased components at all the frequencies in a Fourier domain; polynomials often result in the superposition of the patterns from elementary arithmetic, but clear patterns do not emerge in challenging cases; grokking can easily occur even in higher-degree formulas with basic symmetric and alternating expressions. We also introduce the novel progress measure for modular arithmetic; Fourier Freque +</p>本论文提出了一种新方法单词序列熵(WSE),用于在自由形式医学问答任务中量化答案的不确定性,相比其他基线方法表现更优秀。https://arxiv.org/abs/2402.14259<p> +单词序列熵:走向自由形式医学问答应用及其不确定性估计 </p> <p> -Integrating Symbolic Reasoning into Neural Generative Models for Design Generation. (arXiv:2310.09383v1 [cs.AI]) +Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond </p> <p> -http://arxiv.org/abs/2310.09383 +https://arxiv.org/abs/2402.14259 </p> <p> -这项研究将神经网络和符号推理结合起来,提出了Spatial Reasoning Integrated Generator (SPRING),用于设计生成。SPRING通过将神经网络和符号约束满足结合起来,能够生成满足用户规格和实用要求的设计。 +本论文提出了一种新方法单词序列熵(WSE),用于在自由形式医学问答任务中量化答案的不确定性,相比其他基线方法表现更优秀。 </p> <p> </p> <p> -设计生成需要将神经和符号推理紧密结合,因为良好的设计必须满足显式用户需求和隐含的美学、实用性和便利性规则。当前由神经网络驱动的自动化设计工具能够生成吸引人的设计,但不能满足用户的规格和实用要求。符号推理工具(如约束编程)不能感知图像中的低级视觉信息或捕捉到美学等微妙方面。我们引入了Spatial Reasoning Integrated Generator (SPRING)用于设计生成。SPRING在深度生成网络中嵌入了一个神经和符号整合的空间推理模块。空间推理模块通过一个循环神经网络预测并通过符号约束满足来决定要生成的对象的位置,以边界框的形式表示。将符号推理嵌入神经生成保证了SPRING的输出满足用户的规格和实用要求。 +不确定性估计在确保安全关键的人工智能系统与人类互动的可靠性中发挥关键作用,尤其在医疗领域尤为重要。然而,在自由形式的医学问答任务中,尚未建立一种通用方法来量化答案的不确定性,其中无关的词汇和语序含有有限的语义信息可能是不确定性的主要来源,这是由于生成不平等的存在。本文提出了单词序列熵(WSE),该方法根据语义相关性在单词和序列级别上校准不确定性比例,在不确定性量化时更加强调关键词和更相关的序列。我们在5个自由形式医学问答数据集上,利用7种“现成的”大语言模型(LLMs)将WSE与6种基线方法进行比较,并展示了WSE在性能上的优越性。 </p> <p> -Design generation requires tight integration of neural and symbolic reasoning, as good design must meet explicit user needs and honor implicit rules for aesthetics, utility, and convenience. Current automated design tools driven by neural networks produce appealing designs, but cannot satisfy user specifications and utility requirements. Symbolic reasoning tools, such as constraint programming, cannot perceive low-level visual information in images or capture subtle aspects such as aesthetics. We introduce the Spatial Reasoning Integrated Generator (SPRING) for design generation. SPRING embeds a neural and symbolic integrated spatial reasoning module inside the deep generative network. The spatial reasoning module decides the locations of objects to be generated in the form of bounding boxes, which are predicted by a recurrent neural network and filtered by symbolic constraint satisfaction. Embedding symbolic reasoning into neural generation guarantees that the output of SPRING satisfi -</p>本文介绍了故障注入和安全错误攻击用于提取嵌入式神经网络模型的方法,并阐述了对32位微控制器上的深度神经网络进行模型提取攻击的实验结果。http://arxiv.org/abs/2308.16703<p> -故障注入和安全错误攻击用于提取嵌入式神经网络模型 +arXiv:2402.14259v1 Announce Type: cross Abstract: Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems, particularly in the medical domain. However, a general method for quantifying the uncertainty of free-form answers has yet to be established in open-ended medical question-answering (QA) tasks, where irrelevant words and sequences with limited semantic information can be the primary source of uncertainty due to the presence of generative inequality. In this paper, we propose the Word-Sequence Entropy (WSE), which calibrates the uncertainty proportion at both the word and sequence levels according to the semantic relevance, with greater emphasis placed on keywords and more relevant sequences when performing uncertainty quantification. We compare WSE with 6 baseline methods on 5 free-form medical QA datasets, utilizing 7 "off-the-shelf" large language models (LLMs), and show that WSE exhibits superior performance on ac +</p>了解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。前人的研究表明,在训练过程中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这被称为神经特征分析(NFA)。本研究解释了这种相关性的出现,并发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。在早期训练阶段,可以通过解析的方式预测NFA的发展速度。https://arxiv.org/abs/2402.05271<p> +梯度下降引发了深度非线性网络权重与经验NTK之间的对齐 </p> <p> -Fault Injection and Safe-Error Attack for Extraction of Embedded Neural Network Models. (arXiv:2308.16703v1 [cs.CR]) +Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks </p> <p> -http://arxiv.org/abs/2308.16703 +https://arxiv.org/abs/2402.05271 </p> <p> -本文介绍了故障注入和安全错误攻击用于提取嵌入式神经网络模型的方法,并阐述了对32位微控制器上的深度神经网络进行模型提取攻击的实验结果。 +了解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。前人的研究表明,在训练过程中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这被称为神经特征分析(NFA)。本研究解释了这种相关性的出现,并发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 </p> <p> </p> <p> -模型提取作为一种关键的安全威胁而出现,攻击向量利用了算法和实现方面的方法。攻击者的主要目标是尽可能多地窃取受保护的受害者模型的信息,以便他可以用替代模型来模仿它,即使只有有限的访问相似的训练数据。最近,物理攻击,如故障注入,已经显示出对嵌入式模型的完整性和机密性的令人担忧的效果。我们的重点是32位微控制器上的嵌入式深度神经网络模型,这是物联网中广泛使用的硬件平台系列,以及使用标准故障注入策略-安全错误攻击(SEA)来进行具有有限训练数据访问的模型提取攻击。由于攻击强烈依赖于输入查询,我们提出了一种黑盒方法来构建一个成功的攻击集。对于一个经典的卷积神经网络,我们成功地恢复了至少90%的 +理解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。先前的研究已经确定,在一般结构的训练神经网络中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这个说法被称为神经特征分析(NFA)。然而,这些数量在训练过程中如何相关尚不清楚。在这项工作中,我们解释了这种相关性的出现。我们发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。我们证明了先前研究中引入的NFA是由隔离这种对齐的中心化NFA驱动的。我们还展示了在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 </p> <p> -Model extraction emerges as a critical security threat with attack vectors exploiting both algorithmic and implementation-based approaches. The main goal of an attacker is to steal as much information as possible about a protected victim model, so that he can mimic it with a substitute model, even with a limited access to similar training data. Recently, physical attacks such as fault injection have shown worrying efficiency against the integrity and confidentiality of embedded models. We focus on embedded deep neural network models on 32-bit microcontrollers, a widespread family of hardware platforms in IoT, and the use of a standard fault injection strategy - Safe Error Attack (SEA) - to perform a model extraction attack with an adversary having a limited access to training data. Since the attack strongly depends on the input queries, we propose a black-box approach to craft a successful attack set. For a classical convolutional neural network, we successfully recover at least 90% of -</p>本文研究了一个具有短期、情节和语义内存系统的机器代理模型,通过基于知识图谱的建模,在强化学习环境中实现了短期记忆的管理和存储,实验证明这种人类记忆系统结构的代理比没有该结构的代理表现更好。http://arxiv.org/abs/2212.02098<p> -一个具有短期、情节和语义内存系统的机器 +Understanding the mechanisms through which neural networks extract statistics from input-label pairs is one of the most important unsolved problems in supervised learning. Prior works have identified that the gram matrices of the weights in trained neural networks of general architectures are proportional to the average gradient outer product of the model, in a statement known as the Neural Feature Ansatz (NFA). However, the reason these quantities become correlated during training is poorly understood. In this work, we explain the emergence of this correlation. We identify that the NFA is equivalent to alignment between the left singular structure of the weight matrices and a significant component of the empirical neural tangent kernels associated with those weights. We establish that the NFA introduced in prior works is driven by a centered NFA that isolates this alignment. We show that the speed of NFA development can be predicted analytically at early training times in terms of sim +</p>本研究提出了一种将常识知识图谱与大型视觉语言模型相结合的框架,用于改进预测多模态营销活动效果的性能。该方法能够提供早期检测可能具有说服力的多模态活动并评估和增强营销理论的能力。https://arxiv.org/abs/2402.03607<p> +提高多模态营销的上下文一致性:知识基础学习的有效性 </p> <p> -A Machine with Short-Term, Episodic, and Semantic Memory Systems. (arXiv:2212.02098v2 [cs.AI] UPDATED) +Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning </p> <p> -http://arxiv.org/abs/2212.02098 +https://arxiv.org/abs/2402.03607 </p> <p> -本文研究了一个具有短期、情节和语义内存系统的机器代理模型,通过基于知识图谱的建模,在强化学习环境中实现了短期记忆的管理和存储,实验证明这种人类记忆系统结构的代理比没有该结构的代理表现更好。 +本研究提出了一种将常识知识图谱与大型视觉语言模型相结合的框架,用于改进预测多模态营销活动效果的性能。该方法能够提供早期检测可能具有说服力的多模态活动并评估和增强营销理论的能力。 </p> <p> </p> <p> -受认知科学理论中显性人类记忆系统的启发,我们建立了一个具有短期、情节和语义记忆系统的代理模型,每个记忆系统都用知识图谱建模。为了评估该系统并分析该代理的行为,我们设计并发布了我们自己的强化学习代理环境“房间”,在这个环境中,代理必须学习如何编码、存储和检索记忆,通过回答问题来最大化回报。我们证明了我们基于深度Q学习的代理成功学习了短期记忆是否应该被遗忘,还是应该存储在情节或语义记忆系统中。我们的实验表明,具有类人记忆系统的代理在环境中表现优于没有这种记忆结构的代理。 +智能设备的普及使用户能够在线体验多模态信息。然而,大型语言模型(LLM)和视觉模型(LVM)仍然受到捕捉跨模态语义关系的整体意义的限制。缺乏明确的常识知识(例如,作为一个知识图谱),视觉语言模型(VLM)仅通过捕捉庞大的语料库中的高级模式来学习隐式表示,从而忽略了重要的上下文跨模态线索。在这项工作中,我们设计了一个框架,将显式的常识知识以知识图谱的形式与大型的VLM相结合,以提高下游任务的性能,即预测多模态营销活动的有效性。虽然营销应用提供了一个有说服力的指标来评估我们的方法,但我们的方法使得早期发现可能具有说服力的多模态活动成为可能,并评估和增强营销理论。 </p> <p> -Inspired by the cognitive science theory of the explicit human memory systems, we have modeled an agent with short-term, episodic, and semantic memory systems, each of which is modeled with a knowledge graph. To evaluate this system and analyze the behavior of this agent, we designed and released our own reinforcement learning agent environment, "the Room", where an agent has to learn how to encode, store, and retrieve memories to maximize its return by answering questions. We show that our deep Q-learning based agent successfully learns whether a short-term memory should be forgotten, or rather be stored in the episodic or semantic memory systems. Our experiments indicate that an agent with human-like memory systems can outperform an agent without this memory structure in the environment. +The prevalence of smart devices with the ability to capture moments in multiple modalities has enabled users to experience multimodal information online. However, large Language (LLMs) and Vision models (LVMs) are still limited in capturing holistic meaning with cross-modal semantic relationships. Without explicit, common sense knowledge (e.g., as a knowledge graph), Visual Language Models (VLMs) only learn implicit representations by capturing high-level patterns in vast corpora, missing essential contextual cross-modal cues. In this work, we design a framework to couple explicit commonsense knowledge in the form of knowledge graphs with large VLMs to improve the performance of a downstream task, predicting the effectiveness of multi-modal marketing campaigns. While the marketing application provides a compelling metric for assessing our methods, our approach enables the early detection of likely persuasive multi-modal campaigns and the assessment and augmentation of marketing theory. +</p>本研究引入了TSIS算法作为t-SMILES的补充,用于改进基于字符串的分子表示方法。实验证明,TSIS模型在处理语法中的长期依赖性方面表现优于其他模型。https://arxiv.org/abs/2402.02164<p> +TSIS: t-SMILES的补充算法用于基于片段的分子表示 +</p> +<p> +TSIS: A Supplementary Algorithm to t-SMILES for Fragment-based Molecular Representation +</p> +<p> +https://arxiv.org/abs/2402.02164 +</p> +<p> +本研究引入了TSIS算法作为t-SMILES的补充,用于改进基于字符串的分子表示方法。实验证明,TSIS模型在处理语法中的长期依赖性方面表现优于其他模型。 +</p> +<p> + +</p> +<p> +字符串基本的分子表示方法,如SMILES,在线性表示分子信息方面是事实上的标准。然而,必须使用配对符号和解析算法导致了长的语法依赖关系,使得即使是最先进的深度学习模型也难以准确理解语法和语义。尽管DeepSMILES和SELFIES已经解决了某些限制,但它们仍然在处理高级语法方面存在困难,使得一些字符串难以阅读。本研究引入了一个补充算法TSIS(TSID简化),用于t-SMILES家族。TSIS与另一个基于片段的线性解决方案SAFE进行了比较实验,结果表明SAFE在处理语法中的长期依赖性时存在挑战。TSIS继续使用t-SMILES中定义的树作为其基础数据结构,这使其与SAFE模型有所不同。TSIS模型的性能超过了SAFE模型,表明t-SMILES的树结构起到了重要作用。 +</p> +<p> +String-based molecular representations, such as SMILES, are a de facto standard for linearly representing molecular information. However, the must be paired symbols and the parsing algorithm result in long grammatical dependencies, making it difficult for even state-of-the-art deep learning models to accurately comprehend the syntax and semantics. Although DeepSMILES and SELFIES have addressed certain limitations, they still struggle with advanced grammar, which makes some strings difficult to read. This study introduces a supplementary algorithm, TSIS (TSID Simplified), to t-SMILES family. Comparative experiments between TSIS and another fragment-based linear solution, SAFE, indicate that SAFE presents challenges in managing long-term dependencies in grammar. TSIS continues to use the tree defined in t-SMILES as its foundational data structure, which sets it apart from the SAFE model. The performance of TSIS models surpasses that of SAFE models, indicating that the tree structure of t +</p>本论文提出了一种基于LLM的框架,可以适应不同类型的锂离子电池,实现准确的健康状态估计。这项工作解决了生成训练数据的时间和资源成本高的挑战,并在实际应用中具有良好的泛化能力。https://arxiv.org/abs/2402.00068<p> +GPT4Battery: 一种基于LLM驱动的自适应锂离子电池健康状态估计框架 +</p> +<p> +GPT4Battery: An LLM-driven Framework for Adaptive State of Health Estimation of Raw Li-ion Batteries +</p> +<p> +https://arxiv.org/abs/2402.00068 +</p> +<p> +本论文提出了一种基于LLM的框架,可以适应不同类型的锂离子电池,实现准确的健康状态估计。这项工作解决了生成训练数据的时间和资源成本高的挑战,并在实际应用中具有良好的泛化能力。 +</p> +<p> + +</p> +<p> +健康状态(SOH)是评估电池退化水平的关键指标,无法直接测量但需要估计。准确的SOH估计提升了锂离子电池的检测、控制和反馈能力,实现安全高效的能源管理,并指导新一代电池的发展。尽管在数据驱动的SOH估计方面取得了显著进展,但为生成寿命长期训练数据而进行的耗时且资源密集的退化实验在建立一个能处理多样化锂离子电池(例如,跨化学、跨制造商和跨容量)的大型模型方面存在挑战。因此,本文利用大型语言模型(LLM)的强大泛化能力,提出了一种适用于不同电池的可调整SOH估计的新型框架。为了适应实际情景,其中未标记的数据按顺序以及分布变化的方式到达,所提出的模型在测试时进行了修改。 +</p> +<p> +State of health (SOH) is a crucial indicator for assessing the degradation level of batteries that cannot be measured directly but requires estimation. Accurate SOH estimation enhances detection, control, and feedback for Li-ion batteries, allowing for safe and efficient energy management and guiding the development of new-generation batteries. Despite the significant progress in data-driven SOH estimation, the time and resource-consuming degradation experiments for generating lifelong training data pose a challenge in establishing one large model capable of handling diverse types of Li-ion batteries, e.g., cross-chemistry, cross-manufacturer, and cross-capacity. Hence, this paper utilizes the strong generalization capability of large language model (LLM) to proposes a novel framework for adaptable SOH estimation across diverse batteries. To match the real scenario where unlabeled data sequentially arrives in use with distribution shifts, the proposed model is modified by a test-time t +</p>本文提出了零射击提示方法,通过利用大规模语言模型中的错误信息来指导模型进行任务,以提高任务表现。实验结果表明,在不同数据集上,包括阅读理解、算术推理和闭卷问答,模型性能有所提升。这些结果也显示出不同模型之间存在不同程度的错误信息。https://arxiv.org/abs/2401.08273<p> +大规模语言模型是零射击学习器 +</p> +<p> +Large Language Models are Null-Shot Learners +</p> +<p> +https://arxiv.org/abs/2401.08273 +</p> +<p> +本文提出了零射击提示方法,通过利用大规模语言模型中的错误信息来指导模型进行任务,以提高任务表现。实验结果表明,在不同数据集上,包括阅读理解、算术推理和闭卷问答,模型性能有所提升。这些结果也显示出不同模型之间存在不同程度的错误信息。 +</p> +<p> + +</p> +<p> +本文提出了零射击提示方法。零射击提示利用大规模语言模型(LLMs)中的错误信息,通过指示LLMs利用从“示例”部分中获取的信息(该信息在所提供的上下文中不存在)来完成任务。虽然减少错误信息对于LLMs的日常和重要用途至关重要,但我们提出在目前的环境中,这些LLMs仍然具有错误信息,实际上可以利用错误信息来提高与标准零射击提示相比的任务表现。对八个LLMs进行实验,结果显示在大多数八个数据集(包括阅读理解、算术推理和闭卷问答)中,性能有所提升。观察到的不一致性增加相对性能在LLMs之间的差异,也可能表示每个模型中存在不同程度的错误信息。 +</p> +<p> +arXiv:2401.08273v2 Announce Type: replace-cross Abstract: This paper presents null-shot prompting. Null-shot prompting exploits hallucination in large language models (LLMs) by instructing LLMs to utilize information from the "Examples" section that never exists within the provided context to perform a task. While reducing hallucination is crucial and non-negligible for daily and critical uses of LLMs, we propose that in the current landscape in which these LLMs still hallucinate, it is possible, in fact, to exploit hallucination to increase performance in performing tasks compared to standard zero-shot prompting. Experiments with eight LLMs show improvements in performance across the majority of eight datasets, including reading comprehension, arithmetic reasoning, and closed-book question answering. The observed inconsistency in increased relative performance across the LLMs also potentially indicates a different degree of inherent hallucination in each model. These differences show +</p>SupplyGraph是一个基准数据集,用于使用图神经网络进行供应链规划。该数据集包含了来自孟加拉国一家领先快速消费品公司的实际数据,用于优化、预测和解决供应链问题。数据集中的时间数据作为节点特征,可用于销售预测、生产计划和故障识别。http://arxiv.org/abs/2401.15299<p> +SupplyGraph: 使用图神经网络进行供应链规划的基准数据集 +</p> +<p> +SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks. (arXiv:2401.15299v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2401.15299 +</p> +<p> +SupplyGraph是一个基准数据集,用于使用图神经网络进行供应链规划。该数据集包含了来自孟加拉国一家领先快速消费品公司的实际数据,用于优化、预测和解决供应链问题。数据集中的时间数据作为节点特征,可用于销售预测、生产计划和故障识别。 +</p> +<p> + +</p> +<p> +图神经网络(GNNs)在不同领域如运输、生物信息学、语言处理和计算机视觉中取得了重要进展。然而,在将GNNs应用于供应链网络方面,目前尚缺乏研究。供应链网络在结构上类似于图形,使其成为应用GNN方法的理想选择。这为优化、预测和解决供应链问题开辟了无限可能。然而,此方法的一个主要障碍在于缺乏真实世界的基准数据集以促进使用GNN来研究和解决供应链问题。为了解决这个问题,我们提供了一个来自孟加拉国一家领先的快速消费品公司的实际基准数据集,该数据集侧重于用于生产目的的供应链规划的时间任务。该数据集包括时间数据作为节点特征,以实现销售预测、生产计划和故障识别。 +</p> +<p> +Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision. However, there is a noticeable absence of research on applying GNNs to supply chain networks. Supply chain networks are inherently graph-like in structure, making them prime candidates for applying GNN methodologies. This opens up a world of possibilities for optimizing, predicting, and solving even the most complex supply chain problems. A major setback in this approach lies in the absence of real-world benchmark datasets to facilitate the research and resolution of supply chain problems using GNNs. To address the issue, we present a real-world benchmark dataset for temporal tasks, obtained from one of the leading FMCG companies in Bangladesh, focusing on supply chain planning for production purposes. The dataset includes temporal data as node features to enable sales predictions, production planning, and the identification of fa +</p>本研究全面研究了大型语言模型的知识编辑,旨在有效修改模型的行为,同时保持整体性能。http://arxiv.org/abs/2401.01286<p> +大型语言模型的知识编辑全面研究 +</p> +<p> +A Comprehensive Study of Knowledge Editing for Large Language Models. (arXiv:2401.01286v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2401.01286 +</p> +<p> +本研究全面研究了大型语言模型的知识编辑,旨在有效修改模型的行为,同时保持整体性能。 +</p> +<p> + +</p> +<p> +大型语言模型(LLM)在理解和生成与人类交流紧密相似的文本方面展现出了非凡的能力。然而,其主要限制在于训练过程中的显著计算需求,这是由于其广泛的参数化造成的。这一挑战在于世界的动态性,需要频繁更新LLM以修正过时的信息或集成新知识,从而确保其持续的相关性。许多应用需要在训练后进行持续的模型调整,以解决缺陷或不良行为。近年来,对于LLM的知识编辑技术的兴趣越来越高,在特定领域内有效地修改LLM的行为,同时保持整体性能在各种输入中的表现。本文首先定义了知识编辑的目标和挑战,然后综述了现有的知识编辑方法和技术,并讨论了其应用和未来发展的方向。 +</p> +<p> +Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the kno +</p>这项研究探讨了生成性人工智能中数据隐私和版权保护的多方面挑战,并提出了将技术创新与伦理前瞻相结合的综合方法,旨在全面解决这些问题。http://arxiv.org/abs/2311.18252<p> +跨越生成性人工智能数据生命周期的隐私和版权挑战导航 +</p> +<p> +Navigating Privacy and Copyright Challenges Across the Data Lifecycle of Generative AI. (arXiv:2311.18252v2 [cs.SE] UPDATED) +</p> +<p> +http://arxiv.org/abs/2311.18252 +</p> +<p> +这项研究探讨了生成性人工智能中数据隐私和版权保护的多方面挑战,并提出了将技术创新与伦理前瞻相结合的综合方法,旨在全面解决这些问题。 +</p> +<p> + +</p> +<p> +生成性人工智能的出现标志着人工智能领域的重要里程碑,展示出在生成真实图像、文本和数据模式方面的卓越能力。然而,这些进展也带来了对数据隐私和版权侵犯的更高关注,主要是由于模型训练对大规模数据集的依赖。传统方法如差分隐私、机器遗忘和数据中毒只提供了对这些复杂问题的片面解决方案。本文深入探讨了数据生命周期内隐私和版权保护的多方面挑战。我们主张采用将技术创新与伦理前瞻相结合的综合方法,通过研究和制定在生命周期视角下的解决方案,全面解决这些问题。本研究旨在推动更广泛的讨论,并激励对生成性人工智能中数据隐私和版权完整性的协同努力。 +</p> +<p> +The advent of Generative AI has marked a significant milestone in artificial intelligence, demonstrating remarkable capabilities in generating realistic images, texts, and data patterns. However, these advancements come with heightened concerns over data privacy and copyright infringement, primarily due to the reliance on vast datasets for model training. Traditional approaches like differential privacy, machine unlearning, and data poisoning only offer fragmented solutions to these complex issues. Our paper delves into the multifaceted challenges of privacy and copyright protection within the data lifecycle. We advocate for integrated approaches that combines technical innovation with ethical foresight, holistically addressing these concerns by investigating and devising solutions that are informed by the lifecycle perspective. This work aims to catalyze a broader discussion and inspire concerted efforts towards data privacy and copyright integrity in Generative AI. +</p>Clover是一种闭环可验证代码生成的范式,通过在代码、docstrings和形式注释之间进行一致性检查,确保生成的代码的正确性。http://arxiv.org/abs/2310.17807<p> +Clover: 闭环可验证代码生成 +</p> +<p> +Clover: Closed-Loop Verifiable Code Generation. (arXiv:2310.17807v1 [cs.SE]) +</p> +<p> +http://arxiv.org/abs/2310.17807 +</p> +<p> +Clover是一种闭环可验证代码生成的范式,通过在代码、docstrings和形式注释之间进行一致性检查,确保生成的代码的正确性。 +</p> +<p> + +</p> +<p> +在软件开发中,使用大型语言模型进行代码生成是一个快速增长的趋势。然而,如果没有有效的方法来确保生成的代码的正确性,这个趋势可能会导致许多不良结果。在本文中,我们提出了一个解决这个挑战的愿景:Clover范式,即闭环可验证代码生成,它将正确性检查简化为更可访问的一致性检查问题。在Clover的核心是一个检查器,它在代码、docstrings和形式注释之间进行一致性检查。该检查器使用了形式验证工具和大型语言模型的新颖集成实现。我们提供了理论分析来支持我们的论点,即Clover在一致性检查方面应该是有效的。我们还在一个由手工设计的数据集(CloverBench)上进行了实证调查,该数据集包含了注释的Dafny程序,难度水平与教科书相当。实验结果显示 +</p> +<p> +The use of large language models for code generation is a rapidly growing trend in software development. However, without effective methods for ensuring the correctness of generated code, this trend could lead to any number of undesirable outcomes. In this paper, we lay out a vision for addressing this challenge: the Clover paradigm, short for Closed-Loop Verifiable Code Generation, which reduces correctness checking to the more accessible problem of consistency checking. At the core of Clover lies a checker that performs consistency checks among code, docstrings, and formal annotations. The checker is implemented using a novel integration of formal verification tools and large language models. We provide a theoretical analysis to support our thesis that Clover should be effective at consistency checking. We also empirically investigate its feasibility on a hand-designed dataset (CloverBench) featuring annotated Dafny programs at a textbook level of difficulty. Experimental results sho +</p>Transformers学会了高阶优化方法,用于上下文学习,通过实现类似于迭代牛顿法的算法,而不是梯度下降。http://arxiv.org/abs/2310.17086<p> +Transformers学会了高阶优化方法用于上下文学习:一项与线性模型的研究 +</p> +<p> +Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models. (arXiv:2310.17086v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2310.17086 +</p> +<p> +Transformers学会了高阶优化方法,用于上下文学习,通过实现类似于迭代牛顿法的算法,而不是梯度下降。 +</p> +<p> + +</p> +<p> +Transformers在上下文学习中表现出色,但是它们是如何进行上下文学习仍然是一个谜。最近的研究表明,Transformers可能通过内部运行梯度下降,即一阶优化方法,来进行上下文学习。本文中,我们展示了Transformers学会了实现高阶优化方法来进行上下文学习。我们以上下文线性回归为重点,展示了Transformers学会了实现一个非常类似于迭代牛顿法的算法,而不是梯度下降。从实证上来看,我们展示了连续的Transformer层的预测与牛顿法的不同迭代非常接近,每个中间层大致计算了3次迭代。相比之下,需要指数级的梯度下降步骤才能匹配额外的Transformer层;这表明Transformers具有相当的收敛速率。 +</p> +<p> +Transformers are remarkably good at in-context learning (ICL) -- learning from demonstrations without parameter updates -- but how they perform ICL remains a mystery. Recent work suggests that Transformers may learn in-context by internally running Gradient Descent, a first-order optimization method. In this paper, we instead demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL. Focusing on in-context linear regression, we show that Transformers learn to implement an algorithm very similar to Iterative Newton's Method, a higher-order optimization method, rather than Gradient Descent. Empirically, we show that predictions from successive Transformer layers closely match different iterations of Newton's Method linearly, with each middle layer roughly computing 3 iterations. In contrast, exponentially more Gradient Descent steps are needed to match an additional Transformers layer; this suggests that Transformers have an comparable rate of conv +</p>本文提出了一种迭代跟踪和推理策略,结合文本编码器和视觉编码器以生成准确的响应,解决了视频对话中逐步理解对话历史和吸收视频信息的挑战。http://arxiv.org/abs/2310.07259<p> +揭示隐藏的联系:用于视频对话的迭代跟踪和推理 +</p> +<p> +Uncovering Hidden Connections: Iterative Tracking and Reasoning for Video-grounded Dialog. (arXiv:2310.07259v1 [cs.CV]) +</p> +<p> +http://arxiv.org/abs/2310.07259 +</p> +<p> +本文提出了一种迭代跟踪和推理策略,结合文本编码器和视觉编码器以生成准确的响应,解决了视频对话中逐步理解对话历史和吸收视频信息的挑战。 +</p> +<p> + +</p> +<p> +与传统的视觉问答相比,视频对话需要对对话历史和视频内容进行深入理解,以生成准确的响应。尽管现有的方法取得了令人称赞的进展,但它们常常面临逐步理解复杂的对话历史和吸收视频信息的挑战。为了弥补这一差距,我们提出了一种迭代跟踪和推理策略,将文本编码器、视觉编码器和生成器相结合。我们的文本编码器以路径跟踪和聚合机制为核心,能够从对话历史中获取重要的细微差别,以解释所提出的问题。同时,我们的视觉编码器利用迭代推理网络,精心设计以从视频中提取和强调关键视觉标记,增强对视觉理解的深度。最后,我们使用预训练的GPT-模型将这些丰富的信息综合起来。 +</p> +<p> +In contrast to conventional visual question answering, video-grounded dialog necessitates a profound understanding of both dialog history and video content for accurate response generation. Despite commendable strides made by existing methodologies, they often grapple with the challenges of incrementally understanding intricate dialog histories and assimilating video information. In response to this gap, we present an iterative tracking and reasoning strategy that amalgamates a textual encoder, a visual encoder, and a generator. At its core, our textual encoder is fortified with a path tracking and aggregation mechanism, adept at gleaning nuances from dialog history that are pivotal to deciphering the posed questions. Concurrently, our visual encoder harnesses an iterative reasoning network, meticulously crafted to distill and emphasize critical visual markers from videos, enhancing the depth of visual comprehension. Culminating this enriched information, we employ the pre-trained GPT- +</p>MaGNet是一种模型无关的图神经网络框架,能够顺序地整合不同顺序的信息,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。http://arxiv.org/abs/2309.13459<p> +模型无关的图神经网络用于整合局部和全局信息的研究 +</p> +<p> +A Model-Agnostic Graph Neural Network for Integrating Local and Global Information. (arXiv:2309.13459v1 [stat.ML]) +</p> +<p> +http://arxiv.org/abs/2309.13459 +</p> +<p> +MaGNet是一种模型无关的图神经网络框架,能够顺序地整合不同顺序的信息,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。 +</p> +<p> + +</p> +<p> +图神经网络(GNNs)在各种以图为重点的任务中取得了令人满意的性能。尽管取得了成功,但现有的GNN存在两个重要限制:由于黑盒特性,结果缺乏可解释性;无法学习不同顺序的表示。为了解决这些问题,我们提出了一种新的模型无关的图神经网络(MaGNet)框架,能够顺序地整合不同顺序的信息,从高阶邻居中提取知识,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。特别地,MaGNet由两个组件组成:图拓扑下复杂关系的潜在表示的估计模型和识别有影响力的节点、边和重要节点特征的解释模型。从理论上,我们通过经验Rademacher复杂度建立了MaGNet的泛化误差界,并展示了其强大的能力。 +</p> +<p> +Graph Neural Networks (GNNs) have achieved promising performance in a variety of graph-focused tasks. Despite their success, existing GNNs suffer from two significant limitations: a lack of interpretability in results due to their black-box nature, and an inability to learn representations of varying orders. To tackle these issues, we propose a novel Model-agnostic Graph Neural Network (MaGNet) framework, which is able to sequentially integrate information of various orders, extract knowledge from high-order neighbors, and provide meaningful and interpretable results by identifying influential compact graph structures. In particular, MaGNet consists of two components: an estimation model for the latent representation of complex relationships under graph topology, and an interpretation model that identifies influential nodes, edges, and important node features. Theoretically, we establish the generalization error bound for MaGNet via empirical Rademacher complexity, and showcase its pow +</p>本研究调查了使用大型语言模型(LLMs)来帮助患者和转诊医生识别合适的临床试验的潜力,并引入了TrialGPT架构,该架构能够准确预测合格性并提供解释,实验证明其有效性。http://arxiv.org/abs/2307.15051<p> +使用大型语言模型将患者与临床试验匹配 +</p> +<p> +Matching Patients to Clinical Trials with Large Language Models. (arXiv:2307.15051v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2307.15051 +</p> +<p> +本研究调查了使用大型语言模型(LLMs)来帮助患者和转诊医生识别合适的临床试验的潜力,并引入了TrialGPT架构,该架构能够准确预测合格性并提供解释,实验证明其有效性。 +</p> +<p> + +</p> +<p> +临床试验在推动药物研发和基于证据的医学方面非常重要,但患者招募常常受到限制。在这项工作中,我们调查了使用大型语言模型(LLMs)来帮助患者和转诊医生识别合适的临床试验的潜力。具体而言,我们引入了一种新颖的架构TrialGPT,采用LLMs预测基于标准的合格性,并提供详细的解释,并根据患者病历中的自由文本来对候选临床试验进行排名和排除。我们在三个公开可用的184名患者和18,238个注释的临床试验的队列上评估了TrialGPT。实验结果表明几个关键发现:第一,TrialGPT在标准级别的预测准确性上表现出很高的准确率,并提供准确的解释。第二,TrialGPT的综合试验级别评分与专家标注的合格性高度相关。第三,这些评分 +</p> +<p> +Clinical trials are vital in advancing drug development and evidence-based medicine, but their success is often hindered by challenges in patient recruitment. In this work, we investigate the potential of large language models (LLMs) to assist individual patients and referral physicians in identifying suitable clinical trials from an extensive selection. Specifically, we introduce TrialGPT, a novel architecture employing LLMs to predict criterion-level eligibility with detailed explanations, which are then aggregated for ranking and excluding candidate clinical trials based on free-text patient notes. We evaluate TrialGPT on three publicly available cohorts of 184 patients and 18,238 annotated clinical trials. The experimental results demonstrate several key findings: First, TrialGPT achieves high criterion-level prediction accuracy with faithful explanations. Second, the aggregated trial-level TrialGPT scores are highly correlated with expert eligibility annotations. Third, these scor +</p>遗忘是深度学习中普遍存在的现象,不仅限于连续学习领域。解决遗忘问题面临多个挑战,包括平衡保留旧任务知识与快速学习新任务的挑战,管理任务干扰与冲突目标的挑战,以及防止隐私泄露等。遗忘不总是有害的,可以在某些情况下是有益且可取的,特别是在隐私保护场景中。http://arxiv.org/abs/2307.09218<p> +深度学习中遗忘现象的全面调查:超越连续学习 +</p> +<p> +A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning. (arXiv:2307.09218v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2307.09218 +</p> +<p> +遗忘是深度学习中普遍存在的现象,不仅限于连续学习领域。解决遗忘问题面临多个挑战,包括平衡保留旧任务知识与快速学习新任务的挑战,管理任务干扰与冲突目标的挑战,以及防止隐私泄露等。遗忘不总是有害的,可以在某些情况下是有益且可取的,特别是在隐私保护场景中。 +</p> +<p> + +</p> +<p> +遗忘指的是先前获取的信息或知识的丧失或恶化。尽管现有的关于遗忘的调查主要集中在连续学习方面,但在深度学习中,遗忘是一种普遍现象,可以在各种其他研究领域中观察到。遗忘在研究领域中表现出来,例如由于生成器漂移而在生成模型领域中表现出来,以及由于客户端之间存在异构数据分布而在联邦学习中表现出来。解决遗忘问题涉及到几个挑战,包括在快速学习新任务的同时平衡保留旧任务知识,管理任务干扰与冲突目标,以及防止隐私泄露等。此外,大多数现有的连续学习调查都默认认为遗忘总是有害的。相反,我们的调查认为遗忘是一把双刃剑,在某些情况下可以是有益且可取的,例如隐私保护场景。通过在更广泛的背景下探讨遗忘现象, +</p> +<p> +Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context +</p>本文提出了两种基于学习的方法来进行企业流程资源分配,具有优于常见启发式方法的效果。http://arxiv.org/abs/2304.09970<p> +学习策略在企业流程资源分配中的应用 +</p> +<p> +Learning policies for resource allocation in business processes. (arXiv:2304.09970v1 [cs.AI]) +</p> +<p> +http://arxiv.org/abs/2304.09970 +</p> +<p> +本文提出了两种基于学习的方法来进行企业流程资源分配,具有优于常见启发式方法的效果。 +</p> +<p> + +</p> +<p> +资源分配是将资源分配到必须在运行时刻执行的业务流程活动中。虽然资源分配在制造等其他领域中已经得到深入研究,但在业务流程管理中却只存在少量的方法。现有方法不适用于大型企业流程的应用或是只针对单个实例进行资源分配的优化。本文提出了两种基于学习的方法来进行企业流程资源分配:一种基于深度强化学习的方法和一种基于评分的价值函数逼近方法。在代表典型业务流程结构的一组情景以及在代表现实业务流程的完整网络上,将两种方法与现有的启发式方法进行比较。结果表明,我们的学习方法在大多数情景中优于或与常见的启发式方法竞争力相当。 +</p> +<p> +Resource allocation is the assignment of resources to activities that must be executed in a business process at a particular moment at run-time. While resource allocation is well-studied in other fields, such as manufacturing, there exist only a few methods in business process management. Existing methods are not suited for application in large business processes or focus on optimizing resource allocation for a single case rather than for all cases combined. To fill this gap, this paper proposes two learning-based methods for resource allocation in business processes: a deep reinforcement learning-based approach and a score-based value function approximation approach. The two methods are compared against existing heuristics in a set of scenarios that represent typical business process structures and on a complete network that represents a realistic business process. The results show that our learning-based methods outperform or are competitive with common heuristics in most scenarios a +</p>本文提出了一种非平稳两臂赌博机问题的策略,能够处理平滑变化,并证明了该策略在二次Lipschitz连续的情况下的遗憾为 $\tilde O(T^{3/5})$。http://arxiv.org/abs/2301.12366<p> +平滑的非平稳连续赌博机 +</p> +<p> +Smooth Non-Stationary Bandits. (arXiv:2301.12366v2 [cs.LG] UPDATED) +</p> +<p> +http://arxiv.org/abs/2301.12366 +</p> +<p> +本文提出了一种非平稳两臂赌博机问题的策略,能够处理平滑变化,并证明了该策略在二次Lipschitz连续的情况下的遗憾为 $\tilde O(T^{3/5})$。 +</p> +<p> + +</p> +<p> +在许多在线决策应用中,环境都是非平稳的,因此使用能够处理变化的赌博算法至关重要。大多数现有方法是为了保护非平滑变化而设计的,仅受到总变差或时间上的Lipschitz性的限制,其中它们保证$\tilde \Theta(T^{2/3})$的遗憾。然而,在实践中,环境经常以平稳的方式改变,因此这种算法可能会在这些设置中产生比必要更高的遗憾,并且不利用变化率的信息。我们研究了一个非平稳的两臂赌博机问题,假设臂的平均回报是一个$\beta$-H\''older函数,即它是$(\beta-1)$次Lipschitz连续可微分的,我们展示了一个策略,对于$\beta=2$,它的遗憾为$\tilde O(T^{3/5})$,从而首次在平滑和非平滑之间进行了区分。我们通过一个任意$\Omg(T^{(\beta+1)/(2\beta+1)})$的下界来补充这个结果,说明了这个问题的困难程度。 +</p> +<p> +In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee $\tilde \Theta(T^{2/3})$ regret. However, in practice environments are often changing {\bf smoothly}, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the rate of change. We study a non-stationary two-armed bandits problem where we assume that an arm's mean reward is a $\beta$-H\"older function over (normalized) time, meaning it is $(\beta-1)$-times Lipschitz-continuously differentiable. We show the first separation between the smooth and non-smooth regimes by presenting a policy with $\tilde O(T^{3/5})$ regret for $\beta=2$. We complement this result by an $\Omg(T^{(\beta+1)/(2\beta+1)})$ lower bound for any int +</p>本文提出了深度伪造音频的系统指纹识别方法,并通过收集来自中国七个供应商的语音合成系统的数据集进行了初步研究。这项研究为进一步发展系统指纹识别方法提供了基础,并在模型版权保护和数字证据取证等实际场景中具有重要应用价值。http://arxiv.org/abs/2208.10489<p> +深度伪造音频的系统指纹识别:初始数据集与研究 +</p> +<p> +System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation. (arXiv:2208.10489v3 [cs.SD] UPDATED) +</p> +<p> +http://arxiv.org/abs/2208.10489 +</p> +<p> +本文提出了深度伪造音频的系统指纹识别方法,并通过收集来自中国七个供应商的语音合成系统的数据集进行了初步研究。这项研究为进一步发展系统指纹识别方法提供了基础,并在模型版权保护和数字证据取证等实际场景中具有重要应用价值。 +</p> +<p> + +</p> +<p> +深度语音合成模型的快速发展给社会带来了重大威胁,例如恶意内容操纵。因此,许多研究出现了,旨在检测所谓的深度伪造音频。然而,现有的工作都集中在对真实音频和伪造音频进行二元检测。在模型版权保护和数字证据取证等实际场景中,需要知道生成深度伪造音频的工具或模型来解释决策。这促使我们提出一个问题:我们能识别深度伪造音频的系统指纹吗?在本文中,我们提出了第一个系统指纹识别(SFR)的深度伪造音频数据集,并进行了初步研究。我们从使用最新的深度学习技术的七个中国供应商的语音合成系统中收集了该数据集,包括清晰和压缩集。此外,为了促进系统指纹识别方法的进一步发展,我们提供了外部参考音频,以便进行评估和对比实验。 +</p> +<p> +The rapid progress of deep speech synthesis models has posed significant threats to society such as malicious content manipulation. Therefore, many studies have emerged to detect the so-called deepfake audio. However, existing works focus on the binary detection of real audio and fake audio. In real-world scenarios such as model copyright protection and digital evidence forensics, it is needed to know what tool or model generated the deepfake audio to explain the decision. This motivates us to ask: Can we recognize the system fingerprints of deepfake audio? In this paper, we present the first deepfake audio dataset for system fingerprint recognition (SFR) and conduct an initial investigation. We collected the dataset from the speech synthesis systems of seven Chinese vendors that use the latest state-of-the-art deep learning technologies, including both clean and compressed sets. In addition, to facilitate the further development of system fingerprint recognition methods, we provide ex </p> \ No newline at end of file diff --git a/cs.CL.md b/cs.CL.md index 1b42a432e..9f856a4ef 100644 --- a/cs.CL.md +++ b/cs.CL.md @@ -2,67 +2,217 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [Security and Privacy Challenges of Large Language Models: A Survey](https://rss.arxiv.org/abs/2402.00888) | 大型语言模型具有卓越的能力,但也面临着安全和隐私攻击的威胁。本调查全面审查了LLM的安全和隐私挑战,涵盖了训练数据、用户和应用风险等方面,并对解决方法进行了回顾。 | -| [^2] | [Mitigating the Linguistic Gap with Phonemic Representations for Robust Multilingual Language Understanding](https://arxiv.org/abs/2402.14279) | 通过使用音素表示,本文提出了一种新颖的解决方案来减缓高资源语言和低资源语言之间的性能差距,并通过实证研究和理论分析证明了其有效性。 | -| [^3] | [REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR](https://arxiv.org/abs/2402.03988) | 本文提出了REBORN,在无监督语音识别中使用基于强化学习的迭代训练来实现边界分割。通过交替训练分割模型和音素预测模型,实现了学习语音和文本之间的映射,解决了无监督情况下语音信号分段结构边界的挑战。 | -| [^4] | [Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs](https://arxiv.org/abs/2312.11282) | 该论文评估了当前最先进的大型语言模型(GPT-4)在知识图谱上的对话推理能力,提出了一种基于KG推理的LLM基准代理(LLM-ARK),该代理利用全文环境提示来实现精确和适应性强的KG路径预测,并采用近端策略优化算法进行训练。 | +| [^1] | [Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based Bias Evaluation](https://arxiv.org/abs/2404.01768) | 该研究引入了Multi-Grain Stereotype(MGS)数据集,探索了不同的机器学习方法用于建立陈规检测的基线,并提出了一系列基于MGS数据训练的英文文本的陈规分类器模型。 | +| [^2] | [Encode Once and Decode in Parallel: Efficient Transformer Decoding](https://arxiv.org/abs/2403.13112) | 提出了一种新的编码器-解码器模型配置,称为prompt-in-decoder(PiD),可以一次编码输入并并行解码输出,在结构化输出和问答任务中取得高效率,避免了重复输入编码,大幅减少了解码器的内存占用。 | +| [^3] | [Data-oriented Dynamic Fine-tuning Parameter Selection Strategy for FISH Mask based Efficient Fine-tuning](https://arxiv.org/abs/2403.08484) | 提出了一种数据驱动的动态微调参数选择策略,针对FISH Mask提出了IRD算法,用于在不稳定的数据分布下动态选择最佳参数设置。 | +| [^4] | [Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation](https://arxiv.org/abs/2402.18191) | 本文提出了一种聚类与排序方法(CaR),通过与专家偏好相一致的评分模型排名指令对,保留了数据集的多样性。 | +| [^5] | [How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries](https://arxiv.org/abs/2402.15302) | 本研究探讨了大型语言模型(LLMs)对指令中心响应的容忍度,并提出了一个包含复杂查询的数据集,旨在揭示触发不道德响应的方法。 | +| [^6] | [Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond](https://arxiv.org/abs/2402.14259) | 本论文提出了一种新方法单词序列熵(WSE),用于在自由形式医学问答任务中量化答案的不确定性,相比其他基线方法表现更优秀。 | +| [^7] | [MultiPoT: Multilingual Program of Thoughts Harnesses Multiple Programming Languages](https://arxiv.org/abs/2402.10691) | MultiPoT 提出了一种任务和模型无关的方法,通过利用多种编程语言的优势和多样性,在表现上显著优于 Python 自一致性。 | +| [^8] | [Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning](https://arxiv.org/abs/2402.03607) | 本研究提出了一种将常识知识图谱与大型视觉语言模型相结合的框架,用于改进预测多模态营销活动效果的性能。该方法能够提供早期检测可能具有说服力的多模态活动并评估和增强营销理论的能力。 | +| [^9] | [Large Language Models are Null-Shot Learners](https://arxiv.org/abs/2401.08273) | 本文提出了零射击提示方法,通过利用大规模语言模型中的错误信息来指导模型进行任务,以提高任务表现。实验结果表明,在不同数据集上,包括阅读理解、算术推理和闭卷问答,模型性能有所提升。这些结果也显示出不同模型之间存在不同程度的错误信息。 | +| [^10] | [SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning](https://arxiv.org/abs/2401.07950) | SciGLM引入了自我反思指导注释框架,用于弥补大型语言模型在理解复杂科学概念、推导符号方程式和解决高级数值计算方面的不足,以训练能够进行大学水平科学推理的科学语言模型。 | +| [^11] | [A Comprehensive Study of Knowledge Editing for Large Language Models.](http://arxiv.org/abs/2401.01286) | 本研究全面研究了大型语言模型的知识编辑,旨在有效修改模型的行为,同时保持整体性能。 | +| [^12] | [Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models.](http://arxiv.org/abs/2310.17086) | Transformers学会了高阶优化方法,用于上下文学习,通过实现类似于迭代牛顿法的算法,而不是梯度下降。 | +| [^13] | [Matching Patients to Clinical Trials with Large Language Models.](http://arxiv.org/abs/2307.15051) | 本研究调查了使用大型语言模型(LLMs)来帮助患者和转诊医生识别合适的临床试验的潜力,并引入了TrialGPT架构,该架构能够准确预测合格性并提供解释,实验证明其有效性。 | +| [^14] | [Towards Explainable Evaluation Metrics for Machine Translation.](http://arxiv.org/abs/2306.13041) | 本研究探索机器翻译可解释性评估指标,提供综合综述和最新方法,并贡献下一代方法的愿景。 | # 详细 -[^1]: 大型语言模型的安全和隐私挑战:一项调查 +[^1]: 用于增强基于文本的陈规检测和基于探测的偏见评估的大规模语言模型审计 - Security and Privacy Challenges of Large Language Models: A Survey + Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based Bias Evaluation - [https://rss.arxiv.org/abs/2402.00888](https://rss.arxiv.org/abs/2402.00888) + [https://arxiv.org/abs/2404.01768](https://arxiv.org/abs/2404.01768) - 大型语言模型具有卓越的能力,但也面临着安全和隐私攻击的威胁。本调查全面审查了LLM的安全和隐私挑战,涵盖了训练数据、用户和应用风险等方面,并对解决方法进行了回顾。 + 该研究引入了Multi-Grain Stereotype(MGS)数据集,探索了不同的机器学习方法用于建立陈规检测的基线,并提出了一系列基于MGS数据训练的英文文本的陈规分类器模型。 - 大型语言模型(LLM)展示了非凡的能力,并在生成和总结文本、语言翻译和问答等多个领域做出了贡献。如今,LLM正在成为计算机语言处理任务中非常流行的工具,具备分析复杂语言模式并根据上下文提供相关和适当回答的能力。然而,尽管具有显著优势,这些模型也容易受到安全和隐私攻击的威胁,如越狱攻击、数据污染攻击和个人可识别信息泄露攻击。本调查全面审查了LLM的安全和隐私挑战,包括训练数据和用户方面的问题,以及在交通、教育和医疗等各个领域中应用带来的风险。我们评估了LLM的脆弱性程度,调查了出现的安全和隐私攻击,并对潜在的解决方法进行了回顾。 + 大型语言模型(LLMs)的最新进展显著提高了它们在面向人类的人工智能(AI)应用中的影响力。然而,LLMs可能会复制甚至加剧自训练数据中的陈规输出。本研究介绍了Multi-Grain Stereotype(MGS)数据集,包括51,867个实例,涵盖性别、种族、职业、宗教和陈规文本,通过融合多个先前公开的陈规检测数据集收集而来。我们探索了旨在为陈规检测建立基线的不同机器学习方法,并微调了多种架构和模型大小的几个语言模型,本文展示了一系列基于MGS训练的英文文本的陈规分类器模型。为了了解我们的陈规检测器是否捕捉到与人类常识一致的相关特征,我们利用了各种可解释的AI工具, - Large Language Models (LLMs) have demonstrated extraordinary capabilities and contributed to multiple fields, such as generating and summarizing text, language translation, and question-answering. Nowadays, LLM is becoming a very popular tool in computerized language processing tasks, with the capability to analyze complicated linguistic patterns and provide relevant and appropriate responses depending on the context. While offering significant advantages, these models are also vulnerable to security and privacy attacks, such as jailbreaking attacks, data poisoning attacks, and Personally Identifiable Information (PII) leakage attacks. This survey provides a thorough review of the security and privacy challenges of LLMs for both training data and users, along with the application-based risks in various domains, such as transportation, education, and healthcare. We assess the extent of LLM vulnerabilities, investigate emerging security and privacy attacks for LLMs, and review the potent + arXiv:2404.01768v1 Announce Type: cross Abstract: Recent advancements in Large Language Models (LLMs) have significantly increased their presence in human-facing Artificial Intelligence (AI) applications. However, LLMs could reproduce and even exacerbate stereotypical outputs from training data. This work introduces the Multi-Grain Stereotype (MGS) dataset, encompassing 51,867 instances across gender, race, profession, religion, and stereotypical text, collected by fusing multiple previously publicly available stereotype detection datasets. We explore different machine learning approaches aimed at establishing baselines for stereotype detection, and fine-tune several language models of various architectures and model sizes, presenting in this work a series of stereotypes classifier models for English text trained on MGS. To understand whether our stereotype detectors capture relevant features (aligning with human common sense) we utilise a variety of explanainable AI tools, including -[^2]: 使用音素表示减缓语言差异,实现稳健的多语言理解 +[^2]: 一次编码,多次并行解码:高效Transformer解码 - Mitigating the Linguistic Gap with Phonemic Representations for Robust Multilingual Language Understanding + Encode Once and Decode in Parallel: Efficient Transformer Decoding - [https://arxiv.org/abs/2402.14279](https://arxiv.org/abs/2402.14279) + [https://arxiv.org/abs/2403.13112](https://arxiv.org/abs/2403.13112) - 通过使用音素表示,本文提出了一种新颖的解决方案来减缓高资源语言和低资源语言之间的性能差距,并通过实证研究和理论分析证明了其有效性。 + 提出了一种新的编码器-解码器模型配置,称为prompt-in-decoder(PiD),可以一次编码输入并并行解码输出,在结构化输出和问答任务中取得高效率,避免了重复输入编码,大幅减少了解码器的内存占用。 - 为了改善多语言理解,通常需要在训练阶段使用多种语言,依赖复杂的训练技术,并且在高资源语言和低资源语言之间存在显著的性能差距。我们假设语言之间的性能差距受到这些语言之间的语言差异的影响,并通过使用音素表示(具体来说,将音素作为输入标记输入到语言模型中,而不是子词)提供了一种新颖的解决方案,以实现稳健的多语言建模。我们通过三个跨语言任务的定量证据展示了音素表示的有效性,这进一步得到了对跨语言性能差距的理论分析的证明。 + 基于Transformer的自然语言处理模型功能强大,但计算成本高,限制了部署场景。在专业领域中,微调的编码器-解码器模型备受青睐,可以胜过更大更通用的仅解码器模型,例如GPT-4。我们介绍了一种新的编码器-解码器模型配置,可以提高在结构化输出和问答任务中的效率,在这些任务中,需要从单个输入中产生多个输出。我们的方法,prompt-in-decoder(PiD),只对输入进行一次编码,并且并行解码输出,通过避免重复输入编码,从而减少解码器的内存占用,提升了训练和推断效率。我们实现了计算减少,大致随子任务数量增加而扩展,相比最先进模型,在对话状态追踪、摘要和问答任务中获得高达4.6倍的速度提升,并且性能相当或更好。我们发布了我们的训练/推断代码。 - arXiv:2402.14279v1 Announce Type: cross Abstract: Approaches to improving multilingual language understanding often require multiple languages during the training phase, rely on complicated training techniques, and -- importantly -- struggle with significant performance gaps between high-resource and low-resource languages. We hypothesize that the performance gaps between languages are affected by linguistic gaps between those languages and provide a novel solution for robust multilingual language modeling by employing phonemic representations (specifically, using phonemes as input tokens to LMs rather than subwords). We present quantitative evidence from three cross-lingual tasks that demonstrate the effectiveness of phonemic representation, which is further justified by a theoretical analysis of the cross-lingual performance gap. + arXiv:2403.13112v1 Announce Type: new Abstract: Transformer-based NLP models are powerful but have high computational costs that limit deployment scenarios. Finetuned encoder-decoder models are popular in specialized domains and can outperform larger more generalized decoder-only models, such as GPT-4. We introduce a new configuration for encoder-decoder models that improves efficiency on structured output and question-answering tasks where multiple outputs are required of a single input. Our method, prompt-in-decoder (PiD), encodes the input once and decodes output in parallel, boosting both training and inference efficiency by avoiding duplicate input encoding, thereby reducing the decoder's memory footprint. We achieve computation reduction that roughly scales with the number of subtasks, gaining up to 4.6x speed-up over state-of-the-art models for dialogue state tracking, summarization, and question-answering tasks with comparable or better performance. We release our training/inf -[^3]: REBORN: 基于强化学习的迭代训练的无监督语音识别中的边界分割 +[^3]: 数据驱动的动态微调参数选择策略,用于基于FISH Mask的高效微调 - REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR + Data-oriented Dynamic Fine-tuning Parameter Selection Strategy for FISH Mask based Efficient Fine-tuning - [https://arxiv.org/abs/2402.03988](https://arxiv.org/abs/2402.03988) + [https://arxiv.org/abs/2403.08484](https://arxiv.org/abs/2403.08484) - 本文提出了REBORN,在无监督语音识别中使用基于强化学习的迭代训练来实现边界分割。通过交替训练分割模型和音素预测模型,实现了学习语音和文本之间的映射,解决了无监督情况下语音信号分段结构边界的挑战。 + 提出了一种数据驱动的动态微调参数选择策略,针对FISH Mask提出了IRD算法,用于在不稳定的数据分布下动态选择最佳参数设置。 - 无监督自动语音识别(ASR)旨在学习语音信号与其对应的文本转录之间的映射,而无需配对的语音-文本数据监督。语音信号中的单词/音素由一段长度可变且边界未知的语音信号表示,而这种分段结构使得在没有配对数据的情况下学习语音和文本之间的映射变得具有挑战性。本文提出了REBORN,基于强化学习的迭代训练的无监督语音识别中的边界分割。REBORN交替进行以下两个步骤:(1)训练一个能够预测语音信号中分段结构边界的分割模型,和(2)训练一个音素预测模型,其输入是由分割模型分割的分段结构,用于预测音素转录。由于没有用于训练分割模型的监督数据,我们使用强化学习来训练分割模型。 + 鉴于大型语言模型(LLMs)的参数数量巨大,调整所有参数成本很高,因此更明智的做法是对特定参数进行微调。大多数参数高效微调(PEFT)集中在参数选择策略上,例如加法方法、选择性方法和基于重新参数化的方法。然而,很少有方法考虑数据样本对参数选择的影响,例如基于Fish Mask的方法。Fish Mask随机选择部分数据样本,并在参数选择过程中对它们进行同等处理,这无法为不稳定的数据分布动态选择最佳参数。在这项工作中,我们采用了数据驱动的视角,提出了一个IRD(迭代样本参数范围减小)算法,以搜索FISH Mask的最佳样本参数对设置。 - Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text challenging, especially without paired data. In this paper, we propose REBORN, Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR. REBORN alternates between (1) training a segmentation model that predicts the boundaries of the segmental structures in speech signals and (2) training the phoneme prediction model, whose input is a segmental structure segmented by the segmentation model, to predict a phoneme transcription. Since supervised data for training the segmentation model is not available, we use reinforcement learning to train the segmentation model t + arXiv:2403.08484v1 Announce Type: new Abstract: In view of the huge number of parameters of Large language models (LLMs) , tuning all parameters is very costly, and accordingly fine-tuning specific parameters is more sensible. Most of parameter efficient fine-tuning (PEFT) concentrate on parameter selection strategies, such as additive method, selective method and reparametrization-based method. However, there are few methods that consider the impact of data samples on parameter selecting, such as Fish Mask based method. Fish Mask randomly choose a part of data samples and treat them equally during parameter selection, which is unable to dynamically select optimal parameters for inconstant data distributions. In this work, we adopt a data-oriented perspective, then proposing an IRD ($\mathrm{\underline I}$terative sample-parameter $\mathrm{\underline R}$ange $\mathrm{\underline D}$ecreasing) algorithm to search the best setting of sample-parameter pair for FISH Mask. In each iteration -[^4]: 评估和增强用于知识图谱上的对话推理的大型语言模型 +[^4]: 聚类与排序:通过专家定位质量估计实现保留多样性的指令选择 - Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs + Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation - [https://arxiv.org/abs/2312.11282](https://arxiv.org/abs/2312.11282) + [https://arxiv.org/abs/2402.18191](https://arxiv.org/abs/2402.18191) - 该论文评估了当前最先进的大型语言模型(GPT-4)在知识图谱上的对话推理能力,提出了一种基于KG推理的LLM基准代理(LLM-ARK),该代理利用全文环境提示来实现精确和适应性强的KG路径预测,并采用近端策略优化算法进行训练。 + 本文提出了一种聚类与排序方法(CaR),通过与专家偏好相一致的评分模型排名指令对,保留了数据集的多样性。 - 大型语言模型(LLM)的发展得益于预训练技术的进展。通过手动设计的提示,这些模型展示了强大的推理能力。在这项工作中,我们评估了当前最先进的LLM(GPT-4)在知识图谱(KG)上的对话推理能力。然而,由于缺乏KG环境意识和开发有效的中间推理阶段优化机制的困难,LLM的性能受到限制。我们进一步引入了LLM-ARK,一个基于KG推理的LLM基准代理,旨在提供精确和适应性强的KG路径预测。LLM-ARK利用全文环境(FTE)提示来吸收每个推理步骤中的状态信息。我们将KG上的多跳推理挑战重新框定为顺序决策任务。利用近端策略优化(PPO)在线策略梯度强化学习算法,我们的模型... + 随着开源社区的贡献,涌现了大量指令调优(IT)数据。鉴于训练和评估模型需要大量资源分配,因此有必要采用高效的方法选择高质量的IT数据。然而,现有的指令数据选择方法存在一些限制,比如依赖脆弱的外部API、受GPT模型偏见影响,或减少所选指令数据集的多样性。在本文中,我们提出了一种面向工业的、与专家定位相吻合并保留多样性的指令数据选择方法:聚类与排序(CaR)。CaR分为两个步骤。第一步涉及使用与专家偏好很好对齐的评分模型对指令对进行排名(准确率达到84.25%)。第二步通过聚类过程保留数据集多样性。在我们的实验中,CaR选择了一个子集 - The development of large language models (LLMs) has been catalyzed by advancements in pre-training techniques. These models have demonstrated robust reasoning capabilities through manually designed prompts. In this work, we evaluate the conversational reasoning capabilities of the current state-of-the-art LLM (GPT-4) on knowledge graphs (KGs). However, the performance of LLMs is constrained due to a lack of KG environment awareness and the difficulties in developing effective optimization mechanisms for intermediary reasoning stages. We further introduce LLM-ARK, a LLM grounded KG reasoning agent designed to deliver precise and adaptable predictions on KG paths. LLM-ARK leverages Full Textual Environment (FTE) prompt to assimilate state information within each reasoning step. We reframe the challenge of multi-hop reasoning on the KG as a sequential decision-making task. Utilizing the Proximal Policy Optimization (PPO) online policy gradient reinforcement learning algorithm, our model i + arXiv:2402.18191v1 Announce Type: new Abstract: With contributions from the open-source community, a vast amount of instruction tuning (IT) data has emerged. Given the significant resource allocation required by training and evaluating models, it is advantageous to have an efficient method for selecting high-quality IT data. However, existing methods for instruction data selection have limitations such as relying on fragile external APIs, being affected by biases in GPT models, or reducing the diversity of the selected instruction dataset. In this paper, we propose an industrial-friendly, expert-aligned and diversity-preserved instruction data selection method: Clustering and Ranking (CaR). CaR consists of two steps. The first step involves ranking instruction pairs using a scoring model that is well aligned with expert preferences (achieving an accuracy of 84.25%). The second step involves preserving dataset diversity through a clustering process.In our experiment, CaR selected a sub + +[^5]: 有关LLMs指令中心响应的(不道德)程度有多高?揭示安全防护栏对有害查询的漏洞 + + How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries + + [https://arxiv.org/abs/2402.15302](https://arxiv.org/abs/2402.15302) + + 本研究探讨了大型语言模型(LLMs)对指令中心响应的容忍度,并提出了一个包含复杂查询的数据集,旨在揭示触发不道德响应的方法。 + + + + 在这项研究中,我们解决了一个围绕大型语言模型(LLMs)安全和道德使用日益关注的问题。尽管这些模型具有潜力,但它们可能会被各种复杂的方法欺骗,产生有害或不道德内容,包括“越狱”技术和有针对性的操纵。我们的工作集中在一个特定问题上:LLMs在要求它们生成以伪代码、程序或软件片段为中心的响应时,有多大程度上可能会被误导,而不是生成普通文本。为了调查这个问题,我们引入了TechHazardQA,一个数据集,其中包含应以文本和以指令为中心格式(例如伪代码)回答的复杂查询,旨在识别不道德响应的触发器。我们查询了一系列LLMs-- Llama-2-13b,Llama-2-7b,Mistral-V2和Mistral 8X7B--并要求它们生成文本和指令为中心的响应。为了评估我们的方法, + + arXiv:2402.15302v1 Announce Type: new Abstract: In this study, we tackle a growing concern around the safety and ethical use of large language models (LLMs). Despite their potential, these models can be tricked into producing harmful or unethical content through various sophisticated methods, including 'jailbreaking' techniques and targeted manipulation. Our work zeroes in on a specific issue: to what extent LLMs can be led astray by asking them to generate responses that are instruction-centric such as a pseudocode, a program or a software snippet as opposed to vanilla text. To investigate this question, we introduce TechHazardQA, a dataset containing complex queries which should be answered in both text and instruction-centric formats (e.g., pseudocodes), aimed at identifying triggers for unethical responses. We query a series of LLMs -- Llama-2-13b, Llama-2-7b, Mistral-V2 and Mistral 8X7B -- and ask them to generate both text and instruction-centric responses. For evaluation we rep + +[^6]: 单词序列熵:走向自由形式医学问答应用及其不确定性估计 + + Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond + + [https://arxiv.org/abs/2402.14259](https://arxiv.org/abs/2402.14259) + + 本论文提出了一种新方法单词序列熵(WSE),用于在自由形式医学问答任务中量化答案的不确定性,相比其他基线方法表现更优秀。 + + + + 不确定性估计在确保安全关键的人工智能系统与人类互动的可靠性中发挥关键作用,尤其在医疗领域尤为重要。然而,在自由形式的医学问答任务中,尚未建立一种通用方法来量化答案的不确定性,其中无关的词汇和语序含有有限的语义信息可能是不确定性的主要来源,这是由于生成不平等的存在。本文提出了单词序列熵(WSE),该方法根据语义相关性在单词和序列级别上校准不确定性比例,在不确定性量化时更加强调关键词和更相关的序列。我们在5个自由形式医学问答数据集上,利用7种“现成的”大语言模型(LLMs)将WSE与6种基线方法进行比较,并展示了WSE在性能上的优越性。 + + arXiv:2402.14259v1 Announce Type: cross Abstract: Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems, particularly in the medical domain. However, a general method for quantifying the uncertainty of free-form answers has yet to be established in open-ended medical question-answering (QA) tasks, where irrelevant words and sequences with limited semantic information can be the primary source of uncertainty due to the presence of generative inequality. In this paper, we propose the Word-Sequence Entropy (WSE), which calibrates the uncertainty proportion at both the word and sequence levels according to the semantic relevance, with greater emphasis placed on keywords and more relevant sequences when performing uncertainty quantification. We compare WSE with 6 baseline methods on 5 free-form medical QA datasets, utilizing 7 "off-the-shelf" large language models (LLMs), and show that WSE exhibits superior performance on ac + +[^7]: MultiPoT: 多语言思维程序利用多种编程语言 + + MultiPoT: Multilingual Program of Thoughts Harnesses Multiple Programming Languages + + [https://arxiv.org/abs/2402.10691](https://arxiv.org/abs/2402.10691) + + MultiPoT 提出了一种任务和模型无关的方法,通过利用多种编程语言的优势和多样性,在表现上显著优于 Python 自一致性。 + + + + arXiv:2402.10691v1 公告类型:新的 摘要:思维程序(PoT)是一种以其可执行中间步骤为特征的方法,其确保推理过程中数值计算的准确性。目前,PoT主要使用Python。然而,仅依赖单一语言可能导致次优解决方案,忽视其他编程语言的潜在优势。在本文中,我们对PoT中使用的编程语言进行了全面实验,发现没有一种单一语言在所有任务和模型上始终提供最佳性能。每种语言的有效性取决于具体情景。受此启发,我们提出了一种称为MultiPoT的任务和模型无关方法,该方法从各种语言中获取强大和多样性。实验结果显示,MultiPoT 在很大程度上优于Python 自一致性。此外,与最佳模型相比,它实现了可比或更优异的性能。 + + arXiv:2402.10691v1 Announce Type: new Abstract: Program of Thoughts (PoT) is an approach characterized by its executable intermediate steps, which ensure the accuracy of the numerical calculations in the reasoning process. Currently, PoT primarily uses Python. However, relying solely on a single language may result in suboptimal solutions and overlook the potential benefits of other programming languages. In this paper, we conduct comprehensive experiments on the programming languages used in PoT and find that no single language consistently delivers optimal performance across all tasks and models. The effectiveness of each language varies depending on the specific scenarios. Inspired by this, we propose a task and model agnostic approach called MultiPoT, which harnesses strength and diversity from various languages. Experimental results reveal that it significantly outperforms Python Self-Consistency. Furthermore, it achieves comparable or superior performance compared to the best mo + +[^8]: 提高多模态营销的上下文一致性:知识基础学习的有效性 + + Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning + + [https://arxiv.org/abs/2402.03607](https://arxiv.org/abs/2402.03607) + + 本研究提出了一种将常识知识图谱与大型视觉语言模型相结合的框架,用于改进预测多模态营销活动效果的性能。该方法能够提供早期检测可能具有说服力的多模态活动并评估和增强营销理论的能力。 + + + + 智能设备的普及使用户能够在线体验多模态信息。然而,大型语言模型(LLM)和视觉模型(LVM)仍然受到捕捉跨模态语义关系的整体意义的限制。缺乏明确的常识知识(例如,作为一个知识图谱),视觉语言模型(VLM)仅通过捕捉庞大的语料库中的高级模式来学习隐式表示,从而忽略了重要的上下文跨模态线索。在这项工作中,我们设计了一个框架,将显式的常识知识以知识图谱的形式与大型的VLM相结合,以提高下游任务的性能,即预测多模态营销活动的有效性。虽然营销应用提供了一个有说服力的指标来评估我们的方法,但我们的方法使得早期发现可能具有说服力的多模态活动成为可能,并评估和增强营销理论。 + + The prevalence of smart devices with the ability to capture moments in multiple modalities has enabled users to experience multimodal information online. However, large Language (LLMs) and Vision models (LVMs) are still limited in capturing holistic meaning with cross-modal semantic relationships. Without explicit, common sense knowledge (e.g., as a knowledge graph), Visual Language Models (VLMs) only learn implicit representations by capturing high-level patterns in vast corpora, missing essential contextual cross-modal cues. In this work, we design a framework to couple explicit commonsense knowledge in the form of knowledge graphs with large VLMs to improve the performance of a downstream task, predicting the effectiveness of multi-modal marketing campaigns. While the marketing application provides a compelling metric for assessing our methods, our approach enables the early detection of likely persuasive multi-modal campaigns and the assessment and augmentation of marketing theory. + +[^9]: 大规模语言模型是零射击学习器 + + Large Language Models are Null-Shot Learners + + [https://arxiv.org/abs/2401.08273](https://arxiv.org/abs/2401.08273) + + 本文提出了零射击提示方法,通过利用大规模语言模型中的错误信息来指导模型进行任务,以提高任务表现。实验结果表明,在不同数据集上,包括阅读理解、算术推理和闭卷问答,模型性能有所提升。这些结果也显示出不同模型之间存在不同程度的错误信息。 + + + + 本文提出了零射击提示方法。零射击提示利用大规模语言模型(LLMs)中的错误信息,通过指示LLMs利用从“示例”部分中获取的信息(该信息在所提供的上下文中不存在)来完成任务。虽然减少错误信息对于LLMs的日常和重要用途至关重要,但我们提出在目前的环境中,这些LLMs仍然具有错误信息,实际上可以利用错误信息来提高与标准零射击提示相比的任务表现。对八个LLMs进行实验,结果显示在大多数八个数据集(包括阅读理解、算术推理和闭卷问答)中,性能有所提升。观察到的不一致性增加相对性能在LLMs之间的差异,也可能表示每个模型中存在不同程度的错误信息。 + + arXiv:2401.08273v2 Announce Type: replace-cross Abstract: This paper presents null-shot prompting. Null-shot prompting exploits hallucination in large language models (LLMs) by instructing LLMs to utilize information from the "Examples" section that never exists within the provided context to perform a task. While reducing hallucination is crucial and non-negligible for daily and critical uses of LLMs, we propose that in the current landscape in which these LLMs still hallucinate, it is possible, in fact, to exploit hallucination to increase performance in performing tasks compared to standard zero-shot prompting. Experiments with eight LLMs show improvements in performance across the majority of eight datasets, including reading comprehension, arithmetic reasoning, and closed-book question answering. The observed inconsistency in increased relative performance across the LLMs also potentially indicates a different degree of inherent hallucination in each model. These differences show + +[^10]: SciGLM: 用自我反思指导注释和调整训练科学语言模型 + + SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning + + [https://arxiv.org/abs/2401.07950](https://arxiv.org/abs/2401.07950) + + SciGLM引入了自我反思指导注释框架,用于弥补大型语言模型在理解复杂科学概念、推导符号方程式和解决高级数值计算方面的不足,以训练能够进行大学水平科学推理的科学语言模型。 + + + + 大型语言模型(LLMs)已显示出在协助科学发现方面的潜力。然而,目前LLMs在理解复杂科学概念、推导符号方程式和解决高级数值计算方面存在局限。为了弥补这些差距,我们引入了SciGLM,一套能够进行大学水平科学推理的科学语言模型。我们方法的核心是一种新颖的自我反思指导注释框架,以解决科学领域中数据稀缺挑战。该框架利用现有LLMs为未标记的科学问题生成逐步推理,随后经过自我反思的批评和修改过程。应用这一框架,我们整理了SciInstruct,这是一个涵盖物理、化学、数学和形式证明的多样化、高质量的数据集。我们利用SciInstruct对ChatGLM系列语言模型进行了微调,增强了 + + arXiv:2401.07950v2 Announce Type: replace Abstract: Large Language Models (LLMs) have shown promise in assisting scientific discovery. However, such applications are currently limited by LLMs' deficiencies in understanding intricate scientific concepts, deriving symbolic equations, and solving advanced numerical calculations. To bridge these gaps, we introduce SciGLM, a suite of scientific language models able to conduct college-level scientific reasoning. Central to our approach is a novel self-reflective instruction annotation framework to address the data scarcity challenge in the science domain. This framework leverages existing LLMs to generate step-by-step reasoning for unlabelled scientific questions, followed by a process of self-reflective critic-and-revise. Applying this framework, we curated SciInstruct, a diverse and high-quality dataset encompassing physics, chemistry, math, and formal proofs. We fine-tuned the ChatGLM family of language models with SciInstruct, enhancing + +[^11]: 大型语言模型的知识编辑全面研究 + + A Comprehensive Study of Knowledge Editing for Large Language Models. (arXiv:2401.01286v1 [cs.CL]) + + [http://arxiv.org/abs/2401.01286](http://arxiv.org/abs/2401.01286) + + 本研究全面研究了大型语言模型的知识编辑,旨在有效修改模型的行为,同时保持整体性能。 + + + + 大型语言模型(LLM)在理解和生成与人类交流紧密相似的文本方面展现出了非凡的能力。然而,其主要限制在于训练过程中的显著计算需求,这是由于其广泛的参数化造成的。这一挑战在于世界的动态性,需要频繁更新LLM以修正过时的信息或集成新知识,从而确保其持续的相关性。许多应用需要在训练后进行持续的模型调整,以解决缺陷或不良行为。近年来,对于LLM的知识编辑技术的兴趣越来越高,在特定领域内有效地修改LLM的行为,同时保持整体性能在各种输入中的表现。本文首先定义了知识编辑的目标和挑战,然后综述了现有的知识编辑方法和技术,并讨论了其应用和未来发展的方向。 + + Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the kno + +[^12]: Transformers学会了高阶优化方法用于上下文学习:一项与线性模型的研究 + + Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models. (arXiv:2310.17086v1 [cs.LG]) + + [http://arxiv.org/abs/2310.17086](http://arxiv.org/abs/2310.17086) + + Transformers学会了高阶优化方法,用于上下文学习,通过实现类似于迭代牛顿法的算法,而不是梯度下降。 + + + + Transformers在上下文学习中表现出色,但是它们是如何进行上下文学习仍然是一个谜。最近的研究表明,Transformers可能通过内部运行梯度下降,即一阶优化方法,来进行上下文学习。本文中,我们展示了Transformers学会了实现高阶优化方法来进行上下文学习。我们以上下文线性回归为重点,展示了Transformers学会了实现一个非常类似于迭代牛顿法的算法,而不是梯度下降。从实证上来看,我们展示了连续的Transformer层的预测与牛顿法的不同迭代非常接近,每个中间层大致计算了3次迭代。相比之下,需要指数级的梯度下降步骤才能匹配额外的Transformer层;这表明Transformers具有相当的收敛速率。 + + Transformers are remarkably good at in-context learning (ICL) -- learning from demonstrations without parameter updates -- but how they perform ICL remains a mystery. Recent work suggests that Transformers may learn in-context by internally running Gradient Descent, a first-order optimization method. In this paper, we instead demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL. Focusing on in-context linear regression, we show that Transformers learn to implement an algorithm very similar to Iterative Newton's Method, a higher-order optimization method, rather than Gradient Descent. Empirically, we show that predictions from successive Transformer layers closely match different iterations of Newton's Method linearly, with each middle layer roughly computing 3 iterations. In contrast, exponentially more Gradient Descent steps are needed to match an additional Transformers layer; this suggests that Transformers have an comparable rate of conv + +[^13]: 使用大型语言模型将患者与临床试验匹配 + + Matching Patients to Clinical Trials with Large Language Models. (arXiv:2307.15051v1 [cs.CL]) + + [http://arxiv.org/abs/2307.15051](http://arxiv.org/abs/2307.15051) + + 本研究调查了使用大型语言模型(LLMs)来帮助患者和转诊医生识别合适的临床试验的潜力,并引入了TrialGPT架构,该架构能够准确预测合格性并提供解释,实验证明其有效性。 + + + + 临床试验在推动药物研发和基于证据的医学方面非常重要,但患者招募常常受到限制。在这项工作中,我们调查了使用大型语言模型(LLMs)来帮助患者和转诊医生识别合适的临床试验的潜力。具体而言,我们引入了一种新颖的架构TrialGPT,采用LLMs预测基于标准的合格性,并提供详细的解释,并根据患者病历中的自由文本来对候选临床试验进行排名和排除。我们在三个公开可用的184名患者和18,238个注释的临床试验的队列上评估了TrialGPT。实验结果表明几个关键发现:第一,TrialGPT在标准级别的预测准确性上表现出很高的准确率,并提供准确的解释。第二,TrialGPT的综合试验级别评分与专家标注的合格性高度相关。第三,这些评分 + + Clinical trials are vital in advancing drug development and evidence-based medicine, but their success is often hindered by challenges in patient recruitment. In this work, we investigate the potential of large language models (LLMs) to assist individual patients and referral physicians in identifying suitable clinical trials from an extensive selection. Specifically, we introduce TrialGPT, a novel architecture employing LLMs to predict criterion-level eligibility with detailed explanations, which are then aggregated for ranking and excluding candidate clinical trials based on free-text patient notes. We evaluate TrialGPT on three publicly available cohorts of 184 patients and 18,238 annotated clinical trials. The experimental results demonstrate several key findings: First, TrialGPT achieves high criterion-level prediction accuracy with faithful explanations. Second, the aggregated trial-level TrialGPT scores are highly correlated with expert eligibility annotations. Third, these scor + +[^14]: 机器翻译可解释性评估指标的探索 + + Towards Explainable Evaluation Metrics for Machine Translation. (arXiv:2306.13041v1 [cs.CL]) + + [http://arxiv.org/abs/2306.13041](http://arxiv.org/abs/2306.13041) + + 本研究探索机器翻译可解释性评估指标,提供综合综述和最新方法,并贡献下一代方法的愿景。 + + + + 与传统的词汇重叠度量(如BLEU)不同,大多数当前用于机器翻译评估的指标(例如COMET或BERTScore)基于黑盒子的大型语言模型。它们通常与人类判断具有强相关性,但是最近的研究表明,较低质量的传统指标仍然占主导地位,其中一个潜在原因是它们的决策过程更透明。因此,为了促进新的高质量指标的更广泛接受,解释性变得至关重要。在这篇概念论文中,我们确定了可解释机器翻译指标的关键属性和目标,并提供了最近技术的综合综述,将它们与我们确立的目标和属性联系起来。在这个背景下,我们还讨论基于生成模型(如ChatGPT和GPT4)的可解释指标的最新先进方法。最后,我们贡献了下一代方法的愿景,包括自然语言e。 + + Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for machine translation (for example, COMET or BERTScore) are based on black-box large language models. They often achieve strong correlations with human judgments, but recent research indicates that the lower-quality classical metrics remain dominant, one of the potential reasons being that their decision processes are more transparent. To foster more widespread acceptance of novel high-quality metrics, explainability thus becomes crucial. In this concept paper, we identify key properties as well as key goals of explainable machine translation metrics and provide a comprehensive synthesis of recent techniques, relating them to our established goals and properties. In this context, we also discuss the latest state-of-the-art approaches to explainable metrics based on generative models such as ChatGPT and GPT4. Finally, we contribute a vision of next-generation approaches, including natural language e diff --git a/cs.CL.xml b/cs.CL.xml index 0fb4e08f4..fe6aa1bf1 100644 --- a/cs.CL.xml +++ b/cs.CL.xml @@ -1,81 +1,281 @@ -Chat Arxiv cs.CLhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.CL大型语言模型具有卓越的能力,但也面临着安全和隐私攻击的威胁。本调查全面审查了LLM的安全和隐私挑战,涵盖了训练数据、用户和应用风险等方面,并对解决方法进行了回顾。https://rss.arxiv.org/abs/2402.00888<p> -大型语言模型的安全和隐私挑战:一项调查 +Chat Arxiv cs.CLhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.CL该研究引入了Multi-Grain Stereotype(MGS)数据集,探索了不同的机器学习方法用于建立陈规检测的基线,并提出了一系列基于MGS数据训练的英文文本的陈规分类器模型。https://arxiv.org/abs/2404.01768<p> +用于增强基于文本的陈规检测和基于探测的偏见评估的大规模语言模型审计 </p> <p> -Security and Privacy Challenges of Large Language Models: A Survey +Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based Bias Evaluation </p> <p> -https://rss.arxiv.org/abs/2402.00888 +https://arxiv.org/abs/2404.01768 </p> <p> -大型语言模型具有卓越的能力,但也面临着安全和隐私攻击的威胁。本调查全面审查了LLM的安全和隐私挑战,涵盖了训练数据、用户和应用风险等方面,并对解决方法进行了回顾。 +该研究引入了Multi-Grain Stereotype(MGS)数据集,探索了不同的机器学习方法用于建立陈规检测的基线,并提出了一系列基于MGS数据训练的英文文本的陈规分类器模型。 </p> <p> </p> <p> -大型语言模型(LLM)展示了非凡的能力,并在生成和总结文本、语言翻译和问答等多个领域做出了贡献。如今,LLM正在成为计算机语言处理任务中非常流行的工具,具备分析复杂语言模式并根据上下文提供相关和适当回答的能力。然而,尽管具有显著优势,这些模型也容易受到安全和隐私攻击的威胁,如越狱攻击、数据污染攻击和个人可识别信息泄露攻击。本调查全面审查了LLM的安全和隐私挑战,包括训练数据和用户方面的问题,以及在交通、教育和医疗等各个领域中应用带来的风险。我们评估了LLM的脆弱性程度,调查了出现的安全和隐私攻击,并对潜在的解决方法进行了回顾。 +大型语言模型(LLMs)的最新进展显著提高了它们在面向人类的人工智能(AI)应用中的影响力。然而,LLMs可能会复制甚至加剧自训练数据中的陈规输出。本研究介绍了Multi-Grain Stereotype(MGS)数据集,包括51,867个实例,涵盖性别、种族、职业、宗教和陈规文本,通过融合多个先前公开的陈规检测数据集收集而来。我们探索了旨在为陈规检测建立基线的不同机器学习方法,并微调了多种架构和模型大小的几个语言模型,本文展示了一系列基于MGS训练的英文文本的陈规分类器模型。为了了解我们的陈规检测器是否捕捉到与人类常识一致的相关特征,我们利用了各种可解释的AI工具, </p> <p> -Large Language Models (LLMs) have demonstrated extraordinary capabilities and contributed to multiple fields, such as generating and summarizing text, language translation, and question-answering. Nowadays, LLM is becoming a very popular tool in computerized language processing tasks, with the capability to analyze complicated linguistic patterns and provide relevant and appropriate responses depending on the context. While offering significant advantages, these models are also vulnerable to security and privacy attacks, such as jailbreaking attacks, data poisoning attacks, and Personally Identifiable Information (PII) leakage attacks. This survey provides a thorough review of the security and privacy challenges of LLMs for both training data and users, along with the application-based risks in various domains, such as transportation, education, and healthcare. We assess the extent of LLM vulnerabilities, investigate emerging security and privacy attacks for LLMs, and review the potent -</p>通过使用音素表示,本文提出了一种新颖的解决方案来减缓高资源语言和低资源语言之间的性能差距,并通过实证研究和理论分析证明了其有效性。https://arxiv.org/abs/2402.14279<p> -使用音素表示减缓语言差异,实现稳健的多语言理解 +arXiv:2404.01768v1 Announce Type: cross Abstract: Recent advancements in Large Language Models (LLMs) have significantly increased their presence in human-facing Artificial Intelligence (AI) applications. However, LLMs could reproduce and even exacerbate stereotypical outputs from training data. This work introduces the Multi-Grain Stereotype (MGS) dataset, encompassing 51,867 instances across gender, race, profession, religion, and stereotypical text, collected by fusing multiple previously publicly available stereotype detection datasets. We explore different machine learning approaches aimed at establishing baselines for stereotype detection, and fine-tune several language models of various architectures and model sizes, presenting in this work a series of stereotypes classifier models for English text trained on MGS. To understand whether our stereotype detectors capture relevant features (aligning with human common sense) we utilise a variety of explanainable AI tools, including +</p>提出了一种新的编码器-解码器模型配置,称为prompt-in-decoder(PiD),可以一次编码输入并并行解码输出,在结构化输出和问答任务中取得高效率,避免了重复输入编码,大幅减少了解码器的内存占用。https://arxiv.org/abs/2403.13112<p> +一次编码,多次并行解码:高效Transformer解码 </p> <p> -Mitigating the Linguistic Gap with Phonemic Representations for Robust Multilingual Language Understanding +Encode Once and Decode in Parallel: Efficient Transformer Decoding </p> <p> -https://arxiv.org/abs/2402.14279 +https://arxiv.org/abs/2403.13112 </p> <p> -通过使用音素表示,本文提出了一种新颖的解决方案来减缓高资源语言和低资源语言之间的性能差距,并通过实证研究和理论分析证明了其有效性。 +提出了一种新的编码器-解码器模型配置,称为prompt-in-decoder(PiD),可以一次编码输入并并行解码输出,在结构化输出和问答任务中取得高效率,避免了重复输入编码,大幅减少了解码器的内存占用。 </p> <p> </p> <p> -为了改善多语言理解,通常需要在训练阶段使用多种语言,依赖复杂的训练技术,并且在高资源语言和低资源语言之间存在显著的性能差距。我们假设语言之间的性能差距受到这些语言之间的语言差异的影响,并通过使用音素表示(具体来说,将音素作为输入标记输入到语言模型中,而不是子词)提供了一种新颖的解决方案,以实现稳健的多语言建模。我们通过三个跨语言任务的定量证据展示了音素表示的有效性,这进一步得到了对跨语言性能差距的理论分析的证明。 +基于Transformer的自然语言处理模型功能强大,但计算成本高,限制了部署场景。在专业领域中,微调的编码器-解码器模型备受青睐,可以胜过更大更通用的仅解码器模型,例如GPT-4。我们介绍了一种新的编码器-解码器模型配置,可以提高在结构化输出和问答任务中的效率,在这些任务中,需要从单个输入中产生多个输出。我们的方法,prompt-in-decoder(PiD),只对输入进行一次编码,并且并行解码输出,通过避免重复输入编码,从而减少解码器的内存占用,提升了训练和推断效率。我们实现了计算减少,大致随子任务数量增加而扩展,相比最先进模型,在对话状态追踪、摘要和问答任务中获得高达4.6倍的速度提升,并且性能相当或更好。我们发布了我们的训练/推断代码。 </p> <p> -arXiv:2402.14279v1 Announce Type: cross Abstract: Approaches to improving multilingual language understanding often require multiple languages during the training phase, rely on complicated training techniques, and -- importantly -- struggle with significant performance gaps between high-resource and low-resource languages. We hypothesize that the performance gaps between languages are affected by linguistic gaps between those languages and provide a novel solution for robust multilingual language modeling by employing phonemic representations (specifically, using phonemes as input tokens to LMs rather than subwords). We present quantitative evidence from three cross-lingual tasks that demonstrate the effectiveness of phonemic representation, which is further justified by a theoretical analysis of the cross-lingual performance gap. -</p>本文提出了REBORN,在无监督语音识别中使用基于强化学习的迭代训练来实现边界分割。通过交替训练分割模型和音素预测模型,实现了学习语音和文本之间的映射,解决了无监督情况下语音信号分段结构边界的挑战。https://arxiv.org/abs/2402.03988<p> -REBORN: 基于强化学习的迭代训练的无监督语音识别中的边界分割 +arXiv:2403.13112v1 Announce Type: new Abstract: Transformer-based NLP models are powerful but have high computational costs that limit deployment scenarios. Finetuned encoder-decoder models are popular in specialized domains and can outperform larger more generalized decoder-only models, such as GPT-4. We introduce a new configuration for encoder-decoder models that improves efficiency on structured output and question-answering tasks where multiple outputs are required of a single input. Our method, prompt-in-decoder (PiD), encodes the input once and decodes output in parallel, boosting both training and inference efficiency by avoiding duplicate input encoding, thereby reducing the decoder's memory footprint. We achieve computation reduction that roughly scales with the number of subtasks, gaining up to 4.6x speed-up over state-of-the-art models for dialogue state tracking, summarization, and question-answering tasks with comparable or better performance. We release our training/inf +</p>提出了一种数据驱动的动态微调参数选择策略,针对FISH Mask提出了IRD算法,用于在不稳定的数据分布下动态选择最佳参数设置。https://arxiv.org/abs/2403.08484<p> +数据驱动的动态微调参数选择策略,用于基于FISH Mask的高效微调 </p> <p> -REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR +Data-oriented Dynamic Fine-tuning Parameter Selection Strategy for FISH Mask based Efficient Fine-tuning </p> <p> -https://arxiv.org/abs/2402.03988 +https://arxiv.org/abs/2403.08484 </p> <p> -本文提出了REBORN,在无监督语音识别中使用基于强化学习的迭代训练来实现边界分割。通过交替训练分割模型和音素预测模型,实现了学习语音和文本之间的映射,解决了无监督情况下语音信号分段结构边界的挑战。 +提出了一种数据驱动的动态微调参数选择策略,针对FISH Mask提出了IRD算法,用于在不稳定的数据分布下动态选择最佳参数设置。 </p> <p> </p> <p> -无监督自动语音识别(ASR)旨在学习语音信号与其对应的文本转录之间的映射,而无需配对的语音-文本数据监督。语音信号中的单词/音素由一段长度可变且边界未知的语音信号表示,而这种分段结构使得在没有配对数据的情况下学习语音和文本之间的映射变得具有挑战性。本文提出了REBORN,基于强化学习的迭代训练的无监督语音识别中的边界分割。REBORN交替进行以下两个步骤:(1)训练一个能够预测语音信号中分段结构边界的分割模型,和(2)训练一个音素预测模型,其输入是由分割模型分割的分段结构,用于预测音素转录。由于没有用于训练分割模型的监督数据,我们使用强化学习来训练分割模型。 +鉴于大型语言模型(LLMs)的参数数量巨大,调整所有参数成本很高,因此更明智的做法是对特定参数进行微调。大多数参数高效微调(PEFT)集中在参数选择策略上,例如加法方法、选择性方法和基于重新参数化的方法。然而,很少有方法考虑数据样本对参数选择的影响,例如基于Fish Mask的方法。Fish Mask随机选择部分数据样本,并在参数选择过程中对它们进行同等处理,这无法为不稳定的数据分布动态选择最佳参数。在这项工作中,我们采用了数据驱动的视角,提出了一个IRD(迭代样本参数范围减小)算法,以搜索FISH Mask的最佳样本参数对设置。 </p> <p> -Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text challenging, especially without paired data. In this paper, we propose REBORN, Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR. REBORN alternates between (1) training a segmentation model that predicts the boundaries of the segmental structures in speech signals and (2) training the phoneme prediction model, whose input is a segmental structure segmented by the segmentation model, to predict a phoneme transcription. Since supervised data for training the segmentation model is not available, we use reinforcement learning to train the segmentation model t -</p>该论文评估了当前最先进的大型语言模型(GPT-4)在知识图谱上的对话推理能力,提出了一种基于KG推理的LLM基准代理(LLM-ARK),该代理利用全文环境提示来实现精确和适应性强的KG路径预测,并采用近端策略优化算法进行训练。https://arxiv.org/abs/2312.11282<p> -评估和增强用于知识图谱上的对话推理的大型语言模型 +arXiv:2403.08484v1 Announce Type: new Abstract: In view of the huge number of parameters of Large language models (LLMs) , tuning all parameters is very costly, and accordingly fine-tuning specific parameters is more sensible. Most of parameter efficient fine-tuning (PEFT) concentrate on parameter selection strategies, such as additive method, selective method and reparametrization-based method. However, there are few methods that consider the impact of data samples on parameter selecting, such as Fish Mask based method. Fish Mask randomly choose a part of data samples and treat them equally during parameter selection, which is unable to dynamically select optimal parameters for inconstant data distributions. In this work, we adopt a data-oriented perspective, then proposing an IRD ($\mathrm{\underline I}$terative sample-parameter $\mathrm{\underline R}$ange $\mathrm{\underline D}$ecreasing) algorithm to search the best setting of sample-parameter pair for FISH Mask. In each iteration +</p>本文提出了一种聚类与排序方法(CaR),通过与专家偏好相一致的评分模型排名指令对,保留了数据集的多样性。https://arxiv.org/abs/2402.18191<p> +聚类与排序:通过专家定位质量估计实现保留多样性的指令选择 </p> <p> -Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs +Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation </p> <p> -https://arxiv.org/abs/2312.11282 +https://arxiv.org/abs/2402.18191 </p> <p> -该论文评估了当前最先进的大型语言模型(GPT-4)在知识图谱上的对话推理能力,提出了一种基于KG推理的LLM基准代理(LLM-ARK),该代理利用全文环境提示来实现精确和适应性强的KG路径预测,并采用近端策略优化算法进行训练。 +本文提出了一种聚类与排序方法(CaR),通过与专家偏好相一致的评分模型排名指令对,保留了数据集的多样性。 </p> <p> </p> <p> -大型语言模型(LLM)的发展得益于预训练技术的进展。通过手动设计的提示,这些模型展示了强大的推理能力。在这项工作中,我们评估了当前最先进的LLM(GPT-4)在知识图谱(KG)上的对话推理能力。然而,由于缺乏KG环境意识和开发有效的中间推理阶段优化机制的困难,LLM的性能受到限制。我们进一步引入了LLM-ARK,一个基于KG推理的LLM基准代理,旨在提供精确和适应性强的KG路径预测。LLM-ARK利用全文环境(FTE)提示来吸收每个推理步骤中的状态信息。我们将KG上的多跳推理挑战重新框定为顺序决策任务。利用近端策略优化(PPO)在线策略梯度强化学习算法,我们的模型... +随着开源社区的贡献,涌现了大量指令调优(IT)数据。鉴于训练和评估模型需要大量资源分配,因此有必要采用高效的方法选择高质量的IT数据。然而,现有的指令数据选择方法存在一些限制,比如依赖脆弱的外部API、受GPT模型偏见影响,或减少所选指令数据集的多样性。在本文中,我们提出了一种面向工业的、与专家定位相吻合并保留多样性的指令数据选择方法:聚类与排序(CaR)。CaR分为两个步骤。第一步涉及使用与专家偏好很好对齐的评分模型对指令对进行排名(准确率达到84.25%)。第二步通过聚类过程保留数据集多样性。在我们的实验中,CaR选择了一个子集 </p> <p> -The development of large language models (LLMs) has been catalyzed by advancements in pre-training techniques. These models have demonstrated robust reasoning capabilities through manually designed prompts. In this work, we evaluate the conversational reasoning capabilities of the current state-of-the-art LLM (GPT-4) on knowledge graphs (KGs). However, the performance of LLMs is constrained due to a lack of KG environment awareness and the difficulties in developing effective optimization mechanisms for intermediary reasoning stages. We further introduce LLM-ARK, a LLM grounded KG reasoning agent designed to deliver precise and adaptable predictions on KG paths. LLM-ARK leverages Full Textual Environment (FTE) prompt to assimilate state information within each reasoning step. We reframe the challenge of multi-hop reasoning on the KG as a sequential decision-making task. Utilizing the Proximal Policy Optimization (PPO) online policy gradient reinforcement learning algorithm, our model i +arXiv:2402.18191v1 Announce Type: new Abstract: With contributions from the open-source community, a vast amount of instruction tuning (IT) data has emerged. Given the significant resource allocation required by training and evaluating models, it is advantageous to have an efficient method for selecting high-quality IT data. However, existing methods for instruction data selection have limitations such as relying on fragile external APIs, being affected by biases in GPT models, or reducing the diversity of the selected instruction dataset. In this paper, we propose an industrial-friendly, expert-aligned and diversity-preserved instruction data selection method: Clustering and Ranking (CaR). CaR consists of two steps. The first step involves ranking instruction pairs using a scoring model that is well aligned with expert preferences (achieving an accuracy of 84.25%). The second step involves preserving dataset diversity through a clustering process.In our experiment, CaR selected a sub +</p>本研究探讨了大型语言模型(LLMs)对指令中心响应的容忍度,并提出了一个包含复杂查询的数据集,旨在揭示触发不道德响应的方法。https://arxiv.org/abs/2402.15302<p> +有关LLMs指令中心响应的(不道德)程度有多高?揭示安全防护栏对有害查询的漏洞 +</p> +<p> +How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries +</p> +<p> +https://arxiv.org/abs/2402.15302 +</p> +<p> +本研究探讨了大型语言模型(LLMs)对指令中心响应的容忍度,并提出了一个包含复杂查询的数据集,旨在揭示触发不道德响应的方法。 +</p> +<p> + +</p> +<p> +在这项研究中,我们解决了一个围绕大型语言模型(LLMs)安全和道德使用日益关注的问题。尽管这些模型具有潜力,但它们可能会被各种复杂的方法欺骗,产生有害或不道德内容,包括“越狱”技术和有针对性的操纵。我们的工作集中在一个特定问题上:LLMs在要求它们生成以伪代码、程序或软件片段为中心的响应时,有多大程度上可能会被误导,而不是生成普通文本。为了调查这个问题,我们引入了TechHazardQA,一个数据集,其中包含应以文本和以指令为中心格式(例如伪代码)回答的复杂查询,旨在识别不道德响应的触发器。我们查询了一系列LLMs-- Llama-2-13b,Llama-2-7b,Mistral-V2和Mistral 8X7B--并要求它们生成文本和指令为中心的响应。为了评估我们的方法, +</p> +<p> +arXiv:2402.15302v1 Announce Type: new Abstract: In this study, we tackle a growing concern around the safety and ethical use of large language models (LLMs). Despite their potential, these models can be tricked into producing harmful or unethical content through various sophisticated methods, including 'jailbreaking' techniques and targeted manipulation. Our work zeroes in on a specific issue: to what extent LLMs can be led astray by asking them to generate responses that are instruction-centric such as a pseudocode, a program or a software snippet as opposed to vanilla text. To investigate this question, we introduce TechHazardQA, a dataset containing complex queries which should be answered in both text and instruction-centric formats (e.g., pseudocodes), aimed at identifying triggers for unethical responses. We query a series of LLMs -- Llama-2-13b, Llama-2-7b, Mistral-V2 and Mistral 8X7B -- and ask them to generate both text and instruction-centric responses. For evaluation we rep +</p>本论文提出了一种新方法单词序列熵(WSE),用于在自由形式医学问答任务中量化答案的不确定性,相比其他基线方法表现更优秀。https://arxiv.org/abs/2402.14259<p> +单词序列熵:走向自由形式医学问答应用及其不确定性估计 +</p> +<p> +Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond +</p> +<p> +https://arxiv.org/abs/2402.14259 +</p> +<p> +本论文提出了一种新方法单词序列熵(WSE),用于在自由形式医学问答任务中量化答案的不确定性,相比其他基线方法表现更优秀。 +</p> +<p> + +</p> +<p> +不确定性估计在确保安全关键的人工智能系统与人类互动的可靠性中发挥关键作用,尤其在医疗领域尤为重要。然而,在自由形式的医学问答任务中,尚未建立一种通用方法来量化答案的不确定性,其中无关的词汇和语序含有有限的语义信息可能是不确定性的主要来源,这是由于生成不平等的存在。本文提出了单词序列熵(WSE),该方法根据语义相关性在单词和序列级别上校准不确定性比例,在不确定性量化时更加强调关键词和更相关的序列。我们在5个自由形式医学问答数据集上,利用7种“现成的”大语言模型(LLMs)将WSE与6种基线方法进行比较,并展示了WSE在性能上的优越性。 +</p> +<p> +arXiv:2402.14259v1 Announce Type: cross Abstract: Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems, particularly in the medical domain. However, a general method for quantifying the uncertainty of free-form answers has yet to be established in open-ended medical question-answering (QA) tasks, where irrelevant words and sequences with limited semantic information can be the primary source of uncertainty due to the presence of generative inequality. In this paper, we propose the Word-Sequence Entropy (WSE), which calibrates the uncertainty proportion at both the word and sequence levels according to the semantic relevance, with greater emphasis placed on keywords and more relevant sequences when performing uncertainty quantification. We compare WSE with 6 baseline methods on 5 free-form medical QA datasets, utilizing 7 "off-the-shelf" large language models (LLMs), and show that WSE exhibits superior performance on ac +</p>MultiPoT 提出了一种任务和模型无关的方法,通过利用多种编程语言的优势和多样性,在表现上显著优于 Python 自一致性。https://arxiv.org/abs/2402.10691<p> +MultiPoT: 多语言思维程序利用多种编程语言 +</p> +<p> +MultiPoT: Multilingual Program of Thoughts Harnesses Multiple Programming Languages +</p> +<p> +https://arxiv.org/abs/2402.10691 +</p> +<p> +MultiPoT 提出了一种任务和模型无关的方法,通过利用多种编程语言的优势和多样性,在表现上显著优于 Python 自一致性。 +</p> +<p> + +</p> +<p> +arXiv:2402.10691v1 公告类型:新的 摘要:思维程序(PoT)是一种以其可执行中间步骤为特征的方法,其确保推理过程中数值计算的准确性。目前,PoT主要使用Python。然而,仅依赖单一语言可能导致次优解决方案,忽视其他编程语言的潜在优势。在本文中,我们对PoT中使用的编程语言进行了全面实验,发现没有一种单一语言在所有任务和模型上始终提供最佳性能。每种语言的有效性取决于具体情景。受此启发,我们提出了一种称为MultiPoT的任务和模型无关方法,该方法从各种语言中获取强大和多样性。实验结果显示,MultiPoT 在很大程度上优于Python 自一致性。此外,与最佳模型相比,它实现了可比或更优异的性能。 +</p> +<p> +arXiv:2402.10691v1 Announce Type: new Abstract: Program of Thoughts (PoT) is an approach characterized by its executable intermediate steps, which ensure the accuracy of the numerical calculations in the reasoning process. Currently, PoT primarily uses Python. However, relying solely on a single language may result in suboptimal solutions and overlook the potential benefits of other programming languages. In this paper, we conduct comprehensive experiments on the programming languages used in PoT and find that no single language consistently delivers optimal performance across all tasks and models. The effectiveness of each language varies depending on the specific scenarios. Inspired by this, we propose a task and model agnostic approach called MultiPoT, which harnesses strength and diversity from various languages. Experimental results reveal that it significantly outperforms Python Self-Consistency. Furthermore, it achieves comparable or superior performance compared to the best mo +</p>本研究提出了一种将常识知识图谱与大型视觉语言模型相结合的框架,用于改进预测多模态营销活动效果的性能。该方法能够提供早期检测可能具有说服力的多模态活动并评估和增强营销理论的能力。https://arxiv.org/abs/2402.03607<p> +提高多模态营销的上下文一致性:知识基础学习的有效性 +</p> +<p> +Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning +</p> +<p> +https://arxiv.org/abs/2402.03607 +</p> +<p> +本研究提出了一种将常识知识图谱与大型视觉语言模型相结合的框架,用于改进预测多模态营销活动效果的性能。该方法能够提供早期检测可能具有说服力的多模态活动并评估和增强营销理论的能力。 +</p> +<p> + +</p> +<p> +智能设备的普及使用户能够在线体验多模态信息。然而,大型语言模型(LLM)和视觉模型(LVM)仍然受到捕捉跨模态语义关系的整体意义的限制。缺乏明确的常识知识(例如,作为一个知识图谱),视觉语言模型(VLM)仅通过捕捉庞大的语料库中的高级模式来学习隐式表示,从而忽略了重要的上下文跨模态线索。在这项工作中,我们设计了一个框架,将显式的常识知识以知识图谱的形式与大型的VLM相结合,以提高下游任务的性能,即预测多模态营销活动的有效性。虽然营销应用提供了一个有说服力的指标来评估我们的方法,但我们的方法使得早期发现可能具有说服力的多模态活动成为可能,并评估和增强营销理论。 +</p> +<p> +The prevalence of smart devices with the ability to capture moments in multiple modalities has enabled users to experience multimodal information online. However, large Language (LLMs) and Vision models (LVMs) are still limited in capturing holistic meaning with cross-modal semantic relationships. Without explicit, common sense knowledge (e.g., as a knowledge graph), Visual Language Models (VLMs) only learn implicit representations by capturing high-level patterns in vast corpora, missing essential contextual cross-modal cues. In this work, we design a framework to couple explicit commonsense knowledge in the form of knowledge graphs with large VLMs to improve the performance of a downstream task, predicting the effectiveness of multi-modal marketing campaigns. While the marketing application provides a compelling metric for assessing our methods, our approach enables the early detection of likely persuasive multi-modal campaigns and the assessment and augmentation of marketing theory. +</p>本文提出了零射击提示方法,通过利用大规模语言模型中的错误信息来指导模型进行任务,以提高任务表现。实验结果表明,在不同数据集上,包括阅读理解、算术推理和闭卷问答,模型性能有所提升。这些结果也显示出不同模型之间存在不同程度的错误信息。https://arxiv.org/abs/2401.08273<p> +大规模语言模型是零射击学习器 +</p> +<p> +Large Language Models are Null-Shot Learners +</p> +<p> +https://arxiv.org/abs/2401.08273 +</p> +<p> +本文提出了零射击提示方法,通过利用大规模语言模型中的错误信息来指导模型进行任务,以提高任务表现。实验结果表明,在不同数据集上,包括阅读理解、算术推理和闭卷问答,模型性能有所提升。这些结果也显示出不同模型之间存在不同程度的错误信息。 +</p> +<p> + +</p> +<p> +本文提出了零射击提示方法。零射击提示利用大规模语言模型(LLMs)中的错误信息,通过指示LLMs利用从“示例”部分中获取的信息(该信息在所提供的上下文中不存在)来完成任务。虽然减少错误信息对于LLMs的日常和重要用途至关重要,但我们提出在目前的环境中,这些LLMs仍然具有错误信息,实际上可以利用错误信息来提高与标准零射击提示相比的任务表现。对八个LLMs进行实验,结果显示在大多数八个数据集(包括阅读理解、算术推理和闭卷问答)中,性能有所提升。观察到的不一致性增加相对性能在LLMs之间的差异,也可能表示每个模型中存在不同程度的错误信息。 +</p> +<p> +arXiv:2401.08273v2 Announce Type: replace-cross Abstract: This paper presents null-shot prompting. Null-shot prompting exploits hallucination in large language models (LLMs) by instructing LLMs to utilize information from the "Examples" section that never exists within the provided context to perform a task. While reducing hallucination is crucial and non-negligible for daily and critical uses of LLMs, we propose that in the current landscape in which these LLMs still hallucinate, it is possible, in fact, to exploit hallucination to increase performance in performing tasks compared to standard zero-shot prompting. Experiments with eight LLMs show improvements in performance across the majority of eight datasets, including reading comprehension, arithmetic reasoning, and closed-book question answering. The observed inconsistency in increased relative performance across the LLMs also potentially indicates a different degree of inherent hallucination in each model. These differences show +</p>SciGLM引入了自我反思指导注释框架,用于弥补大型语言模型在理解复杂科学概念、推导符号方程式和解决高级数值计算方面的不足,以训练能够进行大学水平科学推理的科学语言模型。https://arxiv.org/abs/2401.07950<p> +SciGLM: 用自我反思指导注释和调整训练科学语言模型 +</p> +<p> +SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning +</p> +<p> +https://arxiv.org/abs/2401.07950 +</p> +<p> +SciGLM引入了自我反思指导注释框架,用于弥补大型语言模型在理解复杂科学概念、推导符号方程式和解决高级数值计算方面的不足,以训练能够进行大学水平科学推理的科学语言模型。 +</p> +<p> + +</p> +<p> +大型语言模型(LLMs)已显示出在协助科学发现方面的潜力。然而,目前LLMs在理解复杂科学概念、推导符号方程式和解决高级数值计算方面存在局限。为了弥补这些差距,我们引入了SciGLM,一套能够进行大学水平科学推理的科学语言模型。我们方法的核心是一种新颖的自我反思指导注释框架,以解决科学领域中数据稀缺挑战。该框架利用现有LLMs为未标记的科学问题生成逐步推理,随后经过自我反思的批评和修改过程。应用这一框架,我们整理了SciInstruct,这是一个涵盖物理、化学、数学和形式证明的多样化、高质量的数据集。我们利用SciInstruct对ChatGLM系列语言模型进行了微调,增强了 +</p> +<p> +arXiv:2401.07950v2 Announce Type: replace Abstract: Large Language Models (LLMs) have shown promise in assisting scientific discovery. However, such applications are currently limited by LLMs' deficiencies in understanding intricate scientific concepts, deriving symbolic equations, and solving advanced numerical calculations. To bridge these gaps, we introduce SciGLM, a suite of scientific language models able to conduct college-level scientific reasoning. Central to our approach is a novel self-reflective instruction annotation framework to address the data scarcity challenge in the science domain. This framework leverages existing LLMs to generate step-by-step reasoning for unlabelled scientific questions, followed by a process of self-reflective critic-and-revise. Applying this framework, we curated SciInstruct, a diverse and high-quality dataset encompassing physics, chemistry, math, and formal proofs. We fine-tuned the ChatGLM family of language models with SciInstruct, enhancing +</p>本研究全面研究了大型语言模型的知识编辑,旨在有效修改模型的行为,同时保持整体性能。http://arxiv.org/abs/2401.01286<p> +大型语言模型的知识编辑全面研究 +</p> +<p> +A Comprehensive Study of Knowledge Editing for Large Language Models. (arXiv:2401.01286v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2401.01286 +</p> +<p> +本研究全面研究了大型语言模型的知识编辑,旨在有效修改模型的行为,同时保持整体性能。 +</p> +<p> + +</p> +<p> +大型语言模型(LLM)在理解和生成与人类交流紧密相似的文本方面展现出了非凡的能力。然而,其主要限制在于训练过程中的显著计算需求,这是由于其广泛的参数化造成的。这一挑战在于世界的动态性,需要频繁更新LLM以修正过时的信息或集成新知识,从而确保其持续的相关性。许多应用需要在训练后进行持续的模型调整,以解决缺陷或不良行为。近年来,对于LLM的知识编辑技术的兴趣越来越高,在特定领域内有效地修改LLM的行为,同时保持整体性能在各种输入中的表现。本文首先定义了知识编辑的目标和挑战,然后综述了现有的知识编辑方法和技术,并讨论了其应用和未来发展的方向。 +</p> +<p> +Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the kno +</p>Transformers学会了高阶优化方法,用于上下文学习,通过实现类似于迭代牛顿法的算法,而不是梯度下降。http://arxiv.org/abs/2310.17086<p> +Transformers学会了高阶优化方法用于上下文学习:一项与线性模型的研究 +</p> +<p> +Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models. (arXiv:2310.17086v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2310.17086 +</p> +<p> +Transformers学会了高阶优化方法,用于上下文学习,通过实现类似于迭代牛顿法的算法,而不是梯度下降。 +</p> +<p> + +</p> +<p> +Transformers在上下文学习中表现出色,但是它们是如何进行上下文学习仍然是一个谜。最近的研究表明,Transformers可能通过内部运行梯度下降,即一阶优化方法,来进行上下文学习。本文中,我们展示了Transformers学会了实现高阶优化方法来进行上下文学习。我们以上下文线性回归为重点,展示了Transformers学会了实现一个非常类似于迭代牛顿法的算法,而不是梯度下降。从实证上来看,我们展示了连续的Transformer层的预测与牛顿法的不同迭代非常接近,每个中间层大致计算了3次迭代。相比之下,需要指数级的梯度下降步骤才能匹配额外的Transformer层;这表明Transformers具有相当的收敛速率。 +</p> +<p> +Transformers are remarkably good at in-context learning (ICL) -- learning from demonstrations without parameter updates -- but how they perform ICL remains a mystery. Recent work suggests that Transformers may learn in-context by internally running Gradient Descent, a first-order optimization method. In this paper, we instead demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL. Focusing on in-context linear regression, we show that Transformers learn to implement an algorithm very similar to Iterative Newton's Method, a higher-order optimization method, rather than Gradient Descent. Empirically, we show that predictions from successive Transformer layers closely match different iterations of Newton's Method linearly, with each middle layer roughly computing 3 iterations. In contrast, exponentially more Gradient Descent steps are needed to match an additional Transformers layer; this suggests that Transformers have an comparable rate of conv +</p>本研究调查了使用大型语言模型(LLMs)来帮助患者和转诊医生识别合适的临床试验的潜力,并引入了TrialGPT架构,该架构能够准确预测合格性并提供解释,实验证明其有效性。http://arxiv.org/abs/2307.15051<p> +使用大型语言模型将患者与临床试验匹配 +</p> +<p> +Matching Patients to Clinical Trials with Large Language Models. (arXiv:2307.15051v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2307.15051 +</p> +<p> +本研究调查了使用大型语言模型(LLMs)来帮助患者和转诊医生识别合适的临床试验的潜力,并引入了TrialGPT架构,该架构能够准确预测合格性并提供解释,实验证明其有效性。 +</p> +<p> + +</p> +<p> +临床试验在推动药物研发和基于证据的医学方面非常重要,但患者招募常常受到限制。在这项工作中,我们调查了使用大型语言模型(LLMs)来帮助患者和转诊医生识别合适的临床试验的潜力。具体而言,我们引入了一种新颖的架构TrialGPT,采用LLMs预测基于标准的合格性,并提供详细的解释,并根据患者病历中的自由文本来对候选临床试验进行排名和排除。我们在三个公开可用的184名患者和18,238个注释的临床试验的队列上评估了TrialGPT。实验结果表明几个关键发现:第一,TrialGPT在标准级别的预测准确性上表现出很高的准确率,并提供准确的解释。第二,TrialGPT的综合试验级别评分与专家标注的合格性高度相关。第三,这些评分 +</p> +<p> +Clinical trials are vital in advancing drug development and evidence-based medicine, but their success is often hindered by challenges in patient recruitment. In this work, we investigate the potential of large language models (LLMs) to assist individual patients and referral physicians in identifying suitable clinical trials from an extensive selection. Specifically, we introduce TrialGPT, a novel architecture employing LLMs to predict criterion-level eligibility with detailed explanations, which are then aggregated for ranking and excluding candidate clinical trials based on free-text patient notes. We evaluate TrialGPT on three publicly available cohorts of 184 patients and 18,238 annotated clinical trials. The experimental results demonstrate several key findings: First, TrialGPT achieves high criterion-level prediction accuracy with faithful explanations. Second, the aggregated trial-level TrialGPT scores are highly correlated with expert eligibility annotations. Third, these scor +</p>本研究探索机器翻译可解释性评估指标,提供综合综述和最新方法,并贡献下一代方法的愿景。http://arxiv.org/abs/2306.13041<p> +机器翻译可解释性评估指标的探索 +</p> +<p> +Towards Explainable Evaluation Metrics for Machine Translation. (arXiv:2306.13041v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2306.13041 +</p> +<p> +本研究探索机器翻译可解释性评估指标,提供综合综述和最新方法,并贡献下一代方法的愿景。 +</p> +<p> + +</p> +<p> +与传统的词汇重叠度量(如BLEU)不同,大多数当前用于机器翻译评估的指标(例如COMET或BERTScore)基于黑盒子的大型语言模型。它们通常与人类判断具有强相关性,但是最近的研究表明,较低质量的传统指标仍然占主导地位,其中一个潜在原因是它们的决策过程更透明。因此,为了促进新的高质量指标的更广泛接受,解释性变得至关重要。在这篇概念论文中,我们确定了可解释机器翻译指标的关键属性和目标,并提供了最近技术的综合综述,将它们与我们确立的目标和属性联系起来。在这个背景下,我们还讨论基于生成模型(如ChatGPT和GPT4)的可解释指标的最新先进方法。最后,我们贡献了下一代方法的愿景,包括自然语言e。 +</p> +<p> +Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for machine translation (for example, COMET or BERTScore) are based on black-box large language models. They often achieve strong correlations with human judgments, but recent research indicates that the lower-quality classical metrics remain dominant, one of the potential reasons being that their decision processes are more transparent. To foster more widespread acceptance of novel high-quality metrics, explainability thus becomes crucial. In this concept paper, we identify key properties as well as key goals of explainable machine translation metrics and provide a comprehensive synthesis of recent techniques, relating them to our established goals and properties. In this context, we also discuss the latest state-of-the-art approaches to explainable metrics based on generative models such as ChatGPT and GPT4. Finally, we contribute a vision of next-generation approaches, including natural language e </p> \ No newline at end of file diff --git a/cs.IR.md b/cs.IR.md index 1e34f716f..0e3e7504d 100644 --- a/cs.IR.md +++ b/cs.IR.md @@ -2,9 +2,52 @@ | Ref | Title | Summary | | --- | --- | --- | - +| [^1] | [Unlocking the `Why' of Buying: Introducing a New Dataset and Benchmark for Purchase Reason and Post-Purchase Experience](https://arxiv.org/abs/2402.13417) | 引入了一个新的数据集和基准,旨在揭示用户购买决策背后的原因,提出了一个有效的基于LLM的方法来生成高质量、个性化的购买原因解释。 | +| [^2] | [Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT](https://arxiv.org/abs/2402.07440) | 该论文介绍了LoCoV1,一个用于评估长上下文检索性能的新型基准测试,并提出了M2-BERT检索编码器,用于处理长上下文检索,解决了如何评估性能、预训练语言模型以及如何进行微调的挑战。 | +| [^3] | [SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks.](http://arxiv.org/abs/2401.15299) | SupplyGraph是一个基准数据集,用于使用图神经网络进行供应链规划。该数据集包含了来自孟加拉国一家领先快速消费品公司的实际数据,用于优化、预测和解决供应链问题。数据集中的时间数据作为节点特征,可用于销售预测、生产计划和故障识别。 | # 详细 +[^1]: 解锁购买的“为何”:引入一个新的数据集和购买原因与后购买体验的基准 + + Unlocking the `Why' of Buying: Introducing a New Dataset and Benchmark for Purchase Reason and Post-Purchase Experience + + [https://arxiv.org/abs/2402.13417](https://arxiv.org/abs/2402.13417) + + 引入了一个新的数据集和基准,旨在揭示用户购买决策背后的原因,提出了一个有效的基于LLM的方法来生成高质量、个性化的购买原因解释。 + + + + 解释对于提高现代推荐系统中用户信任和理解至关重要。为了构建真正可解释的系统,我们需要能阐明用户为何做出选择的高质量数据集。我们提出了一个新颖的购买原因解释任务。为此,我们引入了一种基于LLM的方法来生成一个由真实用户解释为何做出某些购买决策的文本解释的数据集。我们诱导LLM明确区分用户评论中购买产品背后的原因和购买后的体验。自动化的LLM驱动评估以及小规模人工评估证实了我们方法获取高质量、个性化解释的有效性。我们在两个个性化数据集上对该数据集进行基准测试。 + + arXiv:2402.13417v1 Announce Type: new Abstract: Explanations are crucial for enhancing user trust and understanding within modern recommendation systems. To build truly explainable systems, we need high-quality datasets that elucidate why users make choices. While previous efforts have focused on extracting users' post-purchase sentiment in reviews, they ignore the reasons behind the decision to buy. In our work, we propose a novel purchase reason explanation task. To this end, we introduce an LLM-based approach to generate a dataset that consists of textual explanations of why real users make certain purchase decisions. We induce LLMs to explicitly distinguish between the reasons behind purchasing a product and the experience after the purchase in a user review. An automated, LLM-driven evaluation, as well as a small scale human evaluation, confirms the effectiveness of our approach to obtaining high-quality, personalized explanations. We benchmark this dataset on two personalized + +[^2]: 使用LoCo和M2-BERT进行基准测试和构建长上下文检索模型 + + Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT + + [https://arxiv.org/abs/2402.07440](https://arxiv.org/abs/2402.07440) + + 该论文介绍了LoCoV1,一个用于评估长上下文检索性能的新型基准测试,并提出了M2-BERT检索编码器,用于处理长上下文检索,解决了如何评估性能、预训练语言模型以及如何进行微调的挑战。 + + + + 检索管道是许多机器学习系统中的重要组成部分,在文档很长(例如10K个标记或更多)且需要在整个文本中合成信息来确定相关文档的领域中表现不佳。开发适用于这些领域的长上下文检索编码器面临三个挑战:(1)如何评估长上下文检索性能,(2)如何预训练基本语言模型以表示短上下文(对应查询)和长上下文(对应文档),以及(3)如何根据GPU内存限制下的批量大小限制对该模型进行微调。为了解决这些挑战,我们首先介绍了LoCoV1,这是一个新颖的12个任务基准测试,用于测量在不可分块或不有效的情况下的长上下文检索。接下来,我们提出了M2-BERT检索编码器,这是一个80M参数状态空间编码器模型,采用Monarch Mixer架构构建,能够进行可扩展的检索。 + + Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e.g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text. Developing long-context retrieval encoders suitable for these domains raises three challenges: (1) how to evaluate long-context retrieval performance, (2) how to pretrain a base language model to represent both short contexts (corresponding to queries) and long contexts (corresponding to documents), and (3) how to fine-tune this model for retrieval under the batch size limitations imposed by GPU memory constraints. To address these challenges, we first introduce LoCoV1, a novel 12 task benchmark constructed to measure long-context retrieval where chunking is not possible or not effective. We next present the M2-BERT retrieval encoder, an 80M parameter state-space encoder model built from the Monarch Mixer architecture, capable of scali + +[^3]: SupplyGraph: 使用图神经网络进行供应链规划的基准数据集 + + SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks. (arXiv:2401.15299v1 [cs.LG]) + + [http://arxiv.org/abs/2401.15299](http://arxiv.org/abs/2401.15299) + + SupplyGraph是一个基准数据集,用于使用图神经网络进行供应链规划。该数据集包含了来自孟加拉国一家领先快速消费品公司的实际数据,用于优化、预测和解决供应链问题。数据集中的时间数据作为节点特征,可用于销售预测、生产计划和故障识别。 + + + + 图神经网络(GNNs)在不同领域如运输、生物信息学、语言处理和计算机视觉中取得了重要进展。然而,在将GNNs应用于供应链网络方面,目前尚缺乏研究。供应链网络在结构上类似于图形,使其成为应用GNN方法的理想选择。这为优化、预测和解决供应链问题开辟了无限可能。然而,此方法的一个主要障碍在于缺乏真实世界的基准数据集以促进使用GNN来研究和解决供应链问题。为了解决这个问题,我们提供了一个来自孟加拉国一家领先的快速消费品公司的实际基准数据集,该数据集侧重于用于生产目的的供应链规划的时间任务。该数据集包括时间数据作为节点特征,以实现销售预测、生产计划和故障识别。 + Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision. However, there is a noticeable absence of research on applying GNNs to supply chain networks. Supply chain networks are inherently graph-like in structure, making them prime candidates for applying GNN methodologies. This opens up a world of possibilities for optimizing, predicting, and solving even the most complex supply chain problems. A major setback in this approach lies in the absence of real-world benchmark datasets to facilitate the research and resolution of supply chain problems using GNNs. To address the issue, we present a real-world benchmark dataset for temporal tasks, obtained from one of the leading FMCG companies in Bangladesh, focusing on supply chain planning for production purposes. The dataset includes temporal data as node features to enable sales predictions, production planning, and the identification of fa + diff --git a/cs.IR.xml b/cs.IR.xml index a79979203..1bdae0c98 100644 --- a/cs.IR.xml +++ b/cs.IR.xml @@ -1 +1,61 @@ -Chat Arxiv cs.IRhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.IR \ No newline at end of file +Chat Arxiv cs.IRhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.IR引入了一个新的数据集和基准,旨在揭示用户购买决策背后的原因,提出了一个有效的基于LLM的方法来生成高质量、个性化的购买原因解释。https://arxiv.org/abs/2402.13417<p> +解锁购买的“为何”:引入一个新的数据集和购买原因与后购买体验的基准 +</p> +<p> +Unlocking the `Why' of Buying: Introducing a New Dataset and Benchmark for Purchase Reason and Post-Purchase Experience +</p> +<p> +https://arxiv.org/abs/2402.13417 +</p> +<p> +引入了一个新的数据集和基准,旨在揭示用户购买决策背后的原因,提出了一个有效的基于LLM的方法来生成高质量、个性化的购买原因解释。 +</p> +<p> + +</p> +<p> +解释对于提高现代推荐系统中用户信任和理解至关重要。为了构建真正可解释的系统,我们需要能阐明用户为何做出选择的高质量数据集。我们提出了一个新颖的购买原因解释任务。为此,我们引入了一种基于LLM的方法来生成一个由真实用户解释为何做出某些购买决策的文本解释的数据集。我们诱导LLM明确区分用户评论中购买产品背后的原因和购买后的体验。自动化的LLM驱动评估以及小规模人工评估证实了我们方法获取高质量、个性化解释的有效性。我们在两个个性化数据集上对该数据集进行基准测试。 +</p> +<p> +arXiv:2402.13417v1 Announce Type: new Abstract: Explanations are crucial for enhancing user trust and understanding within modern recommendation systems. To build truly explainable systems, we need high-quality datasets that elucidate why users make choices. While previous efforts have focused on extracting users' post-purchase sentiment in reviews, they ignore the reasons behind the decision to buy. In our work, we propose a novel purchase reason explanation task. To this end, we introduce an LLM-based approach to generate a dataset that consists of textual explanations of why real users make certain purchase decisions. We induce LLMs to explicitly distinguish between the reasons behind purchasing a product and the experience after the purchase in a user review. An automated, LLM-driven evaluation, as well as a small scale human evaluation, confirms the effectiveness of our approach to obtaining high-quality, personalized explanations. We benchmark this dataset on two personalized +</p>该论文介绍了LoCoV1,一个用于评估长上下文检索性能的新型基准测试,并提出了M2-BERT检索编码器,用于处理长上下文检索,解决了如何评估性能、预训练语言模型以及如何进行微调的挑战。https://arxiv.org/abs/2402.07440<p> +使用LoCo和M2-BERT进行基准测试和构建长上下文检索模型 +</p> +<p> +Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT +</p> +<p> +https://arxiv.org/abs/2402.07440 +</p> +<p> +该论文介绍了LoCoV1,一个用于评估长上下文检索性能的新型基准测试,并提出了M2-BERT检索编码器,用于处理长上下文检索,解决了如何评估性能、预训练语言模型以及如何进行微调的挑战。 +</p> +<p> + +</p> +<p> +检索管道是许多机器学习系统中的重要组成部分,在文档很长(例如10K个标记或更多)且需要在整个文本中合成信息来确定相关文档的领域中表现不佳。开发适用于这些领域的长上下文检索编码器面临三个挑战:(1)如何评估长上下文检索性能,(2)如何预训练基本语言模型以表示短上下文(对应查询)和长上下文(对应文档),以及(3)如何根据GPU内存限制下的批量大小限制对该模型进行微调。为了解决这些挑战,我们首先介绍了LoCoV1,这是一个新颖的12个任务基准测试,用于测量在不可分块或不有效的情况下的长上下文检索。接下来,我们提出了M2-BERT检索编码器,这是一个80M参数状态空间编码器模型,采用Monarch Mixer架构构建,能够进行可扩展的检索。 +</p> +<p> +Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e.g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text. Developing long-context retrieval encoders suitable for these domains raises three challenges: (1) how to evaluate long-context retrieval performance, (2) how to pretrain a base language model to represent both short contexts (corresponding to queries) and long contexts (corresponding to documents), and (3) how to fine-tune this model for retrieval under the batch size limitations imposed by GPU memory constraints. To address these challenges, we first introduce LoCoV1, a novel 12 task benchmark constructed to measure long-context retrieval where chunking is not possible or not effective. We next present the M2-BERT retrieval encoder, an 80M parameter state-space encoder model built from the Monarch Mixer architecture, capable of scali +</p>SupplyGraph是一个基准数据集,用于使用图神经网络进行供应链规划。该数据集包含了来自孟加拉国一家领先快速消费品公司的实际数据,用于优化、预测和解决供应链问题。数据集中的时间数据作为节点特征,可用于销售预测、生产计划和故障识别。http://arxiv.org/abs/2401.15299<p> +SupplyGraph: 使用图神经网络进行供应链规划的基准数据集 +</p> +<p> +SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks. (arXiv:2401.15299v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2401.15299 +</p> +<p> +SupplyGraph是一个基准数据集,用于使用图神经网络进行供应链规划。该数据集包含了来自孟加拉国一家领先快速消费品公司的实际数据,用于优化、预测和解决供应链问题。数据集中的时间数据作为节点特征,可用于销售预测、生产计划和故障识别。 +</p> +<p> + +</p> +<p> +图神经网络(GNNs)在不同领域如运输、生物信息学、语言处理和计算机视觉中取得了重要进展。然而,在将GNNs应用于供应链网络方面,目前尚缺乏研究。供应链网络在结构上类似于图形,使其成为应用GNN方法的理想选择。这为优化、预测和解决供应链问题开辟了无限可能。然而,此方法的一个主要障碍在于缺乏真实世界的基准数据集以促进使用GNN来研究和解决供应链问题。为了解决这个问题,我们提供了一个来自孟加拉国一家领先的快速消费品公司的实际基准数据集,该数据集侧重于用于生产目的的供应链规划的时间任务。该数据集包括时间数据作为节点特征,以实现销售预测、生产计划和故障识别。 +</p> +<p> +Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision. However, there is a noticeable absence of research on applying GNNs to supply chain networks. Supply chain networks are inherently graph-like in structure, making them prime candidates for applying GNN methodologies. This opens up a world of possibilities for optimizing, predicting, and solving even the most complex supply chain problems. A major setback in this approach lies in the absence of real-world benchmark datasets to facilitate the research and resolution of supply chain problems using GNNs. To address the issue, we present a real-world benchmark dataset for temporal tasks, obtained from one of the leading FMCG companies in Bangladesh, focusing on supply chain planning for production purposes. The dataset includes temporal data as node features to enable sales predictions, production planning, and the identification of fa +</p> \ No newline at end of file diff --git a/cs.LG.md b/cs.LG.md index 2cb8c075a..8535ee2e8 100644 --- a/cs.LG.md +++ b/cs.LG.md @@ -2,142 +2,607 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models](https://arxiv.org/abs/2404.02827) | BAdam提出了一种内存高效的全参数微调大型语言模型的方法,并在实验中展现出优越的收敛行为以及在性能评估中的优势。 | -| [^2] | [Swarm Characteristics Classification Using Neural Networks](https://arxiv.org/abs/2403.19572) | 本文研究了使用监督神经网络时间序列分类(NN TSC)预测军事背景下群体自主体的关键属性和战术,以及展示了NN TSC在快速推断攻击群体情报方面的有效性。 | -| [^3] | [A Survey on Deep Learning and State-of-the-arts Applications](https://arxiv.org/abs/2403.17561) | 深度学习是解决复杂问题的强大工具,本研究旨在全面审视深度学习模型及其应用的最新发展 | -| [^4] | [ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image](https://arxiv.org/abs/2403.09871) | ThermoHands提出了一个新的基准ThermoHands,旨在解决热图中主观视角3D手部姿势估计的挑战,介绍了一个具有双transformer模块的定制基线方法TheFormer,表明热成像在恶劣条件下实现稳健的3D手部姿势估计的有效性。 | -| [^5] | [DEEP-IoT: Downlink-Enhanced Efficient-Power Internet of Things](https://arxiv.org/abs/2403.00321) | DEEP-IoT通过“更多监听,更少传输”的策略,挑战和转变了传统的物联网通信模型,大幅降低能耗并提高设备寿命。 | -| [^6] | [Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits](https://arxiv.org/abs/2402.15171) | 提出了一种协方差自适应的最小二乘算法,利用在线估计协方差结构,相对于基于代理方差的算法获得改进的遗憾上界,特别在协方差系数全为非负时,能有效地利用半臂反馈,并在各种参数设置下表现优异。 | -| [^7] | [CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion](https://arxiv.org/abs/2402.14551) | CLCE方法结合了标签感知对比学习与交叉熵损失,通过协同利用难例挖掘提高了性能表现 | -| [^8] | [ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters](https://arxiv.org/abs/2402.10930) | ConSmax是一种硬件友好型Softmax替代方案,通过引入可学习参数,在不影响性能的情况下实现了对原Softmax关键任务的高效处理。 | -| [^9] | [Multi-View Symbolic Regression](https://arxiv.org/abs/2402.04298) | 多视角符号回归(MvSR)是一种同时考虑多个数据集的符号回归方法,能够找到一个参数化解来准确拟合所有数据集,解决了传统方法无法处理不同实验设置的问题。 | +| [^1] | [SugarcaneNet2024: An Optimized Weighted Average Ensemble Approach of LASSO Regularized Pre-trained Models for Sugarcane Disease Classification](https://arxiv.org/abs/2403.18870) | SugarcaneNet2024是通过优化加权平均集成LASSO正则化的预训练模型,在甘蔗病害分类中表现出色,具有快速准确的检测能力。 | +| [^2] | [Brain Stroke Segmentation Using Deep Learning Models: A Comparative Study](https://arxiv.org/abs/2403.17177) | 本研究通过比较深度学习模型在脑卒中分割上的表现,探讨了是否需要高级别设计来获得最佳结果。 | +| [^3] | [AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression](https://arxiv.org/abs/2403.13565) | 提出了一种针对高维回归的自适应迁移学习方法,可以根据可迁移结构自适应检测和聚合特征和样本的可迁移结构。 | +| [^4] | [Learning-Based Pricing and Matching for Two-Sided Queues](https://arxiv.org/abs/2403.11093) | 设计定价和匹配算法以最大化平台利润,在未知需求和供应函数下,保持顾客和服务器队列长度低于阈值 | +| [^5] | [Interpretable Machine Learning for Survival Analysis](https://arxiv.org/abs/2403.10250) | 可解释的机器学习在生存分析中的应用促进了透明度和公平性,揭示了模型的潜在偏见和限制,并提供了更符合数学原理的特征影响和风险因素预测方法。 | +| [^6] | [Robust Subgraph Learning by Monitoring Early Training Representations](https://arxiv.org/abs/2403.09901) | 本文引入了一种名为SHERD的新技术,通过监控图神经网络(GNNs)早期训练表示中的信息,利用标准距离度量检测易受攻击节点,从而在图输入中实现性能和对抗鲁棒性。 | +| [^7] | [Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume](https://arxiv.org/abs/2403.05100) | 提出新指标对抗超体积来全面评估深度学习模型在多种扰动强度下的鲁棒性,并采用新型训练算法来提高对抗鲁棒性。 | +| [^8] | [Memetic Differential Evolution Methods for Semi-Supervised Clustering](https://arxiv.org/abs/2403.04322) | 本文提出了一种基于差分进化范式的新颖遗传模拟策略,用于解决半监督聚类问题,是第一次在这个领域尝试定义这样的方法。 | +| [^9] | [ARNN: Attentive Recurrent Neural Network for Multi-channel EEG Signals to Identify Epileptic Seizures](https://arxiv.org/abs/2403.03276) | ARNN提出了一种注意力循环神经网络,用于处理多通道脑电图信号,具有线性复杂度和并行计算,结合注意力和LSTM gate的优势,并避免了它们的缺点。 | +| [^10] | [Non-Convex Stochastic Composite Optimization with Polyak Momentum](https://arxiv.org/abs/2403.02967) | 本文研究了具有Polyak动量的随机近端梯度方法,在非凸复合优化问题中实现了最佳收敛速度,无论批量大小如何。 | +| [^11] | [Pooling Image Datasets With Multiple Covariate Shift and Imbalance](https://arxiv.org/abs/2403.02598) | 本文从范畴论的角度提供了一个简单而有效的解决方案,完全避免了复杂的多阶段训练流程。 | +| [^12] | [The Implicit Bias of Heterogeneity towards Invariance and Causality](https://arxiv.org/abs/2403.01420) | 异质性对于回归任务中出现因果性的贡献解释了为何大型语言模型能够从关联性训练中揭示因果关联。 | +| [^13] | [Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models](https://arxiv.org/abs/2403.01101) | 通过代理进行特征对齐,以解决预先计算特征无法区分标记样本类别和避免通过代理模型选择样本时牺牲宝贵预训练信息的问题。 | +| [^14] | [When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning](https://arxiv.org/abs/2402.17747) | RLHF在考虑部分观察性时可能导致策略欺骗性地夸大性能或过度辩护行为,我们提出了数学条件来解决这些问题,并警告不要盲目应用RLHF在部分可观测情况下。 | +| [^15] | [Supervised machine learning for microbiomics: bridging the gap between current and best practices](https://arxiv.org/abs/2402.17621) | 该研究通过分析大量期刊文章,总结了监督机器学习在微生物组学中的现有实践,探讨了实验设计方法的优缺点,并提出了如何避免常见实验设计缺陷的指导。 | +| [^16] | [Interpreting Grokked Transformers in Complex Modular Arithmetic](https://arxiv.org/abs/2402.16726) | 本研究通过可解释的逆向工程在复杂模块化算术中观察了Transformer内部电路学习过程,并发现减法在Transformer上造成了强烈的不对称性,乘法需要余弦偏置分量,多项式叠加了基本算术模式,但在挑战性情况下并不清晰,Grokking甚至可以在具有基本对称和交替表达式的高次公式中轻松发生。 | +| [^17] | [Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond](https://arxiv.org/abs/2402.14259) | 本论文提出了一种新方法单词序列熵(WSE),用于在自由形式医学问答任务中量化答案的不确定性,相比其他基线方法表现更优秀。 | +| [^18] | [Conservative and Risk-Aware Offline Multi-Agent Reinforcement Learning for Digital Twins](https://arxiv.org/abs/2402.08421) | 本研究提出了一种应用于数字孪生的离线多智能体强化学习方案,通过整合分布式强化学习和保守Q学习来解决环境的不确定性和有限数据带来的认识不确定性。 | +| [^19] | [Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT](https://arxiv.org/abs/2402.07440) | 该论文介绍了LoCoV1,一个用于评估长上下文检索性能的新型基准测试,并提出了M2-BERT检索编码器,用于处理长上下文检索,解决了如何评估性能、预训练语言模型以及如何进行微调的挑战。 | +| [^20] | [Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks](https://arxiv.org/abs/2402.05271) | 了解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。前人的研究表明,在训练过程中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这被称为神经特征分析(NFA)。本研究解释了这种相关性的出现,并发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 | +| [^21] | [Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process](https://arxiv.org/abs/2402.04146) | 这篇论文提出了一种基于潜变量高斯过程的多源数据融合框架,用于解决多个数据源之间质量和全面性差异给系统优化带来的问题。 | +| [^22] | [GPT4Battery: An LLM-driven Framework for Adaptive State of Health Estimation of Raw Li-ion Batteries](https://arxiv.org/abs/2402.00068) | 本论文提出了一种基于LLM的框架,可以适应不同类型的锂离子电池,实现准确的健康状态估计。这项工作解决了生成训练数据的时间和资源成本高的挑战,并在实际应用中具有良好的泛化能力。 | +| [^23] | [Large Language Models are Null-Shot Learners](https://arxiv.org/abs/2401.08273) | 本文提出了零射击提示方法,通过利用大规模语言模型中的错误信息来指导模型进行任务,以提高任务表现。实验结果表明,在不同数据集上,包括阅读理解、算术推理和闭卷问答,模型性能有所提升。这些结果也显示出不同模型之间存在不同程度的错误信息。 | +| [^24] | [SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks.](http://arxiv.org/abs/2401.15299) | SupplyGraph是一个基准数据集,用于使用图神经网络进行供应链规划。该数据集包含了来自孟加拉国一家领先快速消费品公司的实际数据,用于优化、预测和解决供应链问题。数据集中的时间数据作为节点特征,可用于销售预测、生产计划和故障识别。 | +| [^25] | [Efficient generative adversarial networks using linear additive-attention Transformers.](http://arxiv.org/abs/2401.09596) | 这项工作提出了一种名为LadaGAN的高效生成对抗网络,它使用了一种名为Ladaformer的新型Transformer块,通过线性加法注意机制来降低计算复杂度并解决训练不稳定性问题。 | +| [^26] | [A Comprehensive Study of Knowledge Editing for Large Language Models.](http://arxiv.org/abs/2401.01286) | 本研究全面研究了大型语言模型的知识编辑,旨在有效修改模型的行为,同时保持整体性能。 | +| [^27] | [Navigating Privacy and Copyright Challenges Across the Data Lifecycle of Generative AI.](http://arxiv.org/abs/2311.18252) | 这项研究探讨了生成性人工智能中数据隐私和版权保护的多方面挑战,并提出了将技术创新与伦理前瞻相结合的综合方法,旨在全面解决这些问题。 | +| [^28] | [A Scalable Training Strategy for Blind Multi-Distribution Noise Removal.](http://arxiv.org/abs/2310.20064) | 提出了一种使用自适应采样/主动学习策略来训练去噪网络的方法,解决了通用去噪网络在不同噪声分布下表现差的问题。 | +| [^29] | [Clover: Closed-Loop Verifiable Code Generation.](http://arxiv.org/abs/2310.17807) | Clover是一种闭环可验证代码生成的范式,通过在代码、docstrings和形式注释之间进行一致性检查,确保生成的代码的正确性。 | +| [^30] | [Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models.](http://arxiv.org/abs/2310.17086) | Transformers学会了高阶优化方法,用于上下文学习,通过实现类似于迭代牛顿法的算法,而不是梯度下降。 | +| [^31] | [A Survey of Graph Unlearning.](http://arxiv.org/abs/2310.02164) | 图去学习是负责任人工智能发展的重要进展,通过删除训练模型中的敏感数据痕迹来维护被遗忘的权利。这篇综述性论文首次系统回顾了图去学习的方法,包括了各种方法学,并提供了详细的分类和最新的文献综述,以帮助新进入这个领域的研究人员理解。与差分隐私的关系加深了对在这个背景下隐私保护技术的理解。 | +| [^32] | [A Model-Agnostic Graph Neural Network for Integrating Local and Global Information.](http://arxiv.org/abs/2309.13459) | MaGNet是一种模型无关的图神经网络框架,能够顺序地整合不同顺序的信息,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。 | +| [^33] | [Optimal and Fair Encouragement Policy Evaluation and Learning.](http://arxiv.org/abs/2309.07176) | 本研究探讨了在关键领域中针对鼓励政策的最优和公平评估以及学习的问题,研究发现在人类不遵循治疗建议的情况下,最优策略规则只是建议。同时,针对治疗的异质性和公平考虑因素,决策者的权衡和决策规则也会发生变化。在社会服务领域,研究显示存在一个使用差距问题,那些最有可能受益的人却无法获得这些益服务。 | +| [^34] | [Reinforcement Learning for Financial Index Tracking.](http://arxiv.org/abs/2308.02820) | 本论文提出了针对金融指数跟踪问题的第一个具有动态性的离散时间无穷期模型,它克服了现有模型的一些局限,可以精确计算交易成本,同时考虑了跟踪误差和交易成本之间的权衡,并能有效利用长时间段的数据。我们使用深度强化学习方法解决该模型,解决了由于数据限制导致的问题。 | +| [^35] | [Unmasking Parkinson's Disease with Smile: An AI-enabled Screening Framework.](http://arxiv.org/abs/2308.02588) | 本研究使用微表情视频数据集开发了一种基于人工智能的帕金森病筛查框架,通过分析微笑视频中的特征,实现了89.7%的准确性和89.3%的AUROC值,同时在人群子组上没有检测到偏见。 | +| [^36] | [A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning.](http://arxiv.org/abs/2307.09218) | 遗忘是深度学习中普遍存在的现象,不仅限于连续学习领域。解决遗忘问题面临多个挑战,包括平衡保留旧任务知识与快速学习新任务的挑战,管理任务干扰与冲突目标的挑战,以及防止隐私泄露等。遗忘不总是有害的,可以在某些情况下是有益且可取的,特别是在隐私保护场景中。 | +| [^37] | [Towards Explainable Evaluation Metrics for Machine Translation.](http://arxiv.org/abs/2306.13041) | 本研究探索机器翻译可解释性评估指标,提供综合综述和最新方法,并贡献下一代方法的愿景。 | +| [^38] | [Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding.](http://arxiv.org/abs/2304.03907) | 本文提出了一种基于有限维特征逼近的非线性动态谱嵌入控制算法(SDEC)用于解决随机非线性系统的最优控制问题,并对其进行了理论分析和实验测试。 | +| [^39] | [Smooth Non-Stationary Bandits.](http://arxiv.org/abs/2301.12366) | 本文提出了一种非平稳两臂赌博机问题的策略,能够处理平滑变化,并证明了该策略在二次Lipschitz连续的情况下的遗憾为 $\tilde O(T^{3/5})$。 | +| [^40] | [Analysis of functional neural codes of deep learning models.](http://arxiv.org/abs/2205.10952) | 本研究使用自组织映射(SOM)分析了深度学习模型中与决策相关的内部编码,发现浅层将特征压缩到紧凑空间中,而深层将特征空间扩展,并指出压缩特征可能导致对敌对扰动的脆弱性。 | # 详细 -[^1]: BAdam:面向大型语言模型的内存高效全参数训练方法 +[^1]: SugarcaneNet2024: LASSO正则化的预训练模型的优化加权平均集成方法用于甘蔗病害分类 - BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models + SugarcaneNet2024: An Optimized Weighted Average Ensemble Approach of LASSO Regularized Pre-trained Models for Sugarcane Disease Classification - [https://arxiv.org/abs/2404.02827](https://arxiv.org/abs/2404.02827) + [https://arxiv.org/abs/2403.18870](https://arxiv.org/abs/2403.18870) - BAdam提出了一种内存高效的全参数微调大型语言模型的方法,并在实验中展现出优越的收敛行为以及在性能评估中的优势。 + SugarcaneNet2024是通过优化加权平均集成LASSO正则化的预训练模型,在甘蔗病害分类中表现出色,具有快速准确的检测能力。 - 这项工作提出了BAdam,这是一种利用Adam作为内部求解器的块坐标优化框架的优化器。BAdam提供了一种内存高效的方法,用于对大型语言模型进行全参数微调,并且由于链式规则属性减少了反向过程的运行时间。在实验中,我们将BAdam应用于在Alpaca-GPT4数据集上使用单个RTX3090-24GB GPU进行指导微调的Llama 2-7B模型。结果表明,与LoRA和LOMO相比,BAdam展现出了优越的收敛行为。此外,我们通过使用MT-bench对指导微调模型进行下游性能评估,结果显示BAdam在适度超越LoRA的基础上更显著地优于LOMO。最后,我们将BAdam与Adam在中等任务上进行了比较,即在SuperGLUE基准上对RoBERTa-large进行微调。结果表明,BAdam能够缩小与Adam之间的性能差距。我们的代码 + 甘蔗作为世界糖业的关键作物,容易受多种病害侵害,这些病害对其产量和质量都有重大负面影响。为了有效管理和实施预防措施,必须及时准确地检测病害。本研究提出了一种名为SugarcaneNet2024的独特模型,通过叶片图像处理,能够优于先前方法自动快速检测甘蔗病害。我们提出的模型汇总了七个定制的、经过LASSO正则化的预训练模型的优化加权平均集成,特别是InceptionV3、InceptionResNetV2、DenseNet201、DenseNet169、Xception和ResNet152V2。最初,我们在这些预训练模型底部添加了三层更密集层,具有0.0001的LASSO正则化,三个30%的dropout层和三个启用renorm的批量归一化,以提高性能。 - arXiv:2404.02827v1 Announce Type: new Abstract: This work presents BAdam, an optimizer that leverages the block coordinate optimization framework with Adam as the inner solver. BAdam offers a memory efficient approach to the full parameter finetuning of large language models and reduces running time of the backward process thanks to the chain rule property. Experimentally, we apply BAdam to instruction-tune the Llama 2-7B model on the Alpaca-GPT4 dataset using a single RTX3090-24GB GPU. The results indicate that BAdam exhibits superior convergence behavior in comparison to LoRA and LOMO. Furthermore, our downstream performance evaluation of the instruction-tuned models using the MT-bench shows that BAdam modestly surpasses LoRA and more substantially outperforms LOMO. Finally, we compare BAdam with Adam on a medium-sized task, i.e., finetuning RoBERTa-large on the SuperGLUE benchmark. The results demonstrate that BAdam is capable of narrowing the performance gap with Adam. Our code is + arXiv:2403.18870v1 Announce Type: cross Abstract: Sugarcane, a key crop for the world's sugar industry, is prone to several diseases that have a substantial negative influence on both its yield and quality. To effectively manage and implement preventative initiatives, diseases must be detected promptly and accurately. In this study, we present a unique model called sugarcaneNet2024 that outperforms previous methods for automatically and quickly detecting sugarcane disease through leaf image processing. Our proposed model consolidates an optimized weighted average ensemble of seven customized and LASSO-regularized pre-trained models, particularly InceptionV3, InceptionResNetV2, DenseNet201, DenseNet169, Xception, and ResNet152V2. Initially, we added three more dense layers with 0.0001 LASSO regularization, three 30% dropout layers, and three batch normalizations with renorm enabled at the bottom of these pre-trained models to improve the performance. The accuracy of sugarcane leaf dise -[^2]: 使用神经网络对群体特性进行分类 +[^2]: 使用深度学习模型进行脑卒中分割:一项比较研究 - Swarm Characteristics Classification Using Neural Networks + Brain Stroke Segmentation Using Deep Learning Models: A Comparative Study - [https://arxiv.org/abs/2403.19572](https://arxiv.org/abs/2403.19572) + [https://arxiv.org/abs/2403.17177](https://arxiv.org/abs/2403.17177) - 本文研究了使用监督神经网络时间序列分类(NN TSC)预测军事背景下群体自主体的关键属性和战术,以及展示了NN TSC在快速推断攻击群体情报方面的有效性。 + 本研究通过比较深度学习模型在脑卒中分割上的表现,探讨了是否需要高级别设计来获得最佳结果。 - 理解群体自主体的特性对于国防和安全应用至关重要。本文介绍了使用监督神经网络时间序列分类(NN TSC)来预测军事环境中群体自主体的关键属性和战术的研究。具体地,NN TSC被应用于推断两个二进制属性 - 通信和比例导航 - 这两者结合定义了四种互斥的群体战术。我们发现文献中对于使用神经网络进行群体分类存在一定的空白,并展示了NN TSC在快速推断有关攻击群体情报以指导反制动作方面的有效性。通过模拟的群体对战,我们评估了NN TSC在观察窗口要求、噪声鲁棒性和对群体规模的可扩展性方面的性能。关键发现显示NN能够使用较短的观察窗口以97%的准确率预测群体行为。 + 脑卒中分割在脑卒中患者的诊断和治疗中发挥着关键作用,通过提供受影响脑区域的空间信息和受损程度。准确分割脑卒中病变是一项具有挑战性的任务,因为传统的手工技术耗时且容易出错。最近,先进的深度模型已被引入用于一般医学图像分割,展示出在特定数据集上评估时超越许多最先进网络的有前景结果。随着视觉Transformer的出现,已经基于它们引入了几种模型,而其他一些则旨在设计基于传统卷积层来提取像Transformer这样的长程依赖的更好模块。是否对所有分割案例都需要这样高级别的设计来实现最佳结果的问题尚未得到解答。在这项研究中,我们选择了四种类型的深度学习模型 - arXiv:2403.19572v1 Announce Type: new Abstract: Understanding the characteristics of swarming autonomous agents is critical for defense and security applications. This article presents a study on using supervised neural network time series classification (NN TSC) to predict key attributes and tactics of swarming autonomous agents for military contexts. Specifically, NN TSC is applied to infer two binary attributes - communication and proportional navigation - which combine to define four mutually exclusive swarm tactics. We identify a gap in literature on using NNs for swarm classification and demonstrate the effectiveness of NN TSC in rapidly deducing intelligence about attacking swarms to inform counter-maneuvers. Through simulated swarm-vs-swarm engagements, we evaluate NN TSC performance in terms of observation window requirements, noise robustness, and scalability to swarm size. Key findings show NNs can predict swarm behaviors with 97% accuracy using short observation windows of + arXiv:2403.17177v1 Announce Type: cross Abstract: Stroke segmentation plays a crucial role in the diagnosis and treatment of stroke patients by providing spatial information about affected brain regions and the extent of damage. Segmenting stroke lesions accurately is a challenging task, given that conventional manual techniques are time consuming and prone to errors. Recently, advanced deep models have been introduced for general medical image segmentation, demonstrating promising results that surpass many state of the art networks when evaluated on specific datasets. With the advent of the vision Transformers, several models have been introduced based on them, while others have aimed to design better modules based on traditional convolutional layers to extract long-range dependencies like Transformers. The question of whether such high-level designs are necessary for all segmentation cases to achieve the best results remains unanswered. In this study, we selected four types of deep -[^3]: 深度学习及其最新应用综述 +[^3]: AdaTrans:针对高维回归的特征自适应与样本自适应迁移学习 - A Survey on Deep Learning and State-of-the-arts Applications + AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression - [https://arxiv.org/abs/2403.17561](https://arxiv.org/abs/2403.17561) + [https://arxiv.org/abs/2403.13565](https://arxiv.org/abs/2403.13565) - 深度学习是解决复杂问题的强大工具,本研究旨在全面审视深度学习模型及其应用的最新发展 + 提出了一种针对高维回归的自适应迁移学习方法,可以根据可迁移结构自适应检测和聚合特征和样本的可迁移结构。 - 深度学习, 是人工智能的一个分支,是一种利用多层互连单元(神经元)从原始输入数据中直接学习复杂模式和表示的计算模型。受到这种学习能力的赋能,深度学习已成为解决复杂问题的强大工具,是许多突破性技术和创新的核心驱动力。构建深度学习模型是一项具有挑战性的任务,因为算法的复杂性和现实问题的动态性。有几项研究回顾了深度学习的概念和应用。然而,这些研究大多集中于深度学习模型类型和卷积神经网络架构,对深度学习模型及其在不同领域解决复杂问题的最新发展的覆盖面有限。因此,受到这些限制的启发,本研究旨在全面审视th + 我们考虑高维背景下的迁移学习问题,在该问题中,特征维度大于样本大小。为了学习可迁移的信息,该信息可能在特征或源样本之间变化,我们提出一种自适应迁移学习方法,可以检测和聚合特征-wise (F-AdaTrans)或样本-wise (S-AdaTrans)可迁移结构。我们通过采用一种新颖的融合惩罚方法,结合权重,可以根据可迁移结构进行调整。为了选择权重,我们提出了一个在理论上建立,数据驱动的过程,使得 F-AdaTrans 能够选择性地将可迁移的信号与目标融合在一起,同时滤除非可迁移的信号,S-AdaTrans则可以获得每个源样本传递的信息的最佳组合。我们建立了非渐近速率,可以在特殊情况下恢复现有的近最小似乎最优速率。效果证明... - arXiv:2403.17561v1 Announce Type: new Abstract: Deep learning, a branch of artificial intelligence, is a computational model that uses multiple layers of interconnected units (neurons) to learn intricate patterns and representations directly from raw input data. Empowered by this learning capability, it has become a powerful tool for solving complex problems and is the core driver of many groundbreaking technologies and innovations. Building a deep learning model is a challenging task due to the algorithm`s complexity and the dynamic nature of real-world problems. Several studies have reviewed deep learning concepts and applications. However, the studies mostly focused on the types of deep learning models and convolutional neural network architectures, offering limited coverage of the state-of-the-art of deep learning models and their applications in solving complex problems across different domains. Therefore, motivated by the limitations, this study aims to comprehensively review th + arXiv:2403.13565v1 Announce Type: cross Abstract: We consider the transfer learning problem in the high dimensional setting, where the feature dimension is larger than the sample size. To learn transferable information, which may vary across features or the source samples, we propose an adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures. We achieve this by employing a novel fused-penalty, coupled with weights that can adapt according to the transferable structure. To choose the weight, we propose a theoretically informed, data-driven procedure, enabling F-AdaTrans to selectively fuse the transferable signals with the target while filtering out non-transferable signals, and S-AdaTrans to obtain the optimal combination of information transferred from each source sample. The non-asymptotic rates are established, which recover existing near-minimax optimal rates in special cases. The effectivene -[^4]: ThermoHands:一种用于从主观视角热图中估计3D手部姿势的基准 +[^4]: 基于学习的双边队列定价和匹配 - ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image + Learning-Based Pricing and Matching for Two-Sided Queues - [https://arxiv.org/abs/2403.09871](https://arxiv.org/abs/2403.09871) + [https://arxiv.org/abs/2403.11093](https://arxiv.org/abs/2403.11093) - ThermoHands提出了一个新的基准ThermoHands,旨在解决热图中主观视角3D手部姿势估计的挑战,介绍了一个具有双transformer模块的定制基线方法TheFormer,表明热成像在恶劣条件下实现稳健的3D手部姿势估计的有效性。 + 设计定价和匹配算法以最大化平台利润,在未知需求和供应函数下,保持顾客和服务器队列长度低于阈值 - 在这项工作中,我们提出了ThermoHands,这是一个针对基于热图的主观视角3D手部姿势估计的新基准,旨在克服诸如光照变化和遮挡(例如手部穿戴物)等挑战。该基准包括来自28名主体进行手-物体和手-虚拟交互的多样数据集,经过自动化过程准确标注了3D手部姿势。我们引入了一个定制的基线方法TheFormer,利用双transformer模块在热图中实现有效的主观视角3D手部姿势估计。我们的实验结果突显了TheFormer的领先性能,并确认了热成像在实现恶劣条件下稳健的3D手部姿势估计方面的有效性。 + 我们考虑一个具有多种类型顾客和服务器的动态系统。每种等待的顾客或服务器加入一个单独的队列,形成一个具有顾客队列和服务器队列的二部图。平台可以匹配服务器和顾客,如果它们的类型是兼容的。匹配的对将离开系统。平台将根据顾客的类型收取一个价格,当它们到达时,并根据其类型向服务器支付一个价格。每个队列的到达率取决于某些未知的需求或供应函数按价格确定。我们的目标是设计定价和匹配算法,以最大化平台在未知需求和供应函数下的利润,同时保持顾客和服务器的队列长度低于预定阈值。这个系统可以用来建模像乘车共享市场这样的双边市场,有乘客和司机。挑战在于 - arXiv:2403.09871v1 Announce Type: cross Abstract: In this work, we present ThermoHands, a new benchmark for thermal image-based egocentric 3D hand pose estimation, aimed at overcoming challenges like varying lighting and obstructions (e.g., handwear). The benchmark includes a diverse dataset from 28 subjects performing hand-object and hand-virtual interactions, accurately annotated with 3D hand poses through an automated process. We introduce a bespoken baseline method, TheFormer, utilizing dual transformer modules for effective egocentric 3D hand pose estimation in thermal imagery. Our experimental results highlight TheFormer's leading performance and affirm thermal imaging's effectiveness in enabling robust 3D hand pose estimation in adverse conditions. + arXiv:2403.11093v1 Announce Type: cross Abstract: We consider a dynamic system with multiple types of customers and servers. Each type of waiting customer or server joins a separate queue, forming a bipartite graph with customer-side queues and server-side queues. The platform can match the servers and customers if their types are compatible. The matched pairs then leave the system. The platform will charge a customer a price according to their type when they arrive and will pay a server a price according to their type. The arrival rate of each queue is determined by the price according to some unknown demand or supply functions. Our goal is to design pricing and matching algorithms to maximize the profit of the platform with unknown demand and supply functions, while keeping queue lengths of both customers and servers below a predetermined threshold. This system can be used to model two-sided markets such as ride-sharing markets with passengers and drivers. The difficulties of the pr -[^5]: DEEP-IoT: 下行增强型高效能物联网 +[^5]: 可解释的机器学习用于生存分析 - DEEP-IoT: Downlink-Enhanced Efficient-Power Internet of Things + Interpretable Machine Learning for Survival Analysis - [https://arxiv.org/abs/2403.00321](https://arxiv.org/abs/2403.00321) + [https://arxiv.org/abs/2403.10250](https://arxiv.org/abs/2403.10250) - DEEP-IoT通过“更多监听,更少传输”的策略,挑战和转变了传统的物联网通信模型,大幅降低能耗并提高设备寿命。 + 可解释的机器学习在生存分析中的应用促进了透明度和公平性,揭示了模型的潜在偏见和限制,并提供了更符合数学原理的特征影响和风险因素预测方法。 - 本文介绍了DEEP-IoT,这是一种具有革命意义的通信范例,旨在重新定义物联网设备之间的通信方式。通过开创性的“更多监听,更少传输”的策略,DEEP-IoT挑战和转变了传统的发送方(物联网设备)为中心的通信模型,将接收方(接入点)作为关键角色,从而降低能耗并延长设备寿命。我们不仅概念化了DEEP-IoT,还通过在窄带系统中集成深度学习增强的反馈信道编码来实现它。模拟结果显示,IoT单元的运行寿命显著提高,比使用Turbo和Polar编码的传统系统提高了最多52.71%。这一进展标志着一种变革。 + 随着黑盒机器学习模型的传播和快速进步,可解释的机器学习(IML)领域或可解释的人工智能(XAI)在过去十年中变得越来越重要。 这在生存分析领域尤为重要,其中采用IML技术促进了透明度、问责制和公平性,特别是在临床决策过程、有针对性疗法的开发、干预或其他医学或与医疗保健相关的环境中。 具体来说,可解释性可以揭示生存模型的潜在偏见和局限性,并提供更符合数学原理的方法来理解哪些特征对预测有影响或构成风险因素。 然而,缺乏即时可用的IML方法可能已经阻碍了医学从业者和公共卫生政策制定者充分利用机器学习的潜力。 - arXiv:2403.00321v1 Announce Type: cross Abstract: At the heart of the Internet of Things (IoT) -- a domain witnessing explosive growth -- the imperative for energy efficiency and the extension of device lifespans has never been more pressing. This paper presents DEEP-IoT, a revolutionary communication paradigm poised to redefine how IoT devices communicate. Through a pioneering "listen more, transmit less" strategy, DEEP-IoT challenges and transforms the traditional transmitter (IoT devices)-centric communication model to one where the receiver (the access point) play a pivotal role, thereby cutting down energy use and boosting device longevity. We not only conceptualize DEEP-IoT but also actualize it by integrating deep learning-enhanced feedback channel codes within a narrow-band system. Simulation results show a significant enhancement in the operational lifespan of IoT cells -- surpassing traditional systems using Turbo and Polar codes by up to 52.71%. This leap signifies a paradi + arXiv:2403.10250v1 Announce Type: cross Abstract: With the spread and rapid advancement of black box machine learning models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability and fairness in sensitive areas, such as clinical decision making processes, the development of targeted therapies, interventions or in other medical or healthcare related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred medical practitioners and policy makers in public health from leveraging the full potential of machine lea -[^6]: 用于随机组合半臂老虎机的协方差自适应最小二乘算法 +[^6]: 通过监控早期训练表示来实现鲁棒的子图学习 - Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits + Robust Subgraph Learning by Monitoring Early Training Representations - [https://arxiv.org/abs/2402.15171](https://arxiv.org/abs/2402.15171) + [https://arxiv.org/abs/2403.09901](https://arxiv.org/abs/2403.09901) - 提出了一种协方差自适应的最小二乘算法,利用在线估计协方差结构,相对于基于代理方差的算法获得改进的遗憾上界,特别在协方差系数全为非负时,能有效地利用半臂反馈,并在各种参数设置下表现优异。 + 本文引入了一种名为SHERD的新技术,通过监控图神经网络(GNNs)早期训练表示中的信息,利用标准距离度量检测易受攻击节点,从而在图输入中实现性能和对抗鲁棒性。 - 我们解决了随机组合半臂老虎机问题,其中玩家可以从包含d个基本项的P个子集中进行选择。大多数现有算法(如CUCB、ESCB、OLS-UCB)需要对奖励分布有先验知识,比如子高斯代理-方差的上界,这很难准确估计。在这项工作中,我们设计了OLS-UCB的方差自适应版本,依赖于协方差结构的在线估计。在实际设置中,估计协方差矩阵的系数要容易得多,并且相对于基于代理方差的算法,导致改进的遗憾上界。当协方差系数全为非负时,我们展示了我们的方法有效地利用了半臂反馈,并且可以明显优于老虎机反馈方法,在指数级别P≫d以及P≤d的情况下,这一点并不来自大多数现有分析。 + 引文:2403.09901v1 公告类型:新摘要:图神经网络(GNNs)因在图学习和节点分类任务中表现出色而引起了广泛关注。然而,它们对对抗性攻击的脆弱性,特别是通过易受攻击的节点,给决策制定带来了挑战。鲁棒的图摘要需求在于对抗性挑战会导致攻击在整个图中传播。在本文中,我们通过引入新颖的技术SHERD (通过早期训练表示距离进行子图学习)来解决图输入中的性能和对抗鲁棒性。SHERD利用部分训练的图卷积网络(GCN)的层信息,通过标准距离度量来检测对抗攻击期间易受攻击的节点。该方法识别出"易受攻击的(坏)"节点并移除这些节点,形成一个鲁棒的子图,同时保持节点分类性能。 - arXiv:2402.15171v1 Announce Type: new Abstract: We address the problem of stochastic combinatorial semi-bandits, where a player can select from P subsets of a set containing d base items. Most existing algorithms (e.g. CUCB, ESCB, OLS-UCB) require prior knowledge on the reward distribution, like an upper bound on a sub-Gaussian proxy-variance, which is hard to estimate tightly. In this work, we design a variance-adaptive version of OLS-UCB, relying on an online estimation of the covariance structure. Estimating the coefficients of a covariance matrix is much more manageable in practical settings and results in improved regret upper bounds compared to proxy variance-based algorithms. When covariance coefficients are all non-negative, we show that our approach efficiently leverages the semi-bandit feedback and provably outperforms bandit feedback approaches, not only in exponential regimes where P $\gg$ d but also when P $\le$ d, which is not straightforward from most existing analyses. + arXiv:2403.09901v1 Announce Type: new Abstract: Graph neural networks (GNNs) have attracted significant attention for their outstanding performance in graph learning and node classification tasks. However, their vulnerability to adversarial attacks, particularly through susceptible nodes, poses a challenge in decision-making. The need for robust graph summarization is evident in adversarial challenges resulting from the propagation of attacks throughout the entire graph. In this paper, we address both performance and adversarial robustness in graph input by introducing the novel technique SHERD (Subgraph Learning Hale through Early Training Representation Distances). SHERD leverages information from layers of a partially trained graph convolutional network (GCN) to detect susceptible nodes during adversarial attacks using standard distance metrics. The method identifies "vulnerable (bad)" nodes and removes such nodes to form a robust subgraph while maintaining node classification perf -[^7]: CLCE:一种优化学习融合的改进交叉熵和对比学习方法 +[^7]: 探索对抗界限:通过对抗超体积量化鲁棒性 - CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion + Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume - [https://arxiv.org/abs/2402.14551](https://arxiv.org/abs/2402.14551) + [https://arxiv.org/abs/2403.05100](https://arxiv.org/abs/2403.05100) - CLCE方法结合了标签感知对比学习与交叉熵损失,通过协同利用难例挖掘提高了性能表现 + 提出新指标对抗超体积来全面评估深度学习模型在多种扰动强度下的鲁棒性,并采用新型训练算法来提高对抗鲁棒性。 - 最先进的预训练图像模型主要采用两阶段方法:在大规模数据集上进行初始无监督预训练,然后使用交叉熵损失(CE)进行特定任务的微调。然而,已经证明CE可能会损害模型的泛化性和稳定性。为了解决这些问题,我们引入了一种名为CLCE的新方法,该方法将标签感知对比学习与CE相结合。我们的方法不仅保持了两种损失函数的优势,而且以协同方式利用难例挖掘来增强性能。 + 在深度学习模型面临日益严重的对抗攻击威胁,特别是在安全关键领域,强调了对鲁棒深度学习系统的需求。传统的鲁棒性评估依赖于对抗准确性,该指标衡量模型在特定扰动强度下的性能。然而,这一单一指标并不能完全概括模型对不同程度扰动的整体韧性。为了填补这一空白,我们提出了一种新的指标,称为对抗超体积,从多目标优化的角度综合评估了深度学习模型在一系列扰动强度下的鲁棒性。该指标允许深入比较防御机制,并承认了较弱的防御策略所带来的鲁棒性改进。此外,我们采用了一种提高对抗鲁棒性均匀性的新型训练算法。 - arXiv:2402.14551v1 Announce Type: cross Abstract: State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been demonstrated that CE can compromise model generalization and stability. While recent works employing contrastive learning address some of these limitations by enhancing the quality of embeddings and producing better decision boundaries, they often overlook the importance of hard negative mining and rely on resource intensive and slow training using large sample batches. To counter these issues, we introduce a novel approach named CLCE, which integrates Label-Aware Contrastive Learning with CE. Our approach not only maintains the strengths of both loss functions but also leverages hard negative mining in a synergistic way to enhance performance. Experimental results demonstrate that CLCE significantly outperf + arXiv:2403.05100v1 Announce Type: cross Abstract: The escalating threat of adversarial attacks on deep learning models, particularly in security-critical fields, has underscored the need for robust deep learning systems. Conventional robustness evaluations have relied on adversarial accuracy, which measures a model's performance under a specific perturbation intensity. However, this singular metric does not fully encapsulate the overall resilience of a model against varying degrees of perturbation. To address this gap, we propose a new metric termed adversarial hypervolume, assessing the robustness of deep learning models comprehensively over a range of perturbation intensities from a multi-objective optimization standpoint. This metric allows for an in-depth comparison of defense mechanisms and recognizes the trivial improvements in robustness afforded by less potent defensive strategies. Additionally, we adopt a novel training algorithm that enhances adversarial robustness uniformly -[^8]: ConSmax: 具有可学习参数的硬件友好型Softmax替代方案 +[^8]: 基于遗传模拟的差分进化方法用于半监督聚类 - ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters + Memetic Differential Evolution Methods for Semi-Supervised Clustering - [https://arxiv.org/abs/2402.10930](https://arxiv.org/abs/2402.10930) + [https://arxiv.org/abs/2403.04322](https://arxiv.org/abs/2403.04322) - ConSmax是一种硬件友好型Softmax替代方案,通过引入可学习参数,在不影响性能的情况下实现了对原Softmax关键任务的高效处理。 + 本文提出了一种基于差分进化范式的新颖遗传模拟策略,用于解决半监督聚类问题,是第一次在这个领域尝试定义这样的方法。 - 自注意机制将基于transformer的大型语言模型(LLM)与卷积和循环神经网络区分开来。尽管性能有所提升,但由于自注意中广泛使用Softmax,在硅上实现实时LLM推断仍具挑战性。为了解决这一挑战,我们提出了Constant Softmax(ConSmax),这是一种高效的Softmax替代方案,采用可微的规范化参数来消除Softmax中的最大搜索和分母求和,实现了大规模并行化。 + 在本文中,我们处理半监督最小平方和聚类(MSSC)问题,其中背景知识以实例级约束的形式给定。我们特别考虑“必连接”和“非连接”约束,每个约束指示两个数据集点是否应该关联到同一个或不同的簇中。这些约束的存在使得问题至少与其无监督版本一样困难:不再每个点都关联到其最近的簇中心,因此需要在关键操作(如分配步骤)中进行一些修改。在这种情况下,我们提出了一种基于差分进化范式的新颖遗传模拟策略,直接扩展了最近在无监督聚类文献中提出的最新框架。据我们所知,我们的贡献代表了第一次尝试定义一个旨在生成一个 - arXiv:2402.10930v1 Announce Type: cross Abstract: The self-attention mechanism sets transformer-based large language model (LLM) apart from the convolutional and recurrent neural networks. Despite the performance improvement, achieving real-time LLM inference on silicon is challenging due to the extensively used Softmax in self-attention. Apart from the non-linearity, the low arithmetic intensity greatly reduces the processing parallelism, which becomes the bottleneck especially when dealing with a longer context. To address this challenge, we propose Constant Softmax (ConSmax), a software-hardware co-design as an efficient Softmax alternative. ConSmax employs differentiable normalization parameters to remove the maximum searching and denominator summation in Softmax. It allows for massive parallelization while performing the critical tasks of Softmax. In addition, a scalable ConSmax hardware utilizing a bitwidth-split look-up table (LUT) can produce lossless non-linear operation and + arXiv:2403.04322v1 Announce Type: cross Abstract: In this paper, we deal with semi-supervised Minimum Sum-of-Squares Clustering (MSSC) problems where background knowledge is given in the form of instance-level constraints. In particular, we take into account "must-link" and "cannot-link" constraints, each of which indicates if two dataset points should be associated to the same or to a different cluster. The presence of such constraints makes the problem at least as hard as its unsupervised version: it is no more true that each point is associated to its nearest cluster center, thus requiring some modifications in crucial operations, such as the assignment step. In this scenario, we propose a novel memetic strategy based on the Differential Evolution paradigm, directly extending a state-of-the-art framework recently proposed in the unsupervised clustering literature. As far as we know, our contribution represents the first attempt to define a memetic methodology designed to generate a -[^9]: 多视角符号回归 +[^9]: ARNN: 用于识别癫痫发作的多通道脑电图信号的注意力循环神经网络 - Multi-View Symbolic Regression + ARNN: Attentive Recurrent Neural Network for Multi-channel EEG Signals to Identify Epileptic Seizures - [https://arxiv.org/abs/2402.04298](https://arxiv.org/abs/2402.04298) + [https://arxiv.org/abs/2403.03276](https://arxiv.org/abs/2403.03276) - 多视角符号回归(MvSR)是一种同时考虑多个数据集的符号回归方法,能够找到一个参数化解来准确拟合所有数据集,解决了传统方法无法处理不同实验设置的问题。 + ARNN提出了一种注意力循环神经网络,用于处理多通道脑电图信号,具有线性复杂度和并行计算,结合注意力和LSTM gate的优势,并避免了它们的缺点。 - 符号回归(SR)搜索表示解释变量和响应变量之间关系的分析表达式。目前的SR方法假设从单个实验中提取的单个数据集。然而,研究人员经常面临来自不同设置的多个实验结果集。传统的SR方法可能无法找到潜在的表达式,因为每个实验的参数可能不同。在这项工作中,我们提出了多视角符号回归(MvSR),它同时考虑多个数据集,模拟实验环境,并输出一个通用的参数化解。这种方法将评估的表达式适应每个独立数据集,并同时返回能够准确拟合所有数据集的参数函数族f(x; \theta)。我们使用从已知表达式生成的数据以及来自实际世界的数据来展示MvSR的有效性。 + 我们提出了一种注意力循环神经网络(ARNN),其沿着序列循环应用注意力层,并且具有与序列长度相关的线性复杂度。该模型在多通道脑电图信号上运行,而不是单通道信号,并利用并行计算。在该模型中,注意力层是一种计算单元,可以有效地应用自注意力机制和交叉注意力机制来计算一组广泛数量的状态向量和输入信号的递归函数。我们的架构在某种程度上受到了注意力层和长短期记忆(LSTM)单元的启发,并使用长短风格门,但通过多个阶段将这种典型单元扩展到多通道脑电图信号的并行化。它继承了注意力层和LSTM门的优势,同时避免了它们各自的缺点。我们通过对异质实验进行了广泛的模型有效性评估。 - Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; \theta) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from + arXiv:2403.03276v1 Announce Type: cross Abstract: We proposed an Attentive Recurrent Neural Network (ARNN), which recurrently applies attention layers along a sequence and has linear complexity with respect to the sequence length. The proposed model operates on multi-channel EEG signals rather than single channel signals and leverages parallel computation. In this cell, the attention layer is a computational unit that efficiently applies self-attention and cross-attention mechanisms to compute a recurrent function over a wide number of state vectors and input signals. Our architecture is inspired in part by the attention layer and long short-term memory (LSTM) cells, and it uses long-short style gates, but it scales this typical cell up by several orders to parallelize for multi-channel EEG signals. It inherits the advantages of attention layers and LSTM gate while avoiding their respective drawbacks. We evaluated the model effectiveness through extensive experiments with heterogeneou + +[^10]: 具有Polyak动量的非凸随机复合优化 + + Non-Convex Stochastic Composite Optimization with Polyak Momentum + + [https://arxiv.org/abs/2403.02967](https://arxiv.org/abs/2403.02967) + + 本文研究了具有Polyak动量的随机近端梯度方法,在非凸复合优化问题中实现了最佳收敛速度,无论批量大小如何。 + + + + 随机近端梯度法是广泛使用的随机梯度下降(SGD)方法的一个强大泛化,在机器学习中已经被广泛应用。然而,众所周知,当随机噪声显著时(即仅使用小型或有界批量大小时),该方法在非凸环境中无法收敛。本文关注具有Polyak动量的随机近端梯度方法。我们证明了该方法对于非凸复合优化问题实现了最佳收敛速度,而批量大小大小无关。此外,我们对Polyak动量在复合优化环境中的方差减少效应进行了严格分析,并且我们证明了当近端步骤只能通过近似解来求解时,该方法也会收敛。最后,我们提供了数值实验来验证我们的理论结果。 + + arXiv:2403.02967v1 Announce Type: cross Abstract: The stochastic proximal gradient method is a powerful generalization of the widely used stochastic gradient descent (SGD) method and has found numerous applications in Machine Learning. However, it is notoriously known that this method fails to converge in non-convex settings where the stochastic noise is significant (i.e. when only small or bounded batch sizes are used). In this paper, we focus on the stochastic proximal gradient method with Polyak momentum. We prove this method attains an optimal convergence rate for non-convex composite optimization problems, regardless of batch size. Additionally, we rigorously analyze the variance reduction effect of the Polyak momentum in the composite optimization setting and we show the method also converges when the proximal step can only be solved inexactly. Finally, we provide numerical experiments to validate our theoretical results. + +[^11]: 具有多个协变量转移和不平衡的图像数据集聚合 + + Pooling Image Datasets With Multiple Covariate Shift and Imbalance + + [https://arxiv.org/abs/2403.02598](https://arxiv.org/abs/2403.02598) + + 本文从范畴论的角度提供了一个简单而有效的解决方案,完全避免了复杂的多阶段训练流程。 + + + + 许多学科中常见小样本大小,这需要跨多个机构汇总大致相似的数据集来研究图像与疾病结果之间的弱但相关关联。这些数据通常体现出协变量(即次要的非成像数据)的转移/不平衡。在标准统计分析中控制这些无用变量是常见的,但这些思想并不直接适用于参数过多的模型。因此,最近的工作表明,从不变表示学习中提供了一个有意义的起点,但目前的方法库仅限于一次考虑几个协变量的转移/不平衡。本文展示了如何从范畴论的角度看待这一问题,提供了一个简单而有效的解决方案,完全避免了原本需要复杂的多阶段训练流程。我们展示了该方法的效果。 + + arXiv:2403.02598v1 Announce Type: new Abstract: Small sample sizes are common in many disciplines, which necessitates pooling roughly similar datasets across multiple institutions to study weak but relevant associations between images and disease outcomes. Such data often manifest shift/imbalance in covariates (i.e., secondary non-imaging data). Controlling for such nuisance variables is common within standard statistical analysis, but the ideas do not directly apply to overparameterized models. Consequently, recent work has shown how strategies from invariant representation learning provides a meaningful starting point, but the current repertoire of methods is limited to accounting for shifts/imbalances in just a couple of covariates at a time. In this paper, we show how viewing this problem from the perspective of Category theory provides a simple and effective solution that completely avoids elaborate multi-stage training pipelines that would otherwise be needed. We show the effect + +[^12]: 异质性对不变性和因果关系的隐性偏差 + + The Implicit Bias of Heterogeneity towards Invariance and Causality + + [https://arxiv.org/abs/2403.01420](https://arxiv.org/abs/2403.01420) + + 异质性对于回归任务中出现因果性的贡献解释了为何大型语言模型能够从关联性训练中揭示因果关联。 + + + + 从经验上观察到,使用来自互联网的大量语料库训练的大型语言模型(LLM),使用一种变体回归损失,可以在一定程度上揭示因果关联。这与传统智慧“关联不是因果”以及传统因果推断范式相反,传统因果推断范式认为先前的因果知识应谨慎地纳入到方法设计中。令人困惑的是,为何在追求关联的回归任务中能够从更高层次的理解中出现因果性。本文声称从面向关联的训练中出现因果性可以归因于源数据的异质性、训练算法的随机性和学习模型的超参数化的耦合效应。我们使用一个简单但有见地的模型来阐释这样的直觉,该模型使用回归损失学习不变性,一种准因果关系。 + + arXiv:2403.01420v1 Announce Type: new Abstract: It is observed empirically that the large language models (LLM), trained with a variant of regression loss using numerous corpus from the Internet, can unveil causal associations to some extent. This is contrary to the traditional wisdom that ``association is not causation'' and the paradigm of traditional causal inference in which prior causal knowledge should be carefully incorporated into the design of methods. It is a mystery why causality, in a higher layer of understanding, can emerge from the regression task that pursues associations. In this paper, we claim the emergence of causality from association-oriented training can be attributed to the coupling effects from the heterogeneity of the source data, stochasticity of training algorithms, and over-parameterization of the learning models. We illustrate such an intuition using a simple but insightful model that learns invariance, a quasi-causality, using regression loss. To be spec + +[^13]: 特征对齐:在预训练模型背景下通过代理思考高效主动学习 + + Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models + + [https://arxiv.org/abs/2403.01101](https://arxiv.org/abs/2403.01101) + + 通过代理进行特征对齐,以解决预先计算特征无法区分标记样本类别和避免通过代理模型选择样本时牺牲宝贵预训练信息的问题。 + + + + 使用主动学习对预训练模型进行微调有望降低注释成本。然而,这种组合引入了显著的计算成本,尤其是随着预训练模型规模的增长。最近的研究提出了基于代理的主动学习,它预先计算特征以减少计算成本。然而,这种方法通常会在主动学习性能上造成重大损失,甚至可能超过计算成本节约。 + + arXiv:2403.01101v1 Announce Type: cross Abstract: Fine-tuning the pre-trained model with active learning holds promise for reducing annotation costs. However, this combination introduces significant computational costs, particularly with the growing scale of pre-trained models. Recent research has proposed proxy-based active learning, which pre-computes features to reduce computational costs. Yet, this approach often incurs a significant loss in active learning performance, which may even outweigh the computational cost savings. In this paper, we argue the performance drop stems not only from pre-computed features' inability to distinguish between categories of labeled samples, resulting in the selection of redundant samples but also from the tendency to compromise valuable pre-trained information when fine-tuning with samples selected through the proxy model. To address this issue, we propose a novel method called aligned selection via proxy to update pre-computed features while sele + +[^14]: 当你的AI欺骗你:在奖励学习中人类评估者部分可观测性的挑战 + + When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning + + [https://arxiv.org/abs/2402.17747](https://arxiv.org/abs/2402.17747) + + RLHF在考虑部分观察性时可能导致策略欺骗性地夸大性能或过度辩护行为,我们提出了数学条件来解决这些问题,并警告不要盲目应用RLHF在部分可观测情况下。 + + + + 强化学习从人类反馈(RLHF)的过去分析假设人类完全观察到环境。当人类反馈仅基于部分观察时会发生什么?我们对两种失败情况进行了正式定义:欺骗和过度辩护。通过将人类建模为对轨迹信念的Boltzmann-理性,我们证明了RLHF保证会导致策略欺骗性地夸大其性能、为了留下印象而过度辩护或者两者兼而有之的条件。为了帮助解决这些问题,我们数学地刻画了环境部分可观测性如何转化为(缺乏)学到的回报函数中的模糊性。在某些情况下,考虑环境部分可观测性使得在理论上可能恢复回报函数和最优策略,而在其他情况下,存在不可减少的模糊性。我们警告不要盲目应用RLHF在部分可观测情况下。 + + arXiv:2402.17747v1 Announce Type: cross Abstract: Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deception and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is guaranteed to result in policies that deceptively inflate their performance, overjustify their behavior to make an impression, or both. To help address these issues, we mathematically characterize how partial observability of the environment translates into (lack of) ambiguity in the learned return function. In some cases, accounting for partial observability makes it theoretically possible to recover the return function and thus the optimal policy, while in other cases, there is irreducible ambiguity. We caution against blindly applying RLHF in partially observa + +[^15]: 用于微生物组学的监督机器学习:弥合当前和最佳实践之间的差距 + + Supervised machine learning for microbiomics: bridging the gap between current and best practices + + [https://arxiv.org/abs/2402.17621](https://arxiv.org/abs/2402.17621) + + 该研究通过分析大量期刊文章,总结了监督机器学习在微生物组学中的现有实践,探讨了实验设计方法的优缺点,并提出了如何避免常见实验设计缺陷的指导。 + + + + 机器学习(ML)将加速临床微生物组学创新,如疾病诊断和预后。这将需要高质量、可重现、可解释的工作流程,其预测能力达到或超过监管机构对临床工具设定的高门槛。我们通过深入分析2021-2022年发表的100篇同行评议的期刊文章,捕捉了当前将监督ML应用于微生物组学数据的实践的一个快照。我们采用数据驱动方法,引导讨论各种实验设计方法的优点,包括关键考虑因素,如如何减轻小数据集大小的影响同时避免数据泄漏。我们进一步提供关于如何避免可能损害模型性能、可信度和可重复性的常见实验设计缺陷的指南。讨论附有一个互动在线教程。 + + arXiv:2402.17621v1 Announce Type: cross Abstract: Machine learning (ML) is set to accelerate innovations in clinical microbiomics, such as in disease diagnostics and prognostics. This will require high-quality, reproducible, interpretable workflows whose predictive capabilities meet or exceed the high thresholds set for clinical tools by regulatory agencies. Here, we capture a snapshot of current practices in the application of supervised ML to microbiomics data, through an in-depth analysis of 100 peer-reviewed journal articles published in 2021-2022. We apply a data-driven approach to steer discussion of the merits of varied approaches to experimental design, including key considerations such as how to mitigate the effects of small dataset size while avoiding data leakage. We further provide guidance on how to avoid common experimental design pitfalls that can hurt model performance, trustworthiness, and reproducibility. Discussion is accompanied by an interactive online tutorial th + +[^16]: 在复杂模块化算术中解释理解的Transformer + + Interpreting Grokked Transformers in Complex Modular Arithmetic + + [https://arxiv.org/abs/2402.16726](https://arxiv.org/abs/2402.16726) + + 本研究通过可解释的逆向工程在复杂模块化算术中观察了Transformer内部电路学习过程,并发现减法在Transformer上造成了强烈的不对称性,乘法需要余弦偏置分量,多项式叠加了基本算术模式,但在挑战性情况下并不清晰,Grokking甚至可以在具有基本对称和交替表达式的高次公式中轻松发生。 + + + + Grokking一直是解开延迟泛化之谜的积极探索。在已解密模型中识别可解释的算法是理解其机制的暗示性线索。在这项工作中,除了最简单和广为研究的模块化加法外,我们通过可解释的逆向工程观察了通过Grokking在复杂模块化算术中学到的内部电路,突出显示了它们动力学上的重大差异:减法对Transformer产生强烈的不对称性;乘法在傅立叶域的所有频率上需要余弦偏置分量;多项式通常导致基本算术模式的叠加,但在挑战性情况下清晰的模式并不显现;即使在具有基本对称和交替表达式的高次公式中,Grokking也很容易发生。我们还引入了模块化算术的新颖进展度量;傅立叶频率 + + arXiv:2402.16726v2 Announce Type: replace-cross Abstract: Grokking has been actively explored to reveal the mystery of delayed generalization. Identifying interpretable algorithms inside the grokked models is a suggestive hint to understanding its mechanism. In this work, beyond the simplest and well-studied modular addition, we observe the internal circuits learned through grokking in complex modular arithmetic via interpretable reverse engineering, which highlights the significant difference in their dynamics: subtraction poses a strong asymmetry on Transformer; multiplication requires cosine-biased components at all the frequencies in a Fourier domain; polynomials often result in the superposition of the patterns from elementary arithmetic, but clear patterns do not emerge in challenging cases; grokking can easily occur even in higher-degree formulas with basic symmetric and alternating expressions. We also introduce the novel progress measure for modular arithmetic; Fourier Freque + +[^17]: 单词序列熵:走向自由形式医学问答应用及其不确定性估计 + + Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond + + [https://arxiv.org/abs/2402.14259](https://arxiv.org/abs/2402.14259) + + 本论文提出了一种新方法单词序列熵(WSE),用于在自由形式医学问答任务中量化答案的不确定性,相比其他基线方法表现更优秀。 + + + + 不确定性估计在确保安全关键的人工智能系统与人类互动的可靠性中发挥关键作用,尤其在医疗领域尤为重要。然而,在自由形式的医学问答任务中,尚未建立一种通用方法来量化答案的不确定性,其中无关的词汇和语序含有有限的语义信息可能是不确定性的主要来源,这是由于生成不平等的存在。本文提出了单词序列熵(WSE),该方法根据语义相关性在单词和序列级别上校准不确定性比例,在不确定性量化时更加强调关键词和更相关的序列。我们在5个自由形式医学问答数据集上,利用7种“现成的”大语言模型(LLMs)将WSE与6种基线方法进行比较,并展示了WSE在性能上的优越性。 + + arXiv:2402.14259v1 Announce Type: cross Abstract: Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems, particularly in the medical domain. However, a general method for quantifying the uncertainty of free-form answers has yet to be established in open-ended medical question-answering (QA) tasks, where irrelevant words and sequences with limited semantic information can be the primary source of uncertainty due to the presence of generative inequality. In this paper, we propose the Word-Sequence Entropy (WSE), which calibrates the uncertainty proportion at both the word and sequence levels according to the semantic relevance, with greater emphasis placed on keywords and more relevant sequences when performing uncertainty quantification. We compare WSE with 6 baseline methods on 5 free-form medical QA datasets, utilizing 7 "off-the-shelf" large language models (LLMs), and show that WSE exhibits superior performance on ac + +[^18]: 保守和风险意识的离线多智能体强化学习在数字孪生中的应用 + + Conservative and Risk-Aware Offline Multi-Agent Reinforcement Learning for Digital Twins + + [https://arxiv.org/abs/2402.08421](https://arxiv.org/abs/2402.08421) + + 本研究提出了一种应用于数字孪生的离线多智能体强化学习方案,通过整合分布式强化学习和保守Q学习来解决环境的不确定性和有限数据带来的认识不确定性。 + + + + 数字孪生(DT)平台被越来越认为是控制、优化和监控诸如下一代无线网络之类的复杂工程系统的有希望技术。采用DT解决方案面临的一个重要挑战是它们依赖于离线收集的数据,缺乏对物理环境的直接访问。这一限制在多智能体系统中尤为严重,因为传统的多智能体强化学习(MARL)需要与环境进行在线互动。将在线MARL方案直接应用于离线环境通常会因有限数据的认识不确定性而失败。在这项工作中,我们提出了一种用于基于DT的无线网络的离线MARL方案,它整合了分布式强化学习(distributional RL)和保守Q学习,以应对环境固有的案例性不确定性和有限数据引起的认识不确定性。为了进一步利用离线数据,我们改编了所提出的方案。 + + Digital twin (DT) platforms are increasingly regarded as a promising technology for controlling, optimizing, and monitoring complex engineering systems such as next-generation wireless networks. An important challenge in adopting DT solutions is their reliance on data collected offline, lacking direct access to the physical environment. This limitation is particularly severe in multi-agent systems, for which conventional multi-agent reinforcement (MARL) requires online interactions with the environment. A direct application of online MARL schemes to an offline setting would generally fail due to the epistemic uncertainty entailed by the limited availability of data. In this work, we propose an offline MARL scheme for DT-based wireless networks that integrates distributional RL and conservative Q-learning to address the environment's inherent aleatoric uncertainty and the epistemic uncertainty arising from limited data. To further exploit the offline data, we adapt the proposed scheme t + +[^19]: 使用LoCo和M2-BERT进行基准测试和构建长上下文检索模型 + + Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT + + [https://arxiv.org/abs/2402.07440](https://arxiv.org/abs/2402.07440) + + 该论文介绍了LoCoV1,一个用于评估长上下文检索性能的新型基准测试,并提出了M2-BERT检索编码器,用于处理长上下文检索,解决了如何评估性能、预训练语言模型以及如何进行微调的挑战。 + + + + 检索管道是许多机器学习系统中的重要组成部分,在文档很长(例如10K个标记或更多)且需要在整个文本中合成信息来确定相关文档的领域中表现不佳。开发适用于这些领域的长上下文检索编码器面临三个挑战:(1)如何评估长上下文检索性能,(2)如何预训练基本语言模型以表示短上下文(对应查询)和长上下文(对应文档),以及(3)如何根据GPU内存限制下的批量大小限制对该模型进行微调。为了解决这些挑战,我们首先介绍了LoCoV1,这是一个新颖的12个任务基准测试,用于测量在不可分块或不有效的情况下的长上下文检索。接下来,我们提出了M2-BERT检索编码器,这是一个80M参数状态空间编码器模型,采用Monarch Mixer架构构建,能够进行可扩展的检索。 + + Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e.g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text. Developing long-context retrieval encoders suitable for these domains raises three challenges: (1) how to evaluate long-context retrieval performance, (2) how to pretrain a base language model to represent both short contexts (corresponding to queries) and long contexts (corresponding to documents), and (3) how to fine-tune this model for retrieval under the batch size limitations imposed by GPU memory constraints. To address these challenges, we first introduce LoCoV1, a novel 12 task benchmark constructed to measure long-context retrieval where chunking is not possible or not effective. We next present the M2-BERT retrieval encoder, an 80M parameter state-space encoder model built from the Monarch Mixer architecture, capable of scali + +[^20]: 梯度下降引发了深度非线性网络权重与经验NTK之间的对齐 + + Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks + + [https://arxiv.org/abs/2402.05271](https://arxiv.org/abs/2402.05271) + + 了解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。前人的研究表明,在训练过程中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这被称为神经特征分析(NFA)。本研究解释了这种相关性的出现,并发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 + + + + 理解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。先前的研究已经确定,在一般结构的训练神经网络中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这个说法被称为神经特征分析(NFA)。然而,这些数量在训练过程中如何相关尚不清楚。在这项工作中,我们解释了这种相关性的出现。我们发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。我们证明了先前研究中引入的NFA是由隔离这种对齐的中心化NFA驱动的。我们还展示了在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 + + Understanding the mechanisms through which neural networks extract statistics from input-label pairs is one of the most important unsolved problems in supervised learning. Prior works have identified that the gram matrices of the weights in trained neural networks of general architectures are proportional to the average gradient outer product of the model, in a statement known as the Neural Feature Ansatz (NFA). However, the reason these quantities become correlated during training is poorly understood. In this work, we explain the emergence of this correlation. We identify that the NFA is equivalent to alignment between the left singular structure of the weight matrices and a significant component of the empirical neural tangent kernels associated with those weights. We establish that the NFA introduced in prior works is driven by a centered NFA that isolates this alignment. We show that the speed of NFA development can be predicted analytically at early training times in terms of sim + +[^21]: 可解释的多源数据融合通过潜变量高斯过程 + + Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process + + [https://arxiv.org/abs/2402.04146](https://arxiv.org/abs/2402.04146) + + 这篇论文提出了一种基于潜变量高斯过程的多源数据融合框架,用于解决多个数据源之间质量和全面性差异给系统优化带来的问题。 + + + + 随着人工智能(AI)和机器学习(ML)的出现,各个科学和工程领域已经利用数据驱动的替代模型来建模来自大量信息源(数据)的复杂系统。这种增加导致了开发出用于执行特定功能的优越系统所需的成本和时间的显著降低。这样的替代模型往往广泛地融合多个数据来源,可能是发表的论文、专利、开放资源库或其他资源。然而,对于已知和未知的信息来源的基础物理参数的质量和全面性的差异,可能对系统优化过程产生后续影响,却没有得到充分的关注。为了解决这个问题,提出了一种基于潜变量高斯过程(LVGP)的多源数据融合框架。 + + With the advent of artificial intelligence (AI) and machine learning (ML), various domains of science and engineering communites has leveraged data-driven surrogates to model complex systems from numerous sources of information (data). The proliferation has led to significant reduction in cost and time involved in development of superior systems designed to perform specific functionalities. A high proposition of such surrogates are built extensively fusing multiple sources of data, may it be published papers, patents, open repositories, or other resources. However, not much attention has been paid to the differences in quality and comprehensiveness of the known and unknown underlying physical parameters of the information sources that could have downstream implications during system optimization. Towards resolving this issue, a multi-source data fusion framework based on Latent Variable Gaussian Process (LVGP) is proposed. The individual data sources are tagged as a characteristic cate + +[^22]: GPT4Battery: 一种基于LLM驱动的自适应锂离子电池健康状态估计框架 + + GPT4Battery: An LLM-driven Framework for Adaptive State of Health Estimation of Raw Li-ion Batteries + + [https://arxiv.org/abs/2402.00068](https://arxiv.org/abs/2402.00068) + + 本论文提出了一种基于LLM的框架,可以适应不同类型的锂离子电池,实现准确的健康状态估计。这项工作解决了生成训练数据的时间和资源成本高的挑战,并在实际应用中具有良好的泛化能力。 + + + + 健康状态(SOH)是评估电池退化水平的关键指标,无法直接测量但需要估计。准确的SOH估计提升了锂离子电池的检测、控制和反馈能力,实现安全高效的能源管理,并指导新一代电池的发展。尽管在数据驱动的SOH估计方面取得了显著进展,但为生成寿命长期训练数据而进行的耗时且资源密集的退化实验在建立一个能处理多样化锂离子电池(例如,跨化学、跨制造商和跨容量)的大型模型方面存在挑战。因此,本文利用大型语言模型(LLM)的强大泛化能力,提出了一种适用于不同电池的可调整SOH估计的新型框架。为了适应实际情景,其中未标记的数据按顺序以及分布变化的方式到达,所提出的模型在测试时进行了修改。 + + State of health (SOH) is a crucial indicator for assessing the degradation level of batteries that cannot be measured directly but requires estimation. Accurate SOH estimation enhances detection, control, and feedback for Li-ion batteries, allowing for safe and efficient energy management and guiding the development of new-generation batteries. Despite the significant progress in data-driven SOH estimation, the time and resource-consuming degradation experiments for generating lifelong training data pose a challenge in establishing one large model capable of handling diverse types of Li-ion batteries, e.g., cross-chemistry, cross-manufacturer, and cross-capacity. Hence, this paper utilizes the strong generalization capability of large language model (LLM) to proposes a novel framework for adaptable SOH estimation across diverse batteries. To match the real scenario where unlabeled data sequentially arrives in use with distribution shifts, the proposed model is modified by a test-time t + +[^23]: 大规模语言模型是零射击学习器 + + Large Language Models are Null-Shot Learners + + [https://arxiv.org/abs/2401.08273](https://arxiv.org/abs/2401.08273) + + 本文提出了零射击提示方法,通过利用大规模语言模型中的错误信息来指导模型进行任务,以提高任务表现。实验结果表明,在不同数据集上,包括阅读理解、算术推理和闭卷问答,模型性能有所提升。这些结果也显示出不同模型之间存在不同程度的错误信息。 + + + + 本文提出了零射击提示方法。零射击提示利用大规模语言模型(LLMs)中的错误信息,通过指示LLMs利用从“示例”部分中获取的信息(该信息在所提供的上下文中不存在)来完成任务。虽然减少错误信息对于LLMs的日常和重要用途至关重要,但我们提出在目前的环境中,这些LLMs仍然具有错误信息,实际上可以利用错误信息来提高与标准零射击提示相比的任务表现。对八个LLMs进行实验,结果显示在大多数八个数据集(包括阅读理解、算术推理和闭卷问答)中,性能有所提升。观察到的不一致性增加相对性能在LLMs之间的差异,也可能表示每个模型中存在不同程度的错误信息。 + + arXiv:2401.08273v2 Announce Type: replace-cross Abstract: This paper presents null-shot prompting. Null-shot prompting exploits hallucination in large language models (LLMs) by instructing LLMs to utilize information from the "Examples" section that never exists within the provided context to perform a task. While reducing hallucination is crucial and non-negligible for daily and critical uses of LLMs, we propose that in the current landscape in which these LLMs still hallucinate, it is possible, in fact, to exploit hallucination to increase performance in performing tasks compared to standard zero-shot prompting. Experiments with eight LLMs show improvements in performance across the majority of eight datasets, including reading comprehension, arithmetic reasoning, and closed-book question answering. The observed inconsistency in increased relative performance across the LLMs also potentially indicates a different degree of inherent hallucination in each model. These differences show + +[^24]: SupplyGraph: 使用图神经网络进行供应链规划的基准数据集 + + SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks. (arXiv:2401.15299v1 [cs.LG]) + + [http://arxiv.org/abs/2401.15299](http://arxiv.org/abs/2401.15299) + + SupplyGraph是一个基准数据集,用于使用图神经网络进行供应链规划。该数据集包含了来自孟加拉国一家领先快速消费品公司的实际数据,用于优化、预测和解决供应链问题。数据集中的时间数据作为节点特征,可用于销售预测、生产计划和故障识别。 + + + + 图神经网络(GNNs)在不同领域如运输、生物信息学、语言处理和计算机视觉中取得了重要进展。然而,在将GNNs应用于供应链网络方面,目前尚缺乏研究。供应链网络在结构上类似于图形,使其成为应用GNN方法的理想选择。这为优化、预测和解决供应链问题开辟了无限可能。然而,此方法的一个主要障碍在于缺乏真实世界的基准数据集以促进使用GNN来研究和解决供应链问题。为了解决这个问题,我们提供了一个来自孟加拉国一家领先的快速消费品公司的实际基准数据集,该数据集侧重于用于生产目的的供应链规划的时间任务。该数据集包括时间数据作为节点特征,以实现销售预测、生产计划和故障识别。 + + Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision. However, there is a noticeable absence of research on applying GNNs to supply chain networks. Supply chain networks are inherently graph-like in structure, making them prime candidates for applying GNN methodologies. This opens up a world of possibilities for optimizing, predicting, and solving even the most complex supply chain problems. A major setback in this approach lies in the absence of real-world benchmark datasets to facilitate the research and resolution of supply chain problems using GNNs. To address the issue, we present a real-world benchmark dataset for temporal tasks, obtained from one of the leading FMCG companies in Bangladesh, focusing on supply chain planning for production purposes. The dataset includes temporal data as node features to enable sales predictions, production planning, and the identification of fa + +[^25]: 使用线性加法注意力Transformer的高效生成对抗网络 + + Efficient generative adversarial networks using linear additive-attention Transformers. (arXiv:2401.09596v1 [cs.CV]) + + [http://arxiv.org/abs/2401.09596](http://arxiv.org/abs/2401.09596) + + 这项工作提出了一种名为LadaGAN的高效生成对抗网络,它使用了一种名为Ladaformer的新型Transformer块,通过线性加法注意机制来降低计算复杂度并解决训练不稳定性问题。 + + + + 尽管像扩散模型(DMs)和生成对抗网络(GANs)等深度生成模型在图像生成方面的能力近年来得到了显著提高,但是它们的成功很大程度上归功于计算复杂的架构。这限制了它们在研究实验室和资源充足的公司中的采用和使用,同时也极大地增加了训练、微调和推理的碳足迹。在这项工作中,我们提出了LadaGAN,这是一个高效的生成对抗网络,它建立在一种名为Ladaformer的新型Transformer块上。该块的主要组成部分是一个线性加法注意机制,它每个头部计算一个注意向量,而不是二次的点积注意力。我们在生成器和判别器中都采用了Ladaformer,这降低了计算复杂度,并克服了Transformer GAN经常出现的训练不稳定性。LadaGAN一直表现优于现有的GANs。 + + Although the capacity of deep generative models for image generation, such as Diffusion Models (DMs) and Generative Adversarial Networks (GANs), has dramatically improved in recent years, much of their success can be attributed to computationally expensive architectures. This has limited their adoption and use to research laboratories and companies with large resources, while significantly raising the carbon footprint for training, fine-tuning, and inference. In this work, we present LadaGAN, an efficient generative adversarial network that is built upon a novel Transformer block named Ladaformer. The main component of this block is a linear additive-attention mechanism that computes a single attention vector per head instead of the quadratic dot-product attention. We employ Ladaformer in both the generator and discriminator, which reduces the computational complexity and overcomes the training instabilities often associated with Transformer GANs. LadaGAN consistently outperforms exist + +[^26]: 大型语言模型的知识编辑全面研究 + + A Comprehensive Study of Knowledge Editing for Large Language Models. (arXiv:2401.01286v1 [cs.CL]) + + [http://arxiv.org/abs/2401.01286](http://arxiv.org/abs/2401.01286) + + 本研究全面研究了大型语言模型的知识编辑,旨在有效修改模型的行为,同时保持整体性能。 + + + + 大型语言模型(LLM)在理解和生成与人类交流紧密相似的文本方面展现出了非凡的能力。然而,其主要限制在于训练过程中的显著计算需求,这是由于其广泛的参数化造成的。这一挑战在于世界的动态性,需要频繁更新LLM以修正过时的信息或集成新知识,从而确保其持续的相关性。许多应用需要在训练后进行持续的模型调整,以解决缺陷或不良行为。近年来,对于LLM的知识编辑技术的兴趣越来越高,在特定领域内有效地修改LLM的行为,同时保持整体性能在各种输入中的表现。本文首先定义了知识编辑的目标和挑战,然后综述了现有的知识编辑方法和技术,并讨论了其应用和未来发展的方向。 + + Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the kno + +[^27]: 跨越生成性人工智能数据生命周期的隐私和版权挑战导航 + + Navigating Privacy and Copyright Challenges Across the Data Lifecycle of Generative AI. (arXiv:2311.18252v2 [cs.SE] UPDATED) + + [http://arxiv.org/abs/2311.18252](http://arxiv.org/abs/2311.18252) + + 这项研究探讨了生成性人工智能中数据隐私和版权保护的多方面挑战,并提出了将技术创新与伦理前瞻相结合的综合方法,旨在全面解决这些问题。 + + + + 生成性人工智能的出现标志着人工智能领域的重要里程碑,展示出在生成真实图像、文本和数据模式方面的卓越能力。然而,这些进展也带来了对数据隐私和版权侵犯的更高关注,主要是由于模型训练对大规模数据集的依赖。传统方法如差分隐私、机器遗忘和数据中毒只提供了对这些复杂问题的片面解决方案。本文深入探讨了数据生命周期内隐私和版权保护的多方面挑战。我们主张采用将技术创新与伦理前瞻相结合的综合方法,通过研究和制定在生命周期视角下的解决方案,全面解决这些问题。本研究旨在推动更广泛的讨论,并激励对生成性人工智能中数据隐私和版权完整性的协同努力。 + + The advent of Generative AI has marked a significant milestone in artificial intelligence, demonstrating remarkable capabilities in generating realistic images, texts, and data patterns. However, these advancements come with heightened concerns over data privacy and copyright infringement, primarily due to the reliance on vast datasets for model training. Traditional approaches like differential privacy, machine unlearning, and data poisoning only offer fragmented solutions to these complex issues. Our paper delves into the multifaceted challenges of privacy and copyright protection within the data lifecycle. We advocate for integrated approaches that combines technical innovation with ethical foresight, holistically addressing these concerns by investigating and devising solutions that are informed by the lifecycle perspective. This work aims to catalyze a broader discussion and inspire concerted efforts towards data privacy and copyright integrity in Generative AI. + +[^28]: 一种可扩展的训练策略用于盲目的多分布噪声去除 + + A Scalable Training Strategy for Blind Multi-Distribution Noise Removal. (arXiv:2310.20064v1 [cs.CV]) + + [http://arxiv.org/abs/2310.20064](http://arxiv.org/abs/2310.20064) + + 提出了一种使用自适应采样/主动学习策略来训练去噪网络的方法,解决了通用去噪网络在不同噪声分布下表现差的问题。 + + + + 尽管最近取得了一些进展,但是开发通用的去噪和去伪影网络仍然是一个尚未解决的问题:给定固定的网络权重,一个任务(例如去除泊松噪声)的专门化与另一个任务(例如去除斑点噪声)的性能之间存在天然的权衡。此外,由于维度的诅咒,训练这样的网络是具有挑战性的:随着规格空间的维度增加(即需要描述噪声分布所需的参数数量增加),需要训练的唯一规格数量呈指数增长。均匀采样这个空间会导致网络在非常具有挑战性的问题规格上表现良好,但在简单的问题规格上表现不佳,即使大误差也对总体均方误差的影响很小。本文提出了一种使用自适应采样/主动学习策略来训练去噪网络的方法。我们的工作改进了最近提出的一种方法。 + + Despite recent advances, developing general-purpose universal denoising and artifact-removal networks remains largely an open problem: Given fixed network weights, one inherently trades-off specialization at one task (e.g.,~removing Poisson noise) for performance at another (e.g.,~removing speckle noise). In addition, training such a network is challenging due to the curse of dimensionality: As one increases the dimensions of the specification-space (i.e.,~the number of parameters needed to describe the noise distribution) the number of unique specifications one needs to train for grows exponentially. Uniformly sampling this space will result in a network that does well at very challenging problem specifications but poorly at easy problem specifications, where even large errors will have a small effect on the overall mean squared error. In this work we propose training denoising networks using an adaptive-sampling/active-learning strategy. Our work improves upon a recently proposed un + +[^29]: Clover: 闭环可验证代码生成 + + Clover: Closed-Loop Verifiable Code Generation. (arXiv:2310.17807v1 [cs.SE]) + + [http://arxiv.org/abs/2310.17807](http://arxiv.org/abs/2310.17807) + + Clover是一种闭环可验证代码生成的范式,通过在代码、docstrings和形式注释之间进行一致性检查,确保生成的代码的正确性。 + + + + 在软件开发中,使用大型语言模型进行代码生成是一个快速增长的趋势。然而,如果没有有效的方法来确保生成的代码的正确性,这个趋势可能会导致许多不良结果。在本文中,我们提出了一个解决这个挑战的愿景:Clover范式,即闭环可验证代码生成,它将正确性检查简化为更可访问的一致性检查问题。在Clover的核心是一个检查器,它在代码、docstrings和形式注释之间进行一致性检查。该检查器使用了形式验证工具和大型语言模型的新颖集成实现。我们提供了理论分析来支持我们的论点,即Clover在一致性检查方面应该是有效的。我们还在一个由手工设计的数据集(CloverBench)上进行了实证调查,该数据集包含了注释的Dafny程序,难度水平与教科书相当。实验结果显示 + + The use of large language models for code generation is a rapidly growing trend in software development. However, without effective methods for ensuring the correctness of generated code, this trend could lead to any number of undesirable outcomes. In this paper, we lay out a vision for addressing this challenge: the Clover paradigm, short for Closed-Loop Verifiable Code Generation, which reduces correctness checking to the more accessible problem of consistency checking. At the core of Clover lies a checker that performs consistency checks among code, docstrings, and formal annotations. The checker is implemented using a novel integration of formal verification tools and large language models. We provide a theoretical analysis to support our thesis that Clover should be effective at consistency checking. We also empirically investigate its feasibility on a hand-designed dataset (CloverBench) featuring annotated Dafny programs at a textbook level of difficulty. Experimental results sho + +[^30]: Transformers学会了高阶优化方法用于上下文学习:一项与线性模型的研究 + + Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models. (arXiv:2310.17086v1 [cs.LG]) + + [http://arxiv.org/abs/2310.17086](http://arxiv.org/abs/2310.17086) + + Transformers学会了高阶优化方法,用于上下文学习,通过实现类似于迭代牛顿法的算法,而不是梯度下降。 + + + + Transformers在上下文学习中表现出色,但是它们是如何进行上下文学习仍然是一个谜。最近的研究表明,Transformers可能通过内部运行梯度下降,即一阶优化方法,来进行上下文学习。本文中,我们展示了Transformers学会了实现高阶优化方法来进行上下文学习。我们以上下文线性回归为重点,展示了Transformers学会了实现一个非常类似于迭代牛顿法的算法,而不是梯度下降。从实证上来看,我们展示了连续的Transformer层的预测与牛顿法的不同迭代非常接近,每个中间层大致计算了3次迭代。相比之下,需要指数级的梯度下降步骤才能匹配额外的Transformer层;这表明Transformers具有相当的收敛速率。 + + Transformers are remarkably good at in-context learning (ICL) -- learning from demonstrations without parameter updates -- but how they perform ICL remains a mystery. Recent work suggests that Transformers may learn in-context by internally running Gradient Descent, a first-order optimization method. In this paper, we instead demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL. Focusing on in-context linear regression, we show that Transformers learn to implement an algorithm very similar to Iterative Newton's Method, a higher-order optimization method, rather than Gradient Descent. Empirically, we show that predictions from successive Transformer layers closely match different iterations of Newton's Method linearly, with each middle layer roughly computing 3 iterations. In contrast, exponentially more Gradient Descent steps are needed to match an additional Transformers layer; this suggests that Transformers have an comparable rate of conv + +[^31]: 图去学习综述 + + A Survey of Graph Unlearning. (arXiv:2310.02164v2 [cs.LG] UPDATED) + + [http://arxiv.org/abs/2310.02164](http://arxiv.org/abs/2310.02164) + + 图去学习是负责任人工智能发展的重要进展,通过删除训练模型中的敏感数据痕迹来维护被遗忘的权利。这篇综述性论文首次系统回顾了图去学习的方法,包括了各种方法学,并提供了详细的分类和最新的文献综述,以帮助新进入这个领域的研究人员理解。与差分隐私的关系加深了对在这个背景下隐私保护技术的理解。 + + + + 图去学习是在追求负责任人工智能的过程中的重要进展,它提供了从训练模型中删除敏感数据痕迹的方法,以维护被遗忘的权利。显然,图机器学习对数据隐私和对抗攻击具有敏感性,因此需要应用图去学习技术来有效解决这些问题。在这篇综述性论文中,我们首次系统地回顾了图去学习的方法,涵盖了各种方法学,并提供了详细的分类和最新的文献综述,以帮助新进入这个领域的研究人员理解。此外,我们建立了图去学习与差分隐私之间的重要联系,增强了我们对在这个背景下隐私保护技术的相关性的理解。为了保证清晰度,我们对图去学习中使用的基本概念和评估指标进行了简明扼要的解释。 + + Graph unlearning emerges as a crucial advancement in the pursuit of responsible AI, providing the means to remove sensitive data traces from trained models, thereby upholding the right to be forgotten. It is evident that graph machine learning exhibits sensitivity to data privacy and adversarial attacks, necessitating the application of graph unlearning techniques to address these concerns effectively. In this comprehensive survey paper, we present the first systematic review of graph unlearning approaches, encompassing a diverse array of methodologies and offering a detailed taxonomy and up-to-date literature overview to facilitate the understanding of researchers new to this field. Additionally, we establish the vital connections between graph unlearning and differential privacy, augmenting our understanding of the relevance of privacy-preserving techniques in this context. To ensure clarity, we provide lucid explanations of the fundamental concepts and evaluation measures used in gr + +[^32]: 模型无关的图神经网络用于整合局部和全局信息的研究 + + A Model-Agnostic Graph Neural Network for Integrating Local and Global Information. (arXiv:2309.13459v1 [stat.ML]) + + [http://arxiv.org/abs/2309.13459](http://arxiv.org/abs/2309.13459) + + MaGNet是一种模型无关的图神经网络框架,能够顺序地整合不同顺序的信息,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。 + + + + 图神经网络(GNNs)在各种以图为重点的任务中取得了令人满意的性能。尽管取得了成功,但现有的GNN存在两个重要限制:由于黑盒特性,结果缺乏可解释性;无法学习不同顺序的表示。为了解决这些问题,我们提出了一种新的模型无关的图神经网络(MaGNet)框架,能够顺序地整合不同顺序的信息,从高阶邻居中提取知识,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。特别地,MaGNet由两个组件组成:图拓扑下复杂关系的潜在表示的估计模型和识别有影响力的节点、边和重要节点特征的解释模型。从理论上,我们通过经验Rademacher复杂度建立了MaGNet的泛化误差界,并展示了其强大的能力。 + + Graph Neural Networks (GNNs) have achieved promising performance in a variety of graph-focused tasks. Despite their success, existing GNNs suffer from two significant limitations: a lack of interpretability in results due to their black-box nature, and an inability to learn representations of varying orders. To tackle these issues, we propose a novel Model-agnostic Graph Neural Network (MaGNet) framework, which is able to sequentially integrate information of various orders, extract knowledge from high-order neighbors, and provide meaningful and interpretable results by identifying influential compact graph structures. In particular, MaGNet consists of two components: an estimation model for the latent representation of complex relationships under graph topology, and an interpretation model that identifies influential nodes, edges, and important node features. Theoretically, we establish the generalization error bound for MaGNet via empirical Rademacher complexity, and showcase its pow + +[^33]: 最优和公平的鼓励政策评估与学习 + + Optimal and Fair Encouragement Policy Evaluation and Learning. (arXiv:2309.07176v1 [cs.LG]) + + [http://arxiv.org/abs/2309.07176](http://arxiv.org/abs/2309.07176) + + 本研究探讨了在关键领域中针对鼓励政策的最优和公平评估以及学习的问题,研究发现在人类不遵循治疗建议的情况下,最优策略规则只是建议。同时,针对治疗的异质性和公平考虑因素,决策者的权衡和决策规则也会发生变化。在社会服务领域,研究显示存在一个使用差距问题,那些最有可能受益的人却无法获得这些益服务。 + + + + 在关键领域中,强制个体接受治疗通常是不可能的,因此在人类不遵循治疗建议的情况下,最优策略规则只是建议。在这些领域中,接受治疗的个体可能存在异质性,治疗效果也可能存在异质性。虽然最优治疗规则可以最大化整个人群的因果结果,但在鼓励的情况下,对于访问平等限制或其他公平考虑因素可能是相关的。例如,在社会服务领域,一个持久的难题是那些最有可能从中受益的人中那些获益服务的使用差距。当决策者对访问和平均结果都有分配偏好时,最优决策规则会发生变化。我们研究了因果识别、统计方差减少估计和稳健估计的最优治疗规则,包括在违反阳性条件的情况下。 + + In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We c + +[^34]: 针对金融指数跟踪的强化学习 + + Reinforcement Learning for Financial Index Tracking. (arXiv:2308.02820v1 [q-fin.PM]) + + [http://arxiv.org/abs/2308.02820](http://arxiv.org/abs/2308.02820) + + 本论文提出了针对金融指数跟踪问题的第一个具有动态性的离散时间无穷期模型,它克服了现有模型的一些局限,可以精确计算交易成本,同时考虑了跟踪误差和交易成本之间的权衡,并能有效利用长时间段的数据。我们使用深度强化学习方法解决该模型,解决了由于数据限制导致的问题。 + + + + 我们提出了第一个离散时间无穷期动态形式的金融指数跟踪问题,同时考虑到基于收益的跟踪误差和基于价值的跟踪误差。该模型克服了现有模型的局限性,包括不仅限于价格的市场信息变量的时间动态性,可以精确计算交易成本,考虑跟踪误差和交易成本之间的权衡,可以有效利用长时间段的数据等。该模型还引入了现金注入或提取的新的决策变量。我们提出了使用Banach不动点迭代求解投资组合再平衡方程的方法,可以准确计算实践中指定为交易量的非线性函数的交易成本。我们还提出了扩展深度强化学习(RL)方法来解决动态模型。我们的RL方法解决了由数据限制引起的问题。 + + We propose the first discrete-time infinite-horizon dynamic formulation of the financial index tracking problem under both return-based tracking error and value-based tracking error. The formulation overcomes the limitations of existing models by incorporating the intertemporal dynamics of market information variables not limited to prices, allowing exact calculation of transaction costs, accounting for the tradeoff between overall tracking error and transaction costs, allowing effective use of data in a long time period, etc. The formulation also allows novel decision variables of cash injection or withdraw. We propose to solve the portfolio rebalancing equation using a Banach fixed point iteration, which allows to accurately calculate the transaction costs specified as nonlinear functions of trading volumes in practice. We propose an extension of deep reinforcement learning (RL) method to solve the dynamic formulation. Our RL method resolves the issue of data limitation resulting fro + +[^35]: 用微笑揭示帕金森病:一种基于人工智能的筛查框架 + + Unmasking Parkinson's Disease with Smile: An AI-enabled Screening Framework. (arXiv:2308.02588v1 [eess.IV]) + + [http://arxiv.org/abs/2308.02588](http://arxiv.org/abs/2308.02588) + + 本研究使用微表情视频数据集开发了一种基于人工智能的帕金森病筛查框架,通过分析微笑视频中的特征,实现了89.7%的准确性和89.3%的AUROC值,同时在人群子组上没有检测到偏见。 + + + + 鉴于目前缺乏可靠的生物标志物和有限的临床护理资源,帕金森病(PD)的诊断仍然具有挑战性。在本研究中,我们使用包含微表情的最大视频数据集进行PD筛查的分析。我们收集了来自1,059名独立参与者的3,871个视频,其中包括256名自报PD患者。这些录像来自不同来源,包括多个国家的参与者家中、一家诊所和一个美国的PD护理机构。通过利用面部标志和行动单位,我们提取了与PD的一个主要症状Hypomimia(面部表情减少)相关的特征。在这些特征上训练的一组AI模型在保留数据上实现了89.7%的准确性和89.3%的接收者操作特性曲线下面积(AUROC),并且在性别和种族等人群子组上无可检测的偏见。进一步的分析揭示,仅通过微笑视频中的特征就可以获得可比较的准确性和AUROC值。 + + Parkinson's disease (PD) diagnosis remains challenging due to lacking a reliable biomarker and limited access to clinical care. In this study, we present an analysis of the largest video dataset containing micro-expressions to screen for PD. We collected 3,871 videos from 1,059 unique participants, including 256 self-reported PD patients. The recordings are from diverse sources encompassing participants' homes across multiple countries, a clinic, and a PD care facility in the US. Leveraging facial landmarks and action units, we extracted features relevant to Hypomimia, a prominent symptom of PD characterized by reduced facial expressions. An ensemble of AI models trained on these features achieved an accuracy of 89.7% and an Area Under the Receiver Operating Characteristic (AUROC) of 89.3% while being free from detectable bias across population subgroups based on sex and ethnicity on held-out data. Further analysis reveals that features from the smiling videos alone lead to comparable + +[^36]: 深度学习中遗忘现象的全面调查:超越连续学习 + + A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning. (arXiv:2307.09218v1 [cs.LG]) + + [http://arxiv.org/abs/2307.09218](http://arxiv.org/abs/2307.09218) + + 遗忘是深度学习中普遍存在的现象,不仅限于连续学习领域。解决遗忘问题面临多个挑战,包括平衡保留旧任务知识与快速学习新任务的挑战,管理任务干扰与冲突目标的挑战,以及防止隐私泄露等。遗忘不总是有害的,可以在某些情况下是有益且可取的,特别是在隐私保护场景中。 + + + + 遗忘指的是先前获取的信息或知识的丧失或恶化。尽管现有的关于遗忘的调查主要集中在连续学习方面,但在深度学习中,遗忘是一种普遍现象,可以在各种其他研究领域中观察到。遗忘在研究领域中表现出来,例如由于生成器漂移而在生成模型领域中表现出来,以及由于客户端之间存在异构数据分布而在联邦学习中表现出来。解决遗忘问题涉及到几个挑战,包括在快速学习新任务的同时平衡保留旧任务知识,管理任务干扰与冲突目标,以及防止隐私泄露等。此外,大多数现有的连续学习调查都默认认为遗忘总是有害的。相反,我们的调查认为遗忘是一把双刃剑,在某些情况下可以是有益且可取的,例如隐私保护场景。通过在更广泛的背景下探讨遗忘现象, + + Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context + +[^37]: 机器翻译可解释性评估指标的探索 + + Towards Explainable Evaluation Metrics for Machine Translation. (arXiv:2306.13041v1 [cs.CL]) + + [http://arxiv.org/abs/2306.13041](http://arxiv.org/abs/2306.13041) + + 本研究探索机器翻译可解释性评估指标,提供综合综述和最新方法,并贡献下一代方法的愿景。 + + + + 与传统的词汇重叠度量(如BLEU)不同,大多数当前用于机器翻译评估的指标(例如COMET或BERTScore)基于黑盒子的大型语言模型。它们通常与人类判断具有强相关性,但是最近的研究表明,较低质量的传统指标仍然占主导地位,其中一个潜在原因是它们的决策过程更透明。因此,为了促进新的高质量指标的更广泛接受,解释性变得至关重要。在这篇概念论文中,我们确定了可解释机器翻译指标的关键属性和目标,并提供了最近技术的综合综述,将它们与我们确立的目标和属性联系起来。在这个背景下,我们还讨论基于生成模型(如ChatGPT和GPT4)的可解释指标的最新先进方法。最后,我们贡献了下一代方法的愿景,包括自然语言e。 + + Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for machine translation (for example, COMET or BERTScore) are based on black-box large language models. They often achieve strong correlations with human judgments, but recent research indicates that the lower-quality classical metrics remain dominant, one of the potential reasons being that their decision processes are more transparent. To foster more widespread acceptance of novel high-quality metrics, explainability thus becomes crucial. In this concept paper, we identify key properties as well as key goals of explainable machine translation metrics and provide a comprehensive synthesis of recent techniques, relating them to our established goals and properties. In this context, we also discuss the latest state-of-the-art approaches to explainable metrics based on generative models such as ChatGPT and GPT4. Finally, we contribute a vision of next-generation approaches, including natural language e + +[^38]: 基于有限维谱动态嵌入的随机非线性控制 + + Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding. (arXiv:2304.03907v1 [cs.LG]) + + [http://arxiv.org/abs/2304.03907](http://arxiv.org/abs/2304.03907) + + 本文提出了一种基于有限维特征逼近的非线性动态谱嵌入控制算法(SDEC)用于解决随机非线性系统的最优控制问题,并对其进行了理论分析和实验测试。 + + + + 随机非线性系统的最优控制一直是一个棘手的问题。Ren等人引入了谱动态嵌入来开发控制未知系统的强化学习方法。它使用无穷维特征来线性表示状态值函数,并利用有限维的截断逼近进行实际实现。然而,在已知模型的情况下,控制中的有限维逼近性质尚未得到研究。在本文中,我们提出了一种可行的随机非线性控制算法,利用基于有限维特征逼近的非线性动态谱嵌入控制(SDEC),并进行深入的理论分析,以表征由有限维截断引起的逼近误差和由有限样本逼近引起的统计误差,同时进行政策评估和政策优化的实验测试和比较。 + + Optimal control is notoriously difficult for stochastic nonlinear systems. Ren et al. introduced Spectral Dynamics Embedding for developing reinforcement learning methods for controlling an unknown system. It uses an infinite-dimensional feature to linearly represent the state-value function and exploits finite-dimensional truncation approximation for practical implementation. However, the finite-dimensional approximation properties in control have not been investigated even when the model is known. In this paper, we provide a tractable stochastic nonlinear control algorithm that exploits the nonlinear dynamics upon the finite-dimensional feature approximation, Spectral Dynamics Embedding Control (SDEC), with an in-depth theoretical analysis to characterize the approximation error induced by the finite-dimension truncation and statistical error induced by finite-sample approximation in both policy evaluation and policy optimization. We also empirically test the algorithm and compare th + +[^39]: 平滑的非平稳连续赌博机 + + Smooth Non-Stationary Bandits. (arXiv:2301.12366v2 [cs.LG] UPDATED) + + [http://arxiv.org/abs/2301.12366](http://arxiv.org/abs/2301.12366) + + 本文提出了一种非平稳两臂赌博机问题的策略,能够处理平滑变化,并证明了该策略在二次Lipschitz连续的情况下的遗憾为 $\tilde O(T^{3/5})$。 + + + + 在许多在线决策应用中,环境都是非平稳的,因此使用能够处理变化的赌博算法至关重要。大多数现有方法是为了保护非平滑变化而设计的,仅受到总变差或时间上的Lipschitz性的限制,其中它们保证$\tilde \Theta(T^{2/3})$的遗憾。然而,在实践中,环境经常以平稳的方式改变,因此这种算法可能会在这些设置中产生比必要更高的遗憾,并且不利用变化率的信息。我们研究了一个非平稳的两臂赌博机问题,假设臂的平均回报是一个$\beta$-H\''older函数,即它是$(\beta-1)$次Lipschitz连续可微分的,我们展示了一个策略,对于$\beta=2$,它的遗憾为$\tilde O(T^{3/5})$,从而首次在平滑和非平滑之间进行了区分。我们通过一个任意$\Omg(T^{(\beta+1)/(2\beta+1)})$的下界来补充这个结果,说明了这个问题的困难程度。 + + In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee $\tilde \Theta(T^{2/3})$ regret. However, in practice environments are often changing {\bf smoothly}, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the rate of change. We study a non-stationary two-armed bandits problem where we assume that an arm's mean reward is a $\beta$-H\"older function over (normalized) time, meaning it is $(\beta-1)$-times Lipschitz-continuously differentiable. We show the first separation between the smooth and non-smooth regimes by presenting a policy with $\tilde O(T^{3/5})$ regret for $\beta=2$. We complement this result by an $\Omg(T^{(\beta+1)/(2\beta+1)})$ lower bound for any int + +[^40]: 深度学习模型的功能性神经编码分析 + + Analysis of functional neural codes of deep learning models. (arXiv:2205.10952v2 [cs.LG] UPDATED) + + [http://arxiv.org/abs/2205.10952](http://arxiv.org/abs/2205.10952) + + 本研究使用自组织映射(SOM)分析了深度学习模型中与决策相关的内部编码,发现浅层将特征压缩到紧凑空间中,而深层将特征空间扩展,并指出压缩特征可能导致对敌对扰动的脆弱性。 + + + + 深度神经网络(DNNs)作为深度学习(DL)的代理,需要大量的并行/顺序操作。这使得理解DNNs的操作变得困难,阻碍了适当的诊断。在没有对其内部过程有更好的了解之前,在高风险领域部署DNNs可能导致灾难性故障。因此,为了构建更可靠的DNNs/DL来解决高风险现实世界问题,我们必须深入了解DNNs决策背后的内部操作。在这里,我们使用自组织映射(SOM)分析与DNNs决策相关的DL模型的内部编码。我们的分析表明,靠近输入层的浅层将特征压缩到紧凑空间中,而靠近输出层的深层将特征空间扩展。我们还发现有证据表明,压缩特征可能导致DNNs对敌对扰动的脆弱性。 + + Deep neural networks (DNNs), the agents of deep learning (DL), require a massive number of parallel/sequential operations. This makes it difficult to comprehend DNNs' operations and impedes proper diagnosis. Without better knowledge of their internal process, deploying DNNs in high-stakes domains can lead to catastrophic failures. Therefore, to build more reliable DNNs/DL to be deployed in high-stakes real-world problems, it is imperative that we gain insights into DNNs' internal operations underlying their decision-making. Here, we use the self-organizing map (SOM) to analyze DL models' internal codes associated with DNNs' decision-making. Our analyses suggest that shallow layers close to the input layer compress features into condensed space and that deep layers close to the output layer expand feature space. We also found evidence indicating that compressed features may underlie DNNs' vulnerabilities to adversarial perturbations. diff --git a/cs.LG.xml b/cs.LG.xml index 907fe64ef..ead8b9418 100644 --- a/cs.LG.xml +++ b/cs.LG.xml @@ -1,181 +1,801 @@ -Chat Arxiv cs.LGhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.LGBAdam提出了一种内存高效的全参数微调大型语言模型的方法,并在实验中展现出优越的收敛行为以及在性能评估中的优势。https://arxiv.org/abs/2404.02827<p> -BAdam:面向大型语言模型的内存高效全参数训练方法 +Chat Arxiv cs.LGhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.LGSugarcaneNet2024是通过优化加权平均集成LASSO正则化的预训练模型,在甘蔗病害分类中表现出色,具有快速准确的检测能力。https://arxiv.org/abs/2403.18870<p> +SugarcaneNet2024: LASSO正则化的预训练模型的优化加权平均集成方法用于甘蔗病害分类 </p> <p> -BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models +SugarcaneNet2024: An Optimized Weighted Average Ensemble Approach of LASSO Regularized Pre-trained Models for Sugarcane Disease Classification </p> <p> -https://arxiv.org/abs/2404.02827 +https://arxiv.org/abs/2403.18870 </p> <p> -BAdam提出了一种内存高效的全参数微调大型语言模型的方法,并在实验中展现出优越的收敛行为以及在性能评估中的优势。 +SugarcaneNet2024是通过优化加权平均集成LASSO正则化的预训练模型,在甘蔗病害分类中表现出色,具有快速准确的检测能力。 </p> <p> </p> <p> -这项工作提出了BAdam,这是一种利用Adam作为内部求解器的块坐标优化框架的优化器。BAdam提供了一种内存高效的方法,用于对大型语言模型进行全参数微调,并且由于链式规则属性减少了反向过程的运行时间。在实验中,我们将BAdam应用于在Alpaca-GPT4数据集上使用单个RTX3090-24GB GPU进行指导微调的Llama 2-7B模型。结果表明,与LoRA和LOMO相比,BAdam展现出了优越的收敛行为。此外,我们通过使用MT-bench对指导微调模型进行下游性能评估,结果显示BAdam在适度超越LoRA的基础上更显著地优于LOMO。最后,我们将BAdam与Adam在中等任务上进行了比较,即在SuperGLUE基准上对RoBERTa-large进行微调。结果表明,BAdam能够缩小与Adam之间的性能差距。我们的代码 +甘蔗作为世界糖业的关键作物,容易受多种病害侵害,这些病害对其产量和质量都有重大负面影响。为了有效管理和实施预防措施,必须及时准确地检测病害。本研究提出了一种名为SugarcaneNet2024的独特模型,通过叶片图像处理,能够优于先前方法自动快速检测甘蔗病害。我们提出的模型汇总了七个定制的、经过LASSO正则化的预训练模型的优化加权平均集成,特别是InceptionV3、InceptionResNetV2、DenseNet201、DenseNet169、Xception和ResNet152V2。最初,我们在这些预训练模型底部添加了三层更密集层,具有0.0001的LASSO正则化,三个30%的dropout层和三个启用renorm的批量归一化,以提高性能。 </p> <p> -arXiv:2404.02827v1 Announce Type: new Abstract: This work presents BAdam, an optimizer that leverages the block coordinate optimization framework with Adam as the inner solver. BAdam offers a memory efficient approach to the full parameter finetuning of large language models and reduces running time of the backward process thanks to the chain rule property. Experimentally, we apply BAdam to instruction-tune the Llama 2-7B model on the Alpaca-GPT4 dataset using a single RTX3090-24GB GPU. The results indicate that BAdam exhibits superior convergence behavior in comparison to LoRA and LOMO. Furthermore, our downstream performance evaluation of the instruction-tuned models using the MT-bench shows that BAdam modestly surpasses LoRA and more substantially outperforms LOMO. Finally, we compare BAdam with Adam on a medium-sized task, i.e., finetuning RoBERTa-large on the SuperGLUE benchmark. The results demonstrate that BAdam is capable of narrowing the performance gap with Adam. Our code is -</p>本文研究了使用监督神经网络时间序列分类(NN TSC)预测军事背景下群体自主体的关键属性和战术,以及展示了NN TSC在快速推断攻击群体情报方面的有效性。https://arxiv.org/abs/2403.19572<p> -使用神经网络对群体特性进行分类 +arXiv:2403.18870v1 Announce Type: cross Abstract: Sugarcane, a key crop for the world's sugar industry, is prone to several diseases that have a substantial negative influence on both its yield and quality. To effectively manage and implement preventative initiatives, diseases must be detected promptly and accurately. In this study, we present a unique model called sugarcaneNet2024 that outperforms previous methods for automatically and quickly detecting sugarcane disease through leaf image processing. Our proposed model consolidates an optimized weighted average ensemble of seven customized and LASSO-regularized pre-trained models, particularly InceptionV3, InceptionResNetV2, DenseNet201, DenseNet169, Xception, and ResNet152V2. Initially, we added three more dense layers with 0.0001 LASSO regularization, three 30% dropout layers, and three batch normalizations with renorm enabled at the bottom of these pre-trained models to improve the performance. The accuracy of sugarcane leaf dise +</p>本研究通过比较深度学习模型在脑卒中分割上的表现,探讨了是否需要高级别设计来获得最佳结果。https://arxiv.org/abs/2403.17177<p> +使用深度学习模型进行脑卒中分割:一项比较研究 </p> <p> -Swarm Characteristics Classification Using Neural Networks +Brain Stroke Segmentation Using Deep Learning Models: A Comparative Study </p> <p> -https://arxiv.org/abs/2403.19572 +https://arxiv.org/abs/2403.17177 </p> <p> -本文研究了使用监督神经网络时间序列分类(NN TSC)预测军事背景下群体自主体的关键属性和战术,以及展示了NN TSC在快速推断攻击群体情报方面的有效性。 +本研究通过比较深度学习模型在脑卒中分割上的表现,探讨了是否需要高级别设计来获得最佳结果。 </p> <p> </p> <p> -理解群体自主体的特性对于国防和安全应用至关重要。本文介绍了使用监督神经网络时间序列分类(NN TSC)来预测军事环境中群体自主体的关键属性和战术的研究。具体地,NN TSC被应用于推断两个二进制属性 - 通信和比例导航 - 这两者结合定义了四种互斥的群体战术。我们发现文献中对于使用神经网络进行群体分类存在一定的空白,并展示了NN TSC在快速推断有关攻击群体情报以指导反制动作方面的有效性。通过模拟的群体对战,我们评估了NN TSC在观察窗口要求、噪声鲁棒性和对群体规模的可扩展性方面的性能。关键发现显示NN能够使用较短的观察窗口以97%的准确率预测群体行为。 +脑卒中分割在脑卒中患者的诊断和治疗中发挥着关键作用,通过提供受影响脑区域的空间信息和受损程度。准确分割脑卒中病变是一项具有挑战性的任务,因为传统的手工技术耗时且容易出错。最近,先进的深度模型已被引入用于一般医学图像分割,展示出在特定数据集上评估时超越许多最先进网络的有前景结果。随着视觉Transformer的出现,已经基于它们引入了几种模型,而其他一些则旨在设计基于传统卷积层来提取像Transformer这样的长程依赖的更好模块。是否对所有分割案例都需要这样高级别的设计来实现最佳结果的问题尚未得到解答。在这项研究中,我们选择了四种类型的深度学习模型 </p> <p> -arXiv:2403.19572v1 Announce Type: new Abstract: Understanding the characteristics of swarming autonomous agents is critical for defense and security applications. This article presents a study on using supervised neural network time series classification (NN TSC) to predict key attributes and tactics of swarming autonomous agents for military contexts. Specifically, NN TSC is applied to infer two binary attributes - communication and proportional navigation - which combine to define four mutually exclusive swarm tactics. We identify a gap in literature on using NNs for swarm classification and demonstrate the effectiveness of NN TSC in rapidly deducing intelligence about attacking swarms to inform counter-maneuvers. Through simulated swarm-vs-swarm engagements, we evaluate NN TSC performance in terms of observation window requirements, noise robustness, and scalability to swarm size. Key findings show NNs can predict swarm behaviors with 97% accuracy using short observation windows of -</p>深度学习是解决复杂问题的强大工具,本研究旨在全面审视深度学习模型及其应用的最新发展https://arxiv.org/abs/2403.17561<p> -深度学习及其最新应用综述 +arXiv:2403.17177v1 Announce Type: cross Abstract: Stroke segmentation plays a crucial role in the diagnosis and treatment of stroke patients by providing spatial information about affected brain regions and the extent of damage. Segmenting stroke lesions accurately is a challenging task, given that conventional manual techniques are time consuming and prone to errors. Recently, advanced deep models have been introduced for general medical image segmentation, demonstrating promising results that surpass many state of the art networks when evaluated on specific datasets. With the advent of the vision Transformers, several models have been introduced based on them, while others have aimed to design better modules based on traditional convolutional layers to extract long-range dependencies like Transformers. The question of whether such high-level designs are necessary for all segmentation cases to achieve the best results remains unanswered. In this study, we selected four types of deep +</p>提出了一种针对高维回归的自适应迁移学习方法,可以根据可迁移结构自适应检测和聚合特征和样本的可迁移结构。https://arxiv.org/abs/2403.13565<p> +AdaTrans:针对高维回归的特征自适应与样本自适应迁移学习 </p> <p> -A Survey on Deep Learning and State-of-the-arts Applications +AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression </p> <p> -https://arxiv.org/abs/2403.17561 +https://arxiv.org/abs/2403.13565 </p> <p> -深度学习是解决复杂问题的强大工具,本研究旨在全面审视深度学习模型及其应用的最新发展 +提出了一种针对高维回归的自适应迁移学习方法,可以根据可迁移结构自适应检测和聚合特征和样本的可迁移结构。 </p> <p> </p> <p> -深度学习, 是人工智能的一个分支,是一种利用多层互连单元(神经元)从原始输入数据中直接学习复杂模式和表示的计算模型。受到这种学习能力的赋能,深度学习已成为解决复杂问题的强大工具,是许多突破性技术和创新的核心驱动力。构建深度学习模型是一项具有挑战性的任务,因为算法的复杂性和现实问题的动态性。有几项研究回顾了深度学习的概念和应用。然而,这些研究大多集中于深度学习模型类型和卷积神经网络架构,对深度学习模型及其在不同领域解决复杂问题的最新发展的覆盖面有限。因此,受到这些限制的启发,本研究旨在全面审视th +我们考虑高维背景下的迁移学习问题,在该问题中,特征维度大于样本大小。为了学习可迁移的信息,该信息可能在特征或源样本之间变化,我们提出一种自适应迁移学习方法,可以检测和聚合特征-wise (F-AdaTrans)或样本-wise (S-AdaTrans)可迁移结构。我们通过采用一种新颖的融合惩罚方法,结合权重,可以根据可迁移结构进行调整。为了选择权重,我们提出了一个在理论上建立,数据驱动的过程,使得 F-AdaTrans 能够选择性地将可迁移的信号与目标融合在一起,同时滤除非可迁移的信号,S-AdaTrans则可以获得每个源样本传递的信息的最佳组合。我们建立了非渐近速率,可以在特殊情况下恢复现有的近最小似乎最优速率。效果证明... </p> <p> -arXiv:2403.17561v1 Announce Type: new Abstract: Deep learning, a branch of artificial intelligence, is a computational model that uses multiple layers of interconnected units (neurons) to learn intricate patterns and representations directly from raw input data. Empowered by this learning capability, it has become a powerful tool for solving complex problems and is the core driver of many groundbreaking technologies and innovations. Building a deep learning model is a challenging task due to the algorithm`s complexity and the dynamic nature of real-world problems. Several studies have reviewed deep learning concepts and applications. However, the studies mostly focused on the types of deep learning models and convolutional neural network architectures, offering limited coverage of the state-of-the-art of deep learning models and their applications in solving complex problems across different domains. Therefore, motivated by the limitations, this study aims to comprehensively review th -</p>ThermoHands提出了一个新的基准ThermoHands,旨在解决热图中主观视角3D手部姿势估计的挑战,介绍了一个具有双transformer模块的定制基线方法TheFormer,表明热成像在恶劣条件下实现稳健的3D手部姿势估计的有效性。https://arxiv.org/abs/2403.09871<p> -ThermoHands:一种用于从主观视角热图中估计3D手部姿势的基准 +arXiv:2403.13565v1 Announce Type: cross Abstract: We consider the transfer learning problem in the high dimensional setting, where the feature dimension is larger than the sample size. To learn transferable information, which may vary across features or the source samples, we propose an adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures. We achieve this by employing a novel fused-penalty, coupled with weights that can adapt according to the transferable structure. To choose the weight, we propose a theoretically informed, data-driven procedure, enabling F-AdaTrans to selectively fuse the transferable signals with the target while filtering out non-transferable signals, and S-AdaTrans to obtain the optimal combination of information transferred from each source sample. The non-asymptotic rates are established, which recover existing near-minimax optimal rates in special cases. The effectivene +</p>设计定价和匹配算法以最大化平台利润,在未知需求和供应函数下,保持顾客和服务器队列长度低于阈值https://arxiv.org/abs/2403.11093<p> +基于学习的双边队列定价和匹配 </p> <p> -ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image +Learning-Based Pricing and Matching for Two-Sided Queues </p> <p> -https://arxiv.org/abs/2403.09871 +https://arxiv.org/abs/2403.11093 </p> <p> -ThermoHands提出了一个新的基准ThermoHands,旨在解决热图中主观视角3D手部姿势估计的挑战,介绍了一个具有双transformer模块的定制基线方法TheFormer,表明热成像在恶劣条件下实现稳健的3D手部姿势估计的有效性。 +设计定价和匹配算法以最大化平台利润,在未知需求和供应函数下,保持顾客和服务器队列长度低于阈值 </p> <p> </p> <p> -在这项工作中,我们提出了ThermoHands,这是一个针对基于热图的主观视角3D手部姿势估计的新基准,旨在克服诸如光照变化和遮挡(例如手部穿戴物)等挑战。该基准包括来自28名主体进行手-物体和手-虚拟交互的多样数据集,经过自动化过程准确标注了3D手部姿势。我们引入了一个定制的基线方法TheFormer,利用双transformer模块在热图中实现有效的主观视角3D手部姿势估计。我们的实验结果突显了TheFormer的领先性能,并确认了热成像在实现恶劣条件下稳健的3D手部姿势估计方面的有效性。 +我们考虑一个具有多种类型顾客和服务器的动态系统。每种等待的顾客或服务器加入一个单独的队列,形成一个具有顾客队列和服务器队列的二部图。平台可以匹配服务器和顾客,如果它们的类型是兼容的。匹配的对将离开系统。平台将根据顾客的类型收取一个价格,当它们到达时,并根据其类型向服务器支付一个价格。每个队列的到达率取决于某些未知的需求或供应函数按价格确定。我们的目标是设计定价和匹配算法,以最大化平台在未知需求和供应函数下的利润,同时保持顾客和服务器的队列长度低于预定阈值。这个系统可以用来建模像乘车共享市场这样的双边市场,有乘客和司机。挑战在于 </p> <p> -arXiv:2403.09871v1 Announce Type: cross Abstract: In this work, we present ThermoHands, a new benchmark for thermal image-based egocentric 3D hand pose estimation, aimed at overcoming challenges like varying lighting and obstructions (e.g., handwear). The benchmark includes a diverse dataset from 28 subjects performing hand-object and hand-virtual interactions, accurately annotated with 3D hand poses through an automated process. We introduce a bespoken baseline method, TheFormer, utilizing dual transformer modules for effective egocentric 3D hand pose estimation in thermal imagery. Our experimental results highlight TheFormer's leading performance and affirm thermal imaging's effectiveness in enabling robust 3D hand pose estimation in adverse conditions. -</p>DEEP-IoT通过“更多监听,更少传输”的策略,挑战和转变了传统的物联网通信模型,大幅降低能耗并提高设备寿命。https://arxiv.org/abs/2403.00321<p> -DEEP-IoT: 下行增强型高效能物联网 +arXiv:2403.11093v1 Announce Type: cross Abstract: We consider a dynamic system with multiple types of customers and servers. Each type of waiting customer or server joins a separate queue, forming a bipartite graph with customer-side queues and server-side queues. The platform can match the servers and customers if their types are compatible. The matched pairs then leave the system. The platform will charge a customer a price according to their type when they arrive and will pay a server a price according to their type. The arrival rate of each queue is determined by the price according to some unknown demand or supply functions. Our goal is to design pricing and matching algorithms to maximize the profit of the platform with unknown demand and supply functions, while keeping queue lengths of both customers and servers below a predetermined threshold. This system can be used to model two-sided markets such as ride-sharing markets with passengers and drivers. The difficulties of the pr +</p>可解释的机器学习在生存分析中的应用促进了透明度和公平性,揭示了模型的潜在偏见和限制,并提供了更符合数学原理的特征影响和风险因素预测方法。https://arxiv.org/abs/2403.10250<p> +可解释的机器学习用于生存分析 </p> <p> -DEEP-IoT: Downlink-Enhanced Efficient-Power Internet of Things +Interpretable Machine Learning for Survival Analysis </p> <p> -https://arxiv.org/abs/2403.00321 +https://arxiv.org/abs/2403.10250 </p> <p> -DEEP-IoT通过“更多监听,更少传输”的策略,挑战和转变了传统的物联网通信模型,大幅降低能耗并提高设备寿命。 +可解释的机器学习在生存分析中的应用促进了透明度和公平性,揭示了模型的潜在偏见和限制,并提供了更符合数学原理的特征影响和风险因素预测方法。 </p> <p> </p> <p> -本文介绍了DEEP-IoT,这是一种具有革命意义的通信范例,旨在重新定义物联网设备之间的通信方式。通过开创性的“更多监听,更少传输”的策略,DEEP-IoT挑战和转变了传统的发送方(物联网设备)为中心的通信模型,将接收方(接入点)作为关键角色,从而降低能耗并延长设备寿命。我们不仅概念化了DEEP-IoT,还通过在窄带系统中集成深度学习增强的反馈信道编码来实现它。模拟结果显示,IoT单元的运行寿命显著提高,比使用Turbo和Polar编码的传统系统提高了最多52.71%。这一进展标志着一种变革。 +随着黑盒机器学习模型的传播和快速进步,可解释的机器学习(IML)领域或可解释的人工智能(XAI)在过去十年中变得越来越重要。 这在生存分析领域尤为重要,其中采用IML技术促进了透明度、问责制和公平性,特别是在临床决策过程、有针对性疗法的开发、干预或其他医学或与医疗保健相关的环境中。 具体来说,可解释性可以揭示生存模型的潜在偏见和局限性,并提供更符合数学原理的方法来理解哪些特征对预测有影响或构成风险因素。 然而,缺乏即时可用的IML方法可能已经阻碍了医学从业者和公共卫生政策制定者充分利用机器学习的潜力。 </p> <p> -arXiv:2403.00321v1 Announce Type: cross Abstract: At the heart of the Internet of Things (IoT) -- a domain witnessing explosive growth -- the imperative for energy efficiency and the extension of device lifespans has never been more pressing. This paper presents DEEP-IoT, a revolutionary communication paradigm poised to redefine how IoT devices communicate. Through a pioneering "listen more, transmit less" strategy, DEEP-IoT challenges and transforms the traditional transmitter (IoT devices)-centric communication model to one where the receiver (the access point) play a pivotal role, thereby cutting down energy use and boosting device longevity. We not only conceptualize DEEP-IoT but also actualize it by integrating deep learning-enhanced feedback channel codes within a narrow-band system. Simulation results show a significant enhancement in the operational lifespan of IoT cells -- surpassing traditional systems using Turbo and Polar codes by up to 52.71%. This leap signifies a paradi -</p>提出了一种协方差自适应的最小二乘算法,利用在线估计协方差结构,相对于基于代理方差的算法获得改进的遗憾上界,特别在协方差系数全为非负时,能有效地利用半臂反馈,并在各种参数设置下表现优异。https://arxiv.org/abs/2402.15171<p> -用于随机组合半臂老虎机的协方差自适应最小二乘算法 +arXiv:2403.10250v1 Announce Type: cross Abstract: With the spread and rapid advancement of black box machine learning models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability and fairness in sensitive areas, such as clinical decision making processes, the development of targeted therapies, interventions or in other medical or healthcare related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred medical practitioners and policy makers in public health from leveraging the full potential of machine lea +</p>本文引入了一种名为SHERD的新技术,通过监控图神经网络(GNNs)早期训练表示中的信息,利用标准距离度量检测易受攻击节点,从而在图输入中实现性能和对抗鲁棒性。https://arxiv.org/abs/2403.09901<p> +通过监控早期训练表示来实现鲁棒的子图学习 </p> <p> -Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits +Robust Subgraph Learning by Monitoring Early Training Representations </p> <p> -https://arxiv.org/abs/2402.15171 +https://arxiv.org/abs/2403.09901 </p> <p> -提出了一种协方差自适应的最小二乘算法,利用在线估计协方差结构,相对于基于代理方差的算法获得改进的遗憾上界,特别在协方差系数全为非负时,能有效地利用半臂反馈,并在各种参数设置下表现优异。 +本文引入了一种名为SHERD的新技术,通过监控图神经网络(GNNs)早期训练表示中的信息,利用标准距离度量检测易受攻击节点,从而在图输入中实现性能和对抗鲁棒性。 </p> <p> </p> <p> -我们解决了随机组合半臂老虎机问题,其中玩家可以从包含d个基本项的P个子集中进行选择。大多数现有算法(如CUCB、ESCB、OLS-UCB)需要对奖励分布有先验知识,比如子高斯代理-方差的上界,这很难准确估计。在这项工作中,我们设计了OLS-UCB的方差自适应版本,依赖于协方差结构的在线估计。在实际设置中,估计协方差矩阵的系数要容易得多,并且相对于基于代理方差的算法,导致改进的遗憾上界。当协方差系数全为非负时,我们展示了我们的方法有效地利用了半臂反馈,并且可以明显优于老虎机反馈方法,在指数级别P≫d以及P≤d的情况下,这一点并不来自大多数现有分析。 +引文:2403.09901v1 公告类型:新摘要:图神经网络(GNNs)因在图学习和节点分类任务中表现出色而引起了广泛关注。然而,它们对对抗性攻击的脆弱性,特别是通过易受攻击的节点,给决策制定带来了挑战。鲁棒的图摘要需求在于对抗性挑战会导致攻击在整个图中传播。在本文中,我们通过引入新颖的技术SHERD (通过早期训练表示距离进行子图学习)来解决图输入中的性能和对抗鲁棒性。SHERD利用部分训练的图卷积网络(GCN)的层信息,通过标准距离度量来检测对抗攻击期间易受攻击的节点。该方法识别出"易受攻击的(坏)"节点并移除这些节点,形成一个鲁棒的子图,同时保持节点分类性能。 </p> <p> -arXiv:2402.15171v1 Announce Type: new Abstract: We address the problem of stochastic combinatorial semi-bandits, where a player can select from P subsets of a set containing d base items. Most existing algorithms (e.g. CUCB, ESCB, OLS-UCB) require prior knowledge on the reward distribution, like an upper bound on a sub-Gaussian proxy-variance, which is hard to estimate tightly. In this work, we design a variance-adaptive version of OLS-UCB, relying on an online estimation of the covariance structure. Estimating the coefficients of a covariance matrix is much more manageable in practical settings and results in improved regret upper bounds compared to proxy variance-based algorithms. When covariance coefficients are all non-negative, we show that our approach efficiently leverages the semi-bandit feedback and provably outperforms bandit feedback approaches, not only in exponential regimes where P $\gg$ d but also when P $\le$ d, which is not straightforward from most existing analyses. -</p>CLCE方法结合了标签感知对比学习与交叉熵损失,通过协同利用难例挖掘提高了性能表现https://arxiv.org/abs/2402.14551<p> -CLCE:一种优化学习融合的改进交叉熵和对比学习方法 +arXiv:2403.09901v1 Announce Type: new Abstract: Graph neural networks (GNNs) have attracted significant attention for their outstanding performance in graph learning and node classification tasks. However, their vulnerability to adversarial attacks, particularly through susceptible nodes, poses a challenge in decision-making. The need for robust graph summarization is evident in adversarial challenges resulting from the propagation of attacks throughout the entire graph. In this paper, we address both performance and adversarial robustness in graph input by introducing the novel technique SHERD (Subgraph Learning Hale through Early Training Representation Distances). SHERD leverages information from layers of a partially trained graph convolutional network (GCN) to detect susceptible nodes during adversarial attacks using standard distance metrics. The method identifies "vulnerable (bad)" nodes and removes such nodes to form a robust subgraph while maintaining node classification perf +</p>提出新指标对抗超体积来全面评估深度学习模型在多种扰动强度下的鲁棒性,并采用新型训练算法来提高对抗鲁棒性。https://arxiv.org/abs/2403.05100<p> +探索对抗界限:通过对抗超体积量化鲁棒性 </p> <p> -CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion +Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume </p> <p> -https://arxiv.org/abs/2402.14551 +https://arxiv.org/abs/2403.05100 </p> <p> -CLCE方法结合了标签感知对比学习与交叉熵损失,通过协同利用难例挖掘提高了性能表现 +提出新指标对抗超体积来全面评估深度学习模型在多种扰动强度下的鲁棒性,并采用新型训练算法来提高对抗鲁棒性。 </p> <p> </p> <p> -最先进的预训练图像模型主要采用两阶段方法:在大规模数据集上进行初始无监督预训练,然后使用交叉熵损失(CE)进行特定任务的微调。然而,已经证明CE可能会损害模型的泛化性和稳定性。为了解决这些问题,我们引入了一种名为CLCE的新方法,该方法将标签感知对比学习与CE相结合。我们的方法不仅保持了两种损失函数的优势,而且以协同方式利用难例挖掘来增强性能。 +在深度学习模型面临日益严重的对抗攻击威胁,特别是在安全关键领域,强调了对鲁棒深度学习系统的需求。传统的鲁棒性评估依赖于对抗准确性,该指标衡量模型在特定扰动强度下的性能。然而,这一单一指标并不能完全概括模型对不同程度扰动的整体韧性。为了填补这一空白,我们提出了一种新的指标,称为对抗超体积,从多目标优化的角度综合评估了深度学习模型在一系列扰动强度下的鲁棒性。该指标允许深入比较防御机制,并承认了较弱的防御策略所带来的鲁棒性改进。此外,我们采用了一种提高对抗鲁棒性均匀性的新型训练算法。 </p> <p> -arXiv:2402.14551v1 Announce Type: cross Abstract: State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been demonstrated that CE can compromise model generalization and stability. While recent works employing contrastive learning address some of these limitations by enhancing the quality of embeddings and producing better decision boundaries, they often overlook the importance of hard negative mining and rely on resource intensive and slow training using large sample batches. To counter these issues, we introduce a novel approach named CLCE, which integrates Label-Aware Contrastive Learning with CE. Our approach not only maintains the strengths of both loss functions but also leverages hard negative mining in a synergistic way to enhance performance. Experimental results demonstrate that CLCE significantly outperf -</p>ConSmax是一种硬件友好型Softmax替代方案,通过引入可学习参数,在不影响性能的情况下实现了对原Softmax关键任务的高效处理。https://arxiv.org/abs/2402.10930<p> -ConSmax: 具有可学习参数的硬件友好型Softmax替代方案 +arXiv:2403.05100v1 Announce Type: cross Abstract: The escalating threat of adversarial attacks on deep learning models, particularly in security-critical fields, has underscored the need for robust deep learning systems. Conventional robustness evaluations have relied on adversarial accuracy, which measures a model's performance under a specific perturbation intensity. However, this singular metric does not fully encapsulate the overall resilience of a model against varying degrees of perturbation. To address this gap, we propose a new metric termed adversarial hypervolume, assessing the robustness of deep learning models comprehensively over a range of perturbation intensities from a multi-objective optimization standpoint. This metric allows for an in-depth comparison of defense mechanisms and recognizes the trivial improvements in robustness afforded by less potent defensive strategies. Additionally, we adopt a novel training algorithm that enhances adversarial robustness uniformly +</p>本文提出了一种基于差分进化范式的新颖遗传模拟策略,用于解决半监督聚类问题,是第一次在这个领域尝试定义这样的方法。https://arxiv.org/abs/2403.04322<p> +基于遗传模拟的差分进化方法用于半监督聚类 </p> <p> -ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters +Memetic Differential Evolution Methods for Semi-Supervised Clustering </p> <p> -https://arxiv.org/abs/2402.10930 +https://arxiv.org/abs/2403.04322 </p> <p> -ConSmax是一种硬件友好型Softmax替代方案,通过引入可学习参数,在不影响性能的情况下实现了对原Softmax关键任务的高效处理。 +本文提出了一种基于差分进化范式的新颖遗传模拟策略,用于解决半监督聚类问题,是第一次在这个领域尝试定义这样的方法。 </p> <p> </p> <p> -自注意机制将基于transformer的大型语言模型(LLM)与卷积和循环神经网络区分开来。尽管性能有所提升,但由于自注意中广泛使用Softmax,在硅上实现实时LLM推断仍具挑战性。为了解决这一挑战,我们提出了Constant Softmax(ConSmax),这是一种高效的Softmax替代方案,采用可微的规范化参数来消除Softmax中的最大搜索和分母求和,实现了大规模并行化。 +在本文中,我们处理半监督最小平方和聚类(MSSC)问题,其中背景知识以实例级约束的形式给定。我们特别考虑“必连接”和“非连接”约束,每个约束指示两个数据集点是否应该关联到同一个或不同的簇中。这些约束的存在使得问题至少与其无监督版本一样困难:不再每个点都关联到其最近的簇中心,因此需要在关键操作(如分配步骤)中进行一些修改。在这种情况下,我们提出了一种基于差分进化范式的新颖遗传模拟策略,直接扩展了最近在无监督聚类文献中提出的最新框架。据我们所知,我们的贡献代表了第一次尝试定义一个旨在生成一个 </p> <p> -arXiv:2402.10930v1 Announce Type: cross Abstract: The self-attention mechanism sets transformer-based large language model (LLM) apart from the convolutional and recurrent neural networks. Despite the performance improvement, achieving real-time LLM inference on silicon is challenging due to the extensively used Softmax in self-attention. Apart from the non-linearity, the low arithmetic intensity greatly reduces the processing parallelism, which becomes the bottleneck especially when dealing with a longer context. To address this challenge, we propose Constant Softmax (ConSmax), a software-hardware co-design as an efficient Softmax alternative. ConSmax employs differentiable normalization parameters to remove the maximum searching and denominator summation in Softmax. It allows for massive parallelization while performing the critical tasks of Softmax. In addition, a scalable ConSmax hardware utilizing a bitwidth-split look-up table (LUT) can produce lossless non-linear operation and -</p>多视角符号回归(MvSR)是一种同时考虑多个数据集的符号回归方法,能够找到一个参数化解来准确拟合所有数据集,解决了传统方法无法处理不同实验设置的问题。https://arxiv.org/abs/2402.04298<p> -多视角符号回归 +arXiv:2403.04322v1 Announce Type: cross Abstract: In this paper, we deal with semi-supervised Minimum Sum-of-Squares Clustering (MSSC) problems where background knowledge is given in the form of instance-level constraints. In particular, we take into account "must-link" and "cannot-link" constraints, each of which indicates if two dataset points should be associated to the same or to a different cluster. The presence of such constraints makes the problem at least as hard as its unsupervised version: it is no more true that each point is associated to its nearest cluster center, thus requiring some modifications in crucial operations, such as the assignment step. In this scenario, we propose a novel memetic strategy based on the Differential Evolution paradigm, directly extending a state-of-the-art framework recently proposed in the unsupervised clustering literature. As far as we know, our contribution represents the first attempt to define a memetic methodology designed to generate a +</p>ARNN提出了一种注意力循环神经网络,用于处理多通道脑电图信号,具有线性复杂度和并行计算,结合注意力和LSTM gate的优势,并避免了它们的缺点。https://arxiv.org/abs/2403.03276<p> +ARNN: 用于识别癫痫发作的多通道脑电图信号的注意力循环神经网络 </p> <p> -Multi-View Symbolic Regression +ARNN: Attentive Recurrent Neural Network for Multi-channel EEG Signals to Identify Epileptic Seizures </p> <p> -https://arxiv.org/abs/2402.04298 +https://arxiv.org/abs/2403.03276 </p> <p> -多视角符号回归(MvSR)是一种同时考虑多个数据集的符号回归方法,能够找到一个参数化解来准确拟合所有数据集,解决了传统方法无法处理不同实验设置的问题。 +ARNN提出了一种注意力循环神经网络,用于处理多通道脑电图信号,具有线性复杂度和并行计算,结合注意力和LSTM gate的优势,并避免了它们的缺点。 </p> <p> </p> <p> -符号回归(SR)搜索表示解释变量和响应变量之间关系的分析表达式。目前的SR方法假设从单个实验中提取的单个数据集。然而,研究人员经常面临来自不同设置的多个实验结果集。传统的SR方法可能无法找到潜在的表达式,因为每个实验的参数可能不同。在这项工作中,我们提出了多视角符号回归(MvSR),它同时考虑多个数据集,模拟实验环境,并输出一个通用的参数化解。这种方法将评估的表达式适应每个独立数据集,并同时返回能够准确拟合所有数据集的参数函数族f(x; \theta)。我们使用从已知表达式生成的数据以及来自实际世界的数据来展示MvSR的有效性。 +我们提出了一种注意力循环神经网络(ARNN),其沿着序列循环应用注意力层,并且具有与序列长度相关的线性复杂度。该模型在多通道脑电图信号上运行,而不是单通道信号,并利用并行计算。在该模型中,注意力层是一种计算单元,可以有效地应用自注意力机制和交叉注意力机制来计算一组广泛数量的状态向量和输入信号的递归函数。我们的架构在某种程度上受到了注意力层和长短期记忆(LSTM)单元的启发,并使用长短风格门,但通过多个阶段将这种典型单元扩展到多通道脑电图信号的并行化。它继承了注意力层和LSTM门的优势,同时避免了它们各自的缺点。我们通过对异质实验进行了广泛的模型有效性评估。 </p> <p> -Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; \theta) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from +arXiv:2403.03276v1 Announce Type: cross Abstract: We proposed an Attentive Recurrent Neural Network (ARNN), which recurrently applies attention layers along a sequence and has linear complexity with respect to the sequence length. The proposed model operates on multi-channel EEG signals rather than single channel signals and leverages parallel computation. In this cell, the attention layer is a computational unit that efficiently applies self-attention and cross-attention mechanisms to compute a recurrent function over a wide number of state vectors and input signals. Our architecture is inspired in part by the attention layer and long short-term memory (LSTM) cells, and it uses long-short style gates, but it scales this typical cell up by several orders to parallelize for multi-channel EEG signals. It inherits the advantages of attention layers and LSTM gate while avoiding their respective drawbacks. We evaluated the model effectiveness through extensive experiments with heterogeneou +</p>本文研究了具有Polyak动量的随机近端梯度方法,在非凸复合优化问题中实现了最佳收敛速度,无论批量大小如何。https://arxiv.org/abs/2403.02967<p> +具有Polyak动量的非凸随机复合优化 +</p> +<p> +Non-Convex Stochastic Composite Optimization with Polyak Momentum +</p> +<p> +https://arxiv.org/abs/2403.02967 +</p> +<p> +本文研究了具有Polyak动量的随机近端梯度方法,在非凸复合优化问题中实现了最佳收敛速度,无论批量大小如何。 +</p> +<p> + +</p> +<p> +随机近端梯度法是广泛使用的随机梯度下降(SGD)方法的一个强大泛化,在机器学习中已经被广泛应用。然而,众所周知,当随机噪声显著时(即仅使用小型或有界批量大小时),该方法在非凸环境中无法收敛。本文关注具有Polyak动量的随机近端梯度方法。我们证明了该方法对于非凸复合优化问题实现了最佳收敛速度,而批量大小大小无关。此外,我们对Polyak动量在复合优化环境中的方差减少效应进行了严格分析,并且我们证明了当近端步骤只能通过近似解来求解时,该方法也会收敛。最后,我们提供了数值实验来验证我们的理论结果。 +</p> +<p> +arXiv:2403.02967v1 Announce Type: cross Abstract: The stochastic proximal gradient method is a powerful generalization of the widely used stochastic gradient descent (SGD) method and has found numerous applications in Machine Learning. However, it is notoriously known that this method fails to converge in non-convex settings where the stochastic noise is significant (i.e. when only small or bounded batch sizes are used). In this paper, we focus on the stochastic proximal gradient method with Polyak momentum. We prove this method attains an optimal convergence rate for non-convex composite optimization problems, regardless of batch size. Additionally, we rigorously analyze the variance reduction effect of the Polyak momentum in the composite optimization setting and we show the method also converges when the proximal step can only be solved inexactly. Finally, we provide numerical experiments to validate our theoretical results. +</p>本文从范畴论的角度提供了一个简单而有效的解决方案,完全避免了复杂的多阶段训练流程。https://arxiv.org/abs/2403.02598<p> +具有多个协变量转移和不平衡的图像数据集聚合 +</p> +<p> +Pooling Image Datasets With Multiple Covariate Shift and Imbalance +</p> +<p> +https://arxiv.org/abs/2403.02598 +</p> +<p> +本文从范畴论的角度提供了一个简单而有效的解决方案,完全避免了复杂的多阶段训练流程。 +</p> +<p> + +</p> +<p> +许多学科中常见小样本大小,这需要跨多个机构汇总大致相似的数据集来研究图像与疾病结果之间的弱但相关关联。这些数据通常体现出协变量(即次要的非成像数据)的转移/不平衡。在标准统计分析中控制这些无用变量是常见的,但这些思想并不直接适用于参数过多的模型。因此,最近的工作表明,从不变表示学习中提供了一个有意义的起点,但目前的方法库仅限于一次考虑几个协变量的转移/不平衡。本文展示了如何从范畴论的角度看待这一问题,提供了一个简单而有效的解决方案,完全避免了原本需要复杂的多阶段训练流程。我们展示了该方法的效果。 +</p> +<p> +arXiv:2403.02598v1 Announce Type: new Abstract: Small sample sizes are common in many disciplines, which necessitates pooling roughly similar datasets across multiple institutions to study weak but relevant associations between images and disease outcomes. Such data often manifest shift/imbalance in covariates (i.e., secondary non-imaging data). Controlling for such nuisance variables is common within standard statistical analysis, but the ideas do not directly apply to overparameterized models. Consequently, recent work has shown how strategies from invariant representation learning provides a meaningful starting point, but the current repertoire of methods is limited to accounting for shifts/imbalances in just a couple of covariates at a time. In this paper, we show how viewing this problem from the perspective of Category theory provides a simple and effective solution that completely avoids elaborate multi-stage training pipelines that would otherwise be needed. We show the effect +</p>异质性对于回归任务中出现因果性的贡献解释了为何大型语言模型能够从关联性训练中揭示因果关联。https://arxiv.org/abs/2403.01420<p> +异质性对不变性和因果关系的隐性偏差 +</p> +<p> +The Implicit Bias of Heterogeneity towards Invariance and Causality +</p> +<p> +https://arxiv.org/abs/2403.01420 +</p> +<p> +异质性对于回归任务中出现因果性的贡献解释了为何大型语言模型能够从关联性训练中揭示因果关联。 +</p> +<p> + +</p> +<p> +从经验上观察到,使用来自互联网的大量语料库训练的大型语言模型(LLM),使用一种变体回归损失,可以在一定程度上揭示因果关联。这与传统智慧“关联不是因果”以及传统因果推断范式相反,传统因果推断范式认为先前的因果知识应谨慎地纳入到方法设计中。令人困惑的是,为何在追求关联的回归任务中能够从更高层次的理解中出现因果性。本文声称从面向关联的训练中出现因果性可以归因于源数据的异质性、训练算法的随机性和学习模型的超参数化的耦合效应。我们使用一个简单但有见地的模型来阐释这样的直觉,该模型使用回归损失学习不变性,一种准因果关系。 +</p> +<p> +arXiv:2403.01420v1 Announce Type: new Abstract: It is observed empirically that the large language models (LLM), trained with a variant of regression loss using numerous corpus from the Internet, can unveil causal associations to some extent. This is contrary to the traditional wisdom that ``association is not causation'' and the paradigm of traditional causal inference in which prior causal knowledge should be carefully incorporated into the design of methods. It is a mystery why causality, in a higher layer of understanding, can emerge from the regression task that pursues associations. In this paper, we claim the emergence of causality from association-oriented training can be attributed to the coupling effects from the heterogeneity of the source data, stochasticity of training algorithms, and over-parameterization of the learning models. We illustrate such an intuition using a simple but insightful model that learns invariance, a quasi-causality, using regression loss. To be spec +</p>通过代理进行特征对齐,以解决预先计算特征无法区分标记样本类别和避免通过代理模型选择样本时牺牲宝贵预训练信息的问题。https://arxiv.org/abs/2403.01101<p> +特征对齐:在预训练模型背景下通过代理思考高效主动学习 +</p> +<p> +Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models +</p> +<p> +https://arxiv.org/abs/2403.01101 +</p> +<p> +通过代理进行特征对齐,以解决预先计算特征无法区分标记样本类别和避免通过代理模型选择样本时牺牲宝贵预训练信息的问题。 +</p> +<p> + +</p> +<p> +使用主动学习对预训练模型进行微调有望降低注释成本。然而,这种组合引入了显著的计算成本,尤其是随着预训练模型规模的增长。最近的研究提出了基于代理的主动学习,它预先计算特征以减少计算成本。然而,这种方法通常会在主动学习性能上造成重大损失,甚至可能超过计算成本节约。 +</p> +<p> +arXiv:2403.01101v1 Announce Type: cross Abstract: Fine-tuning the pre-trained model with active learning holds promise for reducing annotation costs. However, this combination introduces significant computational costs, particularly with the growing scale of pre-trained models. Recent research has proposed proxy-based active learning, which pre-computes features to reduce computational costs. Yet, this approach often incurs a significant loss in active learning performance, which may even outweigh the computational cost savings. In this paper, we argue the performance drop stems not only from pre-computed features' inability to distinguish between categories of labeled samples, resulting in the selection of redundant samples but also from the tendency to compromise valuable pre-trained information when fine-tuning with samples selected through the proxy model. To address this issue, we propose a novel method called aligned selection via proxy to update pre-computed features while sele +</p>RLHF在考虑部分观察性时可能导致策略欺骗性地夸大性能或过度辩护行为,我们提出了数学条件来解决这些问题,并警告不要盲目应用RLHF在部分可观测情况下。https://arxiv.org/abs/2402.17747<p> +当你的AI欺骗你:在奖励学习中人类评估者部分可观测性的挑战 +</p> +<p> +When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning +</p> +<p> +https://arxiv.org/abs/2402.17747 +</p> +<p> +RLHF在考虑部分观察性时可能导致策略欺骗性地夸大性能或过度辩护行为,我们提出了数学条件来解决这些问题,并警告不要盲目应用RLHF在部分可观测情况下。 +</p> +<p> + +</p> +<p> +强化学习从人类反馈(RLHF)的过去分析假设人类完全观察到环境。当人类反馈仅基于部分观察时会发生什么?我们对两种失败情况进行了正式定义:欺骗和过度辩护。通过将人类建模为对轨迹信念的Boltzmann-理性,我们证明了RLHF保证会导致策略欺骗性地夸大其性能、为了留下印象而过度辩护或者两者兼而有之的条件。为了帮助解决这些问题,我们数学地刻画了环境部分可观测性如何转化为(缺乏)学到的回报函数中的模糊性。在某些情况下,考虑环境部分可观测性使得在理论上可能恢复回报函数和最优策略,而在其他情况下,存在不可减少的模糊性。我们警告不要盲目应用RLHF在部分可观测情况下。 +</p> +<p> +arXiv:2402.17747v1 Announce Type: cross Abstract: Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deception and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is guaranteed to result in policies that deceptively inflate their performance, overjustify their behavior to make an impression, or both. To help address these issues, we mathematically characterize how partial observability of the environment translates into (lack of) ambiguity in the learned return function. In some cases, accounting for partial observability makes it theoretically possible to recover the return function and thus the optimal policy, while in other cases, there is irreducible ambiguity. We caution against blindly applying RLHF in partially observa +</p>该研究通过分析大量期刊文章,总结了监督机器学习在微生物组学中的现有实践,探讨了实验设计方法的优缺点,并提出了如何避免常见实验设计缺陷的指导。https://arxiv.org/abs/2402.17621<p> +用于微生物组学的监督机器学习:弥合当前和最佳实践之间的差距 +</p> +<p> +Supervised machine learning for microbiomics: bridging the gap between current and best practices +</p> +<p> +https://arxiv.org/abs/2402.17621 +</p> +<p> +该研究通过分析大量期刊文章,总结了监督机器学习在微生物组学中的现有实践,探讨了实验设计方法的优缺点,并提出了如何避免常见实验设计缺陷的指导。 +</p> +<p> + +</p> +<p> +机器学习(ML)将加速临床微生物组学创新,如疾病诊断和预后。这将需要高质量、可重现、可解释的工作流程,其预测能力达到或超过监管机构对临床工具设定的高门槛。我们通过深入分析2021-2022年发表的100篇同行评议的期刊文章,捕捉了当前将监督ML应用于微生物组学数据的实践的一个快照。我们采用数据驱动方法,引导讨论各种实验设计方法的优点,包括关键考虑因素,如如何减轻小数据集大小的影响同时避免数据泄漏。我们进一步提供关于如何避免可能损害模型性能、可信度和可重复性的常见实验设计缺陷的指南。讨论附有一个互动在线教程。 +</p> +<p> +arXiv:2402.17621v1 Announce Type: cross Abstract: Machine learning (ML) is set to accelerate innovations in clinical microbiomics, such as in disease diagnostics and prognostics. This will require high-quality, reproducible, interpretable workflows whose predictive capabilities meet or exceed the high thresholds set for clinical tools by regulatory agencies. Here, we capture a snapshot of current practices in the application of supervised ML to microbiomics data, through an in-depth analysis of 100 peer-reviewed journal articles published in 2021-2022. We apply a data-driven approach to steer discussion of the merits of varied approaches to experimental design, including key considerations such as how to mitigate the effects of small dataset size while avoiding data leakage. We further provide guidance on how to avoid common experimental design pitfalls that can hurt model performance, trustworthiness, and reproducibility. Discussion is accompanied by an interactive online tutorial th +</p>本研究通过可解释的逆向工程在复杂模块化算术中观察了Transformer内部电路学习过程,并发现减法在Transformer上造成了强烈的不对称性,乘法需要余弦偏置分量,多项式叠加了基本算术模式,但在挑战性情况下并不清晰,Grokking甚至可以在具有基本对称和交替表达式的高次公式中轻松发生。https://arxiv.org/abs/2402.16726<p> +在复杂模块化算术中解释理解的Transformer +</p> +<p> +Interpreting Grokked Transformers in Complex Modular Arithmetic +</p> +<p> +https://arxiv.org/abs/2402.16726 +</p> +<p> +本研究通过可解释的逆向工程在复杂模块化算术中观察了Transformer内部电路学习过程,并发现减法在Transformer上造成了强烈的不对称性,乘法需要余弦偏置分量,多项式叠加了基本算术模式,但在挑战性情况下并不清晰,Grokking甚至可以在具有基本对称和交替表达式的高次公式中轻松发生。 +</p> +<p> + +</p> +<p> +Grokking一直是解开延迟泛化之谜的积极探索。在已解密模型中识别可解释的算法是理解其机制的暗示性线索。在这项工作中,除了最简单和广为研究的模块化加法外,我们通过可解释的逆向工程观察了通过Grokking在复杂模块化算术中学到的内部电路,突出显示了它们动力学上的重大差异:减法对Transformer产生强烈的不对称性;乘法在傅立叶域的所有频率上需要余弦偏置分量;多项式通常导致基本算术模式的叠加,但在挑战性情况下清晰的模式并不显现;即使在具有基本对称和交替表达式的高次公式中,Grokking也很容易发生。我们还引入了模块化算术的新颖进展度量;傅立叶频率 +</p> +<p> +arXiv:2402.16726v2 Announce Type: replace-cross Abstract: Grokking has been actively explored to reveal the mystery of delayed generalization. Identifying interpretable algorithms inside the grokked models is a suggestive hint to understanding its mechanism. In this work, beyond the simplest and well-studied modular addition, we observe the internal circuits learned through grokking in complex modular arithmetic via interpretable reverse engineering, which highlights the significant difference in their dynamics: subtraction poses a strong asymmetry on Transformer; multiplication requires cosine-biased components at all the frequencies in a Fourier domain; polynomials often result in the superposition of the patterns from elementary arithmetic, but clear patterns do not emerge in challenging cases; grokking can easily occur even in higher-degree formulas with basic symmetric and alternating expressions. We also introduce the novel progress measure for modular arithmetic; Fourier Freque +</p>本论文提出了一种新方法单词序列熵(WSE),用于在自由形式医学问答任务中量化答案的不确定性,相比其他基线方法表现更优秀。https://arxiv.org/abs/2402.14259<p> +单词序列熵:走向自由形式医学问答应用及其不确定性估计 +</p> +<p> +Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond +</p> +<p> +https://arxiv.org/abs/2402.14259 +</p> +<p> +本论文提出了一种新方法单词序列熵(WSE),用于在自由形式医学问答任务中量化答案的不确定性,相比其他基线方法表现更优秀。 +</p> +<p> + +</p> +<p> +不确定性估计在确保安全关键的人工智能系统与人类互动的可靠性中发挥关键作用,尤其在医疗领域尤为重要。然而,在自由形式的医学问答任务中,尚未建立一种通用方法来量化答案的不确定性,其中无关的词汇和语序含有有限的语义信息可能是不确定性的主要来源,这是由于生成不平等的存在。本文提出了单词序列熵(WSE),该方法根据语义相关性在单词和序列级别上校准不确定性比例,在不确定性量化时更加强调关键词和更相关的序列。我们在5个自由形式医学问答数据集上,利用7种“现成的”大语言模型(LLMs)将WSE与6种基线方法进行比较,并展示了WSE在性能上的优越性。 +</p> +<p> +arXiv:2402.14259v1 Announce Type: cross Abstract: Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems, particularly in the medical domain. However, a general method for quantifying the uncertainty of free-form answers has yet to be established in open-ended medical question-answering (QA) tasks, where irrelevant words and sequences with limited semantic information can be the primary source of uncertainty due to the presence of generative inequality. In this paper, we propose the Word-Sequence Entropy (WSE), which calibrates the uncertainty proportion at both the word and sequence levels according to the semantic relevance, with greater emphasis placed on keywords and more relevant sequences when performing uncertainty quantification. We compare WSE with 6 baseline methods on 5 free-form medical QA datasets, utilizing 7 "off-the-shelf" large language models (LLMs), and show that WSE exhibits superior performance on ac +</p>本研究提出了一种应用于数字孪生的离线多智能体强化学习方案,通过整合分布式强化学习和保守Q学习来解决环境的不确定性和有限数据带来的认识不确定性。https://arxiv.org/abs/2402.08421<p> +保守和风险意识的离线多智能体强化学习在数字孪生中的应用 +</p> +<p> +Conservative and Risk-Aware Offline Multi-Agent Reinforcement Learning for Digital Twins +</p> +<p> +https://arxiv.org/abs/2402.08421 +</p> +<p> +本研究提出了一种应用于数字孪生的离线多智能体强化学习方案,通过整合分布式强化学习和保守Q学习来解决环境的不确定性和有限数据带来的认识不确定性。 +</p> +<p> + +</p> +<p> +数字孪生(DT)平台被越来越认为是控制、优化和监控诸如下一代无线网络之类的复杂工程系统的有希望技术。采用DT解决方案面临的一个重要挑战是它们依赖于离线收集的数据,缺乏对物理环境的直接访问。这一限制在多智能体系统中尤为严重,因为传统的多智能体强化学习(MARL)需要与环境进行在线互动。将在线MARL方案直接应用于离线环境通常会因有限数据的认识不确定性而失败。在这项工作中,我们提出了一种用于基于DT的无线网络的离线MARL方案,它整合了分布式强化学习(distributional RL)和保守Q学习,以应对环境固有的案例性不确定性和有限数据引起的认识不确定性。为了进一步利用离线数据,我们改编了所提出的方案。 +</p> +<p> +Digital twin (DT) platforms are increasingly regarded as a promising technology for controlling, optimizing, and monitoring complex engineering systems such as next-generation wireless networks. An important challenge in adopting DT solutions is their reliance on data collected offline, lacking direct access to the physical environment. This limitation is particularly severe in multi-agent systems, for which conventional multi-agent reinforcement (MARL) requires online interactions with the environment. A direct application of online MARL schemes to an offline setting would generally fail due to the epistemic uncertainty entailed by the limited availability of data. In this work, we propose an offline MARL scheme for DT-based wireless networks that integrates distributional RL and conservative Q-learning to address the environment's inherent aleatoric uncertainty and the epistemic uncertainty arising from limited data. To further exploit the offline data, we adapt the proposed scheme t +</p>该论文介绍了LoCoV1,一个用于评估长上下文检索性能的新型基准测试,并提出了M2-BERT检索编码器,用于处理长上下文检索,解决了如何评估性能、预训练语言模型以及如何进行微调的挑战。https://arxiv.org/abs/2402.07440<p> +使用LoCo和M2-BERT进行基准测试和构建长上下文检索模型 +</p> +<p> +Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT +</p> +<p> +https://arxiv.org/abs/2402.07440 +</p> +<p> +该论文介绍了LoCoV1,一个用于评估长上下文检索性能的新型基准测试,并提出了M2-BERT检索编码器,用于处理长上下文检索,解决了如何评估性能、预训练语言模型以及如何进行微调的挑战。 +</p> +<p> + +</p> +<p> +检索管道是许多机器学习系统中的重要组成部分,在文档很长(例如10K个标记或更多)且需要在整个文本中合成信息来确定相关文档的领域中表现不佳。开发适用于这些领域的长上下文检索编码器面临三个挑战:(1)如何评估长上下文检索性能,(2)如何预训练基本语言模型以表示短上下文(对应查询)和长上下文(对应文档),以及(3)如何根据GPU内存限制下的批量大小限制对该模型进行微调。为了解决这些挑战,我们首先介绍了LoCoV1,这是一个新颖的12个任务基准测试,用于测量在不可分块或不有效的情况下的长上下文检索。接下来,我们提出了M2-BERT检索编码器,这是一个80M参数状态空间编码器模型,采用Monarch Mixer架构构建,能够进行可扩展的检索。 +</p> +<p> +Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e.g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text. Developing long-context retrieval encoders suitable for these domains raises three challenges: (1) how to evaluate long-context retrieval performance, (2) how to pretrain a base language model to represent both short contexts (corresponding to queries) and long contexts (corresponding to documents), and (3) how to fine-tune this model for retrieval under the batch size limitations imposed by GPU memory constraints. To address these challenges, we first introduce LoCoV1, a novel 12 task benchmark constructed to measure long-context retrieval where chunking is not possible or not effective. We next present the M2-BERT retrieval encoder, an 80M parameter state-space encoder model built from the Monarch Mixer architecture, capable of scali +</p>了解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。前人的研究表明,在训练过程中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这被称为神经特征分析(NFA)。本研究解释了这种相关性的出现,并发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。在早期训练阶段,可以通过解析的方式预测NFA的发展速度。https://arxiv.org/abs/2402.05271<p> +梯度下降引发了深度非线性网络权重与经验NTK之间的对齐 +</p> +<p> +Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks +</p> +<p> +https://arxiv.org/abs/2402.05271 +</p> +<p> +了解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。前人的研究表明,在训练过程中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这被称为神经特征分析(NFA)。本研究解释了这种相关性的出现,并发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 +</p> +<p> + +</p> +<p> +理解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。先前的研究已经确定,在一般结构的训练神经网络中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这个说法被称为神经特征分析(NFA)。然而,这些数量在训练过程中如何相关尚不清楚。在这项工作中,我们解释了这种相关性的出现。我们发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。我们证明了先前研究中引入的NFA是由隔离这种对齐的中心化NFA驱动的。我们还展示了在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 +</p> +<p> +Understanding the mechanisms through which neural networks extract statistics from input-label pairs is one of the most important unsolved problems in supervised learning. Prior works have identified that the gram matrices of the weights in trained neural networks of general architectures are proportional to the average gradient outer product of the model, in a statement known as the Neural Feature Ansatz (NFA). However, the reason these quantities become correlated during training is poorly understood. In this work, we explain the emergence of this correlation. We identify that the NFA is equivalent to alignment between the left singular structure of the weight matrices and a significant component of the empirical neural tangent kernels associated with those weights. We establish that the NFA introduced in prior works is driven by a centered NFA that isolates this alignment. We show that the speed of NFA development can be predicted analytically at early training times in terms of sim +</p>这篇论文提出了一种基于潜变量高斯过程的多源数据融合框架,用于解决多个数据源之间质量和全面性差异给系统优化带来的问题。https://arxiv.org/abs/2402.04146<p> +可解释的多源数据融合通过潜变量高斯过程 +</p> +<p> +Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process +</p> +<p> +https://arxiv.org/abs/2402.04146 +</p> +<p> +这篇论文提出了一种基于潜变量高斯过程的多源数据融合框架,用于解决多个数据源之间质量和全面性差异给系统优化带来的问题。 +</p> +<p> + +</p> +<p> +随着人工智能(AI)和机器学习(ML)的出现,各个科学和工程领域已经利用数据驱动的替代模型来建模来自大量信息源(数据)的复杂系统。这种增加导致了开发出用于执行特定功能的优越系统所需的成本和时间的显著降低。这样的替代模型往往广泛地融合多个数据来源,可能是发表的论文、专利、开放资源库或其他资源。然而,对于已知和未知的信息来源的基础物理参数的质量和全面性的差异,可能对系统优化过程产生后续影响,却没有得到充分的关注。为了解决这个问题,提出了一种基于潜变量高斯过程(LVGP)的多源数据融合框架。 +</p> +<p> +With the advent of artificial intelligence (AI) and machine learning (ML), various domains of science and engineering communites has leveraged data-driven surrogates to model complex systems from numerous sources of information (data). The proliferation has led to significant reduction in cost and time involved in development of superior systems designed to perform specific functionalities. A high proposition of such surrogates are built extensively fusing multiple sources of data, may it be published papers, patents, open repositories, or other resources. However, not much attention has been paid to the differences in quality and comprehensiveness of the known and unknown underlying physical parameters of the information sources that could have downstream implications during system optimization. Towards resolving this issue, a multi-source data fusion framework based on Latent Variable Gaussian Process (LVGP) is proposed. The individual data sources are tagged as a characteristic cate +</p>本论文提出了一种基于LLM的框架,可以适应不同类型的锂离子电池,实现准确的健康状态估计。这项工作解决了生成训练数据的时间和资源成本高的挑战,并在实际应用中具有良好的泛化能力。https://arxiv.org/abs/2402.00068<p> +GPT4Battery: 一种基于LLM驱动的自适应锂离子电池健康状态估计框架 +</p> +<p> +GPT4Battery: An LLM-driven Framework for Adaptive State of Health Estimation of Raw Li-ion Batteries +</p> +<p> +https://arxiv.org/abs/2402.00068 +</p> +<p> +本论文提出了一种基于LLM的框架,可以适应不同类型的锂离子电池,实现准确的健康状态估计。这项工作解决了生成训练数据的时间和资源成本高的挑战,并在实际应用中具有良好的泛化能力。 +</p> +<p> + +</p> +<p> +健康状态(SOH)是评估电池退化水平的关键指标,无法直接测量但需要估计。准确的SOH估计提升了锂离子电池的检测、控制和反馈能力,实现安全高效的能源管理,并指导新一代电池的发展。尽管在数据驱动的SOH估计方面取得了显著进展,但为生成寿命长期训练数据而进行的耗时且资源密集的退化实验在建立一个能处理多样化锂离子电池(例如,跨化学、跨制造商和跨容量)的大型模型方面存在挑战。因此,本文利用大型语言模型(LLM)的强大泛化能力,提出了一种适用于不同电池的可调整SOH估计的新型框架。为了适应实际情景,其中未标记的数据按顺序以及分布变化的方式到达,所提出的模型在测试时进行了修改。 +</p> +<p> +State of health (SOH) is a crucial indicator for assessing the degradation level of batteries that cannot be measured directly but requires estimation. Accurate SOH estimation enhances detection, control, and feedback for Li-ion batteries, allowing for safe and efficient energy management and guiding the development of new-generation batteries. Despite the significant progress in data-driven SOH estimation, the time and resource-consuming degradation experiments for generating lifelong training data pose a challenge in establishing one large model capable of handling diverse types of Li-ion batteries, e.g., cross-chemistry, cross-manufacturer, and cross-capacity. Hence, this paper utilizes the strong generalization capability of large language model (LLM) to proposes a novel framework for adaptable SOH estimation across diverse batteries. To match the real scenario where unlabeled data sequentially arrives in use with distribution shifts, the proposed model is modified by a test-time t +</p>本文提出了零射击提示方法,通过利用大规模语言模型中的错误信息来指导模型进行任务,以提高任务表现。实验结果表明,在不同数据集上,包括阅读理解、算术推理和闭卷问答,模型性能有所提升。这些结果也显示出不同模型之间存在不同程度的错误信息。https://arxiv.org/abs/2401.08273<p> +大规模语言模型是零射击学习器 +</p> +<p> +Large Language Models are Null-Shot Learners +</p> +<p> +https://arxiv.org/abs/2401.08273 +</p> +<p> +本文提出了零射击提示方法,通过利用大规模语言模型中的错误信息来指导模型进行任务,以提高任务表现。实验结果表明,在不同数据集上,包括阅读理解、算术推理和闭卷问答,模型性能有所提升。这些结果也显示出不同模型之间存在不同程度的错误信息。 +</p> +<p> + +</p> +<p> +本文提出了零射击提示方法。零射击提示利用大规模语言模型(LLMs)中的错误信息,通过指示LLMs利用从“示例”部分中获取的信息(该信息在所提供的上下文中不存在)来完成任务。虽然减少错误信息对于LLMs的日常和重要用途至关重要,但我们提出在目前的环境中,这些LLMs仍然具有错误信息,实际上可以利用错误信息来提高与标准零射击提示相比的任务表现。对八个LLMs进行实验,结果显示在大多数八个数据集(包括阅读理解、算术推理和闭卷问答)中,性能有所提升。观察到的不一致性增加相对性能在LLMs之间的差异,也可能表示每个模型中存在不同程度的错误信息。 +</p> +<p> +arXiv:2401.08273v2 Announce Type: replace-cross Abstract: This paper presents null-shot prompting. Null-shot prompting exploits hallucination in large language models (LLMs) by instructing LLMs to utilize information from the "Examples" section that never exists within the provided context to perform a task. While reducing hallucination is crucial and non-negligible for daily and critical uses of LLMs, we propose that in the current landscape in which these LLMs still hallucinate, it is possible, in fact, to exploit hallucination to increase performance in performing tasks compared to standard zero-shot prompting. Experiments with eight LLMs show improvements in performance across the majority of eight datasets, including reading comprehension, arithmetic reasoning, and closed-book question answering. The observed inconsistency in increased relative performance across the LLMs also potentially indicates a different degree of inherent hallucination in each model. These differences show +</p>SupplyGraph是一个基准数据集,用于使用图神经网络进行供应链规划。该数据集包含了来自孟加拉国一家领先快速消费品公司的实际数据,用于优化、预测和解决供应链问题。数据集中的时间数据作为节点特征,可用于销售预测、生产计划和故障识别。http://arxiv.org/abs/2401.15299<p> +SupplyGraph: 使用图神经网络进行供应链规划的基准数据集 +</p> +<p> +SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks. (arXiv:2401.15299v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2401.15299 +</p> +<p> +SupplyGraph是一个基准数据集,用于使用图神经网络进行供应链规划。该数据集包含了来自孟加拉国一家领先快速消费品公司的实际数据,用于优化、预测和解决供应链问题。数据集中的时间数据作为节点特征,可用于销售预测、生产计划和故障识别。 +</p> +<p> + +</p> +<p> +图神经网络(GNNs)在不同领域如运输、生物信息学、语言处理和计算机视觉中取得了重要进展。然而,在将GNNs应用于供应链网络方面,目前尚缺乏研究。供应链网络在结构上类似于图形,使其成为应用GNN方法的理想选择。这为优化、预测和解决供应链问题开辟了无限可能。然而,此方法的一个主要障碍在于缺乏真实世界的基准数据集以促进使用GNN来研究和解决供应链问题。为了解决这个问题,我们提供了一个来自孟加拉国一家领先的快速消费品公司的实际基准数据集,该数据集侧重于用于生产目的的供应链规划的时间任务。该数据集包括时间数据作为节点特征,以实现销售预测、生产计划和故障识别。 +</p> +<p> +Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision. However, there is a noticeable absence of research on applying GNNs to supply chain networks. Supply chain networks are inherently graph-like in structure, making them prime candidates for applying GNN methodologies. This opens up a world of possibilities for optimizing, predicting, and solving even the most complex supply chain problems. A major setback in this approach lies in the absence of real-world benchmark datasets to facilitate the research and resolution of supply chain problems using GNNs. To address the issue, we present a real-world benchmark dataset for temporal tasks, obtained from one of the leading FMCG companies in Bangladesh, focusing on supply chain planning for production purposes. The dataset includes temporal data as node features to enable sales predictions, production planning, and the identification of fa +</p>这项工作提出了一种名为LadaGAN的高效生成对抗网络,它使用了一种名为Ladaformer的新型Transformer块,通过线性加法注意机制来降低计算复杂度并解决训练不稳定性问题。http://arxiv.org/abs/2401.09596<p> +使用线性加法注意力Transformer的高效生成对抗网络 +</p> +<p> +Efficient generative adversarial networks using linear additive-attention Transformers. (arXiv:2401.09596v1 [cs.CV]) +</p> +<p> +http://arxiv.org/abs/2401.09596 +</p> +<p> +这项工作提出了一种名为LadaGAN的高效生成对抗网络,它使用了一种名为Ladaformer的新型Transformer块,通过线性加法注意机制来降低计算复杂度并解决训练不稳定性问题。 +</p> +<p> + +</p> +<p> +尽管像扩散模型(DMs)和生成对抗网络(GANs)等深度生成模型在图像生成方面的能力近年来得到了显著提高,但是它们的成功很大程度上归功于计算复杂的架构。这限制了它们在研究实验室和资源充足的公司中的采用和使用,同时也极大地增加了训练、微调和推理的碳足迹。在这项工作中,我们提出了LadaGAN,这是一个高效的生成对抗网络,它建立在一种名为Ladaformer的新型Transformer块上。该块的主要组成部分是一个线性加法注意机制,它每个头部计算一个注意向量,而不是二次的点积注意力。我们在生成器和判别器中都采用了Ladaformer,这降低了计算复杂度,并克服了Transformer GAN经常出现的训练不稳定性。LadaGAN一直表现优于现有的GANs。 +</p> +<p> +Although the capacity of deep generative models for image generation, such as Diffusion Models (DMs) and Generative Adversarial Networks (GANs), has dramatically improved in recent years, much of their success can be attributed to computationally expensive architectures. This has limited their adoption and use to research laboratories and companies with large resources, while significantly raising the carbon footprint for training, fine-tuning, and inference. In this work, we present LadaGAN, an efficient generative adversarial network that is built upon a novel Transformer block named Ladaformer. The main component of this block is a linear additive-attention mechanism that computes a single attention vector per head instead of the quadratic dot-product attention. We employ Ladaformer in both the generator and discriminator, which reduces the computational complexity and overcomes the training instabilities often associated with Transformer GANs. LadaGAN consistently outperforms exist +</p>本研究全面研究了大型语言模型的知识编辑,旨在有效修改模型的行为,同时保持整体性能。http://arxiv.org/abs/2401.01286<p> +大型语言模型的知识编辑全面研究 +</p> +<p> +A Comprehensive Study of Knowledge Editing for Large Language Models. (arXiv:2401.01286v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2401.01286 +</p> +<p> +本研究全面研究了大型语言模型的知识编辑,旨在有效修改模型的行为,同时保持整体性能。 +</p> +<p> + +</p> +<p> +大型语言模型(LLM)在理解和生成与人类交流紧密相似的文本方面展现出了非凡的能力。然而,其主要限制在于训练过程中的显著计算需求,这是由于其广泛的参数化造成的。这一挑战在于世界的动态性,需要频繁更新LLM以修正过时的信息或集成新知识,从而确保其持续的相关性。许多应用需要在训练后进行持续的模型调整,以解决缺陷或不良行为。近年来,对于LLM的知识编辑技术的兴趣越来越高,在特定领域内有效地修改LLM的行为,同时保持整体性能在各种输入中的表现。本文首先定义了知识编辑的目标和挑战,然后综述了现有的知识编辑方法和技术,并讨论了其应用和未来发展的方向。 +</p> +<p> +Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the kno +</p>这项研究探讨了生成性人工智能中数据隐私和版权保护的多方面挑战,并提出了将技术创新与伦理前瞻相结合的综合方法,旨在全面解决这些问题。http://arxiv.org/abs/2311.18252<p> +跨越生成性人工智能数据生命周期的隐私和版权挑战导航 +</p> +<p> +Navigating Privacy and Copyright Challenges Across the Data Lifecycle of Generative AI. (arXiv:2311.18252v2 [cs.SE] UPDATED) +</p> +<p> +http://arxiv.org/abs/2311.18252 +</p> +<p> +这项研究探讨了生成性人工智能中数据隐私和版权保护的多方面挑战,并提出了将技术创新与伦理前瞻相结合的综合方法,旨在全面解决这些问题。 +</p> +<p> + +</p> +<p> +生成性人工智能的出现标志着人工智能领域的重要里程碑,展示出在生成真实图像、文本和数据模式方面的卓越能力。然而,这些进展也带来了对数据隐私和版权侵犯的更高关注,主要是由于模型训练对大规模数据集的依赖。传统方法如差分隐私、机器遗忘和数据中毒只提供了对这些复杂问题的片面解决方案。本文深入探讨了数据生命周期内隐私和版权保护的多方面挑战。我们主张采用将技术创新与伦理前瞻相结合的综合方法,通过研究和制定在生命周期视角下的解决方案,全面解决这些问题。本研究旨在推动更广泛的讨论,并激励对生成性人工智能中数据隐私和版权完整性的协同努力。 +</p> +<p> +The advent of Generative AI has marked a significant milestone in artificial intelligence, demonstrating remarkable capabilities in generating realistic images, texts, and data patterns. However, these advancements come with heightened concerns over data privacy and copyright infringement, primarily due to the reliance on vast datasets for model training. Traditional approaches like differential privacy, machine unlearning, and data poisoning only offer fragmented solutions to these complex issues. Our paper delves into the multifaceted challenges of privacy and copyright protection within the data lifecycle. We advocate for integrated approaches that combines technical innovation with ethical foresight, holistically addressing these concerns by investigating and devising solutions that are informed by the lifecycle perspective. This work aims to catalyze a broader discussion and inspire concerted efforts towards data privacy and copyright integrity in Generative AI. +</p>提出了一种使用自适应采样/主动学习策略来训练去噪网络的方法,解决了通用去噪网络在不同噪声分布下表现差的问题。http://arxiv.org/abs/2310.20064<p> +一种可扩展的训练策略用于盲目的多分布噪声去除 +</p> +<p> +A Scalable Training Strategy for Blind Multi-Distribution Noise Removal. (arXiv:2310.20064v1 [cs.CV]) +</p> +<p> +http://arxiv.org/abs/2310.20064 +</p> +<p> +提出了一种使用自适应采样/主动学习策略来训练去噪网络的方法,解决了通用去噪网络在不同噪声分布下表现差的问题。 +</p> +<p> + +</p> +<p> +尽管最近取得了一些进展,但是开发通用的去噪和去伪影网络仍然是一个尚未解决的问题:给定固定的网络权重,一个任务(例如去除泊松噪声)的专门化与另一个任务(例如去除斑点噪声)的性能之间存在天然的权衡。此外,由于维度的诅咒,训练这样的网络是具有挑战性的:随着规格空间的维度增加(即需要描述噪声分布所需的参数数量增加),需要训练的唯一规格数量呈指数增长。均匀采样这个空间会导致网络在非常具有挑战性的问题规格上表现良好,但在简单的问题规格上表现不佳,即使大误差也对总体均方误差的影响很小。本文提出了一种使用自适应采样/主动学习策略来训练去噪网络的方法。我们的工作改进了最近提出的一种方法。 +</p> +<p> +Despite recent advances, developing general-purpose universal denoising and artifact-removal networks remains largely an open problem: Given fixed network weights, one inherently trades-off specialization at one task (e.g.,~removing Poisson noise) for performance at another (e.g.,~removing speckle noise). In addition, training such a network is challenging due to the curse of dimensionality: As one increases the dimensions of the specification-space (i.e.,~the number of parameters needed to describe the noise distribution) the number of unique specifications one needs to train for grows exponentially. Uniformly sampling this space will result in a network that does well at very challenging problem specifications but poorly at easy problem specifications, where even large errors will have a small effect on the overall mean squared error. In this work we propose training denoising networks using an adaptive-sampling/active-learning strategy. Our work improves upon a recently proposed un +</p>Clover是一种闭环可验证代码生成的范式,通过在代码、docstrings和形式注释之间进行一致性检查,确保生成的代码的正确性。http://arxiv.org/abs/2310.17807<p> +Clover: 闭环可验证代码生成 +</p> +<p> +Clover: Closed-Loop Verifiable Code Generation. (arXiv:2310.17807v1 [cs.SE]) +</p> +<p> +http://arxiv.org/abs/2310.17807 +</p> +<p> +Clover是一种闭环可验证代码生成的范式,通过在代码、docstrings和形式注释之间进行一致性检查,确保生成的代码的正确性。 +</p> +<p> + +</p> +<p> +在软件开发中,使用大型语言模型进行代码生成是一个快速增长的趋势。然而,如果没有有效的方法来确保生成的代码的正确性,这个趋势可能会导致许多不良结果。在本文中,我们提出了一个解决这个挑战的愿景:Clover范式,即闭环可验证代码生成,它将正确性检查简化为更可访问的一致性检查问题。在Clover的核心是一个检查器,它在代码、docstrings和形式注释之间进行一致性检查。该检查器使用了形式验证工具和大型语言模型的新颖集成实现。我们提供了理论分析来支持我们的论点,即Clover在一致性检查方面应该是有效的。我们还在一个由手工设计的数据集(CloverBench)上进行了实证调查,该数据集包含了注释的Dafny程序,难度水平与教科书相当。实验结果显示 +</p> +<p> +The use of large language models for code generation is a rapidly growing trend in software development. However, without effective methods for ensuring the correctness of generated code, this trend could lead to any number of undesirable outcomes. In this paper, we lay out a vision for addressing this challenge: the Clover paradigm, short for Closed-Loop Verifiable Code Generation, which reduces correctness checking to the more accessible problem of consistency checking. At the core of Clover lies a checker that performs consistency checks among code, docstrings, and formal annotations. The checker is implemented using a novel integration of formal verification tools and large language models. We provide a theoretical analysis to support our thesis that Clover should be effective at consistency checking. We also empirically investigate its feasibility on a hand-designed dataset (CloverBench) featuring annotated Dafny programs at a textbook level of difficulty. Experimental results sho +</p>Transformers学会了高阶优化方法,用于上下文学习,通过实现类似于迭代牛顿法的算法,而不是梯度下降。http://arxiv.org/abs/2310.17086<p> +Transformers学会了高阶优化方法用于上下文学习:一项与线性模型的研究 +</p> +<p> +Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models. (arXiv:2310.17086v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2310.17086 +</p> +<p> +Transformers学会了高阶优化方法,用于上下文学习,通过实现类似于迭代牛顿法的算法,而不是梯度下降。 +</p> +<p> + +</p> +<p> +Transformers在上下文学习中表现出色,但是它们是如何进行上下文学习仍然是一个谜。最近的研究表明,Transformers可能通过内部运行梯度下降,即一阶优化方法,来进行上下文学习。本文中,我们展示了Transformers学会了实现高阶优化方法来进行上下文学习。我们以上下文线性回归为重点,展示了Transformers学会了实现一个非常类似于迭代牛顿法的算法,而不是梯度下降。从实证上来看,我们展示了连续的Transformer层的预测与牛顿法的不同迭代非常接近,每个中间层大致计算了3次迭代。相比之下,需要指数级的梯度下降步骤才能匹配额外的Transformer层;这表明Transformers具有相当的收敛速率。 +</p> +<p> +Transformers are remarkably good at in-context learning (ICL) -- learning from demonstrations without parameter updates -- but how they perform ICL remains a mystery. Recent work suggests that Transformers may learn in-context by internally running Gradient Descent, a first-order optimization method. In this paper, we instead demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL. Focusing on in-context linear regression, we show that Transformers learn to implement an algorithm very similar to Iterative Newton's Method, a higher-order optimization method, rather than Gradient Descent. Empirically, we show that predictions from successive Transformer layers closely match different iterations of Newton's Method linearly, with each middle layer roughly computing 3 iterations. In contrast, exponentially more Gradient Descent steps are needed to match an additional Transformers layer; this suggests that Transformers have an comparable rate of conv +</p>图去学习是负责任人工智能发展的重要进展,通过删除训练模型中的敏感数据痕迹来维护被遗忘的权利。这篇综述性论文首次系统回顾了图去学习的方法,包括了各种方法学,并提供了详细的分类和最新的文献综述,以帮助新进入这个领域的研究人员理解。与差分隐私的关系加深了对在这个背景下隐私保护技术的理解。http://arxiv.org/abs/2310.02164<p> +图去学习综述 +</p> +<p> +A Survey of Graph Unlearning. (arXiv:2310.02164v2 [cs.LG] UPDATED) +</p> +<p> +http://arxiv.org/abs/2310.02164 +</p> +<p> +图去学习是负责任人工智能发展的重要进展,通过删除训练模型中的敏感数据痕迹来维护被遗忘的权利。这篇综述性论文首次系统回顾了图去学习的方法,包括了各种方法学,并提供了详细的分类和最新的文献综述,以帮助新进入这个领域的研究人员理解。与差分隐私的关系加深了对在这个背景下隐私保护技术的理解。 +</p> +<p> + +</p> +<p> +图去学习是在追求负责任人工智能的过程中的重要进展,它提供了从训练模型中删除敏感数据痕迹的方法,以维护被遗忘的权利。显然,图机器学习对数据隐私和对抗攻击具有敏感性,因此需要应用图去学习技术来有效解决这些问题。在这篇综述性论文中,我们首次系统地回顾了图去学习的方法,涵盖了各种方法学,并提供了详细的分类和最新的文献综述,以帮助新进入这个领域的研究人员理解。此外,我们建立了图去学习与差分隐私之间的重要联系,增强了我们对在这个背景下隐私保护技术的相关性的理解。为了保证清晰度,我们对图去学习中使用的基本概念和评估指标进行了简明扼要的解释。 +</p> +<p> +Graph unlearning emerges as a crucial advancement in the pursuit of responsible AI, providing the means to remove sensitive data traces from trained models, thereby upholding the right to be forgotten. It is evident that graph machine learning exhibits sensitivity to data privacy and adversarial attacks, necessitating the application of graph unlearning techniques to address these concerns effectively. In this comprehensive survey paper, we present the first systematic review of graph unlearning approaches, encompassing a diverse array of methodologies and offering a detailed taxonomy and up-to-date literature overview to facilitate the understanding of researchers new to this field. Additionally, we establish the vital connections between graph unlearning and differential privacy, augmenting our understanding of the relevance of privacy-preserving techniques in this context. To ensure clarity, we provide lucid explanations of the fundamental concepts and evaluation measures used in gr +</p>MaGNet是一种模型无关的图神经网络框架,能够顺序地整合不同顺序的信息,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。http://arxiv.org/abs/2309.13459<p> +模型无关的图神经网络用于整合局部和全局信息的研究 +</p> +<p> +A Model-Agnostic Graph Neural Network for Integrating Local and Global Information. (arXiv:2309.13459v1 [stat.ML]) +</p> +<p> +http://arxiv.org/abs/2309.13459 +</p> +<p> +MaGNet是一种模型无关的图神经网络框架,能够顺序地整合不同顺序的信息,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。 +</p> +<p> + +</p> +<p> +图神经网络(GNNs)在各种以图为重点的任务中取得了令人满意的性能。尽管取得了成功,但现有的GNN存在两个重要限制:由于黑盒特性,结果缺乏可解释性;无法学习不同顺序的表示。为了解决这些问题,我们提出了一种新的模型无关的图神经网络(MaGNet)框架,能够顺序地整合不同顺序的信息,从高阶邻居中提取知识,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。特别地,MaGNet由两个组件组成:图拓扑下复杂关系的潜在表示的估计模型和识别有影响力的节点、边和重要节点特征的解释模型。从理论上,我们通过经验Rademacher复杂度建立了MaGNet的泛化误差界,并展示了其强大的能力。 +</p> +<p> +Graph Neural Networks (GNNs) have achieved promising performance in a variety of graph-focused tasks. Despite their success, existing GNNs suffer from two significant limitations: a lack of interpretability in results due to their black-box nature, and an inability to learn representations of varying orders. To tackle these issues, we propose a novel Model-agnostic Graph Neural Network (MaGNet) framework, which is able to sequentially integrate information of various orders, extract knowledge from high-order neighbors, and provide meaningful and interpretable results by identifying influential compact graph structures. In particular, MaGNet consists of two components: an estimation model for the latent representation of complex relationships under graph topology, and an interpretation model that identifies influential nodes, edges, and important node features. Theoretically, we establish the generalization error bound for MaGNet via empirical Rademacher complexity, and showcase its pow +</p>本研究探讨了在关键领域中针对鼓励政策的最优和公平评估以及学习的问题,研究发现在人类不遵循治疗建议的情况下,最优策略规则只是建议。同时,针对治疗的异质性和公平考虑因素,决策者的权衡和决策规则也会发生变化。在社会服务领域,研究显示存在一个使用差距问题,那些最有可能受益的人却无法获得这些益服务。http://arxiv.org/abs/2309.07176<p> +最优和公平的鼓励政策评估与学习 +</p> +<p> +Optimal and Fair Encouragement Policy Evaluation and Learning. (arXiv:2309.07176v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2309.07176 +</p> +<p> +本研究探讨了在关键领域中针对鼓励政策的最优和公平评估以及学习的问题,研究发现在人类不遵循治疗建议的情况下,最优策略规则只是建议。同时,针对治疗的异质性和公平考虑因素,决策者的权衡和决策规则也会发生变化。在社会服务领域,研究显示存在一个使用差距问题,那些最有可能受益的人却无法获得这些益服务。 +</p> +<p> + +</p> +<p> +在关键领域中,强制个体接受治疗通常是不可能的,因此在人类不遵循治疗建议的情况下,最优策略规则只是建议。在这些领域中,接受治疗的个体可能存在异质性,治疗效果也可能存在异质性。虽然最优治疗规则可以最大化整个人群的因果结果,但在鼓励的情况下,对于访问平等限制或其他公平考虑因素可能是相关的。例如,在社会服务领域,一个持久的难题是那些最有可能从中受益的人中那些获益服务的使用差距。当决策者对访问和平均结果都有分配偏好时,最优决策规则会发生变化。我们研究了因果识别、统计方差减少估计和稳健估计的最优治疗规则,包括在违反阳性条件的情况下。 +</p> +<p> +In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We c +</p>本论文提出了针对金融指数跟踪问题的第一个具有动态性的离散时间无穷期模型,它克服了现有模型的一些局限,可以精确计算交易成本,同时考虑了跟踪误差和交易成本之间的权衡,并能有效利用长时间段的数据。我们使用深度强化学习方法解决该模型,解决了由于数据限制导致的问题。http://arxiv.org/abs/2308.02820<p> +针对金融指数跟踪的强化学习 +</p> +<p> +Reinforcement Learning for Financial Index Tracking. (arXiv:2308.02820v1 [q-fin.PM]) +</p> +<p> +http://arxiv.org/abs/2308.02820 +</p> +<p> +本论文提出了针对金融指数跟踪问题的第一个具有动态性的离散时间无穷期模型,它克服了现有模型的一些局限,可以精确计算交易成本,同时考虑了跟踪误差和交易成本之间的权衡,并能有效利用长时间段的数据。我们使用深度强化学习方法解决该模型,解决了由于数据限制导致的问题。 +</p> +<p> + +</p> +<p> +我们提出了第一个离散时间无穷期动态形式的金融指数跟踪问题,同时考虑到基于收益的跟踪误差和基于价值的跟踪误差。该模型克服了现有模型的局限性,包括不仅限于价格的市场信息变量的时间动态性,可以精确计算交易成本,考虑跟踪误差和交易成本之间的权衡,可以有效利用长时间段的数据等。该模型还引入了现金注入或提取的新的决策变量。我们提出了使用Banach不动点迭代求解投资组合再平衡方程的方法,可以准确计算实践中指定为交易量的非线性函数的交易成本。我们还提出了扩展深度强化学习(RL)方法来解决动态模型。我们的RL方法解决了由数据限制引起的问题。 +</p> +<p> +We propose the first discrete-time infinite-horizon dynamic formulation of the financial index tracking problem under both return-based tracking error and value-based tracking error. The formulation overcomes the limitations of existing models by incorporating the intertemporal dynamics of market information variables not limited to prices, allowing exact calculation of transaction costs, accounting for the tradeoff between overall tracking error and transaction costs, allowing effective use of data in a long time period, etc. The formulation also allows novel decision variables of cash injection or withdraw. We propose to solve the portfolio rebalancing equation using a Banach fixed point iteration, which allows to accurately calculate the transaction costs specified as nonlinear functions of trading volumes in practice. We propose an extension of deep reinforcement learning (RL) method to solve the dynamic formulation. Our RL method resolves the issue of data limitation resulting fro +</p>本研究使用微表情视频数据集开发了一种基于人工智能的帕金森病筛查框架,通过分析微笑视频中的特征,实现了89.7%的准确性和89.3%的AUROC值,同时在人群子组上没有检测到偏见。http://arxiv.org/abs/2308.02588<p> +用微笑揭示帕金森病:一种基于人工智能的筛查框架 +</p> +<p> +Unmasking Parkinson's Disease with Smile: An AI-enabled Screening Framework. (arXiv:2308.02588v1 [eess.IV]) +</p> +<p> +http://arxiv.org/abs/2308.02588 +</p> +<p> +本研究使用微表情视频数据集开发了一种基于人工智能的帕金森病筛查框架,通过分析微笑视频中的特征,实现了89.7%的准确性和89.3%的AUROC值,同时在人群子组上没有检测到偏见。 +</p> +<p> + +</p> +<p> +鉴于目前缺乏可靠的生物标志物和有限的临床护理资源,帕金森病(PD)的诊断仍然具有挑战性。在本研究中,我们使用包含微表情的最大视频数据集进行PD筛查的分析。我们收集了来自1,059名独立参与者的3,871个视频,其中包括256名自报PD患者。这些录像来自不同来源,包括多个国家的参与者家中、一家诊所和一个美国的PD护理机构。通过利用面部标志和行动单位,我们提取了与PD的一个主要症状Hypomimia(面部表情减少)相关的特征。在这些特征上训练的一组AI模型在保留数据上实现了89.7%的准确性和89.3%的接收者操作特性曲线下面积(AUROC),并且在性别和种族等人群子组上无可检测的偏见。进一步的分析揭示,仅通过微笑视频中的特征就可以获得可比较的准确性和AUROC值。 +</p> +<p> +Parkinson's disease (PD) diagnosis remains challenging due to lacking a reliable biomarker and limited access to clinical care. In this study, we present an analysis of the largest video dataset containing micro-expressions to screen for PD. We collected 3,871 videos from 1,059 unique participants, including 256 self-reported PD patients. The recordings are from diverse sources encompassing participants' homes across multiple countries, a clinic, and a PD care facility in the US. Leveraging facial landmarks and action units, we extracted features relevant to Hypomimia, a prominent symptom of PD characterized by reduced facial expressions. An ensemble of AI models trained on these features achieved an accuracy of 89.7% and an Area Under the Receiver Operating Characteristic (AUROC) of 89.3% while being free from detectable bias across population subgroups based on sex and ethnicity on held-out data. Further analysis reveals that features from the smiling videos alone lead to comparable +</p>遗忘是深度学习中普遍存在的现象,不仅限于连续学习领域。解决遗忘问题面临多个挑战,包括平衡保留旧任务知识与快速学习新任务的挑战,管理任务干扰与冲突目标的挑战,以及防止隐私泄露等。遗忘不总是有害的,可以在某些情况下是有益且可取的,特别是在隐私保护场景中。http://arxiv.org/abs/2307.09218<p> +深度学习中遗忘现象的全面调查:超越连续学习 +</p> +<p> +A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning. (arXiv:2307.09218v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2307.09218 +</p> +<p> +遗忘是深度学习中普遍存在的现象,不仅限于连续学习领域。解决遗忘问题面临多个挑战,包括平衡保留旧任务知识与快速学习新任务的挑战,管理任务干扰与冲突目标的挑战,以及防止隐私泄露等。遗忘不总是有害的,可以在某些情况下是有益且可取的,特别是在隐私保护场景中。 +</p> +<p> + +</p> +<p> +遗忘指的是先前获取的信息或知识的丧失或恶化。尽管现有的关于遗忘的调查主要集中在连续学习方面,但在深度学习中,遗忘是一种普遍现象,可以在各种其他研究领域中观察到。遗忘在研究领域中表现出来,例如由于生成器漂移而在生成模型领域中表现出来,以及由于客户端之间存在异构数据分布而在联邦学习中表现出来。解决遗忘问题涉及到几个挑战,包括在快速学习新任务的同时平衡保留旧任务知识,管理任务干扰与冲突目标,以及防止隐私泄露等。此外,大多数现有的连续学习调查都默认认为遗忘总是有害的。相反,我们的调查认为遗忘是一把双刃剑,在某些情况下可以是有益且可取的,例如隐私保护场景。通过在更广泛的背景下探讨遗忘现象, +</p> +<p> +Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context +</p>本研究探索机器翻译可解释性评估指标,提供综合综述和最新方法,并贡献下一代方法的愿景。http://arxiv.org/abs/2306.13041<p> +机器翻译可解释性评估指标的探索 +</p> +<p> +Towards Explainable Evaluation Metrics for Machine Translation. (arXiv:2306.13041v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2306.13041 +</p> +<p> +本研究探索机器翻译可解释性评估指标,提供综合综述和最新方法,并贡献下一代方法的愿景。 +</p> +<p> + +</p> +<p> +与传统的词汇重叠度量(如BLEU)不同,大多数当前用于机器翻译评估的指标(例如COMET或BERTScore)基于黑盒子的大型语言模型。它们通常与人类判断具有强相关性,但是最近的研究表明,较低质量的传统指标仍然占主导地位,其中一个潜在原因是它们的决策过程更透明。因此,为了促进新的高质量指标的更广泛接受,解释性变得至关重要。在这篇概念论文中,我们确定了可解释机器翻译指标的关键属性和目标,并提供了最近技术的综合综述,将它们与我们确立的目标和属性联系起来。在这个背景下,我们还讨论基于生成模型(如ChatGPT和GPT4)的可解释指标的最新先进方法。最后,我们贡献了下一代方法的愿景,包括自然语言e。 +</p> +<p> +Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for machine translation (for example, COMET or BERTScore) are based on black-box large language models. They often achieve strong correlations with human judgments, but recent research indicates that the lower-quality classical metrics remain dominant, one of the potential reasons being that their decision processes are more transparent. To foster more widespread acceptance of novel high-quality metrics, explainability thus becomes crucial. In this concept paper, we identify key properties as well as key goals of explainable machine translation metrics and provide a comprehensive synthesis of recent techniques, relating them to our established goals and properties. In this context, we also discuss the latest state-of-the-art approaches to explainable metrics based on generative models such as ChatGPT and GPT4. Finally, we contribute a vision of next-generation approaches, including natural language e +</p>本文提出了一种基于有限维特征逼近的非线性动态谱嵌入控制算法(SDEC)用于解决随机非线性系统的最优控制问题,并对其进行了理论分析和实验测试。http://arxiv.org/abs/2304.03907<p> +基于有限维谱动态嵌入的随机非线性控制 +</p> +<p> +Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding. (arXiv:2304.03907v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2304.03907 +</p> +<p> +本文提出了一种基于有限维特征逼近的非线性动态谱嵌入控制算法(SDEC)用于解决随机非线性系统的最优控制问题,并对其进行了理论分析和实验测试。 +</p> +<p> + +</p> +<p> +随机非线性系统的最优控制一直是一个棘手的问题。Ren等人引入了谱动态嵌入来开发控制未知系统的强化学习方法。它使用无穷维特征来线性表示状态值函数,并利用有限维的截断逼近进行实际实现。然而,在已知模型的情况下,控制中的有限维逼近性质尚未得到研究。在本文中,我们提出了一种可行的随机非线性控制算法,利用基于有限维特征逼近的非线性动态谱嵌入控制(SDEC),并进行深入的理论分析,以表征由有限维截断引起的逼近误差和由有限样本逼近引起的统计误差,同时进行政策评估和政策优化的实验测试和比较。 +</p> +<p> +Optimal control is notoriously difficult for stochastic nonlinear systems. Ren et al. introduced Spectral Dynamics Embedding for developing reinforcement learning methods for controlling an unknown system. It uses an infinite-dimensional feature to linearly represent the state-value function and exploits finite-dimensional truncation approximation for practical implementation. However, the finite-dimensional approximation properties in control have not been investigated even when the model is known. In this paper, we provide a tractable stochastic nonlinear control algorithm that exploits the nonlinear dynamics upon the finite-dimensional feature approximation, Spectral Dynamics Embedding Control (SDEC), with an in-depth theoretical analysis to characterize the approximation error induced by the finite-dimension truncation and statistical error induced by finite-sample approximation in both policy evaluation and policy optimization. We also empirically test the algorithm and compare th +</p>本文提出了一种非平稳两臂赌博机问题的策略,能够处理平滑变化,并证明了该策略在二次Lipschitz连续的情况下的遗憾为 $\tilde O(T^{3/5})$。http://arxiv.org/abs/2301.12366<p> +平滑的非平稳连续赌博机 +</p> +<p> +Smooth Non-Stationary Bandits. (arXiv:2301.12366v2 [cs.LG] UPDATED) +</p> +<p> +http://arxiv.org/abs/2301.12366 +</p> +<p> +本文提出了一种非平稳两臂赌博机问题的策略,能够处理平滑变化,并证明了该策略在二次Lipschitz连续的情况下的遗憾为 $\tilde O(T^{3/5})$。 +</p> +<p> + +</p> +<p> +在许多在线决策应用中,环境都是非平稳的,因此使用能够处理变化的赌博算法至关重要。大多数现有方法是为了保护非平滑变化而设计的,仅受到总变差或时间上的Lipschitz性的限制,其中它们保证$\tilde \Theta(T^{2/3})$的遗憾。然而,在实践中,环境经常以平稳的方式改变,因此这种算法可能会在这些设置中产生比必要更高的遗憾,并且不利用变化率的信息。我们研究了一个非平稳的两臂赌博机问题,假设臂的平均回报是一个$\beta$-H\''older函数,即它是$(\beta-1)$次Lipschitz连续可微分的,我们展示了一个策略,对于$\beta=2$,它的遗憾为$\tilde O(T^{3/5})$,从而首次在平滑和非平滑之间进行了区分。我们通过一个任意$\Omg(T^{(\beta+1)/(2\beta+1)})$的下界来补充这个结果,说明了这个问题的困难程度。 +</p> +<p> +In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee $\tilde \Theta(T^{2/3})$ regret. However, in practice environments are often changing {\bf smoothly}, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the rate of change. We study a non-stationary two-armed bandits problem where we assume that an arm's mean reward is a $\beta$-H\"older function over (normalized) time, meaning it is $(\beta-1)$-times Lipschitz-continuously differentiable. We show the first separation between the smooth and non-smooth regimes by presenting a policy with $\tilde O(T^{3/5})$ regret for $\beta=2$. We complement this result by an $\Omg(T^{(\beta+1)/(2\beta+1)})$ lower bound for any int +</p>本研究使用自组织映射(SOM)分析了深度学习模型中与决策相关的内部编码,发现浅层将特征压缩到紧凑空间中,而深层将特征空间扩展,并指出压缩特征可能导致对敌对扰动的脆弱性。http://arxiv.org/abs/2205.10952<p> +深度学习模型的功能性神经编码分析 +</p> +<p> +Analysis of functional neural codes of deep learning models. (arXiv:2205.10952v2 [cs.LG] UPDATED) +</p> +<p> +http://arxiv.org/abs/2205.10952 +</p> +<p> +本研究使用自组织映射(SOM)分析了深度学习模型中与决策相关的内部编码,发现浅层将特征压缩到紧凑空间中,而深层将特征空间扩展,并指出压缩特征可能导致对敌对扰动的脆弱性。 +</p> +<p> + +</p> +<p> +深度神经网络(DNNs)作为深度学习(DL)的代理,需要大量的并行/顺序操作。这使得理解DNNs的操作变得困难,阻碍了适当的诊断。在没有对其内部过程有更好的了解之前,在高风险领域部署DNNs可能导致灾难性故障。因此,为了构建更可靠的DNNs/DL来解决高风险现实世界问题,我们必须深入了解DNNs决策背后的内部操作。在这里,我们使用自组织映射(SOM)分析与DNNs决策相关的DL模型的内部编码。我们的分析表明,靠近输入层的浅层将特征压缩到紧凑空间中,而靠近输出层的深层将特征空间扩展。我们还发现有证据表明,压缩特征可能导致DNNs对敌对扰动的脆弱性。 +</p> +<p> +Deep neural networks (DNNs), the agents of deep learning (DL), require a massive number of parallel/sequential operations. This makes it difficult to comprehend DNNs' operations and impedes proper diagnosis. Without better knowledge of their internal process, deploying DNNs in high-stakes domains can lead to catastrophic failures. Therefore, to build more reliable DNNs/DL to be deployed in high-stakes real-world problems, it is imperative that we gain insights into DNNs' internal operations underlying their decision-making. Here, we use the self-organizing map (SOM) to analyze DL models' internal codes associated with DNNs' decision-making. Our analyses suggest that shallow layers close to the input layer compress features into condensed space and that deep layers close to the output layer expand feature space. We also found evidence indicating that compressed features may underlie DNNs' vulnerabilities to adversarial perturbations. </p> \ No newline at end of file diff --git a/econ.md b/econ.md index 79d034484..3d08ae0df 100644 --- a/econ.md +++ b/econ.md @@ -2,52 +2,97 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [Bayesian Bi-level Sparse Group Regressions for Macroeconomic Forecasting](https://arxiv.org/abs/2404.02671) | 提出了基于贝叶斯双层稀疏组回归的机器学习方法,可以进行高维宏观经济预测,并且理论证明其具有最小极限速率的收缩性,能够恢复模型参数,支持集包含模型的支持集。 | -| [^2] | [Limited substitutability, relative price changes and the uplifting of public natural capital values.](http://arxiv.org/abs/2308.04400) | 本研究通过全球元分析得出,生态系统服务相对价格每年约为2.2%,用于公共项目评估和环境经济会计中的调整。 | -| [^3] | [Time-Varying Parameters as Ridge Regressions.](http://arxiv.org/abs/2009.00401) | 该论文提出了一种实际上是基于岭回归的时变参数模型,这比传统的状态空间方法计算更快,调整更容易,有助于研究经济结构性变化。 | +| [^1] | [Shill-Proof Auctions](https://arxiv.org/abs/2404.00475) | 本文研究了免疫作弊的拍卖形式,发现荷兰式拍卖(设有适当保留价)是唯一的最优且强免疫作弊的拍卖,同时荷兰式拍卖(没有保留价)是唯一同时高效和弱免疫作弊的先验独立拍卖。 | +| [^2] | [Robust Learning for Optimal Dynamic Treatment Regimes with Observational Data](https://arxiv.org/abs/2404.00221) | 学习利用观测数据提出了一种逐步双重强健方法,通过向后归纳解决了最佳动态治疗方案的问题 | +| [^3] | [Costly Persuasion by a Partially Informed Sender.](http://arxiv.org/abs/2401.14087) | 本研究探讨了具有高昂成本的贝叶斯说服模型,研究对象是一位私人且部分信息知情的发送者在进行公共实验。研究发现实验中好消息和坏消息的成本差异对均衡结果具有重要影响,坏消息成本高时,存在唯一的分离均衡,接收者受益于发送者的私有信息;而好消息成本高时,均衡情况可能出现汇集和部分汇集均衡,接收者可能会因为发送者私有信息而受到损害。 | +| [^4] | [Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects.](http://arxiv.org/abs/2310.08115) | 提出了一种模型不可知的推断方法,在部分可辨识的因果估计中应用广泛。该方法基于最优输运问题的对偶理论,能够适应随机实验和观测研究,并且具有统一有效和双重鲁棒性。 | +| [^5] | [Persuasion as Transportation.](http://arxiv.org/abs/2307.07672) | 本研究通过将说服问题归约为最优运输的Monge-Kantorovich问题,揭示了贝叶斯说服模型中多接收方问题的显式解集和结构性结果,并推广了价值的对偶表示和凹化公式。 | +| [^6] | [Bayes = Blackwell, Almost.](http://arxiv.org/abs/2302.13956) | 存在其他的更新规则可以使信息的价值变为正值,作者找到了所有这些规则。 | # 详细 -[^1]: 基于贝叶斯双层稀疏组回归的宏观经济预测 +[^1]: 免疫作弊拍卖 - Bayesian Bi-level Sparse Group Regressions for Macroeconomic Forecasting + Shill-Proof Auctions - [https://arxiv.org/abs/2404.02671](https://arxiv.org/abs/2404.02671) + [https://arxiv.org/abs/2404.00475](https://arxiv.org/abs/2404.00475) - 提出了基于贝叶斯双层稀疏组回归的机器学习方法,可以进行高维宏观经济预测,并且理论证明其具有最小极限速率的收缩性,能够恢复模型参数,支持集包含模型的支持集。 + 本文研究了免疫作弊的拍卖形式,发现荷兰式拍卖(设有适当保留价)是唯一的最优且强免疫作弊的拍卖,同时荷兰式拍卖(没有保留价)是唯一同时高效和弱免疫作弊的先验独立拍卖。 - 我们提出了一种机器学习方法,在已知具有组结构的协变量的高维设置中进行最优宏观经济预测。我们的模型涵盖了许多时间序列、混合频率和未知非线性的预测设置。我们引入了时间序列计量经济学中的双层稀疏概念,即稀疏性在组水平和组内均成立,我们假设真实模型符合这一假设。我们提出了一种引起双层稀疏性的先验,相应的后验分布被证明以最小极限速率收缩,恢复模型参数,并且其支持集在渐近上包含模型的支持集。我们的理论允许组间相关性,而同一组中的预测变量可以通过强相关性以及共同特征和模式进行表征。通过全面展示有限样本的性能来说明。 + 在单品拍卖中,一个欺诈性的卖家可能会伪装成一个或多个竞标者,以操纵成交价格。本文对那些免疫作弊的拍卖格式进行了表征:一个利润最大化的卖家没有任何动机提交任何虚假报价。我们区分了强免疫作弊,即一个了解竞标者估值的卖家永远无法从作弊中获利,和弱免疫作弊,它仅要求从作弊中得到的平衡预期利润为非正。荷兰式拍卖(设有适当保留价)是唯一的最优和强免疫作弊拍卖。此外,荷兰式拍卖(没有保留价)是唯一的具有先验独立性的拍卖,既高效又弱免疫作弊。虽然存在多种策略证明、弱免疫作弊和最优拍卖;任何最优拍卖只能满足集合 {静态、策略证明、弱免疫作弊} 中的两个性质。 - arXiv:2404.02671v1 Announce Type: new Abstract: We propose a Machine Learning approach for optimal macroeconomic forecasting in a high-dimensional setting with covariates presenting a known group structure. Our model encompasses forecasting settings with many series, mixed frequencies, and unknown nonlinearities. We introduce in time-series econometrics the concept of bi-level sparsity, i.e. sparsity holds at both the group level and within groups, and we assume the true model satisfies this assumption. We propose a prior that induces bi-level sparsity, and the corresponding posterior distribution is demonstrated to contract at the minimax-optimal rate, recover the model parameters, and have a support that includes the support of the model asymptotically. Our theory allows for correlation between groups, while predictors in the same group can be characterized by strong covariation as well as common characteristics and patterns. Finite sample performance is illustrated through comprehe + arXiv:2404.00475v1 Announce Type: new Abstract: In a single-item auction, a duplicitous seller may masquerade as one or more bidders in order to manipulate the clearing price. This paper characterizes auction formats that are shill-proof: a profit-maximizing seller has no incentive to submit any shill bids. We distinguish between strong shill-proofness, in which a seller with full knowledge of bidders' valuations can never profit from shilling, and weak shill-proofness, which requires only that the expected equilibrium profit from shilling is nonpositive. The Dutch auction (with suitable reserve) is the unique optimal and strongly shill-proof auction. Moreover, the Dutch auction (with no reserve) is the unique prior-independent auction that is both efficient and weakly shill-proof. While there are a multiplicity of strategy-proof, weakly shill-proof, and optimal auctions; any optimal auction can satisfy only two properties in the set {static, strategy-proof, weakly shill-proof}. -[^2]: 有限的替代性、相对价格变动与公共自然资本价值的提升 +[^2]: 利用观测数据进行强健学习以获得最佳动态治疗方案 - Limited substitutability, relative price changes and the uplifting of public natural capital values. (arXiv:2308.04400v1 [econ.GN]) + Robust Learning for Optimal Dynamic Treatment Regimes with Observational Data - [http://arxiv.org/abs/2308.04400](http://arxiv.org/abs/2308.04400) + [https://arxiv.org/abs/2404.00221](https://arxiv.org/abs/2404.00221) - 本研究通过全球元分析得出,生态系统服务相对价格每年约为2.2%,用于公共项目评估和环境经济会计中的调整。 + 学习利用观测数据提出了一种逐步双重强健方法,通过向后归纳解决了最佳动态治疗方案的问题 - 随着全球经济的不断增长,生态系统服务往往停滞或减少。经济学理论已经揭示了如何将这种相对稀缺性的转变反映到公共项目评估和环境经济会计中,但缺乏实证证据来将理论付诸实践。为了估计可用于进行此类调整的生态系统服务相对价格变化,我们对环境价值评估研究进行了全球元分析,以推导出意愿支付收入弹性作为有限替代性程度的代理。基于749个收入-意愿支付对,我们估计意愿支付收入弹性约为0.78(95-CI:0.6至1.0)。将这些结果与生态系统服务相对稀缺性变化的全球数据集结合起来,我们估计生态系统服务相对价格每年约为2.2%。在对非木材林生态系统的自然资本估值中应用了这些结果。 + 许多公共政策和医疗干预涉及其治疗分配中的动态性,治疗通常依据先前治疗的历史和相关特征对每个阶段的效果具有异质性。本文研究了统计学习最佳动态治疗方案(DTR),根据个体的历史指导每个阶段的最佳治疗分配。我们提出了一种基于观测数据的逐步双重强健方法,在顺序可忽略性假设下学习最佳DTR。该方法通过向后归纳解决了顺序治疗分配问题,在每一步中,我们结合倾向评分和行动值函数(Q函数)的估计量,构建了政策价值的增强反向概率加权估计量。 - As the global economy continues to grow, ecosystem services tend to stagnate or degrow. Economic theory has shown how such shifts in relative scarcities can be reflected in the appraisal of public projects and environmental-economic accounting, but empirical evidence has been lacking to put the theory into practice. To estimate the relative price change in ecosystem services that can be used to make such adjustments, we perform a global meta-analysis of environmental valuation studies to derive income elasticities of willingness to pay (WTP) for ecosystem services as a proxy for the degree of limited substitutability. Based on 749 income-WTP pairs, we estimate an income elasticity of WTP of around 0.78 (95-CI: 0.6 to 1.0). Combining these results with a global data set on shifts in the relative scarcity of ecosystem services, we estimate relative price change of ecosystem services of around 2.2 percent per year. In an application to natural capital valuation of non-timber forest ecosys + arXiv:2404.00221v1 Announce Type: cross Abstract: Many public policies and medical interventions involve dynamics in their treatment assignments, where treatments are sequentially assigned to the same individuals across multiple stages, and the effect of treatment at each stage is usually heterogeneous with respect to the history of prior treatments and associated characteristics. We study statistical learning of optimal dynamic treatment regimes (DTRs) that guide the optimal treatment assignment for each individual at each stage based on the individual's history. We propose a step-wise doubly-robust approach to learn the optimal DTR using observational data under the assumption of sequential ignorability. The approach solves the sequential treatment assignment problem through backward induction, where, at each step, we combine estimators of propensity scores and action-value functions (Q-functions) to construct augmented inverse probability weighting estimators of values of policies -[^3]: 使用岭回归法的时变参数模型 +[^3]: 高昂的说服成本与部分信息的发送者 - Time-Varying Parameters as Ridge Regressions. (arXiv:2009.00401v3 [econ.EM] UPDATED) + Costly Persuasion by a Partially Informed Sender. (arXiv:2401.14087v1 [econ.TH]) - [http://arxiv.org/abs/2009.00401](http://arxiv.org/abs/2009.00401) + [http://arxiv.org/abs/2401.14087](http://arxiv.org/abs/2401.14087) - 该论文提出了一种实际上是基于岭回归的时变参数模型,这比传统的状态空间方法计算更快,调整更容易,有助于研究经济结构性变化。 + 本研究探讨了具有高昂成本的贝叶斯说服模型,研究对象是一位私人且部分信息知情的发送者在进行公共实验。研究发现实验中好消息和坏消息的成本差异对均衡结果具有重要影响,坏消息成本高时,存在唯一的分离均衡,接收者受益于发送者的私有信息;而好消息成本高时,均衡情况可能出现汇集和部分汇集均衡,接收者可能会因为发送者私有信息而受到损害。 - 时变参数模型(TVPs)经常被用于经济学中来捕捉结构性变化。我强调了一个被忽视的事实——这些实际上是岭回归。这使得计算、调整和实现比状态空间范式更容易。在高维情况下,解决等价的双重岭问题的计算非常快,关键的“时间变化量”通常是由交叉验证来调整的。使用两步回归岭回归来处理不断变化的波动性。我考虑了基于稀疏性(算法选择哪些参数变化, 哪些不变)和降低秩约束的扩展(变化与因子模型相关联)。为了展示这种方法的有用性, 我使用它来研究加拿大货币政策的演变, 并使用大规模时变局部投影估计约4600个TVPs, 这一任务完全可以利用这种新方法完成。 + 本文研究了由一个拥有私有且部分信息的发送者进行的昂贵的贝叶斯说服模型,该发送者进行了一个公共实验。实验的成本是发送者信念的加权对数似然比函数的期望减少。这个模型通过一个沃尔德的顺序抽样问题得到微基础,其中好消息和坏消息的成本不同。我们关注满足D1准则的均衡。均衡结果取决于实验中获得好消息和坏消息的相对成本。如果坏消息的成本更高,则存在唯一的分离均衡,并且接收者明确受益于发送者的私有信息。如果好消息的成本更高,则单点交叉特性不成立。可能存在汇集和部分汇集均衡,在某些均衡中,接收者会明确受到发送者私有信息的伤害。 - Time-varying parameters (TVPs) models are frequently used in economics to capture structural change. I highlight a rather underutilized fact -- that these are actually ridge regressions. Instantly, this makes computations, tuning, and implementation much easier than in the state-space paradigm. Among other things, solving the equivalent dual ridge problem is computationally very fast even in high dimensions, and the crucial "amount of time variation" is tuned by cross-validation. Evolving volatility is dealt with using a two-step ridge regression. I consider extensions that incorporate sparsity (the algorithm selects which parameters vary and which do not) and reduced-rank restrictions (variation is tied to a factor model). To demonstrate the usefulness of the approach, I use it to study the evolution of monetary policy in Canada using large time-varying local projections. The application requires the estimation of about 4600 TVPs, a task well within the reach of the new method. + I study a model of costly Bayesian persuasion by a privately and partially informed sender who conducts a public experiment. The cost of running an experiment is the expected reduction of a weighted log-likelihood ratio function of the sender's belief. This is microfounded by a Wald's sequential sampling problem where good news and bad news cost differently. I focus on equilibria that satisfy the D1 criterion. The equilibrium outcome depends on the relative costs of drawing good and bad news in the experiment. If bad news is more costly, there exists a unique separating equilibrium, and the receiver unambiguously benefits from the sender's private information. If good news is more costly, the single-crossing property fails. There may exist pooling and partial pooling equilibria, and in some equilibria, the receiver strictly suffers from sender private information. + +[^4]: 模型不可知的辅助推断方法在部分可辨识因果效应上的应用 + + Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects. (arXiv:2310.08115v1 [econ.EM]) + + [http://arxiv.org/abs/2310.08115](http://arxiv.org/abs/2310.08115) + + 提出了一种模型不可知的推断方法,在部分可辨识的因果估计中应用广泛。该方法基于最优输运问题的对偶理论,能够适应随机实验和观测研究,并且具有统一有效和双重鲁棒性。 + + + + 很多因果估计是部分可辨识的,因为它们依赖于潜在结果之间的不可观察联合分布。基于前处理协变量的分层可以获得更明确的部分可辨识性范围;然而,除非协变量为离散且支撑度相对较小,否则这种方法通常需要对给定协变量的潜在结果的条件分布进行一致估计。因此,现有的方法在模型错误或一致性假设被违反时可能失败。在本研究中,我们提出了一种基于最优输运问题的对偶理论的统一且模型不可知的推断方法,适用于广泛类别的部分可辨识估计。在随机实验中,我们的方法可以结合任何对条件分布的估计,并提供统一有效的推断,即使初始估计是任意不准确的。此外,我们的方法在观测研究中也是双重鲁棒的。 + + Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper partial identification bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires consistent estimation of the conditional distributions of the potential outcomes given the covariates. Thus, existing approaches may fail under model misspecification or if consistency assumptions are violated. In this study, we propose a unified and model-agnostic inferential approach for a wide class of partially identified estimands, based on duality theory for optimal transport problems. In randomized experiments, our approach can wrap around any estimates of the conditional distributions and provide uniformly valid inference, even if the initial estimates are arbitrarily inaccurate. Also, our approach is doubly robust in observational studies. Notab + +[^5]: 说服作为交通工具 + + Persuasion as Transportation. (arXiv:2307.07672v1 [econ.TH]) + + [http://arxiv.org/abs/2307.07672](http://arxiv.org/abs/2307.07672) + + 本研究通过将说服问题归约为最优运输的Monge-Kantorovich问题,揭示了贝叶斯说服模型中多接收方问题的显式解集和结构性结果,并推广了价值的对偶表示和凹化公式。 + + + + 我们考虑了一个贝叶斯说服模型,其中有一个知情的发送方和几个不知情的接收方。发送方可以通过私人信号影响接收方的信念,而发送方的目标取决于诱导信念的组合。我们将说服问题归约为最优运输的Monge-Kantorovich问题。借助最优运输理论的洞见,我们确定了几类多接收方问题的显式解集,得到了一般的结构性结果,导出了价值的对偶表示,并将著名的凹化公式推广到多接收方问题上。 + + We consider a model of Bayesian persuasion with one informed sender and several uninformed receivers. The sender can affect receivers' beliefs via private signals, and the sender's objective depends on the combination of induced beliefs. We reduce the persuasion problem to the Monge-Kantorovich problem of optimal transportation. Using insights from optimal transportation theory, we identify several classes of multi-receiver problems that admit explicit solutions, get general structural results, derive a dual representation for the value, and generalize the celebrated concavification formula for the value to multi-receiver problems. + +[^6]: Bayes = Blackwell, 差不多。 + + Bayes = Blackwell, Almost. (arXiv:2302.13956v3 [econ.TH] UPDATED) + + [http://arxiv.org/abs/2302.13956](http://arxiv.org/abs/2302.13956) + + 存在其他的更新规则可以使信息的价值变为正值,作者找到了所有这些规则。 + + + + 存在着除了Bayes'定律之外的更新规则,可以使信息的价值变为正值。我找到了所有这些规则。 + + There are updating rules other than Bayes' law that render the value of information positive. I find all of them. diff --git a/econ.xml b/econ.xml index e9b70cd4e..4e2b90850 100644 --- a/econ.xml +++ b/econ.xml @@ -1,61 +1,121 @@ -Chat Arxiv econhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for econ提出了基于贝叶斯双层稀疏组回归的机器学习方法,可以进行高维宏观经济预测,并且理论证明其具有最小极限速率的收缩性,能够恢复模型参数,支持集包含模型的支持集。https://arxiv.org/abs/2404.02671<p> -基于贝叶斯双层稀疏组回归的宏观经济预测 +Chat Arxiv econhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for econ本文研究了免疫作弊的拍卖形式,发现荷兰式拍卖(设有适当保留价)是唯一的最优且强免疫作弊的拍卖,同时荷兰式拍卖(没有保留价)是唯一同时高效和弱免疫作弊的先验独立拍卖。https://arxiv.org/abs/2404.00475<p> +免疫作弊拍卖 </p> <p> -Bayesian Bi-level Sparse Group Regressions for Macroeconomic Forecasting +Shill-Proof Auctions </p> <p> -https://arxiv.org/abs/2404.02671 +https://arxiv.org/abs/2404.00475 </p> <p> -提出了基于贝叶斯双层稀疏组回归的机器学习方法,可以进行高维宏观经济预测,并且理论证明其具有最小极限速率的收缩性,能够恢复模型参数,支持集包含模型的支持集。 +本文研究了免疫作弊的拍卖形式,发现荷兰式拍卖(设有适当保留价)是唯一的最优且强免疫作弊的拍卖,同时荷兰式拍卖(没有保留价)是唯一同时高效和弱免疫作弊的先验独立拍卖。 </p> <p> </p> <p> -我们提出了一种机器学习方法,在已知具有组结构的协变量的高维设置中进行最优宏观经济预测。我们的模型涵盖了许多时间序列、混合频率和未知非线性的预测设置。我们引入了时间序列计量经济学中的双层稀疏概念,即稀疏性在组水平和组内均成立,我们假设真实模型符合这一假设。我们提出了一种引起双层稀疏性的先验,相应的后验分布被证明以最小极限速率收缩,恢复模型参数,并且其支持集在渐近上包含模型的支持集。我们的理论允许组间相关性,而同一组中的预测变量可以通过强相关性以及共同特征和模式进行表征。通过全面展示有限样本的性能来说明。 +在单品拍卖中,一个欺诈性的卖家可能会伪装成一个或多个竞标者,以操纵成交价格。本文对那些免疫作弊的拍卖格式进行了表征:一个利润最大化的卖家没有任何动机提交任何虚假报价。我们区分了强免疫作弊,即一个了解竞标者估值的卖家永远无法从作弊中获利,和弱免疫作弊,它仅要求从作弊中得到的平衡预期利润为非正。荷兰式拍卖(设有适当保留价)是唯一的最优和强免疫作弊拍卖。此外,荷兰式拍卖(没有保留价)是唯一的具有先验独立性的拍卖,既高效又弱免疫作弊。虽然存在多种策略证明、弱免疫作弊和最优拍卖;任何最优拍卖只能满足集合 {静态、策略证明、弱免疫作弊} 中的两个性质。 </p> <p> -arXiv:2404.02671v1 Announce Type: new Abstract: We propose a Machine Learning approach for optimal macroeconomic forecasting in a high-dimensional setting with covariates presenting a known group structure. Our model encompasses forecasting settings with many series, mixed frequencies, and unknown nonlinearities. We introduce in time-series econometrics the concept of bi-level sparsity, i.e. sparsity holds at both the group level and within groups, and we assume the true model satisfies this assumption. We propose a prior that induces bi-level sparsity, and the corresponding posterior distribution is demonstrated to contract at the minimax-optimal rate, recover the model parameters, and have a support that includes the support of the model asymptotically. Our theory allows for correlation between groups, while predictors in the same group can be characterized by strong covariation as well as common characteristics and patterns. Finite sample performance is illustrated through comprehe -</p>本研究通过全球元分析得出,生态系统服务相对价格每年约为2.2%,用于公共项目评估和环境经济会计中的调整。http://arxiv.org/abs/2308.04400<p> -有限的替代性、相对价格变动与公共自然资本价值的提升 +arXiv:2404.00475v1 Announce Type: new Abstract: In a single-item auction, a duplicitous seller may masquerade as one or more bidders in order to manipulate the clearing price. This paper characterizes auction formats that are shill-proof: a profit-maximizing seller has no incentive to submit any shill bids. We distinguish between strong shill-proofness, in which a seller with full knowledge of bidders' valuations can never profit from shilling, and weak shill-proofness, which requires only that the expected equilibrium profit from shilling is nonpositive. The Dutch auction (with suitable reserve) is the unique optimal and strongly shill-proof auction. Moreover, the Dutch auction (with no reserve) is the unique prior-independent auction that is both efficient and weakly shill-proof. While there are a multiplicity of strategy-proof, weakly shill-proof, and optimal auctions; any optimal auction can satisfy only two properties in the set {static, strategy-proof, weakly shill-proof}. +</p>学习利用观测数据提出了一种逐步双重强健方法,通过向后归纳解决了最佳动态治疗方案的问题https://arxiv.org/abs/2404.00221<p> +利用观测数据进行强健学习以获得最佳动态治疗方案 </p> <p> -Limited substitutability, relative price changes and the uplifting of public natural capital values. (arXiv:2308.04400v1 [econ.GN]) +Robust Learning for Optimal Dynamic Treatment Regimes with Observational Data </p> <p> -http://arxiv.org/abs/2308.04400 +https://arxiv.org/abs/2404.00221 </p> <p> -本研究通过全球元分析得出,生态系统服务相对价格每年约为2.2%,用于公共项目评估和环境经济会计中的调整。 +学习利用观测数据提出了一种逐步双重强健方法,通过向后归纳解决了最佳动态治疗方案的问题 </p> <p> </p> <p> -随着全球经济的不断增长,生态系统服务往往停滞或减少。经济学理论已经揭示了如何将这种相对稀缺性的转变反映到公共项目评估和环境经济会计中,但缺乏实证证据来将理论付诸实践。为了估计可用于进行此类调整的生态系统服务相对价格变化,我们对环境价值评估研究进行了全球元分析,以推导出意愿支付收入弹性作为有限替代性程度的代理。基于749个收入-意愿支付对,我们估计意愿支付收入弹性约为0.78(95-CI:0.6至1.0)。将这些结果与生态系统服务相对稀缺性变化的全球数据集结合起来,我们估计生态系统服务相对价格每年约为2.2%。在对非木材林生态系统的自然资本估值中应用了这些结果。 +许多公共政策和医疗干预涉及其治疗分配中的动态性,治疗通常依据先前治疗的历史和相关特征对每个阶段的效果具有异质性。本文研究了统计学习最佳动态治疗方案(DTR),根据个体的历史指导每个阶段的最佳治疗分配。我们提出了一种基于观测数据的逐步双重强健方法,在顺序可忽略性假设下学习最佳DTR。该方法通过向后归纳解决了顺序治疗分配问题,在每一步中,我们结合倾向评分和行动值函数(Q函数)的估计量,构建了政策价值的增强反向概率加权估计量。 </p> <p> -As the global economy continues to grow, ecosystem services tend to stagnate or degrow. Economic theory has shown how such shifts in relative scarcities can be reflected in the appraisal of public projects and environmental-economic accounting, but empirical evidence has been lacking to put the theory into practice. To estimate the relative price change in ecosystem services that can be used to make such adjustments, we perform a global meta-analysis of environmental valuation studies to derive income elasticities of willingness to pay (WTP) for ecosystem services as a proxy for the degree of limited substitutability. Based on 749 income-WTP pairs, we estimate an income elasticity of WTP of around 0.78 (95-CI: 0.6 to 1.0). Combining these results with a global data set on shifts in the relative scarcity of ecosystem services, we estimate relative price change of ecosystem services of around 2.2 percent per year. In an application to natural capital valuation of non-timber forest ecosys -</p>该论文提出了一种实际上是基于岭回归的时变参数模型,这比传统的状态空间方法计算更快,调整更容易,有助于研究经济结构性变化。http://arxiv.org/abs/2009.00401<p> -使用岭回归法的时变参数模型 +arXiv:2404.00221v1 Announce Type: cross Abstract: Many public policies and medical interventions involve dynamics in their treatment assignments, where treatments are sequentially assigned to the same individuals across multiple stages, and the effect of treatment at each stage is usually heterogeneous with respect to the history of prior treatments and associated characteristics. We study statistical learning of optimal dynamic treatment regimes (DTRs) that guide the optimal treatment assignment for each individual at each stage based on the individual's history. We propose a step-wise doubly-robust approach to learn the optimal DTR using observational data under the assumption of sequential ignorability. The approach solves the sequential treatment assignment problem through backward induction, where, at each step, we combine estimators of propensity scores and action-value functions (Q-functions) to construct augmented inverse probability weighting estimators of values of policies +</p>本研究探讨了具有高昂成本的贝叶斯说服模型,研究对象是一位私人且部分信息知情的发送者在进行公共实验。研究发现实验中好消息和坏消息的成本差异对均衡结果具有重要影响,坏消息成本高时,存在唯一的分离均衡,接收者受益于发送者的私有信息;而好消息成本高时,均衡情况可能出现汇集和部分汇集均衡,接收者可能会因为发送者私有信息而受到损害。http://arxiv.org/abs/2401.14087<p> +高昂的说服成本与部分信息的发送者 </p> <p> -Time-Varying Parameters as Ridge Regressions. (arXiv:2009.00401v3 [econ.EM] UPDATED) +Costly Persuasion by a Partially Informed Sender. (arXiv:2401.14087v1 [econ.TH]) </p> <p> -http://arxiv.org/abs/2009.00401 +http://arxiv.org/abs/2401.14087 </p> <p> -该论文提出了一种实际上是基于岭回归的时变参数模型,这比传统的状态空间方法计算更快,调整更容易,有助于研究经济结构性变化。 +本研究探讨了具有高昂成本的贝叶斯说服模型,研究对象是一位私人且部分信息知情的发送者在进行公共实验。研究发现实验中好消息和坏消息的成本差异对均衡结果具有重要影响,坏消息成本高时,存在唯一的分离均衡,接收者受益于发送者的私有信息;而好消息成本高时,均衡情况可能出现汇集和部分汇集均衡,接收者可能会因为发送者私有信息而受到损害。 </p> <p> </p> <p> -时变参数模型(TVPs)经常被用于经济学中来捕捉结构性变化。我强调了一个被忽视的事实——这些实际上是岭回归。这使得计算、调整和实现比状态空间范式更容易。在高维情况下,解决等价的双重岭问题的计算非常快,关键的“时间变化量”通常是由交叉验证来调整的。使用两步回归岭回归来处理不断变化的波动性。我考虑了基于稀疏性(算法选择哪些参数变化, 哪些不变)和降低秩约束的扩展(变化与因子模型相关联)。为了展示这种方法的有用性, 我使用它来研究加拿大货币政策的演变, 并使用大规模时变局部投影估计约4600个TVPs, 这一任务完全可以利用这种新方法完成。 +本文研究了由一个拥有私有且部分信息的发送者进行的昂贵的贝叶斯说服模型,该发送者进行了一个公共实验。实验的成本是发送者信念的加权对数似然比函数的期望减少。这个模型通过一个沃尔德的顺序抽样问题得到微基础,其中好消息和坏消息的成本不同。我们关注满足D1准则的均衡。均衡结果取决于实验中获得好消息和坏消息的相对成本。如果坏消息的成本更高,则存在唯一的分离均衡,并且接收者明确受益于发送者的私有信息。如果好消息的成本更高,则单点交叉特性不成立。可能存在汇集和部分汇集均衡,在某些均衡中,接收者会明确受到发送者私有信息的伤害。 </p> <p> -Time-varying parameters (TVPs) models are frequently used in economics to capture structural change. I highlight a rather underutilized fact -- that these are actually ridge regressions. Instantly, this makes computations, tuning, and implementation much easier than in the state-space paradigm. Among other things, solving the equivalent dual ridge problem is computationally very fast even in high dimensions, and the crucial "amount of time variation" is tuned by cross-validation. Evolving volatility is dealt with using a two-step ridge regression. I consider extensions that incorporate sparsity (the algorithm selects which parameters vary and which do not) and reduced-rank restrictions (variation is tied to a factor model). To demonstrate the usefulness of the approach, I use it to study the evolution of monetary policy in Canada using large time-varying local projections. The application requires the estimation of about 4600 TVPs, a task well within the reach of the new method. +I study a model of costly Bayesian persuasion by a privately and partially informed sender who conducts a public experiment. The cost of running an experiment is the expected reduction of a weighted log-likelihood ratio function of the sender's belief. This is microfounded by a Wald's sequential sampling problem where good news and bad news cost differently. I focus on equilibria that satisfy the D1 criterion. The equilibrium outcome depends on the relative costs of drawing good and bad news in the experiment. If bad news is more costly, there exists a unique separating equilibrium, and the receiver unambiguously benefits from the sender's private information. If good news is more costly, the single-crossing property fails. There may exist pooling and partial pooling equilibria, and in some equilibria, the receiver strictly suffers from sender private information. +</p>提出了一种模型不可知的推断方法,在部分可辨识的因果估计中应用广泛。该方法基于最优输运问题的对偶理论,能够适应随机实验和观测研究,并且具有统一有效和双重鲁棒性。http://arxiv.org/abs/2310.08115<p> +模型不可知的辅助推断方法在部分可辨识因果效应上的应用 +</p> +<p> +Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects. (arXiv:2310.08115v1 [econ.EM]) +</p> +<p> +http://arxiv.org/abs/2310.08115 +</p> +<p> +提出了一种模型不可知的推断方法,在部分可辨识的因果估计中应用广泛。该方法基于最优输运问题的对偶理论,能够适应随机实验和观测研究,并且具有统一有效和双重鲁棒性。 +</p> +<p> + +</p> +<p> +很多因果估计是部分可辨识的,因为它们依赖于潜在结果之间的不可观察联合分布。基于前处理协变量的分层可以获得更明确的部分可辨识性范围;然而,除非协变量为离散且支撑度相对较小,否则这种方法通常需要对给定协变量的潜在结果的条件分布进行一致估计。因此,现有的方法在模型错误或一致性假设被违反时可能失败。在本研究中,我们提出了一种基于最优输运问题的对偶理论的统一且模型不可知的推断方法,适用于广泛类别的部分可辨识估计。在随机实验中,我们的方法可以结合任何对条件分布的估计,并提供统一有效的推断,即使初始估计是任意不准确的。此外,我们的方法在观测研究中也是双重鲁棒的。 +</p> +<p> +Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper partial identification bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires consistent estimation of the conditional distributions of the potential outcomes given the covariates. Thus, existing approaches may fail under model misspecification or if consistency assumptions are violated. In this study, we propose a unified and model-agnostic inferential approach for a wide class of partially identified estimands, based on duality theory for optimal transport problems. In randomized experiments, our approach can wrap around any estimates of the conditional distributions and provide uniformly valid inference, even if the initial estimates are arbitrarily inaccurate. Also, our approach is doubly robust in observational studies. Notab +</p>本研究通过将说服问题归约为最优运输的Monge-Kantorovich问题,揭示了贝叶斯说服模型中多接收方问题的显式解集和结构性结果,并推广了价值的对偶表示和凹化公式。http://arxiv.org/abs/2307.07672<p> +说服作为交通工具 +</p> +<p> +Persuasion as Transportation. (arXiv:2307.07672v1 [econ.TH]) +</p> +<p> +http://arxiv.org/abs/2307.07672 +</p> +<p> +本研究通过将说服问题归约为最优运输的Monge-Kantorovich问题,揭示了贝叶斯说服模型中多接收方问题的显式解集和结构性结果,并推广了价值的对偶表示和凹化公式。 +</p> +<p> + +</p> +<p> +我们考虑了一个贝叶斯说服模型,其中有一个知情的发送方和几个不知情的接收方。发送方可以通过私人信号影响接收方的信念,而发送方的目标取决于诱导信念的组合。我们将说服问题归约为最优运输的Monge-Kantorovich问题。借助最优运输理论的洞见,我们确定了几类多接收方问题的显式解集,得到了一般的结构性结果,导出了价值的对偶表示,并将著名的凹化公式推广到多接收方问题上。 +</p> +<p> +We consider a model of Bayesian persuasion with one informed sender and several uninformed receivers. The sender can affect receivers' beliefs via private signals, and the sender's objective depends on the combination of induced beliefs. We reduce the persuasion problem to the Monge-Kantorovich problem of optimal transportation. Using insights from optimal transportation theory, we identify several classes of multi-receiver problems that admit explicit solutions, get general structural results, derive a dual representation for the value, and generalize the celebrated concavification formula for the value to multi-receiver problems. +</p>存在其他的更新规则可以使信息的价值变为正值,作者找到了所有这些规则。http://arxiv.org/abs/2302.13956<p> +Bayes = Blackwell, 差不多。 +</p> +<p> +Bayes = Blackwell, Almost. (arXiv:2302.13956v3 [econ.TH] UPDATED) +</p> +<p> +http://arxiv.org/abs/2302.13956 +</p> +<p> +存在其他的更新规则可以使信息的价值变为正值,作者找到了所有这些规则。 +</p> +<p> + +</p> +<p> +存在着除了Bayes'定律之外的更新规则,可以使信息的价值变为正值。我找到了所有这些规则。 +</p> +<p> +There are updating rules other than Bayes' law that render the value of information positive. I find all of them. </p> \ No newline at end of file diff --git a/latest_updated.txt b/latest_updated.txt index a21da1a81..5ab7d7239 100644 --- a/latest_updated.txt +++ b/latest_updated.txt @@ -1 +1 @@ -2024-11-19 03:15:24 \ No newline at end of file +2024-11-19 09:06:44 \ No newline at end of file diff --git a/q-fin.md b/q-fin.md index 1be717665..45b87495d 100644 --- a/q-fin.md +++ b/q-fin.md @@ -2,22 +2,37 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [Limited substitutability, relative price changes and the uplifting of public natural capital values.](http://arxiv.org/abs/2308.04400) | 本研究通过全球元分析得出,生态系统服务相对价格每年约为2.2%,用于公共项目评估和环境经济会计中的调整。 | +| [^1] | [Deep Learning Based Measure of Name Concentration Risk](https://arxiv.org/abs/2403.16525) | 提出了一种基于深度学习的方法,用于量化贷款组合中的姓名集中风险,通过重要性抽样的蒙特卡洛模拟训练神经网络,展示了其相比现有方法在评估小型和集中组合中的姓名集中风险方面的准确性和优越性能。 | +| [^2] | [Reinforcement Learning for Financial Index Tracking.](http://arxiv.org/abs/2308.02820) | 本论文提出了针对金融指数跟踪问题的第一个具有动态性的离散时间无穷期模型,它克服了现有模型的一些局限,可以精确计算交易成本,同时考虑了跟踪误差和交易成本之间的权衡,并能有效利用长时间段的数据。我们使用深度强化学习方法解决该模型,解决了由于数据限制导致的问题。 | # 详细 -[^1]: 有限的替代性、相对价格变动与公共自然资本价值的提升 +[^1]: 基于深度学习的姓名集中风险度量方法 - Limited substitutability, relative price changes and the uplifting of public natural capital values. (arXiv:2308.04400v1 [econ.GN]) + Deep Learning Based Measure of Name Concentration Risk - [http://arxiv.org/abs/2308.04400](http://arxiv.org/abs/2308.04400) + [https://arxiv.org/abs/2403.16525](https://arxiv.org/abs/2403.16525) - 本研究通过全球元分析得出,生态系统服务相对价格每年约为2.2%,用于公共项目评估和环境经济会计中的调整。 + 提出了一种基于深度学习的方法,用于量化贷款组合中的姓名集中风险,通过重要性抽样的蒙特卡洛模拟训练神经网络,展示了其相比现有方法在评估小型和集中组合中的姓名集中风险方面的准确性和优越性能。 - 随着全球经济的不断增长,生态系统服务往往停滞或减少。经济学理论已经揭示了如何将这种相对稀缺性的转变反映到公共项目评估和环境经济会计中,但缺乏实证证据来将理论付诸实践。为了估计可用于进行此类调整的生态系统服务相对价格变化,我们对环境价值评估研究进行了全球元分析,以推导出意愿支付收入弹性作为有限替代性程度的代理。基于749个收入-意愿支付对,我们估计意愿支付收入弹性约为0.78(95-CI:0.6至1.0)。将这些结果与生态系统服务相对稀缺性变化的全球数据集结合起来,我们估计生态系统服务相对价格每年约为2.2%。在对非木材林生态系统的自然资本估值中应用了这些结果。 + 我们提出了一种新的基于深度学习的方法,用于量化贷款组合中的姓名集中风险。我们的方法针对小型组合进行了定制,允许损失的精算定义和按市场价值核算定义。我们的神经网络的训练依赖于重要性抽样的蒙特卡洛模拟,我们明确为CreditRisk${+}$和基于评级的CreditMetrics模型制定了这一过程。基于模拟和真实数据的数值结果显示了我们新方法的准确性,以及与现有分析方法相比,在评估小型和集中组合中的姓名集中风险方面表现出的卓越性能。 - As the global economy continues to grow, ecosystem services tend to stagnate or degrow. Economic theory has shown how such shifts in relative scarcities can be reflected in the appraisal of public projects and environmental-economic accounting, but empirical evidence has been lacking to put the theory into practice. To estimate the relative price change in ecosystem services that can be used to make such adjustments, we perform a global meta-analysis of environmental valuation studies to derive income elasticities of willingness to pay (WTP) for ecosystem services as a proxy for the degree of limited substitutability. Based on 749 income-WTP pairs, we estimate an income elasticity of WTP of around 0.78 (95-CI: 0.6 to 1.0). Combining these results with a global data set on shifts in the relative scarcity of ecosystem services, we estimate relative price change of ecosystem services of around 2.2 percent per year. In an application to natural capital valuation of non-timber forest ecosys + arXiv:2403.16525v1 Announce Type: new Abstract: We propose a new deep learning approach for the quantification of name concentration risk in loan portfolios. Our approach is tailored for small portfolios and allows for both an actuarial as well as a mark-to-market definition of loss. The training of our neural network relies on Monte Carlo simulations with importance sampling which we explicitly formulate for the CreditRisk${+}$ and the ratings-based CreditMetrics model. Numerical results based on simulated as well as real data demonstrate the accuracy of our new approach and its superior performance compared to existing analytical methods for assessing name concentration risk in small and concentrated portfolios. + +[^2]: 针对金融指数跟踪的强化学习 + + Reinforcement Learning for Financial Index Tracking. (arXiv:2308.02820v1 [q-fin.PM]) + + [http://arxiv.org/abs/2308.02820](http://arxiv.org/abs/2308.02820) + + 本论文提出了针对金融指数跟踪问题的第一个具有动态性的离散时间无穷期模型,它克服了现有模型的一些局限,可以精确计算交易成本,同时考虑了跟踪误差和交易成本之间的权衡,并能有效利用长时间段的数据。我们使用深度强化学习方法解决该模型,解决了由于数据限制导致的问题。 + + + + 我们提出了第一个离散时间无穷期动态形式的金融指数跟踪问题,同时考虑到基于收益的跟踪误差和基于价值的跟踪误差。该模型克服了现有模型的局限性,包括不仅限于价格的市场信息变量的时间动态性,可以精确计算交易成本,考虑跟踪误差和交易成本之间的权衡,可以有效利用长时间段的数据等。该模型还引入了现金注入或提取的新的决策变量。我们提出了使用Banach不动点迭代求解投资组合再平衡方程的方法,可以准确计算实践中指定为交易量的非线性函数的交易成本。我们还提出了扩展深度强化学习(RL)方法来解决动态模型。我们的RL方法解决了由数据限制引起的问题。 + + We propose the first discrete-time infinite-horizon dynamic formulation of the financial index tracking problem under both return-based tracking error and value-based tracking error. The formulation overcomes the limitations of existing models by incorporating the intertemporal dynamics of market information variables not limited to prices, allowing exact calculation of transaction costs, accounting for the tradeoff between overall tracking error and transaction costs, allowing effective use of data in a long time period, etc. The formulation also allows novel decision variables of cash injection or withdraw. We propose to solve the portfolio rebalancing equation using a Banach fixed point iteration, which allows to accurately calculate the transaction costs specified as nonlinear functions of trading volumes in practice. We propose an extension of deep reinforcement learning (RL) method to solve the dynamic formulation. Our RL method resolves the issue of data limitation resulting fro diff --git a/q-fin.xml b/q-fin.xml index 4ac277b90..64337b8ad 100644 --- a/q-fin.xml +++ b/q-fin.xml @@ -1,21 +1,41 @@ -Chat Arxiv q-finhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for q-fin本研究通过全球元分析得出,生态系统服务相对价格每年约为2.2%,用于公共项目评估和环境经济会计中的调整。http://arxiv.org/abs/2308.04400<p> -有限的替代性、相对价格变动与公共自然资本价值的提升 +Chat Arxiv q-finhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for q-fin提出了一种基于深度学习的方法,用于量化贷款组合中的姓名集中风险,通过重要性抽样的蒙特卡洛模拟训练神经网络,展示了其相比现有方法在评估小型和集中组合中的姓名集中风险方面的准确性和优越性能。https://arxiv.org/abs/2403.16525<p> +基于深度学习的姓名集中风险度量方法 </p> <p> -Limited substitutability, relative price changes and the uplifting of public natural capital values. (arXiv:2308.04400v1 [econ.GN]) +Deep Learning Based Measure of Name Concentration Risk </p> <p> -http://arxiv.org/abs/2308.04400 +https://arxiv.org/abs/2403.16525 </p> <p> -本研究通过全球元分析得出,生态系统服务相对价格每年约为2.2%,用于公共项目评估和环境经济会计中的调整。 +提出了一种基于深度学习的方法,用于量化贷款组合中的姓名集中风险,通过重要性抽样的蒙特卡洛模拟训练神经网络,展示了其相比现有方法在评估小型和集中组合中的姓名集中风险方面的准确性和优越性能。 </p> <p> </p> <p> -随着全球经济的不断增长,生态系统服务往往停滞或减少。经济学理论已经揭示了如何将这种相对稀缺性的转变反映到公共项目评估和环境经济会计中,但缺乏实证证据来将理论付诸实践。为了估计可用于进行此类调整的生态系统服务相对价格变化,我们对环境价值评估研究进行了全球元分析,以推导出意愿支付收入弹性作为有限替代性程度的代理。基于749个收入-意愿支付对,我们估计意愿支付收入弹性约为0.78(95-CI:0.6至1.0)。将这些结果与生态系统服务相对稀缺性变化的全球数据集结合起来,我们估计生态系统服务相对价格每年约为2.2%。在对非木材林生态系统的自然资本估值中应用了这些结果。 +我们提出了一种新的基于深度学习的方法,用于量化贷款组合中的姓名集中风险。我们的方法针对小型组合进行了定制,允许损失的精算定义和按市场价值核算定义。我们的神经网络的训练依赖于重要性抽样的蒙特卡洛模拟,我们明确为CreditRisk${+}$和基于评级的CreditMetrics模型制定了这一过程。基于模拟和真实数据的数值结果显示了我们新方法的准确性,以及与现有分析方法相比,在评估小型和集中组合中的姓名集中风险方面表现出的卓越性能。 </p> <p> -As the global economy continues to grow, ecosystem services tend to stagnate or degrow. Economic theory has shown how such shifts in relative scarcities can be reflected in the appraisal of public projects and environmental-economic accounting, but empirical evidence has been lacking to put the theory into practice. To estimate the relative price change in ecosystem services that can be used to make such adjustments, we perform a global meta-analysis of environmental valuation studies to derive income elasticities of willingness to pay (WTP) for ecosystem services as a proxy for the degree of limited substitutability. Based on 749 income-WTP pairs, we estimate an income elasticity of WTP of around 0.78 (95-CI: 0.6 to 1.0). Combining these results with a global data set on shifts in the relative scarcity of ecosystem services, we estimate relative price change of ecosystem services of around 2.2 percent per year. In an application to natural capital valuation of non-timber forest ecosys +arXiv:2403.16525v1 Announce Type: new Abstract: We propose a new deep learning approach for the quantification of name concentration risk in loan portfolios. Our approach is tailored for small portfolios and allows for both an actuarial as well as a mark-to-market definition of loss. The training of our neural network relies on Monte Carlo simulations with importance sampling which we explicitly formulate for the CreditRisk${+}$ and the ratings-based CreditMetrics model. Numerical results based on simulated as well as real data demonstrate the accuracy of our new approach and its superior performance compared to existing analytical methods for assessing name concentration risk in small and concentrated portfolios. +</p>本论文提出了针对金融指数跟踪问题的第一个具有动态性的离散时间无穷期模型,它克服了现有模型的一些局限,可以精确计算交易成本,同时考虑了跟踪误差和交易成本之间的权衡,并能有效利用长时间段的数据。我们使用深度强化学习方法解决该模型,解决了由于数据限制导致的问题。http://arxiv.org/abs/2308.02820<p> +针对金融指数跟踪的强化学习 +</p> +<p> +Reinforcement Learning for Financial Index Tracking. (arXiv:2308.02820v1 [q-fin.PM]) +</p> +<p> +http://arxiv.org/abs/2308.02820 +</p> +<p> +本论文提出了针对金融指数跟踪问题的第一个具有动态性的离散时间无穷期模型,它克服了现有模型的一些局限,可以精确计算交易成本,同时考虑了跟踪误差和交易成本之间的权衡,并能有效利用长时间段的数据。我们使用深度强化学习方法解决该模型,解决了由于数据限制导致的问题。 +</p> +<p> + +</p> +<p> +我们提出了第一个离散时间无穷期动态形式的金融指数跟踪问题,同时考虑到基于收益的跟踪误差和基于价值的跟踪误差。该模型克服了现有模型的局限性,包括不仅限于价格的市场信息变量的时间动态性,可以精确计算交易成本,考虑跟踪误差和交易成本之间的权衡,可以有效利用长时间段的数据等。该模型还引入了现金注入或提取的新的决策变量。我们提出了使用Banach不动点迭代求解投资组合再平衡方程的方法,可以准确计算实践中指定为交易量的非线性函数的交易成本。我们还提出了扩展深度强化学习(RL)方法来解决动态模型。我们的RL方法解决了由数据限制引起的问题。 +</p> +<p> +We propose the first discrete-time infinite-horizon dynamic formulation of the financial index tracking problem under both return-based tracking error and value-based tracking error. The formulation overcomes the limitations of existing models by incorporating the intertemporal dynamics of market information variables not limited to prices, allowing exact calculation of transaction costs, accounting for the tradeoff between overall tracking error and transaction costs, allowing effective use of data in a long time period, etc. The formulation also allows novel decision variables of cash injection or withdraw. We propose to solve the portfolio rebalancing equation using a Banach fixed point iteration, which allows to accurately calculate the transaction costs specified as nonlinear functions of trading volumes in practice. We propose an extension of deep reinforcement learning (RL) method to solve the dynamic formulation. Our RL method resolves the issue of data limitation resulting fro </p> \ No newline at end of file diff --git a/stat.ML.md b/stat.ML.md index 9746f0fd1..2cf304a59 100644 --- a/stat.ML.md +++ b/stat.ML.md @@ -2,52 +2,172 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits](https://arxiv.org/abs/2402.15171) | 提出了一种协方差自适应的最小二乘算法,利用在线估计协方差结构,相对于基于代理方差的算法获得改进的遗憾上界,特别在协方差系数全为非负时,能有效地利用半臂反馈,并在各种参数设置下表现优异。 | -| [^2] | [Convergence of Dirichlet Forms for MCMC Optimal Scaling with Dependent Target Distributions on Large Graphs.](http://arxiv.org/abs/2210.17042) | 本文利用Dirichlet形式的Mosco收敛性分析了在大图上的随机游走Metropolis(RWM)算法,证明了RWM算法的最优比例缩放具有收敛性,将已知的几个结果推广到了大图上的依赖目标分布的情况,并为大图上的MCMC算法开辟了许多新的可能性。 | -| [^3] | [Time-Varying Parameters as Ridge Regressions.](http://arxiv.org/abs/2009.00401) | 该论文提出了一种实际上是基于岭回归的时变参数模型,这比传统的状态空间方法计算更快,调整更容易,有助于研究经济结构性变化。 | +| [^1] | [Robust Learning for Optimal Dynamic Treatment Regimes with Observational Data](https://arxiv.org/abs/2404.00221) | 学习利用观测数据提出了一种逐步双重强健方法,通过向后归纳解决了最佳动态治疗方案的问题 | +| [^2] | [AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression](https://arxiv.org/abs/2403.13565) | 提出了一种针对高维回归的自适应迁移学习方法,可以根据可迁移结构自适应检测和聚合特征和样本的可迁移结构。 | +| [^3] | [Interpretable Machine Learning for Survival Analysis](https://arxiv.org/abs/2403.10250) | 可解释的机器学习在生存分析中的应用促进了透明度和公平性,揭示了模型的潜在偏见和限制,并提供了更符合数学原理的特征影响和风险因素预测方法。 | +| [^4] | [When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning](https://arxiv.org/abs/2402.17747) | RLHF在考虑部分观察性时可能导致策略欺骗性地夸大性能或过度辩护行为,我们提出了数学条件来解决这些问题,并警告不要盲目应用RLHF在部分可观测情况下。 | +| [^5] | [Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks](https://arxiv.org/abs/2402.05271) | 了解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。前人的研究表明,在训练过程中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这被称为神经特征分析(NFA)。本研究解释了这种相关性的出现,并发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 | +| [^6] | [Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process](https://arxiv.org/abs/2402.04146) | 这篇论文提出了一种基于潜变量高斯过程的多源数据融合框架,用于解决多个数据源之间质量和全面性差异给系统优化带来的问题。 | +| [^7] | [Bayesian Quantile Regression with Subset Selection: A Posterior Summarization Perspective.](http://arxiv.org/abs/2311.02043) | 本研究提出了一种基于贝叶斯决策分析的方法,对于任何贝叶斯回归模型,可以得到每个条件分位数的最佳和可解释的线性估计值和不确定性量化。该方法是一种适用于特定分位数子集选择的有效工具。 | +| [^8] | [Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects.](http://arxiv.org/abs/2310.08115) | 提出了一种模型不可知的推断方法,在部分可辨识的因果估计中应用广泛。该方法基于最优输运问题的对偶理论,能够适应随机实验和观测研究,并且具有统一有效和双重鲁棒性。 | +| [^9] | [A Model-Agnostic Graph Neural Network for Integrating Local and Global Information.](http://arxiv.org/abs/2309.13459) | MaGNet是一种模型无关的图神经网络框架,能够顺序地整合不同顺序的信息,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。 | +| [^10] | [Model-based Clustering using Non-parametric Hidden Markov Models.](http://arxiv.org/abs/2309.12238) | 本文研究了使用非参数隐马尔可夫模型进行基于模型的聚类时的贝叶斯风险,并提出了相应的聚类方法。通过研究分类的贝叶斯风险和聚类的贝叶斯风险之间的关系,确定了聚类任务的难度。同时,在插值分类器和在线设置中的结果也得到了证明。模拟实验验证了这些发现。 | +| [^11] | [Optimal and Fair Encouragement Policy Evaluation and Learning.](http://arxiv.org/abs/2309.07176) | 本研究探讨了在关键领域中针对鼓励政策的最优和公平评估以及学习的问题,研究发现在人类不遵循治疗建议的情况下,最优策略规则只是建议。同时,针对治疗的异质性和公平考虑因素,决策者的权衡和决策规则也会发生变化。在社会服务领域,研究显示存在一个使用差距问题,那些最有可能受益的人却无法获得这些益服务。 | # 详细 -[^1]: 用于随机组合半臂老虎机的协方差自适应最小二乘算法 +[^1]: 利用观测数据进行强健学习以获得最佳动态治疗方案 - Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits + Robust Learning for Optimal Dynamic Treatment Regimes with Observational Data - [https://arxiv.org/abs/2402.15171](https://arxiv.org/abs/2402.15171) + [https://arxiv.org/abs/2404.00221](https://arxiv.org/abs/2404.00221) - 提出了一种协方差自适应的最小二乘算法,利用在线估计协方差结构,相对于基于代理方差的算法获得改进的遗憾上界,特别在协方差系数全为非负时,能有效地利用半臂反馈,并在各种参数设置下表现优异。 + 学习利用观测数据提出了一种逐步双重强健方法,通过向后归纳解决了最佳动态治疗方案的问题 - 我们解决了随机组合半臂老虎机问题,其中玩家可以从包含d个基本项的P个子集中进行选择。大多数现有算法(如CUCB、ESCB、OLS-UCB)需要对奖励分布有先验知识,比如子高斯代理-方差的上界,这很难准确估计。在这项工作中,我们设计了OLS-UCB的方差自适应版本,依赖于协方差结构的在线估计。在实际设置中,估计协方差矩阵的系数要容易得多,并且相对于基于代理方差的算法,导致改进的遗憾上界。当协方差系数全为非负时,我们展示了我们的方法有效地利用了半臂反馈,并且可以明显优于老虎机反馈方法,在指数级别P≫d以及P≤d的情况下,这一点并不来自大多数现有分析。 + 许多公共政策和医疗干预涉及其治疗分配中的动态性,治疗通常依据先前治疗的历史和相关特征对每个阶段的效果具有异质性。本文研究了统计学习最佳动态治疗方案(DTR),根据个体的历史指导每个阶段的最佳治疗分配。我们提出了一种基于观测数据的逐步双重强健方法,在顺序可忽略性假设下学习最佳DTR。该方法通过向后归纳解决了顺序治疗分配问题,在每一步中,我们结合倾向评分和行动值函数(Q函数)的估计量,构建了政策价值的增强反向概率加权估计量。 - arXiv:2402.15171v1 Announce Type: new Abstract: We address the problem of stochastic combinatorial semi-bandits, where a player can select from P subsets of a set containing d base items. Most existing algorithms (e.g. CUCB, ESCB, OLS-UCB) require prior knowledge on the reward distribution, like an upper bound on a sub-Gaussian proxy-variance, which is hard to estimate tightly. In this work, we design a variance-adaptive version of OLS-UCB, relying on an online estimation of the covariance structure. Estimating the coefficients of a covariance matrix is much more manageable in practical settings and results in improved regret upper bounds compared to proxy variance-based algorithms. When covariance coefficients are all non-negative, we show that our approach efficiently leverages the semi-bandit feedback and provably outperforms bandit feedback approaches, not only in exponential regimes where P $\gg$ d but also when P $\le$ d, which is not straightforward from most existing analyses. + arXiv:2404.00221v1 Announce Type: cross Abstract: Many public policies and medical interventions involve dynamics in their treatment assignments, where treatments are sequentially assigned to the same individuals across multiple stages, and the effect of treatment at each stage is usually heterogeneous with respect to the history of prior treatments and associated characteristics. We study statistical learning of optimal dynamic treatment regimes (DTRs) that guide the optimal treatment assignment for each individual at each stage based on the individual's history. We propose a step-wise doubly-robust approach to learn the optimal DTR using observational data under the assumption of sequential ignorability. The approach solves the sequential treatment assignment problem through backward induction, where, at each step, we combine estimators of propensity scores and action-value functions (Q-functions) to construct augmented inverse probability weighting estimators of values of policies -[^2]: 依赖于大图的MCMC最优比例缩放的Dirichlet形式的收敛性 +[^2]: AdaTrans:针对高维回归的特征自适应与样本自适应迁移学习 - Convergence of Dirichlet Forms for MCMC Optimal Scaling with Dependent Target Distributions on Large Graphs. (arXiv:2210.17042v2 [math.ST] UPDATED) + AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression - [http://arxiv.org/abs/2210.17042](http://arxiv.org/abs/2210.17042) + [https://arxiv.org/abs/2403.13565](https://arxiv.org/abs/2403.13565) - 本文利用Dirichlet形式的Mosco收敛性分析了在大图上的随机游走Metropolis(RWM)算法,证明了RWM算法的最优比例缩放具有收敛性,将已知的几个结果推广到了大图上的依赖目标分布的情况,并为大图上的MCMC算法开辟了许多新的可能性。 + 提出了一种针对高维回归的自适应迁移学习方法,可以根据可迁移结构自适应检测和聚合特征和样本的可迁移结构。 - Markov Chain Monte Carlo (MCMC)算法在统计学、物理学、机器学习等方面发挥了重要作用,并且对于一些高维问题,它们是唯一已知的通用和有效的方法。本文利用Dirichlet形式的Mosco收敛性分析了在大图上的随机游走Metropolis(RWM)算法,其目标分布是包括任何满足Markov性质的概率测度的Gibbs测度。Dirichlet形式的抽象且强大的理论使我们能够直接和自然地在无限维空间上工作,我们的Mosco收敛性概念允许与RWM链相关联的Dirichlet形式位于变化的图序列上,其中图的大小可以是无界的,图可以是相关的。我们证明了在强空间依赖性存在的情况下,RWM算法的最优比例缩放具有收敛性。我们的结果将已知的几个结果推广到了大图上的依赖目标分布的情况,并为大图上的MCMC算法开辟了许多新的可能性。 + 我们考虑高维背景下的迁移学习问题,在该问题中,特征维度大于样本大小。为了学习可迁移的信息,该信息可能在特征或源样本之间变化,我们提出一种自适应迁移学习方法,可以检测和聚合特征-wise (F-AdaTrans)或样本-wise (S-AdaTrans)可迁移结构。我们通过采用一种新颖的融合惩罚方法,结合权重,可以根据可迁移结构进行调整。为了选择权重,我们提出了一个在理论上建立,数据驱动的过程,使得 F-AdaTrans 能够选择性地将可迁移的信号与目标融合在一起,同时滤除非可迁移的信号,S-AdaTrans则可以获得每个源样本传递的信息的最佳组合。我们建立了非渐近速率,可以在特殊情况下恢复现有的近最小似乎最优速率。效果证明... - Markov chain Monte Carlo (MCMC) algorithms have played a significant role in statistics, physics, machine learning and others, and they are the only known general and efficient approach for some high-dimensional problems. The random walk Metropolis (RWM) algorithm as the most classical MCMC algorithm, has had a great influence on the development and practice of science and engineering. The behavior of the RWM algorithm in high-dimensional problems is typically investigated through a weak convergence result of diffusion processes. In this paper, we utilize the Mosco convergence of Dirichlet forms in analyzing the RWM algorithm on large graphs, whose target distribution is the Gibbs measure that includes any probability measure satisfying a Markov property. The abstract and powerful theory of Dirichlet forms allows us to work directly and naturally on the infinite-dimensional space, and our notion of Mosco convergence allows Dirichlet forms associated with the RWM chains to lie on changi + arXiv:2403.13565v1 Announce Type: cross Abstract: We consider the transfer learning problem in the high dimensional setting, where the feature dimension is larger than the sample size. To learn transferable information, which may vary across features or the source samples, we propose an adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures. We achieve this by employing a novel fused-penalty, coupled with weights that can adapt according to the transferable structure. To choose the weight, we propose a theoretically informed, data-driven procedure, enabling F-AdaTrans to selectively fuse the transferable signals with the target while filtering out non-transferable signals, and S-AdaTrans to obtain the optimal combination of information transferred from each source sample. The non-asymptotic rates are established, which recover existing near-minimax optimal rates in special cases. The effectivene -[^3]: 使用岭回归法的时变参数模型 +[^3]: 可解释的机器学习用于生存分析 - Time-Varying Parameters as Ridge Regressions. (arXiv:2009.00401v3 [econ.EM] UPDATED) + Interpretable Machine Learning for Survival Analysis - [http://arxiv.org/abs/2009.00401](http://arxiv.org/abs/2009.00401) + [https://arxiv.org/abs/2403.10250](https://arxiv.org/abs/2403.10250) - 该论文提出了一种实际上是基于岭回归的时变参数模型,这比传统的状态空间方法计算更快,调整更容易,有助于研究经济结构性变化。 + 可解释的机器学习在生存分析中的应用促进了透明度和公平性,揭示了模型的潜在偏见和限制,并提供了更符合数学原理的特征影响和风险因素预测方法。 - 时变参数模型(TVPs)经常被用于经济学中来捕捉结构性变化。我强调了一个被忽视的事实——这些实际上是岭回归。这使得计算、调整和实现比状态空间范式更容易。在高维情况下,解决等价的双重岭问题的计算非常快,关键的“时间变化量”通常是由交叉验证来调整的。使用两步回归岭回归来处理不断变化的波动性。我考虑了基于稀疏性(算法选择哪些参数变化, 哪些不变)和降低秩约束的扩展(变化与因子模型相关联)。为了展示这种方法的有用性, 我使用它来研究加拿大货币政策的演变, 并使用大规模时变局部投影估计约4600个TVPs, 这一任务完全可以利用这种新方法完成。 + 随着黑盒机器学习模型的传播和快速进步,可解释的机器学习(IML)领域或可解释的人工智能(XAI)在过去十年中变得越来越重要。 这在生存分析领域尤为重要,其中采用IML技术促进了透明度、问责制和公平性,特别是在临床决策过程、有针对性疗法的开发、干预或其他医学或与医疗保健相关的环境中。 具体来说,可解释性可以揭示生存模型的潜在偏见和局限性,并提供更符合数学原理的方法来理解哪些特征对预测有影响或构成风险因素。 然而,缺乏即时可用的IML方法可能已经阻碍了医学从业者和公共卫生政策制定者充分利用机器学习的潜力。 - Time-varying parameters (TVPs) models are frequently used in economics to capture structural change. I highlight a rather underutilized fact -- that these are actually ridge regressions. Instantly, this makes computations, tuning, and implementation much easier than in the state-space paradigm. Among other things, solving the equivalent dual ridge problem is computationally very fast even in high dimensions, and the crucial "amount of time variation" is tuned by cross-validation. Evolving volatility is dealt with using a two-step ridge regression. I consider extensions that incorporate sparsity (the algorithm selects which parameters vary and which do not) and reduced-rank restrictions (variation is tied to a factor model). To demonstrate the usefulness of the approach, I use it to study the evolution of monetary policy in Canada using large time-varying local projections. The application requires the estimation of about 4600 TVPs, a task well within the reach of the new method. + arXiv:2403.10250v1 Announce Type: cross Abstract: With the spread and rapid advancement of black box machine learning models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability and fairness in sensitive areas, such as clinical decision making processes, the development of targeted therapies, interventions or in other medical or healthcare related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred medical practitioners and policy makers in public health from leveraging the full potential of machine lea + +[^4]: 当你的AI欺骗你:在奖励学习中人类评估者部分可观测性的挑战 + + When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning + + [https://arxiv.org/abs/2402.17747](https://arxiv.org/abs/2402.17747) + + RLHF在考虑部分观察性时可能导致策略欺骗性地夸大性能或过度辩护行为,我们提出了数学条件来解决这些问题,并警告不要盲目应用RLHF在部分可观测情况下。 + + + + 强化学习从人类反馈(RLHF)的过去分析假设人类完全观察到环境。当人类反馈仅基于部分观察时会发生什么?我们对两种失败情况进行了正式定义:欺骗和过度辩护。通过将人类建模为对轨迹信念的Boltzmann-理性,我们证明了RLHF保证会导致策略欺骗性地夸大其性能、为了留下印象而过度辩护或者两者兼而有之的条件。为了帮助解决这些问题,我们数学地刻画了环境部分可观测性如何转化为(缺乏)学到的回报函数中的模糊性。在某些情况下,考虑环境部分可观测性使得在理论上可能恢复回报函数和最优策略,而在其他情况下,存在不可减少的模糊性。我们警告不要盲目应用RLHF在部分可观测情况下。 + + arXiv:2402.17747v1 Announce Type: cross Abstract: Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deception and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is guaranteed to result in policies that deceptively inflate their performance, overjustify their behavior to make an impression, or both. To help address these issues, we mathematically characterize how partial observability of the environment translates into (lack of) ambiguity in the learned return function. In some cases, accounting for partial observability makes it theoretically possible to recover the return function and thus the optimal policy, while in other cases, there is irreducible ambiguity. We caution against blindly applying RLHF in partially observa + +[^5]: 梯度下降引发了深度非线性网络权重与经验NTK之间的对齐 + + Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks + + [https://arxiv.org/abs/2402.05271](https://arxiv.org/abs/2402.05271) + + 了解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。前人的研究表明,在训练过程中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这被称为神经特征分析(NFA)。本研究解释了这种相关性的出现,并发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 + + + + 理解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。先前的研究已经确定,在一般结构的训练神经网络中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这个说法被称为神经特征分析(NFA)。然而,这些数量在训练过程中如何相关尚不清楚。在这项工作中,我们解释了这种相关性的出现。我们发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。我们证明了先前研究中引入的NFA是由隔离这种对齐的中心化NFA驱动的。我们还展示了在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 + + Understanding the mechanisms through which neural networks extract statistics from input-label pairs is one of the most important unsolved problems in supervised learning. Prior works have identified that the gram matrices of the weights in trained neural networks of general architectures are proportional to the average gradient outer product of the model, in a statement known as the Neural Feature Ansatz (NFA). However, the reason these quantities become correlated during training is poorly understood. In this work, we explain the emergence of this correlation. We identify that the NFA is equivalent to alignment between the left singular structure of the weight matrices and a significant component of the empirical neural tangent kernels associated with those weights. We establish that the NFA introduced in prior works is driven by a centered NFA that isolates this alignment. We show that the speed of NFA development can be predicted analytically at early training times in terms of sim + +[^6]: 可解释的多源数据融合通过潜变量高斯过程 + + Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process + + [https://arxiv.org/abs/2402.04146](https://arxiv.org/abs/2402.04146) + + 这篇论文提出了一种基于潜变量高斯过程的多源数据融合框架,用于解决多个数据源之间质量和全面性差异给系统优化带来的问题。 + + + + 随着人工智能(AI)和机器学习(ML)的出现,各个科学和工程领域已经利用数据驱动的替代模型来建模来自大量信息源(数据)的复杂系统。这种增加导致了开发出用于执行特定功能的优越系统所需的成本和时间的显著降低。这样的替代模型往往广泛地融合多个数据来源,可能是发表的论文、专利、开放资源库或其他资源。然而,对于已知和未知的信息来源的基础物理参数的质量和全面性的差异,可能对系统优化过程产生后续影响,却没有得到充分的关注。为了解决这个问题,提出了一种基于潜变量高斯过程(LVGP)的多源数据融合框架。 + + With the advent of artificial intelligence (AI) and machine learning (ML), various domains of science and engineering communites has leveraged data-driven surrogates to model complex systems from numerous sources of information (data). The proliferation has led to significant reduction in cost and time involved in development of superior systems designed to perform specific functionalities. A high proposition of such surrogates are built extensively fusing multiple sources of data, may it be published papers, patents, open repositories, or other resources. However, not much attention has been paid to the differences in quality and comprehensiveness of the known and unknown underlying physical parameters of the information sources that could have downstream implications during system optimization. Towards resolving this issue, a multi-source data fusion framework based on Latent Variable Gaussian Process (LVGP) is proposed. The individual data sources are tagged as a characteristic cate + +[^7]: 基于子集选择的贝叶斯分位回归:后验总结视角 + + Bayesian Quantile Regression with Subset Selection: A Posterior Summarization Perspective. (arXiv:2311.02043v1 [stat.ME]) + + [http://arxiv.org/abs/2311.02043](http://arxiv.org/abs/2311.02043) + + 本研究提出了一种基于贝叶斯决策分析的方法,对于任何贝叶斯回归模型,可以得到每个条件分位数的最佳和可解释的线性估计值和不确定性量化。该方法是一种适用于特定分位数子集选择的有效工具。 + + + + 分位回归是一种强大的工具,用于推断协变量如何影响响应分布的特定分位数。现有方法要么分别估计每个感兴趣分位数的条件分位数,要么使用半参数或非参数模型估计整个条件分布。前者经常产生不适合实际数据的模型,并且不在分位数之间共享信息,而后者则以复杂且受限制的模型为特点,难以解释和计算效率低下。此外,这两种方法都不适合于特定分位数的子集选择。相反,我们从贝叶斯决策分析的角度出发,提出了线性分位估计、不确定性量化和子集选择的基本问题。对于任何贝叶斯回归模型,我们为每个基于模型的条件分位数推导出最佳和可解释的线性估计值和不确定性量化。我们的方法引入了一种分位数聚焦的方法。 + + Quantile regression is a powerful tool for inferring how covariates affect specific percentiles of the response distribution. Existing methods either estimate conditional quantiles separately for each quantile of interest or estimate the entire conditional distribution using semi- or non-parametric models. The former often produce inadequate models for real data and do not share information across quantiles, while the latter are characterized by complex and constrained models that can be difficult to interpret and computationally inefficient. Further, neither approach is well-suited for quantile-specific subset selection. Instead, we pose the fundamental problems of linear quantile estimation, uncertainty quantification, and subset selection from a Bayesian decision analysis perspective. For any Bayesian regression model, we derive optimal and interpretable linear estimates and uncertainty quantification for each model-based conditional quantile. Our approach introduces a quantile-focu + +[^8]: 模型不可知的辅助推断方法在部分可辨识因果效应上的应用 + + Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects. (arXiv:2310.08115v1 [econ.EM]) + + [http://arxiv.org/abs/2310.08115](http://arxiv.org/abs/2310.08115) + + 提出了一种模型不可知的推断方法,在部分可辨识的因果估计中应用广泛。该方法基于最优输运问题的对偶理论,能够适应随机实验和观测研究,并且具有统一有效和双重鲁棒性。 + + + + 很多因果估计是部分可辨识的,因为它们依赖于潜在结果之间的不可观察联合分布。基于前处理协变量的分层可以获得更明确的部分可辨识性范围;然而,除非协变量为离散且支撑度相对较小,否则这种方法通常需要对给定协变量的潜在结果的条件分布进行一致估计。因此,现有的方法在模型错误或一致性假设被违反时可能失败。在本研究中,我们提出了一种基于最优输运问题的对偶理论的统一且模型不可知的推断方法,适用于广泛类别的部分可辨识估计。在随机实验中,我们的方法可以结合任何对条件分布的估计,并提供统一有效的推断,即使初始估计是任意不准确的。此外,我们的方法在观测研究中也是双重鲁棒的。 + + Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper partial identification bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires consistent estimation of the conditional distributions of the potential outcomes given the covariates. Thus, existing approaches may fail under model misspecification or if consistency assumptions are violated. In this study, we propose a unified and model-agnostic inferential approach for a wide class of partially identified estimands, based on duality theory for optimal transport problems. In randomized experiments, our approach can wrap around any estimates of the conditional distributions and provide uniformly valid inference, even if the initial estimates are arbitrarily inaccurate. Also, our approach is doubly robust in observational studies. Notab + +[^9]: 模型无关的图神经网络用于整合局部和全局信息的研究 + + A Model-Agnostic Graph Neural Network for Integrating Local and Global Information. (arXiv:2309.13459v1 [stat.ML]) + + [http://arxiv.org/abs/2309.13459](http://arxiv.org/abs/2309.13459) + + MaGNet是一种模型无关的图神经网络框架,能够顺序地整合不同顺序的信息,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。 + + + + 图神经网络(GNNs)在各种以图为重点的任务中取得了令人满意的性能。尽管取得了成功,但现有的GNN存在两个重要限制:由于黑盒特性,结果缺乏可解释性;无法学习不同顺序的表示。为了解决这些问题,我们提出了一种新的模型无关的图神经网络(MaGNet)框架,能够顺序地整合不同顺序的信息,从高阶邻居中提取知识,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。特别地,MaGNet由两个组件组成:图拓扑下复杂关系的潜在表示的估计模型和识别有影响力的节点、边和重要节点特征的解释模型。从理论上,我们通过经验Rademacher复杂度建立了MaGNet的泛化误差界,并展示了其强大的能力。 + + Graph Neural Networks (GNNs) have achieved promising performance in a variety of graph-focused tasks. Despite their success, existing GNNs suffer from two significant limitations: a lack of interpretability in results due to their black-box nature, and an inability to learn representations of varying orders. To tackle these issues, we propose a novel Model-agnostic Graph Neural Network (MaGNet) framework, which is able to sequentially integrate information of various orders, extract knowledge from high-order neighbors, and provide meaningful and interpretable results by identifying influential compact graph structures. In particular, MaGNet consists of two components: an estimation model for the latent representation of complex relationships under graph topology, and an interpretation model that identifies influential nodes, edges, and important node features. Theoretically, we establish the generalization error bound for MaGNet via empirical Rademacher complexity, and showcase its pow + +[^10]: 使用非参数隐马尔可夫模型的基于模型的聚类 + + Model-based Clustering using Non-parametric Hidden Markov Models. (arXiv:2309.12238v1 [math.ST]) + + [http://arxiv.org/abs/2309.12238](http://arxiv.org/abs/2309.12238) + + 本文研究了使用非参数隐马尔可夫模型进行基于模型的聚类时的贝叶斯风险,并提出了相应的聚类方法。通过研究分类的贝叶斯风险和聚类的贝叶斯风险之间的关系,确定了聚类任务的难度。同时,在插值分类器和在线设置中的结果也得到了证明。模拟实验验证了这些发现。 + + + + 非参数隐马尔可夫模型(HMM)由于其依赖结构,可以在不指定群组分布的情况下进行基于模型的聚类。本文研究了在使用HMM进行聚类时的贝叶斯风险,并提出了相应的聚类方法。首先,我们给出了将分类的贝叶斯风险与聚类的贝叶斯风险联系起来的结果,用以确定聚类任务的难度的关键数量。我们还在独立同分布的框架下证明了这一结果,这可能具有独立的兴趣。然后我们研究了插值分类器的过度风险。所有这些结果都被证明在在线设置中仍然有效,在该设置下,观测结果被顺序聚类。模拟实验证明了我们的发现。 + + Thanks to their dependency structure, non-parametric Hidden Markov Models (HMMs) are able to handle model-based clustering without specifying group distributions. The aim of this work is to study the Bayes risk of clustering when using HMMs and to propose associated clustering procedures. We first give a result linking the Bayes risk of classification and the Bayes risk of clustering, which we use to identify the key quantity determining the difficulty of the clustering task. We also give a proof of this result in the i.i.d. framework, which might be of independent interest. Then we study the excess risk of the plugin classifier. All these results are shown to remain valid in the online setting where observations are clustered sequentially. Simulations illustrate our findings. + +[^11]: 最优和公平的鼓励政策评估与学习 + + Optimal and Fair Encouragement Policy Evaluation and Learning. (arXiv:2309.07176v1 [cs.LG]) + + [http://arxiv.org/abs/2309.07176](http://arxiv.org/abs/2309.07176) + + 本研究探讨了在关键领域中针对鼓励政策的最优和公平评估以及学习的问题,研究发现在人类不遵循治疗建议的情况下,最优策略规则只是建议。同时,针对治疗的异质性和公平考虑因素,决策者的权衡和决策规则也会发生变化。在社会服务领域,研究显示存在一个使用差距问题,那些最有可能受益的人却无法获得这些益服务。 + + + + 在关键领域中,强制个体接受治疗通常是不可能的,因此在人类不遵循治疗建议的情况下,最优策略规则只是建议。在这些领域中,接受治疗的个体可能存在异质性,治疗效果也可能存在异质性。虽然最优治疗规则可以最大化整个人群的因果结果,但在鼓励的情况下,对于访问平等限制或其他公平考虑因素可能是相关的。例如,在社会服务领域,一个持久的难题是那些最有可能从中受益的人中那些获益服务的使用差距。当决策者对访问和平均结果都有分配偏好时,最优决策规则会发生变化。我们研究了因果识别、统计方差减少估计和稳健估计的最优治疗规则,包括在违反阳性条件的情况下。 + + In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We c diff --git a/stat.ML.xml b/stat.ML.xml index 0a5cc4888..c723ce92d 100644 --- a/stat.ML.xml +++ b/stat.ML.xml @@ -1,61 +1,221 @@ -Chat Arxiv stat.MLhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for stat.ML提出了一种协方差自适应的最小二乘算法,利用在线估计协方差结构,相对于基于代理方差的算法获得改进的遗憾上界,特别在协方差系数全为非负时,能有效地利用半臂反馈,并在各种参数设置下表现优异。https://arxiv.org/abs/2402.15171<p> -用于随机组合半臂老虎机的协方差自适应最小二乘算法 +Chat Arxiv stat.MLhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for stat.ML学习利用观测数据提出了一种逐步双重强健方法,通过向后归纳解决了最佳动态治疗方案的问题https://arxiv.org/abs/2404.00221<p> +利用观测数据进行强健学习以获得最佳动态治疗方案 </p> <p> -Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits +Robust Learning for Optimal Dynamic Treatment Regimes with Observational Data </p> <p> -https://arxiv.org/abs/2402.15171 +https://arxiv.org/abs/2404.00221 </p> <p> -提出了一种协方差自适应的最小二乘算法,利用在线估计协方差结构,相对于基于代理方差的算法获得改进的遗憾上界,特别在协方差系数全为非负时,能有效地利用半臂反馈,并在各种参数设置下表现优异。 +学习利用观测数据提出了一种逐步双重强健方法,通过向后归纳解决了最佳动态治疗方案的问题 </p> <p> </p> <p> -我们解决了随机组合半臂老虎机问题,其中玩家可以从包含d个基本项的P个子集中进行选择。大多数现有算法(如CUCB、ESCB、OLS-UCB)需要对奖励分布有先验知识,比如子高斯代理-方差的上界,这很难准确估计。在这项工作中,我们设计了OLS-UCB的方差自适应版本,依赖于协方差结构的在线估计。在实际设置中,估计协方差矩阵的系数要容易得多,并且相对于基于代理方差的算法,导致改进的遗憾上界。当协方差系数全为非负时,我们展示了我们的方法有效地利用了半臂反馈,并且可以明显优于老虎机反馈方法,在指数级别P≫d以及P≤d的情况下,这一点并不来自大多数现有分析。 +许多公共政策和医疗干预涉及其治疗分配中的动态性,治疗通常依据先前治疗的历史和相关特征对每个阶段的效果具有异质性。本文研究了统计学习最佳动态治疗方案(DTR),根据个体的历史指导每个阶段的最佳治疗分配。我们提出了一种基于观测数据的逐步双重强健方法,在顺序可忽略性假设下学习最佳DTR。该方法通过向后归纳解决了顺序治疗分配问题,在每一步中,我们结合倾向评分和行动值函数(Q函数)的估计量,构建了政策价值的增强反向概率加权估计量。 </p> <p> -arXiv:2402.15171v1 Announce Type: new Abstract: We address the problem of stochastic combinatorial semi-bandits, where a player can select from P subsets of a set containing d base items. Most existing algorithms (e.g. CUCB, ESCB, OLS-UCB) require prior knowledge on the reward distribution, like an upper bound on a sub-Gaussian proxy-variance, which is hard to estimate tightly. In this work, we design a variance-adaptive version of OLS-UCB, relying on an online estimation of the covariance structure. Estimating the coefficients of a covariance matrix is much more manageable in practical settings and results in improved regret upper bounds compared to proxy variance-based algorithms. When covariance coefficients are all non-negative, we show that our approach efficiently leverages the semi-bandit feedback and provably outperforms bandit feedback approaches, not only in exponential regimes where P $\gg$ d but also when P $\le$ d, which is not straightforward from most existing analyses. -</p>本文利用Dirichlet形式的Mosco收敛性分析了在大图上的随机游走Metropolis(RWM)算法,证明了RWM算法的最优比例缩放具有收敛性,将已知的几个结果推广到了大图上的依赖目标分布的情况,并为大图上的MCMC算法开辟了许多新的可能性。http://arxiv.org/abs/2210.17042<p> -依赖于大图的MCMC最优比例缩放的Dirichlet形式的收敛性 +arXiv:2404.00221v1 Announce Type: cross Abstract: Many public policies and medical interventions involve dynamics in their treatment assignments, where treatments are sequentially assigned to the same individuals across multiple stages, and the effect of treatment at each stage is usually heterogeneous with respect to the history of prior treatments and associated characteristics. We study statistical learning of optimal dynamic treatment regimes (DTRs) that guide the optimal treatment assignment for each individual at each stage based on the individual's history. We propose a step-wise doubly-robust approach to learn the optimal DTR using observational data under the assumption of sequential ignorability. The approach solves the sequential treatment assignment problem through backward induction, where, at each step, we combine estimators of propensity scores and action-value functions (Q-functions) to construct augmented inverse probability weighting estimators of values of policies +</p>提出了一种针对高维回归的自适应迁移学习方法,可以根据可迁移结构自适应检测和聚合特征和样本的可迁移结构。https://arxiv.org/abs/2403.13565<p> +AdaTrans:针对高维回归的特征自适应与样本自适应迁移学习 </p> <p> -Convergence of Dirichlet Forms for MCMC Optimal Scaling with Dependent Target Distributions on Large Graphs. (arXiv:2210.17042v2 [math.ST] UPDATED) +AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression </p> <p> -http://arxiv.org/abs/2210.17042 +https://arxiv.org/abs/2403.13565 </p> <p> -本文利用Dirichlet形式的Mosco收敛性分析了在大图上的随机游走Metropolis(RWM)算法,证明了RWM算法的最优比例缩放具有收敛性,将已知的几个结果推广到了大图上的依赖目标分布的情况,并为大图上的MCMC算法开辟了许多新的可能性。 +提出了一种针对高维回归的自适应迁移学习方法,可以根据可迁移结构自适应检测和聚合特征和样本的可迁移结构。 </p> <p> </p> <p> -Markov Chain Monte Carlo (MCMC)算法在统计学、物理学、机器学习等方面发挥了重要作用,并且对于一些高维问题,它们是唯一已知的通用和有效的方法。本文利用Dirichlet形式的Mosco收敛性分析了在大图上的随机游走Metropolis(RWM)算法,其目标分布是包括任何满足Markov性质的概率测度的Gibbs测度。Dirichlet形式的抽象且强大的理论使我们能够直接和自然地在无限维空间上工作,我们的Mosco收敛性概念允许与RWM链相关联的Dirichlet形式位于变化的图序列上,其中图的大小可以是无界的,图可以是相关的。我们证明了在强空间依赖性存在的情况下,RWM算法的最优比例缩放具有收敛性。我们的结果将已知的几个结果推广到了大图上的依赖目标分布的情况,并为大图上的MCMC算法开辟了许多新的可能性。 +我们考虑高维背景下的迁移学习问题,在该问题中,特征维度大于样本大小。为了学习可迁移的信息,该信息可能在特征或源样本之间变化,我们提出一种自适应迁移学习方法,可以检测和聚合特征-wise (F-AdaTrans)或样本-wise (S-AdaTrans)可迁移结构。我们通过采用一种新颖的融合惩罚方法,结合权重,可以根据可迁移结构进行调整。为了选择权重,我们提出了一个在理论上建立,数据驱动的过程,使得 F-AdaTrans 能够选择性地将可迁移的信号与目标融合在一起,同时滤除非可迁移的信号,S-AdaTrans则可以获得每个源样本传递的信息的最佳组合。我们建立了非渐近速率,可以在特殊情况下恢复现有的近最小似乎最优速率。效果证明... </p> <p> -Markov chain Monte Carlo (MCMC) algorithms have played a significant role in statistics, physics, machine learning and others, and they are the only known general and efficient approach for some high-dimensional problems. The random walk Metropolis (RWM) algorithm as the most classical MCMC algorithm, has had a great influence on the development and practice of science and engineering. The behavior of the RWM algorithm in high-dimensional problems is typically investigated through a weak convergence result of diffusion processes. In this paper, we utilize the Mosco convergence of Dirichlet forms in analyzing the RWM algorithm on large graphs, whose target distribution is the Gibbs measure that includes any probability measure satisfying a Markov property. The abstract and powerful theory of Dirichlet forms allows us to work directly and naturally on the infinite-dimensional space, and our notion of Mosco convergence allows Dirichlet forms associated with the RWM chains to lie on changi -</p>该论文提出了一种实际上是基于岭回归的时变参数模型,这比传统的状态空间方法计算更快,调整更容易,有助于研究经济结构性变化。http://arxiv.org/abs/2009.00401<p> -使用岭回归法的时变参数模型 +arXiv:2403.13565v1 Announce Type: cross Abstract: We consider the transfer learning problem in the high dimensional setting, where the feature dimension is larger than the sample size. To learn transferable information, which may vary across features or the source samples, we propose an adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures. We achieve this by employing a novel fused-penalty, coupled with weights that can adapt according to the transferable structure. To choose the weight, we propose a theoretically informed, data-driven procedure, enabling F-AdaTrans to selectively fuse the transferable signals with the target while filtering out non-transferable signals, and S-AdaTrans to obtain the optimal combination of information transferred from each source sample. The non-asymptotic rates are established, which recover existing near-minimax optimal rates in special cases. The effectivene +</p>可解释的机器学习在生存分析中的应用促进了透明度和公平性,揭示了模型的潜在偏见和限制,并提供了更符合数学原理的特征影响和风险因素预测方法。https://arxiv.org/abs/2403.10250<p> +可解释的机器学习用于生存分析 </p> <p> -Time-Varying Parameters as Ridge Regressions. (arXiv:2009.00401v3 [econ.EM] UPDATED) +Interpretable Machine Learning for Survival Analysis </p> <p> -http://arxiv.org/abs/2009.00401 +https://arxiv.org/abs/2403.10250 </p> <p> -该论文提出了一种实际上是基于岭回归的时变参数模型,这比传统的状态空间方法计算更快,调整更容易,有助于研究经济结构性变化。 +可解释的机器学习在生存分析中的应用促进了透明度和公平性,揭示了模型的潜在偏见和限制,并提供了更符合数学原理的特征影响和风险因素预测方法。 </p> <p> </p> <p> -时变参数模型(TVPs)经常被用于经济学中来捕捉结构性变化。我强调了一个被忽视的事实——这些实际上是岭回归。这使得计算、调整和实现比状态空间范式更容易。在高维情况下,解决等价的双重岭问题的计算非常快,关键的“时间变化量”通常是由交叉验证来调整的。使用两步回归岭回归来处理不断变化的波动性。我考虑了基于稀疏性(算法选择哪些参数变化, 哪些不变)和降低秩约束的扩展(变化与因子模型相关联)。为了展示这种方法的有用性, 我使用它来研究加拿大货币政策的演变, 并使用大规模时变局部投影估计约4600个TVPs, 这一任务完全可以利用这种新方法完成。 +随着黑盒机器学习模型的传播和快速进步,可解释的机器学习(IML)领域或可解释的人工智能(XAI)在过去十年中变得越来越重要。 这在生存分析领域尤为重要,其中采用IML技术促进了透明度、问责制和公平性,特别是在临床决策过程、有针对性疗法的开发、干预或其他医学或与医疗保健相关的环境中。 具体来说,可解释性可以揭示生存模型的潜在偏见和局限性,并提供更符合数学原理的方法来理解哪些特征对预测有影响或构成风险因素。 然而,缺乏即时可用的IML方法可能已经阻碍了医学从业者和公共卫生政策制定者充分利用机器学习的潜力。 </p> <p> -Time-varying parameters (TVPs) models are frequently used in economics to capture structural change. I highlight a rather underutilized fact -- that these are actually ridge regressions. Instantly, this makes computations, tuning, and implementation much easier than in the state-space paradigm. Among other things, solving the equivalent dual ridge problem is computationally very fast even in high dimensions, and the crucial "amount of time variation" is tuned by cross-validation. Evolving volatility is dealt with using a two-step ridge regression. I consider extensions that incorporate sparsity (the algorithm selects which parameters vary and which do not) and reduced-rank restrictions (variation is tied to a factor model). To demonstrate the usefulness of the approach, I use it to study the evolution of monetary policy in Canada using large time-varying local projections. The application requires the estimation of about 4600 TVPs, a task well within the reach of the new method. +arXiv:2403.10250v1 Announce Type: cross Abstract: With the spread and rapid advancement of black box machine learning models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability and fairness in sensitive areas, such as clinical decision making processes, the development of targeted therapies, interventions or in other medical or healthcare related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred medical practitioners and policy makers in public health from leveraging the full potential of machine lea +</p>RLHF在考虑部分观察性时可能导致策略欺骗性地夸大性能或过度辩护行为,我们提出了数学条件来解决这些问题,并警告不要盲目应用RLHF在部分可观测情况下。https://arxiv.org/abs/2402.17747<p> +当你的AI欺骗你:在奖励学习中人类评估者部分可观测性的挑战 +</p> +<p> +When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning +</p> +<p> +https://arxiv.org/abs/2402.17747 +</p> +<p> +RLHF在考虑部分观察性时可能导致策略欺骗性地夸大性能或过度辩护行为,我们提出了数学条件来解决这些问题,并警告不要盲目应用RLHF在部分可观测情况下。 +</p> +<p> + +</p> +<p> +强化学习从人类反馈(RLHF)的过去分析假设人类完全观察到环境。当人类反馈仅基于部分观察时会发生什么?我们对两种失败情况进行了正式定义:欺骗和过度辩护。通过将人类建模为对轨迹信念的Boltzmann-理性,我们证明了RLHF保证会导致策略欺骗性地夸大其性能、为了留下印象而过度辩护或者两者兼而有之的条件。为了帮助解决这些问题,我们数学地刻画了环境部分可观测性如何转化为(缺乏)学到的回报函数中的模糊性。在某些情况下,考虑环境部分可观测性使得在理论上可能恢复回报函数和最优策略,而在其他情况下,存在不可减少的模糊性。我们警告不要盲目应用RLHF在部分可观测情况下。 +</p> +<p> +arXiv:2402.17747v1 Announce Type: cross Abstract: Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deception and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is guaranteed to result in policies that deceptively inflate their performance, overjustify their behavior to make an impression, or both. To help address these issues, we mathematically characterize how partial observability of the environment translates into (lack of) ambiguity in the learned return function. In some cases, accounting for partial observability makes it theoretically possible to recover the return function and thus the optimal policy, while in other cases, there is irreducible ambiguity. We caution against blindly applying RLHF in partially observa +</p>了解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。前人的研究表明,在训练过程中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这被称为神经特征分析(NFA)。本研究解释了这种相关性的出现,并发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。在早期训练阶段,可以通过解析的方式预测NFA的发展速度。https://arxiv.org/abs/2402.05271<p> +梯度下降引发了深度非线性网络权重与经验NTK之间的对齐 +</p> +<p> +Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks +</p> +<p> +https://arxiv.org/abs/2402.05271 +</p> +<p> +了解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。前人的研究表明,在训练过程中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这被称为神经特征分析(NFA)。本研究解释了这种相关性的出现,并发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 +</p> +<p> + +</p> +<p> +理解神经网络从输入-标签对中提取统计信息的机制是监督学习中最重要的未解决问题之一。先前的研究已经确定,在一般结构的训练神经网络中,权重的格拉姆矩阵与模型的平均梯度外积成正比,这个说法被称为神经特征分析(NFA)。然而,这些数量在训练过程中如何相关尚不清楚。在这项工作中,我们解释了这种相关性的出现。我们发现NFA等价于权重矩阵的左奇异结构与与这些权重相关的经验神经切线核的显著成分之间的对齐。我们证明了先前研究中引入的NFA是由隔离这种对齐的中心化NFA驱动的。我们还展示了在早期训练阶段,可以通过解析的方式预测NFA的发展速度。 +</p> +<p> +Understanding the mechanisms through which neural networks extract statistics from input-label pairs is one of the most important unsolved problems in supervised learning. Prior works have identified that the gram matrices of the weights in trained neural networks of general architectures are proportional to the average gradient outer product of the model, in a statement known as the Neural Feature Ansatz (NFA). However, the reason these quantities become correlated during training is poorly understood. In this work, we explain the emergence of this correlation. We identify that the NFA is equivalent to alignment between the left singular structure of the weight matrices and a significant component of the empirical neural tangent kernels associated with those weights. We establish that the NFA introduced in prior works is driven by a centered NFA that isolates this alignment. We show that the speed of NFA development can be predicted analytically at early training times in terms of sim +</p>这篇论文提出了一种基于潜变量高斯过程的多源数据融合框架,用于解决多个数据源之间质量和全面性差异给系统优化带来的问题。https://arxiv.org/abs/2402.04146<p> +可解释的多源数据融合通过潜变量高斯过程 +</p> +<p> +Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process +</p> +<p> +https://arxiv.org/abs/2402.04146 +</p> +<p> +这篇论文提出了一种基于潜变量高斯过程的多源数据融合框架,用于解决多个数据源之间质量和全面性差异给系统优化带来的问题。 +</p> +<p> + +</p> +<p> +随着人工智能(AI)和机器学习(ML)的出现,各个科学和工程领域已经利用数据驱动的替代模型来建模来自大量信息源(数据)的复杂系统。这种增加导致了开发出用于执行特定功能的优越系统所需的成本和时间的显著降低。这样的替代模型往往广泛地融合多个数据来源,可能是发表的论文、专利、开放资源库或其他资源。然而,对于已知和未知的信息来源的基础物理参数的质量和全面性的差异,可能对系统优化过程产生后续影响,却没有得到充分的关注。为了解决这个问题,提出了一种基于潜变量高斯过程(LVGP)的多源数据融合框架。 +</p> +<p> +With the advent of artificial intelligence (AI) and machine learning (ML), various domains of science and engineering communites has leveraged data-driven surrogates to model complex systems from numerous sources of information (data). The proliferation has led to significant reduction in cost and time involved in development of superior systems designed to perform specific functionalities. A high proposition of such surrogates are built extensively fusing multiple sources of data, may it be published papers, patents, open repositories, or other resources. However, not much attention has been paid to the differences in quality and comprehensiveness of the known and unknown underlying physical parameters of the information sources that could have downstream implications during system optimization. Towards resolving this issue, a multi-source data fusion framework based on Latent Variable Gaussian Process (LVGP) is proposed. The individual data sources are tagged as a characteristic cate +</p>本研究提出了一种基于贝叶斯决策分析的方法,对于任何贝叶斯回归模型,可以得到每个条件分位数的最佳和可解释的线性估计值和不确定性量化。该方法是一种适用于特定分位数子集选择的有效工具。http://arxiv.org/abs/2311.02043<p> +基于子集选择的贝叶斯分位回归:后验总结视角 +</p> +<p> +Bayesian Quantile Regression with Subset Selection: A Posterior Summarization Perspective. (arXiv:2311.02043v1 [stat.ME]) +</p> +<p> +http://arxiv.org/abs/2311.02043 +</p> +<p> +本研究提出了一种基于贝叶斯决策分析的方法,对于任何贝叶斯回归模型,可以得到每个条件分位数的最佳和可解释的线性估计值和不确定性量化。该方法是一种适用于特定分位数子集选择的有效工具。 +</p> +<p> + +</p> +<p> +分位回归是一种强大的工具,用于推断协变量如何影响响应分布的特定分位数。现有方法要么分别估计每个感兴趣分位数的条件分位数,要么使用半参数或非参数模型估计整个条件分布。前者经常产生不适合实际数据的模型,并且不在分位数之间共享信息,而后者则以复杂且受限制的模型为特点,难以解释和计算效率低下。此外,这两种方法都不适合于特定分位数的子集选择。相反,我们从贝叶斯决策分析的角度出发,提出了线性分位估计、不确定性量化和子集选择的基本问题。对于任何贝叶斯回归模型,我们为每个基于模型的条件分位数推导出最佳和可解释的线性估计值和不确定性量化。我们的方法引入了一种分位数聚焦的方法。 +</p> +<p> +Quantile regression is a powerful tool for inferring how covariates affect specific percentiles of the response distribution. Existing methods either estimate conditional quantiles separately for each quantile of interest or estimate the entire conditional distribution using semi- or non-parametric models. The former often produce inadequate models for real data and do not share information across quantiles, while the latter are characterized by complex and constrained models that can be difficult to interpret and computationally inefficient. Further, neither approach is well-suited for quantile-specific subset selection. Instead, we pose the fundamental problems of linear quantile estimation, uncertainty quantification, and subset selection from a Bayesian decision analysis perspective. For any Bayesian regression model, we derive optimal and interpretable linear estimates and uncertainty quantification for each model-based conditional quantile. Our approach introduces a quantile-focu +</p>提出了一种模型不可知的推断方法,在部分可辨识的因果估计中应用广泛。该方法基于最优输运问题的对偶理论,能够适应随机实验和观测研究,并且具有统一有效和双重鲁棒性。http://arxiv.org/abs/2310.08115<p> +模型不可知的辅助推断方法在部分可辨识因果效应上的应用 +</p> +<p> +Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects. (arXiv:2310.08115v1 [econ.EM]) +</p> +<p> +http://arxiv.org/abs/2310.08115 +</p> +<p> +提出了一种模型不可知的推断方法,在部分可辨识的因果估计中应用广泛。该方法基于最优输运问题的对偶理论,能够适应随机实验和观测研究,并且具有统一有效和双重鲁棒性。 +</p> +<p> + +</p> +<p> +很多因果估计是部分可辨识的,因为它们依赖于潜在结果之间的不可观察联合分布。基于前处理协变量的分层可以获得更明确的部分可辨识性范围;然而,除非协变量为离散且支撑度相对较小,否则这种方法通常需要对给定协变量的潜在结果的条件分布进行一致估计。因此,现有的方法在模型错误或一致性假设被违反时可能失败。在本研究中,我们提出了一种基于最优输运问题的对偶理论的统一且模型不可知的推断方法,适用于广泛类别的部分可辨识估计。在随机实验中,我们的方法可以结合任何对条件分布的估计,并提供统一有效的推断,即使初始估计是任意不准确的。此外,我们的方法在观测研究中也是双重鲁棒的。 +</p> +<p> +Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper partial identification bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires consistent estimation of the conditional distributions of the potential outcomes given the covariates. Thus, existing approaches may fail under model misspecification or if consistency assumptions are violated. In this study, we propose a unified and model-agnostic inferential approach for a wide class of partially identified estimands, based on duality theory for optimal transport problems. In randomized experiments, our approach can wrap around any estimates of the conditional distributions and provide uniformly valid inference, even if the initial estimates are arbitrarily inaccurate. Also, our approach is doubly robust in observational studies. Notab +</p>MaGNet是一种模型无关的图神经网络框架,能够顺序地整合不同顺序的信息,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。http://arxiv.org/abs/2309.13459<p> +模型无关的图神经网络用于整合局部和全局信息的研究 +</p> +<p> +A Model-Agnostic Graph Neural Network for Integrating Local and Global Information. (arXiv:2309.13459v1 [stat.ML]) +</p> +<p> +http://arxiv.org/abs/2309.13459 +</p> +<p> +MaGNet是一种模型无关的图神经网络框架,能够顺序地整合不同顺序的信息,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。 +</p> +<p> + +</p> +<p> +图神经网络(GNNs)在各种以图为重点的任务中取得了令人满意的性能。尽管取得了成功,但现有的GNN存在两个重要限制:由于黑盒特性,结果缺乏可解释性;无法学习不同顺序的表示。为了解决这些问题,我们提出了一种新的模型无关的图神经网络(MaGNet)框架,能够顺序地整合不同顺序的信息,从高阶邻居中提取知识,并通过识别有影响力的紧凑图结构提供有意义且可解释的结果。特别地,MaGNet由两个组件组成:图拓扑下复杂关系的潜在表示的估计模型和识别有影响力的节点、边和重要节点特征的解释模型。从理论上,我们通过经验Rademacher复杂度建立了MaGNet的泛化误差界,并展示了其强大的能力。 +</p> +<p> +Graph Neural Networks (GNNs) have achieved promising performance in a variety of graph-focused tasks. Despite their success, existing GNNs suffer from two significant limitations: a lack of interpretability in results due to their black-box nature, and an inability to learn representations of varying orders. To tackle these issues, we propose a novel Model-agnostic Graph Neural Network (MaGNet) framework, which is able to sequentially integrate information of various orders, extract knowledge from high-order neighbors, and provide meaningful and interpretable results by identifying influential compact graph structures. In particular, MaGNet consists of two components: an estimation model for the latent representation of complex relationships under graph topology, and an interpretation model that identifies influential nodes, edges, and important node features. Theoretically, we establish the generalization error bound for MaGNet via empirical Rademacher complexity, and showcase its pow +</p>本文研究了使用非参数隐马尔可夫模型进行基于模型的聚类时的贝叶斯风险,并提出了相应的聚类方法。通过研究分类的贝叶斯风险和聚类的贝叶斯风险之间的关系,确定了聚类任务的难度。同时,在插值分类器和在线设置中的结果也得到了证明。模拟实验验证了这些发现。http://arxiv.org/abs/2309.12238<p> +使用非参数隐马尔可夫模型的基于模型的聚类 +</p> +<p> +Model-based Clustering using Non-parametric Hidden Markov Models. (arXiv:2309.12238v1 [math.ST]) +</p> +<p> +http://arxiv.org/abs/2309.12238 +</p> +<p> +本文研究了使用非参数隐马尔可夫模型进行基于模型的聚类时的贝叶斯风险,并提出了相应的聚类方法。通过研究分类的贝叶斯风险和聚类的贝叶斯风险之间的关系,确定了聚类任务的难度。同时,在插值分类器和在线设置中的结果也得到了证明。模拟实验验证了这些发现。 +</p> +<p> + +</p> +<p> +非参数隐马尔可夫模型(HMM)由于其依赖结构,可以在不指定群组分布的情况下进行基于模型的聚类。本文研究了在使用HMM进行聚类时的贝叶斯风险,并提出了相应的聚类方法。首先,我们给出了将分类的贝叶斯风险与聚类的贝叶斯风险联系起来的结果,用以确定聚类任务的难度的关键数量。我们还在独立同分布的框架下证明了这一结果,这可能具有独立的兴趣。然后我们研究了插值分类器的过度风险。所有这些结果都被证明在在线设置中仍然有效,在该设置下,观测结果被顺序聚类。模拟实验证明了我们的发现。 +</p> +<p> +Thanks to their dependency structure, non-parametric Hidden Markov Models (HMMs) are able to handle model-based clustering without specifying group distributions. The aim of this work is to study the Bayes risk of clustering when using HMMs and to propose associated clustering procedures. We first give a result linking the Bayes risk of classification and the Bayes risk of clustering, which we use to identify the key quantity determining the difficulty of the clustering task. We also give a proof of this result in the i.i.d. framework, which might be of independent interest. Then we study the excess risk of the plugin classifier. All these results are shown to remain valid in the online setting where observations are clustered sequentially. Simulations illustrate our findings. +</p>本研究探讨了在关键领域中针对鼓励政策的最优和公平评估以及学习的问题,研究发现在人类不遵循治疗建议的情况下,最优策略规则只是建议。同时,针对治疗的异质性和公平考虑因素,决策者的权衡和决策规则也会发生变化。在社会服务领域,研究显示存在一个使用差距问题,那些最有可能受益的人却无法获得这些益服务。http://arxiv.org/abs/2309.07176<p> +最优和公平的鼓励政策评估与学习 +</p> +<p> +Optimal and Fair Encouragement Policy Evaluation and Learning. (arXiv:2309.07176v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2309.07176 +</p> +<p> +本研究探讨了在关键领域中针对鼓励政策的最优和公平评估以及学习的问题,研究发现在人类不遵循治疗建议的情况下,最优策略规则只是建议。同时,针对治疗的异质性和公平考虑因素,决策者的权衡和决策规则也会发生变化。在社会服务领域,研究显示存在一个使用差距问题,那些最有可能受益的人却无法获得这些益服务。 +</p> +<p> + +</p> +<p> +在关键领域中,强制个体接受治疗通常是不可能的,因此在人类不遵循治疗建议的情况下,最优策略规则只是建议。在这些领域中,接受治疗的个体可能存在异质性,治疗效果也可能存在异质性。虽然最优治疗规则可以最大化整个人群的因果结果,但在鼓励的情况下,对于访问平等限制或其他公平考虑因素可能是相关的。例如,在社会服务领域,一个持久的难题是那些最有可能从中受益的人中那些获益服务的使用差距。当决策者对访问和平均结果都有分配偏好时,最优决策规则会发生变化。我们研究了因果识别、统计方差减少估计和稳健估计的最优治疗规则,包括在违反阳性条件的情况下。 +</p> +<p> +In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We c </p> \ No newline at end of file