From 909dbfc41d64f3c476da9c49ea334f7c7056078c Mon Sep 17 00:00:00 2001 From: qhduan Date: Tue, 10 Dec 2024 09:07:17 +0000 Subject: [PATCH] Add changes --- cs.AI.md | 503 +++++++++++++++++++++++++--- cs.AI.xml | 638 ++++++++++++++++++++++++++++++++--- cs.CL.md | 374 ++++++++++++++++++++- cs.CL.xml | 494 ++++++++++++++++++++++++++- cs.IR.md | 29 +- cs.IR.xml | 34 +- cs.LG.md | 648 +++++++++++++++++++++++++++++++----- cs.LG.xml | 808 ++++++++++++++++++++++++++++++++++++++++----- econ.md | 102 ++++-- econ.xml | 122 +++++-- latest_updated.txt | 2 +- q-fin.md | 28 +- q-fin.xml | 28 +- stat.ML.md | 146 ++++++-- stat.ML.xml | 176 ++++++++-- 15 files changed, 3676 insertions(+), 456 deletions(-) diff --git a/cs.AI.md b/cs.AI.md index 58f4539f9..c6066574b 100644 --- a/cs.AI.md +++ b/cs.AI.md @@ -2,112 +2,517 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models](https://rss.arxiv.org/abs/2402.01349) | 对于评估大型语言模型中多选题回答的合理性进行了回顾,发现当前基于多选题回答的基准可能无法充分捕捉大型语言模型的真实能力。 | -| [^2] | [Colour and Brush Stroke Pattern Recognition in Abstract Art using Modified Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/2403.18397) | 本文通过引入改进的深度卷积生成对抗网络(mDCGAN),针对高质量艺术品生成进行了研究,解决了普遍训练问题,有效探索抽象绘画中的颜色和笔触模式。 | -| [^3] | [All-in-One: Heterogeneous Interaction Modeling for Cold-Start Rating Prediction](https://arxiv.org/abs/2403.17740) | 提出了异质交互评分网络(HIRE)框架,通过异质交互模块(HIM)来共同建模异质交互并直接推断重要特征 | -| [^4] | [Masked Attention is All You Need for Graphs](https://arxiv.org/abs/2402.10793) | 提出了一种在图上学习的简单替代方法,称为掩码注意力(MAG),其利用注意力矩阵来创建定制的注意力模式,在长距离任务上表现出色并胜过其他方法。 | -| [^5] | [Graph Inference Acceleration by Learning MLPs on Graphs without Supervision](https://arxiv.org/abs/2402.08918) | 该论文提出了一个简单而有效的框架SimMLP,通过在图上无监督学习MLPs,提高了在延迟敏感的应用中的泛化能力。 | -| [^6] | [PAC Privacy Preserving Diffusion Models](https://arxiv.org/abs/2312.01201) | 提出了一种PAC隐私保护扩散模型,通过将私有分类器指导集成到采样过程中增强隐私保护,并发展了一种新的度量标准来衡量隐私水平,在保护性能方面表现出卓越表现。 | -| [^7] | [A Simple Data Augmentation for Feature Distribution Skewed Federated Learning.](http://arxiv.org/abs/2306.09363) | 本文针对特征分布偏斜的联邦学习提出了FedRDN方法,在输入层级上实现了数据增强,将整个联邦数据集的统计信息注入到本地客户端数据中,以缓解特征漂移问题。 | +| [^1] | [Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models](https://arxiv.org/abs/2404.02657) | 本研究重新思考了大型语言模型知识蒸馏中对Kullback-Leibler散度的应用,发现逆Kullback-Leibler和正向Kullback-Leibler散度在优化目标上相似,为此提出了一种自适应Kullback-Leiber散度方法。 | +| [^2] | [A Methodology for Improving Accuracy of Embedded Spiking Neural Networks through Kernel Size Scaling](https://arxiv.org/abs/2404.01685) | 通过核大小缩放提高嵌入式脉冲神经网络准确性的方法学在实验中表现出更高的准确性。 | +| [^3] | [A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias](https://arxiv.org/abs/2404.00929) | 该论文对多语言大型语言模型进行了全面分析,深入讨论了关键问题,包括多语言语料库、对齐和偏见。 | +| [^4] | [Croissant: A Metadata Format for ML-Ready Datasets](https://arxiv.org/abs/2403.19546) | Croissant是一种面向机器学习数据集的元数据格式,使数据集更易发现、可移植和互操作,有助于解决ML数据管理和负责任AI中的重要挑战。 | +| [^5] | [Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation](https://arxiv.org/abs/2403.19103) | PRISM是一种算法,可以自动识别人类可解释且易传递的提示,从而有效生成所需概念,仅使用黑盒访问T2I模型。 | +| [^6] | [Can ChatGPT predict article retraction based on Twitter mentions?](https://arxiv.org/abs/2403.16851) | 本研究探讨了ChatGPT是否能够基于Twitter提及来预测文章的撤回,研究发现在预测未来被撤回的有问题文章方面是具有一定潜力的。 | +| [^7] | [A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries](https://arxiv.org/abs/2403.05720) | 介绍了一个新的基准测试,评估了用于生成简要住院病程摘要的大语言模型在健康保健领域中的性能并提出相应的自适应策略 | +| [^8] | [Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad](https://arxiv.org/abs/2403.02648) | KATE是一种新的优化算法,提出了一种与AdaGrad标度不变的适应方法,并在广义线性模型和一般的非凸问题中证明了其标度不变性。数值实验结果表明,KATE在各种场景中均优于AdaGrad并与Adam性能匹配/超越。 | +| [^9] | [Large Language Models and Games: A Survey and Roadmap](https://arxiv.org/abs/2402.18659) | 这项研究调查了大型语言模型在游戏领域中的多种应用及其角色,指出了未开发领域和未来发展方向,同时探讨了在游戏领域中大型语言模型的潜力和限制。 | +| [^10] | [ToMBench: Benchmarking Theory of Mind in Large Language Models](https://arxiv.org/abs/2402.15052) | 提出了ToMBench框架,在大型语言模型中进行心灵理论性能评估,发现最先进的模型仍然落后于人类表现超过10%。 | +| [^11] | [RealDex: Towards Human-like Grasping for Robotic Dexterous Hand](https://arxiv.org/abs/2402.13853) | RealDex数据集捕捉了真实的灵巧手抓取动作,利用多模态数据使得训练灵巧手更加自然和精确,同时提出了一种先进的灵巧抓取动作生成框架,有效利用多模态大型语言模型,在类人机器人的自动感知、认知和操纵方面具有巨大潜力。 | +| [^12] | [Query-Based Adversarial Prompt Generation](https://arxiv.org/abs/2402.12329) | 该研究提出了一种基于查询的对抗性攻击方法,通过利用远程语言模型的 API 访问构造对抗性示例,使模型以更高概率发出有害字符串,而非仅仅基于模型之间的转移性攻击。 | +| [^13] | [CHEMREASONER: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback](https://arxiv.org/abs/2402.10980) | 通过将大型语言模型推理与量子化学反馈相结合,我们引入了一个AI引导的计算筛选框架,将催化剂发现形式化为一个不确定环境,从而实现高效催化剂的积极搜索 | +| [^14] | [Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models](https://arxiv.org/abs/2402.09236) | 本研究将因果表示学习和基础模型相结合,研究了如何从数据中学习人类可解释的概念。实验证明了这一统一方法的实用性。 | +| [^15] | [Explainable Adversarial Learning Framework on Physical Layer Secret Keys Combating Malicious Reconfigurable Intelligent Surface](https://arxiv.org/abs/2402.06663) | 本文提出了一个对抗学习框架,用于合法参与方间的物理层密钥生成,在恶意可重构智能面干扰下提供了一个可解释的解决方案。 | +| [^16] | [The role of the metaverse in calibrating an embodied artificial general intelligence](https://arxiv.org/abs/2402.06660) | 本文研究了具有肉身的人工通用智能(AGI)的概念及其与人类意识的关系,强调了元宇宙在促进这一关系中的关键作用。通过结合不同理论框架和技术工具,论文总结出实现具有肉身的AGI的关键要素和发展阶段。 | +| [^17] | [InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write](https://arxiv.org/abs/2402.05804) | InkSight是一个可以将离线手写转换为在线手写的系统,通过结合阅读和书写先验知识,在多样化的照片中有效地Derendering手写文本。 | +| [^18] | [CIC: A framework for Culturally-aware Image Captioning](https://arxiv.org/abs/2402.05374) | CIC是一种面向文化感知图像字幕的框架,通过结合视觉问答和大型语言模型,它能够生成能描述图像中文化元素的详细字幕。 | +| [^19] | [Personalized Language Modeling from Personalized Human Feedback](https://arxiv.org/abs/2402.05133) | 该论文提出了一个个性化语言模型的方法,通过在于用户的反馈数据中引入个性化特征来解决强化学习框架在多样化用户偏好下存在的问题。 | +| [^20] | [TopoX: A Suite of Python Packages for Machine Learning on Topological Domains](https://arxiv.org/abs/2402.02441) | TopoX是一个用于在拓扑域上进行机器学习的Python软件包套件,包含了构建、计算和嵌入拓扑域的功能,并提供了一套全面的高阶消息传递功能工具箱。 | +| [^21] | [GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure](https://arxiv.org/abs/2311.11319) | GeoSAM是一个基于SAM的新框架,使用了来自零样本学习和预训练CNN分割模型的视觉提示,提高了地理图像分割的性能。 | +| [^22] | [ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using ChatGPT.](http://arxiv.org/abs/2401.14279) | ZS4C提出了一种使用ChatGPT进行零射击合成可编译代码的轻量级方法,帮助用户重用或分析不完整的Q&A代码片段,通过识别缺失的导入语句并修复编译错误来实现。 | +| [^23] | [Crowdsourced Adaptive Surveys.](http://arxiv.org/abs/2401.12986) | 众包自适应调查方法(CSAS)结合自然语言处理和自适应算法,能够根据用户输入演变问题库,并在调查中适应新的问题,应用在拉丁裔信息环境和议题重要性领域,能够识别难以通过传统方法跟踪的主张或问题。 | +| [^24] | [xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein.](http://arxiv.org/abs/2401.06199) | xTrimoPGLM是一个统一的100亿规模预训练蛋白质语言模型,能够同时处理蛋白质理解和生成任务,通过创新的预训练框架和大规模的参数训练,显著优于其他先进模型,在18个蛋白理解基准测试中取得了成功,并能够实现对蛋白质结构的原子分辨率观察。 | +| [^25] | [Multi-Agent Quantum Reinforcement Learning using Evolutionary Optimization.](http://arxiv.org/abs/2311.05546) | 本研究提出了三种基于变分量子线路的进化优化多智能体强化学习变体,并在Coin Game环境中证明了这些方法相比于经典方法表现显著更好。 | +| [^26] | [Domain Generalization for Medical Image Analysis: A Survey.](http://arxiv.org/abs/2310.08598) | 本综述详细回顾了针对医学图像分析的领域泛化研究,探讨了在DL模型在真实世界应用中遇到的挑战,以及如何解决分布漂移问题和实现稳健性。同时,考虑了领域泛化技术对整个MedIA工作流程的操作影响。 | +| [^27] | [Split and Merge: Aligning Position Biases in Large Language Model based Evaluators.](http://arxiv.org/abs/2310.01432) | PORTIA是一个旨在校准大型语言模型评估器的位置偏差的对齐系统,通过将答案分割成多个片段,并对其进行对齐,然后将其合并回一个单一的提示,以提高评估的准确性和公正性。 | +| [^28] | [Statistical Tests for Replacing Human Decision Makers with Algorithms.](http://arxiv.org/abs/2306.11689) | 本文提出了一种利用人工智能改善人类决策的统计框架,通过基准测试与机器预测,替换部分人类决策者的决策制定,并经过实验检验得出算法具有更高的真阳性率和更低的假阳性率,尤其是来自农村地区的医生的诊断更容易被替代。 | +| [^29] | [Using LLM-assisted Annotation for Corpus Linguistics: A Case Study of Local Grammar Analysis.](http://arxiv.org/abs/2305.08339) | 本文研究了使用基于大语言模型的聊天机器人自动标注文本的潜力,重点考察了从本地语法角度观察道歉言语行为构成的功能元素的程度,并比较了不同模型在注释任务中的表现,结果表明Bing聊天机器人在任务中表现优于ChatGPT和人类标注员。 | +| [^30] | [Optimal partition of feature using Bayesian classifier.](http://arxiv.org/abs/2304.14537) | 本文通过提出一种名为“共单调独立分类器”(CIBer)的新技术,专注于特征的最优分区,旨在克服朴素贝叶斯方法带来的挑战,并且证明该技术在不同数据集上具有更高的准确率和更低的错误率。 | +| [^31] | [Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments.](http://arxiv.org/abs/2304.09825) | 本研究旨在提高程序生成环境中强化学习的样本效率。研究证明,使用模仿学习进行预训练和同时进行模仿学习和在线强化学习的方法可以提高效率。 | +| [^32] | [Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs.](http://arxiv.org/abs/2303.13763) | 本文提出了一种原型引导知识蒸馏(PGKD)方法,它不需要图形边缘,但可以在不考虑边缘的情况下学习结构感知的MLP。 | +| [^33] | [Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks.](http://arxiv.org/abs/2210.15629) | 本文提出一种利用语言控制扩散模型的分层规划器,有效而高效地扩展扩散模型,解决长时间跨度自然语言指令下的控制问题,实现了较高的单任务和多任务成功率,并极大地提高计算效率。 | +| [^34] | [Implications of Distance over Redistricting Maps: Central and Outlier Maps.](http://arxiv.org/abs/2203.00872) | 本文提出了一种可解释且可操作的选区划分图距离测量方法,并定义了一种“最典型”的中心图。这种方法可以帮助我们深入研究一系列约束条件下选区划分图的应用。 | # 详细 -[^1]: 超越答案:对于评估大型语言模型中多选题回答的合理性的回顾 +[^1]: 在大型语言模型知识蒸馏中重新思考Kullback-Leibler散度 - Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models + Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models - [https://rss.arxiv.org/abs/2402.01349](https://rss.arxiv.org/abs/2402.01349) + [https://arxiv.org/abs/2404.02657](https://arxiv.org/abs/2404.02657) - 对于评估大型语言模型中多选题回答的合理性进行了回顾,发现当前基于多选题回答的基准可能无法充分捕捉大型语言模型的真实能力。 + 本研究重新思考了大型语言模型知识蒸馏中对Kullback-Leibler散度的应用,发现逆Kullback-Leibler和正向Kullback-Leibler散度在优化目标上相似,为此提出了一种自适应Kullback-Leiber散度方法。 - 在自然语言处理领域,大型语言模型(LLMs)引发了一场范式转变,显著提升了自然语言生成任务的性能。尽管取得了这些进展,对LLMs的全面评估仍然是社区面临的必然挑战。最近,将多选题回答(MCQA)作为LLMs的基准已经引起了广泛关注。本研究调查了MCQA作为LLMs评估方法的合理性。如果LLMs真正理解问题的语义,它们的性能应该在从相同问题派生的各种配置上表现一致。然而,我们的实证结果表明LLMs的响应一致性存在显著差异,我们将之定义为LLMs的响应可变性综合征(REVAS),这表明目前基于MCQA的基准可能无法充分捕捉LLMs的真实能力,强调了对更合适的评估方法的需要。 + Kullback-Leibler散度在知识蒸馏中被广泛应用于压缩大型语言模型。本研究从经验和理论上证明了,在LLMs的知识蒸馏中,与之前断言的逆Kullback-Leibler(RKL)散度寻找模式并因此优于寻找平均值的正向Kullback-Leibler(FKL)散度相反,实际上在知识蒸馏中都没有体现出寻找模式或寻找平均值的特性。相反,发现RKL和FKL具有相同的优化目标,并在足够数量的时代之后都会收敛。然而,由于实际约束,LLMs很少被训练如此多的时代。同时,我们进一步发现,RKL在分布的尾部,而FKL在开始时代侧重于分布的头部。因此,我们提出了一种简单而有效的自适应Kullback-Leiber(AKL)散度方法,该方法自适应地分配权重来组合F - In the field of natural language processing (NLP), Large Language Models (LLMs) have precipitated a paradigm shift, markedly enhancing performance in natural language generation tasks. Despite these advancements, the comprehensive evaluation of LLMs remains an inevitable challenge for the community. Recently, the utilization of Multiple Choice Question Answering (MCQA) as a benchmark for LLMs has gained considerable traction. This study investigates the rationality of MCQA as an evaluation method for LLMs. If LLMs genuinely understand the semantics of questions, their performance should exhibit consistency across the varied configurations derived from the same questions. Contrary to this expectation, our empirical findings suggest a notable disparity in the consistency of LLM responses, which we define as REsponse VAriability Syndrome (REVAS) of the LLMs, indicating that current MCQA-based benchmarks may not adequately capture the true capabilities of LLMs, which underscores the need f + arXiv:2404.02657v1 Announce Type: cross Abstract: Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler (RKL) divergence is mode-seeking and thus preferable over the mean-seeking forward Kullback-Leibler (FKL) divergence, this study empirically and theoretically demonstrates that neither mode-seeking nor mean-seeking properties manifest in KD for LLMs. Instead, RKL and FKL are found to share the same optimization objective and both converge after a sufficient number of epochs. However, due to practical constraints, LLMs are seldom trained for such an extensive number of epochs. Meanwhile, we further find that RKL focuses on the tail part of the distributions, while FKL focuses on the head part at the beginning epochs. Consequently, we propose a simple yet effective Adaptive Kullback-Leiber (AKL) divergence method, which adaptively allocates weights to combine F -[^2]: 使用改进的深度卷积生成对抗网络在抽象艺术中进行颜色和笔触模式识别 +[^2]: 通过核大小缩放提高嵌入式脉冲神经网络准确性的方法学 - Colour and Brush Stroke Pattern Recognition in Abstract Art using Modified Deep Convolutional Generative Adversarial Networks + A Methodology for Improving Accuracy of Embedded Spiking Neural Networks through Kernel Size Scaling - [https://arxiv.org/abs/2403.18397](https://arxiv.org/abs/2403.18397) + [https://arxiv.org/abs/2404.01685](https://arxiv.org/abs/2404.01685) - 本文通过引入改进的深度卷积生成对抗网络(mDCGAN),针对高质量艺术品生成进行了研究,解决了普遍训练问题,有效探索抽象绘画中的颜色和笔触模式。 + 通过核大小缩放提高嵌入式脉冲神经网络准确性的方法学在实验中表现出更高的准确性。 - 抽象艺术是一种广受欢迎、被广泛讨论的艺术形式,通常能够描绘出艺术家的情感。许多研究人员尝试使用机器学习和深度学习的边缘检测、笔触和情感识别算法来研究抽象艺术。本文描述了使用生成对抗神经网络(GAN)对广泛分布的抽象绘画进行研究。 GAN具有学习和再现分布的能力,使研究人员能够有效地探索和研究生成的图像空间。然而,挑战在于开发一种能够克服常见训练问题的高效GAN架构。本文通过引入专门设计用于高质量艺术品生成的改进DCGAN(mDCGAN)来解决这一挑战。该方法涉及对所做修改的深入探讨,深入研究DCGAN的复杂工作。 + 脉冲神经网络(SNNs)由于其稀疏的基于脉冲的操作而能为基于机器学习的应用提供超低功耗/能耗。目前,大多数SNN架构需要更大的模型大小才能实现更高的准确性,这对资源受限的嵌入式应用不太适合。因此,迫切需要开发能够以可接受的内存占用实现高准确性的SNNs。为此,我们提出了一种通过核大小缩放提高SNNs准确性的新方法学。其关键步骤包括调查不同核大小对准确性的影响,设计新的核大小集合,基于选定的核大小生成SNN架构,并分析SNN模型选择的准确性-内存折衷。实验结果表明,我们的方法学在准确性方面优于最先进的方法(对于CIFAR10有93.24%的准确度) - arXiv:2403.18397v1 Announce Type: cross Abstract: Abstract Art is an immensely popular, discussed form of art that often has the ability to depict the emotions of an artist. Many researchers have made attempts to study abstract art in the form of edge detection, brush stroke and emotion recognition algorithms using machine and deep learning. This papers describes the study of a wide distribution of abstract paintings using Generative Adversarial Neural Networks(GAN). GANs have the ability to learn and reproduce a distribution enabling researchers and scientists to effectively explore and study the generated image space. However, the challenge lies in developing an efficient GAN architecture that overcomes common training pitfalls. This paper addresses this challenge by introducing a modified-DCGAN (mDCGAN) specifically designed for high-quality artwork generation. The approach involves a thorough exploration of the modifications made, delving into the intricate workings of DCGANs, opt + arXiv:2404.01685v1 Announce Type: cross Abstract: Spiking Neural Networks (SNNs) can offer ultra low power/ energy consumption for machine learning-based applications due to their sparse spike-based operations. Currently, most of the SNN architectures need a significantly larger model size to achieve higher accuracy, which is not suitable for resource-constrained embedded applications. Therefore, developing SNNs that can achieve high accuracy with acceptable memory footprint is highly needed. Toward this, we propose a novel methodology that improves the accuracy of SNNs through kernel size scaling. Its key steps include investigating the impact of different kernel sizes on the accuracy, devising new sets of kernel sizes, generating SNN architectures based on the selected kernel sizes, and analyzing the accuracy-memory trade-offs for SNN model selection. The experimental results show that our methodology achieves higher accuracy than state-of-the-art (93.24% accuracy for CIFAR10 and 70 -[^3]: 一体化:异质交互建模用于冷启动评分预测 +[^3]: 多语言大型语言模型:语料库、对齐和偏见综述 - All-in-One: Heterogeneous Interaction Modeling for Cold-Start Rating Prediction + A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias - [https://arxiv.org/abs/2403.17740](https://arxiv.org/abs/2403.17740) + [https://arxiv.org/abs/2404.00929](https://arxiv.org/abs/2404.00929) - 提出了异质交互评分网络(HIRE)框架,通过异质交互模块(HIM)来共同建模异质交互并直接推断重要特征 + 该论文对多语言大型语言模型进行了全面分析,深入讨论了关键问题,包括多语言语料库、对齐和偏见。 - 冷启动评分预测是推荐系统中一个基本问题,已得到广泛研究。许多方法已经被提出,利用现有数据之间的显式关系,例如协同过滤、社交推荐和异构信息网络,以缓解冷启动用户和物品的数据不足问题。然而,基于不同角色之间的数据构建的显式关系可能不可靠且无关,从而限制了特定推荐任务的性能上限。受此启发,本文提出了一个灵活的框架,名为异质交互评分网络(HIRE)。HIRE不仅仅依赖于预先定义的交互模式或手动构建的异构信息网络。相反,我们设计了一个异质交互模块(HIM),来共同建模异质交互并直接推断重要特征。 + 基于大型语言模型(LLMs)的基础上,发展了多语言大型语言模型(MLLMs)来解决多语言自然语言处理任务的挑战,希望实现从高资源到低资源语言的知识转移。然而,仍然存在重要限制和挑战,比如语言不平衡、多语言对齐和固有偏见。本文旨在对MLLMs进行全面分析,深入讨论围绕这些关键问题的议题。 - arXiv:2403.17740v1 Announce Type: cross Abstract: Cold-start rating prediction is a fundamental problem in recommender systems that has been extensively studied. Many methods have been proposed that exploit explicit relations among existing data, such as collaborative filtering, social recommendations and heterogeneous information network, to alleviate the data insufficiency issue for cold-start users and items. However, the explicit relations constructed based on data between different roles may be unreliable and irrelevant, which limits the performance ceiling of the specific recommendation task. Motivated by this, in this paper, we propose a flexible framework dubbed heterogeneous interaction rating network (HIRE). HIRE dose not solely rely on the pre-defined interaction pattern or the manually constructed heterogeneous information network. Instead, we devise a Heterogeneous Interaction Module (HIM) to jointly model the heterogeneous interactions and directly infer the important in + arXiv:2404.00929v1 Announce Type: cross Abstract: Based on the foundation of Large Language Models (LLMs), Multilingual Large Language Models (MLLMs) have been developed to address the challenges of multilingual natural language processing tasks, hoping to achieve knowledge transfer from high-resource to low-resource languages. However, significant limitations and challenges still exist, such as language imbalance, multilingual alignment, and inherent bias. In this paper, we aim to provide a comprehensive analysis of MLLMs, delving deeply into discussions surrounding these critical issues. First of all, we start by presenting an overview of MLLMs, covering their evolution, key techniques, and multilingual capacities. Secondly, we explore widely utilized multilingual corpora for MLLMs' training and multilingual datasets oriented for downstream tasks that are crucial for enhancing the cross-lingual capability of MLLMs. Thirdly, we survey the existing studies on multilingual representati -[^4]: 掩码注意力是图的关键 +[^4]: Croissant:一种面向机器学习数据集的元数据格式 - Masked Attention is All You Need for Graphs + Croissant: A Metadata Format for ML-Ready Datasets - [https://arxiv.org/abs/2402.10793](https://arxiv.org/abs/2402.10793) + [https://arxiv.org/abs/2403.19546](https://arxiv.org/abs/2403.19546) - 提出了一种在图上学习的简单替代方法,称为掩码注意力(MAG),其利用注意力矩阵来创建定制的注意力模式,在长距离任务上表现出色并胜过其他方法。 + Croissant是一种面向机器学习数据集的元数据格式,使数据集更易发现、可移植和互操作,有助于解决ML数据管理和负责任AI中的重要挑战。 - 图神经网络(GNNs)和消息传递算法的变种主要用于在图上学习,这在很大程度上归功于它们的灵活性、速度和令人满意的性能。然而,设计强大而通用的GNNs需要大量的研究工作,通常依赖于精心选择的手工制作的消息传递操作符。受此启发,我们提出了一种在图上学习的非常简单的替代方法,它完全依赖于注意力。图被表示为节点或边集,并通过掩码注意权重矩阵来强制它们的连接,有效地为每个图创建定制的注意力模式。尽管其简单性,用于图的掩码注意力(MAG)在长距离任务上表现出色,并在55多个节点和图级任务上优于强消息传递基线和更复杂的基于注意力的方法。 + 数据是机器学习(ML)的关键资源,但处理数据仍然是一个主要的摩擦点。本文介绍了Croissant,一种用于数据集的元数据格式,简化了数据被ML工具和框架使用的方式。Croissant使数据集更易发现、可移植和互操作,从而解决了ML数据管理和负责任AI中的重要挑战。Croissant已得到几个流行数据集库的支持,涵盖数十万个数据集,可以加载到最流行的ML框架中。 - arXiv:2402.10793v1 Announce Type: cross Abstract: Graph neural networks (GNNs) and variations of the message passing algorithm are the predominant means for learning on graphs, largely due to their flexibility, speed, and satisfactory performance. The design of powerful and general purpose GNNs, however, requires significant research efforts and often relies on handcrafted, carefully-chosen message passing operators. Motivated by this, we propose a remarkably simple alternative for learning on graphs that relies exclusively on attention. Graphs are represented as node or edge sets and their connectivity is enforced by masking the attention weight matrix, effectively creating custom attention patterns for each graph. Despite its simplicity, masked attention for graphs (MAG) has state-of-the-art performance on long-range tasks and outperforms strong message passing baselines and much more involved attention-based methods on over 55 node and graph-level tasks. We also show significantly + arXiv:2403.19546v1 Announce Type: cross Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks. -[^5]: 通过无监督在图上学习多层感知机(MLP)加速图推理 +[^5]: 用于个性化文本到图像生成的自动化黑盒提示工程 - Graph Inference Acceleration by Learning MLPs on Graphs without Supervision + Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation - [https://arxiv.org/abs/2402.08918](https://arxiv.org/abs/2402.08918) + [https://arxiv.org/abs/2403.19103](https://arxiv.org/abs/2403.19103) - 该论文提出了一个简单而有效的框架SimMLP,通过在图上无监督学习MLPs,提高了在延迟敏感的应用中的泛化能力。 + PRISM是一种算法,可以自动识别人类可解释且易传递的提示,从而有效生成所需概念,仅使用黑盒访问T2I模型。 - 图神经网络(GNNs)已经在各种图学习任务中展示出了有效性,但是它们对消息传递的依赖限制了它们在延迟敏感的应用中的部署,比如金融欺诈检测。最近的研究探索了从GNNs中提取知识到多层感知机(MLPs)来加速推理。然而,这种任务特定的有监督蒸馏限制了对未见节点的泛化,而在延迟敏感的应用中这种情况很常见。为此,我们提出了一种简单而有效的框架SimMLP,用于在图上无监督学习MLPs,以增强泛化能力。SimMLP利用自监督对齐GNNs和MLPs之间的节点特征和图结构之间的精细和泛化的相关性,并提出了两种策略来减轻平凡解的风险。从理论上讲, + 提示工程对于控制文本到图像(T2I)生成模型的输出是有效的,但由于需要手动制作提示而导致工作繁重。这一挑战促使了自动提示生成算法的发展。然而,这些方法通常在T2I模型之间的可传递性方面遇到困难,需要对基础模型进行白盒访问,并产生非直观的提示。在这项工作中,我们介绍了PRISM,这是一种算法,可以仅使用黑盒访问T2I模型就自动识别人类可解释且易传递的提示,从而有效生成所需概念。受大型语言模型(LLM)越狱的启发,PRISM利用LLM的上下文学习能力来迭代地改进给定参考图像的候选提示分布。我们的实验展示了PRISM在为对象、样式等生成准确提示方面的多样性和有效性。 - arXiv:2402.08918v1 Announce Type: cross Abstract: Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph learning tasks, yet their reliance on message-passing constraints their deployment in latency-sensitive applications such as financial fraud detection. Recent works have explored distilling knowledge from GNNs to Multi-Layer Perceptrons (MLPs) to accelerate inference. However, this task-specific supervised distillation limits generalization to unseen nodes, which are prevalent in latency-sensitive applications. To this end, we present \textbf{\textsc{SimMLP}}, a \textbf{\textsc{Sim}}ple yet effective framework for learning \textbf{\textsc{MLP}}s on graphs without supervision, to enhance generalization. \textsc{SimMLP} employs self-supervised alignment between GNNs and MLPs to capture the fine-grained and generalizable correlation between node features and graph structures, and proposes two strategies to alleviate the risk of trivial solutions. Theoretically, w + arXiv:2403.19103v1 Announce Type: cross Abstract: Prompt engineering is effective for controlling the output of text-to-image (T2I) generative models, but it is also laborious due to the need for manually crafted prompts. This challenge has spurred the development of algorithms for automated prompt generation. However, these methods often struggle with transferability across T2I models, require white-box access to the underlying model, and produce non-intuitive prompts. In this work, we introduce PRISM, an algorithm that automatically identifies human-interpretable and transferable prompts that can effectively generate desired concepts given only black-box access to T2I models. Inspired by large language model (LLM) jailbreaking, PRISM leverages the in-context learning ability of LLMs to iteratively refine the candidate prompts distribution for given reference images. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, sty -[^6]: PAC隐私保护扩散模型 +[^6]: ChatGPT是否能够基于Twitter提及来预测文章的撤回? - PAC Privacy Preserving Diffusion Models + Can ChatGPT predict article retraction based on Twitter mentions? - [https://arxiv.org/abs/2312.01201](https://arxiv.org/abs/2312.01201) + [https://arxiv.org/abs/2403.16851](https://arxiv.org/abs/2403.16851) - 提出了一种PAC隐私保护扩散模型,通过将私有分类器指导集成到采样过程中增强隐私保护,并发展了一种新的度量标准来衡量隐私水平,在保护性能方面表现出卓越表现。 + 本研究探讨了ChatGPT是否能够基于Twitter提及来预测文章的撤回,研究发现在预测未来被撤回的有问题文章方面是具有一定潜力的。 - 数据隐私保护正在引起研究人员的越来越多的关注。扩散模型(DMs),尤其是具有严格的差分隐私,有可能生成既具有高隐私性又具有良好视觉质量的图像。然而,挑战在于确保在私有化特定数据属性时的强大保护,当前模型在这些方面经常存在不足。为了解决这些挑战,我们引入了PAC隐私保护扩散模型,这是一种利用扩散原理并确保“可能大致正确(PAC)”隐私性的模型。我们通过将私有分类器指导集成到Langevin采样过程中来增强隐私保护。此外,认识到在衡量模型隐私性方面存在差距,我们开发了一种新的度量标准来衡量隐私水平。我们的模型通过这个新度量标准评估,并通过高斯矩阵计算支持PAC界限,表现出更优异的隐私性能。 + 检测有问题的研究文章具有重要意义,本研究探讨了根据被撤回文章在Twitter上的提及是否能够在文章被撤回前发出信号,从而在预测未来被撤回的有问题文章方面发挥作用。分析了包括3,505篇已撤回文章及其相关Twitter提及在内的数据集,以及使用粗糙精确匹配方法获取的具有类似特征的3,505篇未撤回文章。通过四种预测方法评估了Twitter提及在预测文章撤回方面的有效性,包括手动标注、关键词识别、机器学习模型和ChatGPT。手动标注的结果表明,的确有被撤回的文章,其Twitter提及包含在撤回前发出信号的可识别证据,尽管它们只占所有被撤回文章的一小部分。 - arXiv:2312.01201v2 Announce Type: replace-cross Abstract: Data privacy protection is garnering increased attention among researchers. Diffusion models (DMs), particularly with strict differential privacy, can potentially produce images with both high privacy and visual quality. However, challenges arise such as in ensuring robust protection in privatizing specific data attributes, areas where current models often fall short. To address these challenges, we introduce the PAC Privacy Preserving Diffusion Model, a model leverages diffusion principles and ensure Probably Approximately Correct (PAC) privacy. We enhance privacy protection by integrating a private classifier guidance into the Langevin Sampling Process. Additionally, recognizing the gap in measuring the privacy of models, we have developed a novel metric to gauge privacy levels. Our model, assessed with this new metric and supported by Gaussian matrix computations for the PAC bound, has shown superior performance in privacy p + arXiv:2403.16851v1 Announce Type: cross Abstract: Detecting problematic research articles timely is a vital task. This study explores whether Twitter mentions of retracted articles can signal potential problems with the articles prior to retraction, thereby playing a role in predicting future retraction of problematic articles. A dataset comprising 3,505 retracted articles and their associated Twitter mentions is analyzed, alongside 3,505 non-retracted articles with similar characteristics obtained using the Coarsened Exact Matching method. The effectiveness of Twitter mentions in predicting article retraction is evaluated by four prediction methods, including manual labelling, keyword identification, machine learning models, and ChatGPT. Manual labelling results indicate that there are indeed retracted articles with their Twitter mentions containing recognizable evidence signaling problems before retraction, although they represent only a limited share of all retracted articles with -[^7]: 一种简单的面向特征分布偏斜联邦学习的数据增强方法 +[^7]: 用于生成简要住院病程摘要的领域自适应大语言模型的基准测试 - A Simple Data Augmentation for Feature Distribution Skewed Federated Learning. (arXiv:2306.09363v1 [cs.LG]) + A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries - [http://arxiv.org/abs/2306.09363](http://arxiv.org/abs/2306.09363) + [https://arxiv.org/abs/2403.05720](https://arxiv.org/abs/2403.05720) - 本文针对特征分布偏斜的联邦学习提出了FedRDN方法,在输入层级上实现了数据增强,将整个联邦数据集的统计信息注入到本地客户端数据中,以缓解特征漂移问题。 + 介绍了一个新的基准测试,评估了用于生成简要住院病程摘要的大语言模型在健康保健领域中的性能并提出相应的自适应策略 - 联邦学习(FL)是一种分布式协作学习方法,可以确保隐私保护。然而,由于数据异构性(即非独立同分布数据),它的性能必然受到影响。本文针对特征分布偏斜的FL场景展开研究,提出了一种通用的数据增强方法,以减轻由本地数据集之间潜在分布不同导致的特征漂移问题。 + 简要住院病程(BHC)摘要是通过总结临床记录而生成的常见临床文件。虽然大型语言模型(LLMs)在自动化实际任务方面展现出显著能力,但它们在医疗应用(如BHC合成)中的能力尚未得到展示。为了使LLMs能够适应BHC合成,我们引入了一个新颖的基准测试,其中包含从MIMIC-IV记录中提取的经过预处理的数据集,封装了临床记录和简要住院病程(BHC)对。我们评估了两个通用LLMs和三个医疗领域适应的LLMs的性能,以改进从临床记录生成BHC。我们使用临床记录作为输入来生成BHC,采用基于提示的(使用上下文学习)和基于微调的自适应策略来应用于三个开源LLMs(Clinical-T5-Large,Llama2-13B,FLAN-UL2)和两个专有LLMs(GPT-3.5,GPT-4)。我们定量评估了性能。 - Federated learning (FL) facilitates collaborative learning among multiple clients in a distributed manner, while ensuring privacy protection. However, its performance is inevitably degraded as suffering data heterogeneity, i.e., non-IID data. In this paper, we focus on the feature distribution skewed FL scenario, which is widespread in real-world applications. The main challenge lies in the feature shift caused by the different underlying distributions of local datasets. While the previous attempts achieved progress, few studies pay attention to the data itself, the root of this issue. Therefore, the primary goal of this paper is to develop a general data augmentation technique at the input level, to mitigate the feature shift. To achieve this goal, we propose FedRDN, a simple yet remarkably effective data augmentation method for feature distribution skewed FL, which randomly injects the statistics of the dataset from the entire federation into the client's data. By this, our method ca + arXiv:2403.05720v1 Announce Type: cross Abstract: Brief hospital course (BHC) summaries are common clinical documents generated by summarizing clinical notes. While large language models (LLMs) depict remarkable capabilities in automating real-world tasks, their capabilities for healthcare applications such as BHC synthesis have not been shown. To enable the adaptation of LLMs for BHC synthesis, we introduce a novel benchmark consisting of a pre-processed dataset extracted from MIMIC-IV notes, encapsulating clinical note, and brief hospital course (BHC) pairs. We assess the performance of two general-purpose LLMs and three healthcare-adapted LLMs to improve BHC synthesis from clinical notes. Using clinical notes as input for generating BHCs, we apply prompting-based (using in-context learning) and fine-tuning-based adaptation strategies to three open-source LLMs (Clinical-T5-Large, Llama2-13B, FLAN-UL2) and two proprietary LLMs (GPT-3.5, GPT-4). We quantitatively evaluate the performa + +[^8]: 移除平方根:一种新的高效标度不变版本的AdaGrad + + Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad + + [https://arxiv.org/abs/2403.02648](https://arxiv.org/abs/2403.02648) + + KATE是一种新的优化算法,提出了一种与AdaGrad标度不变的适应方法,并在广义线性模型和一般的非凸问题中证明了其标度不变性。数值实验结果表明,KATE在各种场景中均优于AdaGrad并与Adam性能匹配/超越。 + + + + 自适应方法在机器学习中非常流行,因为它们可以降低学习速率调整的成本。本文引入了一种名为KATE的新型优化算法,它提出了一个著名的AdaGrad算法的标度不变适应。我们证明了KATE在广义线性模型案例中的标度不变性。此外,对于一般的光滑非凸问题,我们为KATE建立了一个收敛速率为$O \left(\frac{\log T}{\sqrt{T}} \right)$,与AdaGrad和Adam的最佳收敛速率相匹配。我们还通过不同问题的数值实验将KATE与其他最先进的自适应算法Adam和AdaGrad进行了比较,包括在真实数据上进行图像分类和文本分类等复杂机器学习任务。结果表明,在所有考虑到的场景中,KATE始终胜过AdaGrad,并且在性能上匹配/超越Adam。 + + arXiv:2403.02648v1 Announce Type: cross Abstract: Adaptive methods are extremely popular in machine learning as they make learning rate tuning less expensive. This paper introduces a novel optimization algorithm named KATE, which presents a scale-invariant adaptation of the well-known AdaGrad algorithm. We prove the scale-invariance of KATE for the case of Generalized Linear Models. Moreover, for general smooth non-convex problems, we establish a convergence rate of $O \left(\frac{\log T}{\sqrt{T}} \right)$ for KATE, matching the best-known ones for AdaGrad and Adam. We also compare KATE to other state-of-the-art adaptive algorithms Adam and AdaGrad in numerical experiments with different problems, including complex machine learning tasks like image classification and text classification on real data. The results indicate that KATE consistently outperforms AdaGrad and matches/surpasses the performance of Adam in all considered scenarios. + +[^9]: 大型语言模型与游戏:调研与路线图 + + Large Language Models and Games: A Survey and Roadmap + + [https://arxiv.org/abs/2402.18659](https://arxiv.org/abs/2402.18659) + + 这项研究调查了大型语言模型在游戏领域中的多种应用及其角色,指出了未开发领域和未来发展方向,同时探讨了在游戏领域中大型语言模型的潜力和限制。 + + + + 近年来,大型语言模型(LLMs)的研究急剧增加,并伴随着公众对该主题的参与。尽管起初是自然语言处理中的一小部分,LLMs在广泛的应用和领域中展现出显著潜力,包括游戏。本文调查了LLMs在游戏中及为游戏提供支持的各种应用的最新技术水平,并明确了LLMs在游戏中可以扮演的不同角色。重要的是,我们讨论了尚未开发的领域和LLMs在游戏中未来应用的有前途的方向,以及在游戏领域中LLMs的潜力和限制。作为LLMs和游戏交叉领域的第一份综合调查和路线图,我们希望本文能够成为这一激动人心的新领域的开创性研究和创新的基础。 + + arXiv:2402.18659v1 Announce Type: cross Abstract: Recent years have seen an explosive increase in research on large language models (LLMs), and accompanying public engagement on the topic. While starting as a niche area within natural language processing, LLMs have shown remarkable potential across a broad range of applications and domains, including games. This paper surveys the current state of the art across the various applications of LLMs in and for games, and identifies the different roles LLMs can take within a game. Importantly, we discuss underexplored areas and promising directions for future uses of LLMs in games and we reconcile the potential and limitations of LLMs within the games domain. As the first comprehensive survey and roadmap at the intersection of LLMs and games, we are hopeful that this paper will serve as the basis for groundbreaking research and innovation in this exciting new field. + +[^10]: 在大型语言模型中基准测试心灵理论 + + ToMBench: Benchmarking Theory of Mind in Large Language Models + + [https://arxiv.org/abs/2402.15052](https://arxiv.org/abs/2402.15052) + + 提出了ToMBench框架,在大型语言模型中进行心灵理论性能评估,发现最先进的模型仍然落后于人类表现超过10%。 + + + + 心灵理论(ToM)是指感知和归因自己以及他人的心理状态的认知能力。最近的研究引发了关于大型语言模型(LLMs)是否表现出一种形式的心灵理论的争论。然而,现有的心灵理论评估受到诸如受限范围、主观判断和意外污染等挑战的制约,导致评估不足。为了填补这一空白,我们引入了ToMBench,具有三个关键特征:系统评估框架涵盖社会认知中的8项任务和31项能力,多项选择题格式以支持自动化和无偏见的评估,以及基于双语清单的从头构建,严格避免数据泄漏。基于ToMBench,我们进行了大量实验,评估了10个流行LLMs在任务和能力方面的心灵理论表现。我们发现,即使像GPT-4这样的最先进的LLMs也比人类表现落后超过10个百分点。 + + arXiv:2402.15052v1 Announce Type: cross Abstract: Theory of Mind (ToM) is the cognitive capability to perceive and ascribe mental states to oneself and others. Recent research has sparked a debate over whether large language models (LLMs) exhibit a form of ToM. However, existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination, yielding inadequate assessments. To address this gap, we introduce ToMBench with three key characteristics: a systematic evaluation framework encompassing 8 tasks and 31 abilities in social cognition, a multiple-choice question format to support automated and unbiased evaluation, and a build-from-scratch bilingual inventory to strictly avoid data leakage. Based on ToMBench, we conduct extensive experiments to evaluate the ToM performance of 10 popular LLMs across tasks and abilities. We find that even the most advanced LLMs like GPT-4 lag behind human performance by over 10% points, indicati + +[^11]: RealDex: 实现机器人灵巧手类人式抓取 + + RealDex: Towards Human-like Grasping for Robotic Dexterous Hand + + [https://arxiv.org/abs/2402.13853](https://arxiv.org/abs/2402.13853) + + RealDex数据集捕捉了真实的灵巧手抓取动作,利用多模态数据使得训练灵巧手更加自然和精确,同时提出了一种先进的灵巧抓取动作生成框架,有效利用多模态大型语言模型,在类人机器人的自动感知、认知和操纵方面具有巨大潜力。 + + + + 在本文中,我们介绍了RealDex,一个开创性的数据集,捕捉了融入了人类行为模式的真实灵巧手抓取动作,同时通过多视角和多模态视觉数据进行了丰富。利用远程操作系统,我们可以实时无缝同步人-机器人手姿势。这些类人动作的集合对于训练灵巧手更自然、更精确地模仿人类动作至关重要。RealDex在推动类人机器人在真实场景中自动感知、认知和操纵方面具有巨大潜力。此外,我们介绍了一种前沿的灵巧抓取动作生成框架,该框架符合人类经验,并通过有效利用多模态大型语言模型增强了在现实世界中的适用性。广泛的实验证明了我们的方法在RealDex和其他开放数据集上的优越性能。完整的数据集和代码将会公开发布。 + + arXiv:2402.13853v1 Announce Type: cross Abstract: In this paper, we introduce RealDex, a pioneering dataset capturing authentic dexterous hand grasping motions infused with human behavioral patterns, enriched by multi-view and multimodal visual data. Utilizing a teleoperation system, we seamlessly synchronize human-robot hand poses in real time. This collection of human-like motions is crucial for training dexterous hands to mimic human movements more naturally and precisely. RealDex holds immense promise in advancing humanoid robot for automated perception, cognition, and manipulation in real-world scenarios. Moreover, we introduce a cutting-edge dexterous grasping motion generation framework, which aligns with human experience and enhances real-world applicability through effectively utilizing Multimodal Large Language Models. Extensive experiments have demonstrated the superior performance of our method on RealDex and other open datasets. The complete dataset and code will be made + +[^12]: 基于查询的对抗性提示生成 + + Query-Based Adversarial Prompt Generation + + [https://arxiv.org/abs/2402.12329](https://arxiv.org/abs/2402.12329) + + 该研究提出了一种基于查询的对抗性攻击方法,通过利用远程语言模型的 API 访问构造对抗性示例,使模型以更高概率发出有害字符串,而非仅仅基于模型之间的转移性攻击。 + + + + 最近的研究表明,可以构造对抗性示例,导致一个对其进行了调整的语言模型产生有害字符串或执行有害行为。现有的攻击要么在白盒设置中(完全访问模型权重),要么通过可转移性:一种现象,即在一个模型上精心设计的对抗性示例通常在其他模型上仍然有效。我们通过基于查询的攻击改进以前的工作,利用 API 访问远程语言模型来构造对抗性示例,使模型以(明显)更高的概率发出有害字符串,而不能仅仅使用转移攻击。我们在 GPT-3.5 和 OpenAI 的安全分类器上验证了我们的攻击;我们能够让 GPT-3.5 发出有害字符串,而目前的转移攻击失败了,并且我们几乎以 100% 的概率规避了安全分类器。 + + arXiv:2402.12329v1 Announce Type: cross Abstract: Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior. Existing attacks work either in the white-box setting (with full access to the model weights), or through transferability: the phenomenon that adversarial examples crafted on one model often remain effective on other models. We improve on prior work with a query-based attack that leverages API access to a remote language model to construct adversarial examples that cause the model to emit harmful strings with (much) higher probability than with transfer-only attacks. We validate our attack on GPT-3.5 and OpenAI's safety classifier; we can cause GPT-3.5 to emit harmful strings that current transfer attacks fail at, and we can evade the safety classifier with nearly 100% probability. + +[^13]: CHEMREASONER:使用量子化学反馈在大型语言模型的知识空间中进行启发式搜索 + + CHEMREASONER: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback + + [https://arxiv.org/abs/2402.10980](https://arxiv.org/abs/2402.10980) + + 通过将大型语言模型推理与量子化学反馈相结合,我们引入了一个AI引导的计算筛选框架,将催化剂发现形式化为一个不确定环境,从而实现高效催化剂的积极搜索 + + + + arXiv:2402.10980v1 类型公告:跨领域 摘要:发现新的催化剂对于设计新的更高效的化学过程至关重要,以实现向可持续未来的过渡。我们引入了一种人工智能引导的计算筛选框架,将语言推理与基于量子化学的三维原子表示的反馈统一起来。我们的方法将催化剂发现构建为一个不确定环境,其中一个代理通过大型语言模型(LLM)推导的假设与基于原子图神经网络(GNN)的反馈的迭代组合,积极搜索高效催化剂。在中间搜索步骤确定的催化剂经过基于空间定向、反应途径和稳定性的结构评估。基于吸附能和势垒的评分函数引导在LLM的知识空间中向能量有利、高效的催化剂探索。我们引入了可以自动规划的方法 + + arXiv:2402.10980v1 Announce Type: cross Abstract: The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts. We introduce planning methods that automaticall + +[^14]: 学习可解释概念:统一因果表示学习与基础模型 + + Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models + + [https://arxiv.org/abs/2402.09236](https://arxiv.org/abs/2402.09236) + + 本研究将因果表示学习和基础模型相结合,研究了如何从数据中学习人类可解释的概念。实验证明了这一统一方法的实用性。 + + + + 构建智能机器学习系统有两种广泛的方法。一种方法是构建天生可解释的模型,这是因果表示学习领域的努力方向。另一种方法是构建高性能的基础模型,然后投入努力去理解它们的工作原理。本研究将这两种方法联系起来,研究如何从数据中学习人类可解释的概念。通过结合这两个领域的思想,我们正式定义了概念的概念,并展示了它们可以从多样的数据中被可靠地恢复出来。对于合成数据和大型语言模型的实验证明了我们统一方法的实用性。 + + arXiv:2402.09236v1 Announce Type: cross Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach. + +[^15]: 物理层密钥对抗恶意可重构智能面的可解释对抗学习框架 + + Explainable Adversarial Learning Framework on Physical Layer Secret Keys Combating Malicious Reconfigurable Intelligent Surface + + [https://arxiv.org/abs/2402.06663](https://arxiv.org/abs/2402.06663) + + 本文提出了一个对抗学习框架,用于合法参与方间的物理层密钥生成,在恶意可重构智能面干扰下提供了一个可解释的解决方案。 + + + + 可重构智能面(RIS)的发展对物理层安全(PLS)是一把双刃剑。合法的RIS可以产生有益的影响,包括增加信道的随机性,增强物理层密钥生成(PL-SKG),而恶意的RIS可以破坏合法信道并破解大部分现有的PL-SKG。在这项工作中,我们提出了一个合法参与方(即爱丽丝和鲍勃)之间的对抗学习框架,以解决中间人恶意RIS(MITM-RIS)窃听问题。首先,我们推导了合法配对和MITM-RIS之间的理论互信息差距。然后,爱丽丝和鲍勃利用生成对抗网络(GAN)学习实现一个与MITM-RIS没有互信息重叠的共同特征面。接下来,我们使用符号可解释AI(xAI)表示对黑盒神经网络进行信号处理解释。这些主导神经元的符号术语有助于特征工程。 + + The development of reconfigurable intelligent surfaces (RIS) is a double-edged sword to physical layer security (PLS). Whilst a legitimate RIS can yield beneficial impacts including increased channel randomness to enhance physical layer secret key generation (PL-SKG), malicious RIS can poison legitimate channels and crack most of existing PL-SKGs. In this work, we propose an adversarial learning framework between legitimate parties (namely Alice and Bob) to address this Man-in-the-middle malicious RIS (MITM-RIS) eavesdropping. First, the theoretical mutual information gap between legitimate pairs and MITM-RIS is deduced. Then, Alice and Bob leverage generative adversarial networks (GANs) to learn to achieve a common feature surface that does not have mutual information overlap with MITM-RIS. Next, we aid signal processing interpretation of black-box neural networks by using a symbolic explainable AI (xAI) representation. These symbolic terms of dominant neurons aid feature engineering- + +[^16]: 元宇宙在校准具有肉身的人工通用智能中的作用 + + The role of the metaverse in calibrating an embodied artificial general intelligence + + [https://arxiv.org/abs/2402.06660](https://arxiv.org/abs/2402.06660) + + 本文研究了具有肉身的人工通用智能(AGI)的概念及其与人类意识的关系,强调了元宇宙在促进这一关系中的关键作用。通过结合不同理论框架和技术工具,论文总结出实现具有肉身的AGI的关键要素和发展阶段。 + + + + 本文探讨了具有肉身的人工通用智能(AGI)的概念,它与人类意识的关系,以及元宇宙在促进这种关系中的关键作用。本文利用融入认知、Michael Levin的计算边界"Self"、Donald D. Hoffman的感知界面理论以及Bernardo Kastrup的分析唯心主义等理论框架来构建实现具有肉身的AGI的论证。它认为我们所感知的外部现实是一种内在存在的交替状态的象征性表示,而AGI可以具有更大计算边界的更高意识。本文进一步讨论了AGI的发展阶段、实现具有肉身的AGI的要求、为AGI校准象征性界面的重要性,以及元宇宙、去中心化系统、开源区块链技术以及开源人工智能研究所扮演的关键角色。它还探讨了新的沟通机制和用于加强对元宇宙的理解的技术工具,以帮助实现具有肉身的AGI。 + + This paper examines the concept of embodied artificial general intelligence (AGI), its relationship to human consciousness, and the key role of the metaverse in facilitating this relationship. The paper leverages theoretical frameworks such as embodied cognition, Michael Levin's computational boundary of a "Self," Donald D. Hoffman's Interface Theory of Perception, and Bernardo Kastrup's analytical idealism to build the argument for achieving embodied AGI. It contends that our perceived outer reality is a symbolic representation of alternate inner states of being, and that AGI could embody a higher consciousness with a larger computational boundary. The paper further discusses the developmental stages of AGI, the requirements for the emergence of an embodied AGI, the importance of a calibrated symbolic interface for AGI, and the key role played by the metaverse, decentralized systems, open-source blockchain technology, as well as open-source AI research. It also explores the idea of a + +[^17]: InkSight:通过学习阅读和书写实现离线到在线手写转换 + + InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write + + [https://arxiv.org/abs/2402.05804](https://arxiv.org/abs/2402.05804) + + InkSight是一个可以将离线手写转换为在线手写的系统,通过结合阅读和书写先验知识,在多样化的照片中有效地Derendering手写文本。 + + + + 数字笔记正在变得越来越受欢迎,提供了一种耐用、可编辑和易于索引的存储笔记的方式,即矢量化形式的数字墨水。然而,这种笔记方式与传统的纸笔记方式之间仍存在显著差距,而传统纸笔记方式仍受到绝大多数人的青睐。我们的工作InkSight旨在弥合这种差距,使实体笔记者能够轻松地将他们的作品(离线手写)转换为数字墨水(在线手写),这个过程我们称之为Derendering。之前关于此主题的研究集中在图像的几何属性上,导致了在训练领域之外的有限泛化能力。我们的方法结合了阅读和书写的先验知识,允许在缺乏大量配对样本的情况下训练模型,而这些配对样本很难获取。据我们所知,这是第一个有效地对具有多样化视觉特征和背景的任意照片中的手写文本进行Derendering的工作。 + + Digital note-taking is gaining popularity, offering a durable, editable, and easily indexable way of storing notes in the vectorized form, known as digital ink. However, a substantial gap remains between this way of note-taking and traditional pen-and-paper note-taking, a practice still favored by a vast majority. Our work, InkSight, aims to bridge the gap by empowering physical note-takers to effortlessly convert their work (offline handwriting) to digital ink (online handwriting), a process we refer to as Derendering. Prior research on the topic has focused on the geometric properties of images, resulting in limited generalization beyond their training domains. Our approach combines reading and writing priors, allowing training a model in the absence of large amounts of paired samples, which are difficult to obtain. To our knowledge, this is the first work that effectively derenders handwritten text in arbitrary photos with diverse visual characteristics and backgrounds. Furthermore, + +[^18]: CIC:一种面向文化感知图像字幕的框架 + + CIC: A framework for Culturally-aware Image Captioning + + [https://arxiv.org/abs/2402.05374](https://arxiv.org/abs/2402.05374) + + CIC是一种面向文化感知图像字幕的框架,通过结合视觉问答和大型语言模型,它能够生成能描述图像中文化元素的详细字幕。 + + + + 图像字幕通过使用视觉-语言预训练模型(VLPs)如BLIP从图像生成描述性句子,这种方法已经取得了很大的改进。然而,当前的方法缺乏对图像中所描绘的文化元素(例如亚洲文化群体的传统服装)生成详细描述性字幕的能力。在本文中,我们提出了一种新的框架,\textbf{面向文化感知图像字幕(CIC)},该框架能够从代表不同文化的图像中生成字幕并描述文化元素。受到将视觉模态和大型语言模型(LLMs)通过适当的提示进行组合的方法的启发,我们的框架(1)根据图像中的文化类别生成问题,(2)利用生成的问题从视觉问答(VQA)中提取文化视觉元素,(3)使用带有提示的LLMs生成文化感知字幕。我们在4个不同大学的45名参与者上进行了人工评估。 + + Image Captioning generates descriptive sentences from images using Vision-Language Pre-trained models (VLPs) such as BLIP, which has improved greatly. However, current methods lack the generation of detailed descriptive captions for the cultural elements depicted in the images, such as the traditional clothing worn by people from Asian cultural groups. In this paper, we propose a new framework, \textbf{Culturally-aware Image Captioning (CIC)}, that generates captions and describes cultural elements extracted from cultural visual elements in images representing cultures. Inspired by methods combining visual modality and Large Language Models (LLMs) through appropriate prompts, our framework (1) generates questions based on cultural categories from images, (2) extracts cultural visual elements from Visual Question Answering (VQA) using generated questions, and (3) generates culturally-aware captions using LLMs with the prompts. Our human evaluation conducted on 45 participants from 4 dif + +[^19]: 个性化语言模型基于个性化人类反馈 + + Personalized Language Modeling from Personalized Human Feedback + + [https://arxiv.org/abs/2402.05133](https://arxiv.org/abs/2402.05133) + + 该论文提出了一个个性化语言模型的方法,通过在于用户的反馈数据中引入个性化特征来解决强化学习框架在多样化用户偏好下存在的问题。 + + + + 从个性化人类反馈中进行强化学习(RLHF)是目前主流的框架,用于调整大型语言模型以更好地符合人类偏好。然而,在这个框架下开发的算法的基本前提在用户偏好多样化的情况下可能会出现问题。在本文中,我们旨在通过开发个性化语言模型的方法来解决这个问题。我们首先正式介绍了从个性化人类反馈中学习的任务,并解释了为什么在这种情况下普通的RLHF可能会存在问题。然后,我们提出了一个通用的个性化-RLHF(P-RLHF)框架,需要同时学习用户模型和语言(或奖励)模型。用户模型接收用户信息并输出用户表示。其结构编码了我们对反馈数据中用户偏好的假设。我们为个性化奖励建模和个性化直接偏好优化开发了新的学习目标。 + + Reinforcement Learning from Human Feedback (RLHF) is the current dominating framework to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be problematic in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, which requires one to jointly learn a user model and a language (or reward) model. The user model takes in user information and outputs user representations. Its structure encodes our assumptions about user preferences underlying the feedback data. We develop new learning objectives for personalized reward modeling and personalized Direct Preference Optimizat + +[^20]: TopoX: 一个用于拓扑域上的机器学习的Python软件包套件 + + TopoX: A Suite of Python Packages for Machine Learning on Topological Domains + + [https://arxiv.org/abs/2402.02441](https://arxiv.org/abs/2402.02441) + + TopoX是一个用于在拓扑域上进行机器学习的Python软件包套件,包含了构建、计算和嵌入拓扑域的功能,并提供了一套全面的高阶消息传递功能工具箱。 + + + + 我们介绍了topox,一个提供可靠且用户友好的Python软件包套件,用于在拓扑域(扩展了图的领域)上进行计算和机器学习:超图、单纯、胞腔、路径和组合复合体。topox由三个软件包组成:toponetx用于构建和计算这些域,包括节点、边和高阶单元的处理;topoembedx提供了将拓扑域嵌入到向量空间的方法,类似于流行的基于图的嵌入算法,如node2vec;topomodelx建立在PyTorch之上,为拓扑域上的神经网络提供了一套全面的高阶消息传递功能工具箱。topox的源代码经过广泛的文档化和单元测试,并在https://github.com/pyt-team以MIT许可证的形式提供。 + + We introduce topox, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. topox consists of three packages: toponetx facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; topoembedx provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; topomodelx is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of topox is available under MIT license at https://github.com/pyt-team. + +[^21]: GeoSAM: 使用稀疏和密集的视觉提示对SAM进行改进,实现自动化的移动基础设施分割 + + GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure + + [https://arxiv.org/abs/2311.11319](https://arxiv.org/abs/2311.11319) + + GeoSAM是一个基于SAM的新框架,使用了来自零样本学习和预训练CNN分割模型的视觉提示,提高了地理图像分割的性能。 + + + + 当应用于自然图像分割时,Segment Anything Model (SAM)已经展现出了令人印象深刻的性能。然而,它在地理图像(如航拍和卫星图像)中面临困难,特别是在分割道路、人行道和人行横道等移动基础设施时。这种较差的性能源于这些对象的窄小特征,它们的纹理融入环境中,以及树木、建筑物、车辆和行人等物体的干扰,这些都可能使模型失去定向产生不准确的分割图。为了解决这些挑战,我们提出了地理SAM(GeoSAM),这是一个基于SAM的新框架,它使用来自零样本学习的密集视觉提示和预训练CNN分割模型的稀疏视觉提示实施了细调策略。所提出的GeoSAM在地理图像分割方面优于现有方法,特别是对于道路基础设施、行人基础设施的分割性能提升了26%、7%和17%。 + + The Segment Anything Model (SAM) has shown impressive performance when applied to natural image segmentation. However, it struggles with geographical images like aerial and satellite imagery, especially when segmenting mobility infrastructure including roads, sidewalks, and crosswalks. This inferior performance stems from the narrow features of these objects, their textures blending into the surroundings, and interference from objects like trees, buildings, vehicles, and pedestrians - all of which can disorient the model to produce inaccurate segmentation maps. To address these challenges, we propose Geographical SAM (GeoSAM), a novel SAM-based framework that implements a fine-tuning strategy using the dense visual prompt from zero-shot learning, and the sparse visual prompt from a pre-trained CNN segmentation model. The proposed GeoSAM outperforms existing approaches for geographical image segmentation, specifically by 26%, 7%, and 17% for road infrastructure, pedestrian infrastructur + +[^22]: ZS4C: 使用ChatGPT进行零射击合成不完整代码片段的可编译代码 + + ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using ChatGPT. (arXiv:2401.14279v1 [cs.SE] CROSS LISTED) + + [http://arxiv.org/abs/2401.14279](http://arxiv.org/abs/2401.14279) + + ZS4C提出了一种使用ChatGPT进行零射击合成可编译代码的轻量级方法,帮助用户重用或分析不完整的Q&A代码片段,通过识别缺失的导入语句并修复编译错误来实现。 + + + + 技术问答(Q&A)网站如Stack Overflow已成为软件开发者寻求知识的重要来源。然而,Q&A网站上的代码片段通常由于未解析的类型和缺失的依赖库而无法编译和语义上不完整,这增加了用户重用或分析Q&A代码片段的障碍。之前的方法要么不适用于合成可编译代码,要么编译成功率低。为了解决这个问题,我们提出了ZS4C,一种使用大型语言模型(LLM)从不完整的代码片段中进行零射击合成可编译代码的轻量级方法。ZS4C分为两个阶段。在第一阶段,ZS4C利用一个LLM,即ChatGPT,根据我们设计的专用任务提示模板,为给定的代码片段识别缺失的导入语句。在第二阶段,ZS4C通过修复由于不正确的导入语句和语法错误引起的编译错误来修复代码。 + + Technical question and answering (Q&A) sites such as Stack Overflow have become an important source for software developers to seek knowledge. However, code snippets on Q&A sites are usually uncompilable and semantically incomplete for compilation due to unresolved types and missing dependent libraries, which raises the obstacle for users to reuse or analyze Q&A code snippets. Prior approaches either are not designed for synthesizing compilable code or suffer from a low compilation success rate. To address this problem, we propose ZS4C, a lightweight approach to perform zero-shot synthesis of compilable code from incomplete code snippets using Large Language Model (LLM). ZS4C operates in two stages. In the first stage, ZS4C utilizes an LLM, i.e., ChatGPT, to identify missing import statements for a given code snippet, leveraging our designed task-specific prompt template. In the second stage, ZS4C fixes compilation errors caused by incorrect import statements and syntax errors through + +[^23]: 众包自适应调查 + + Crowdsourced Adaptive Surveys. (arXiv:2401.12986v1 [cs.CL]) + + [http://arxiv.org/abs/2401.12986](http://arxiv.org/abs/2401.12986) + + 众包自适应调查方法(CSAS)结合自然语言处理和自适应算法,能够根据用户输入演变问题库,并在调查中适应新的问题,应用在拉丁裔信息环境和议题重要性领域,能够识别难以通过传统方法跟踪的主张或问题。 + + + + 公众舆论调查对于民主决策至关重要,但对于传统调查方法来说,快速变化的信息环境和在小众社区中衡量观点可能是具有挑战性的。本文介绍了一种众包自适应调查方法(CSAS),它将自然语言处理和自适应算法的进展结合起来,生成随着用户输入不断演变的问题库。CSAS方法将参与者提供的开放式文本转换为Likert式项目,并应用多臂赌博算法来确定应优先考虑在调查中的用户提供问题。该方法的自适应性允许探索新的调查问题,同时在调查长度上施加最小的成本。在拉丁裔信息环境和议题重要性领域的应用展示了CSAS识别可能难以通过标准方法跟踪的主张或问题的能力。最后,我提出 Conclusion by di的结束语。 + + Public opinion surveys are vital for informing democratic decision-making, but responding to rapidly changing information environments and measuring beliefs within niche communities can be challenging for traditional survey methods. This paper introduces a crowdsourced adaptive survey methodology (CSAS) that unites advances in natural language processing and adaptive algorithms to generate question banks that evolve with user input. The CSAS method converts open-ended text provided by participants into Likert-style items and applies a multi-armed bandit algorithm to determine user-provided questions that should be prioritized in the survey. The method's adaptive nature allows for the exploration of new survey questions, while imposing minimal costs in survey length. Applications in the domains of Latino information environments and issue importance showcase CSAS's ability to identify claims or issues that might otherwise be difficult to track using standard approaches. I conclude by di + +[^24]: xTrimoPGLM: 统一的百亿规模预训练蛋白质语言模型,用于解析蛋白质的语言 + + xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein. (arXiv:2401.06199v1 [q-bio.QM]) + + [http://arxiv.org/abs/2401.06199](http://arxiv.org/abs/2401.06199) + + xTrimoPGLM是一个统一的100亿规模预训练蛋白质语言模型,能够同时处理蛋白质理解和生成任务,通过创新的预训练框架和大规模的参数训练,显著优于其他先进模型,在18个蛋白理解基准测试中取得了成功,并能够实现对蛋白质结构的原子分辨率观察。 + + + + 蛋白质语言模型在学习蛋白质序列中的生物信息方面显示出显著的成功。然而,大多数现有模型局限于自编码或自回归的预训练目标,这使得它们在处理蛋白质理解和生成任务时很难同时进行。我们提出了一个统一的蛋白质语言模型,xTrimoPGLM,通过创新的预训练框架同时解决这两类任务。我们的关键技术贡献是探索这两类目标的兼容性和联合优化的潜力,从而导致了一个以前所未有的规模,使用1000亿参数和1万亿训练标记来训练xTrimoPGLM的策略。我们广泛的实验证明,1)xTrimoPGLM在四个类别的18个蛋白理解基准测试中明显优于其他先进基线。该模型还有助于对蛋白质结构进行原子分辨率的观察,从而实现了对蛋白质结构的理解和生成。 + + Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework. Our key technical contribution is an exploration of the compatibility and the potential for joint optimization of the two types of objectives, which has led to a strategy for training xTrimoPGLM at an unprecedented scale of 100 billion parameters and 1 trillion training tokens. Our extensive experiments reveal that 1) xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. The model also facilitates an atomic-resolution view of protein structures, leading to + +[^25]: 多智能体量子强化学习使用进化优化 + + Multi-Agent Quantum Reinforcement Learning using Evolutionary Optimization. (arXiv:2311.05546v2 [quant-ph] UPDATED) + + [http://arxiv.org/abs/2311.05546](http://arxiv.org/abs/2311.05546) + + 本研究提出了三种基于变分量子线路的进化优化多智能体强化学习变体,并在Coin Game环境中证明了这些方法相比于经典方法表现显著更好。 + + + + 多智能体强化学习在自动驾驶和其他智能产业应用方面变得越来越重要。与此同时,利用量子力学的固有属性,采用新的有希望的强化学习方法,显著减少模型的可训练参数。然而,基于梯度的多智能体量子强化学习方法常常面临贫瘠平台问题,阻碍了它们与经典方法性能的匹配。我们在现有的无梯度量子强化学习方法基础上构建,并提出了三种基于变分量子线路的进化优化多智能体强化学习变体。我们在Coin Game环境中评估了我们的遗传变种,并与经典方法进行了比较。我们证明了我们的变分量子线路方法相比于具有类似参数数量的神经网络表现显著更好。 + + Multi-Agent Reinforcement Learning is becoming increasingly more important in times of autonomous driving and other smart industrial applications. Simultaneously a promising new approach to Reinforcement Learning arises using the inherent properties of quantum mechanics, reducing the trainable parameters of a model significantly. However, gradient-based Multi-Agent Quantum Reinforcement Learning methods often have to struggle with barren plateaus, holding them back from matching the performance of classical approaches. We build upon an existing approach for gradient free Quantum Reinforcement Learning and propose three genetic variations with Variational Quantum Circuits for Multi-Agent Reinforcement Learning using evolutionary optimization. We evaluate our genetic variations in the Coin Game environment and also compare them to classical approaches. We showed that our Variational Quantum Circuit approaches perform significantly better compared to a neural network with a similar amount + +[^26]: 医学图像分析的领域泛化:综述 + + Domain Generalization for Medical Image Analysis: A Survey. (arXiv:2310.08598v1 [eess.IV]) + + [http://arxiv.org/abs/2310.08598](http://arxiv.org/abs/2310.08598) + + 本综述详细回顾了针对医学图像分析的领域泛化研究,探讨了在DL模型在真实世界应用中遇到的挑战,以及如何解决分布漂移问题和实现稳健性。同时,考虑了领域泛化技术对整个MedIA工作流程的操作影响。 + + + + 医学图像分析(MedIA)已成为医学和保健领域的重要工具,在疾病诊断、预后和治疗规划方面起到了很大的作用,深度学习(DL)的最新成功为其进展做出了重要贡献。然而,MedIA的DL模型在现实世界中的部署仍然具有挑战性,在训练和测试样本之间的分布差异下很难泛化,这被称为分布漂移问题。研究人员致力于开发各种DL方法,使其能够适应并在未知和超出分布的数据分布上稳健地运行。本文综合评述了专门针对MedIA的领域泛化研究。我们提供了领域泛化技术在更大范围MedIA系统内的交互方式的整体视图,不仅仅考虑方法学,还考虑了对整个MedIA工作流程的操作影响。具体而言,我们将领域泛化方法分为数据层次的方法… + + Medical Image Analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, DL models for MedIA remain challenging to deploy in real-world situations, failing for generalization under the distributional gap between training and testing samples, known as a distribution shift problem. Researchers have dedicated their efforts to developing various DL methods to adapt and perform robustly on unknown and out-of-distribution data distributions. This paper comprehensively reviews domain generalization studies specifically tailored for MedIA. We provide a holistic view of how domain generalization techniques interact within the broader MedIA system, going beyond methodologies to consider the operational implications on the entire MedIA workflow. Specifically, we categorize domain generalization methods into data-lev + +[^27]: 分割与合并:对大型语言模型的位置偏差进行校准 + + Split and Merge: Aligning Position Biases in Large Language Model based Evaluators. (arXiv:2310.01432v1 [cs.CL]) + + [http://arxiv.org/abs/2310.01432](http://arxiv.org/abs/2310.01432) + + PORTIA是一个旨在校准大型语言模型评估器的位置偏差的对齐系统,通过将答案分割成多个片段,并对其进行对齐,然后将其合并回一个单一的提示,以提高评估的准确性和公正性。 + + + + 大型语言模型(LLMs)已被证明可以作为自动化评估器,用于评估AI系统生成的答案的质量。然而,这些基于LLM的评估器在使用对比评估候选答案时存在位置偏差或不一致性,无视内容而偏向于第一个或第二个答案。为了解决这个问题,我们提出了PORTIA,这是一个基于对齐的系统,旨在模拟人类的比较策略,以轻量级但有效的方式校准位置偏差。具体而言,PORTIA将答案分割成多个片段,对比候选答案中的相似内容进行对齐,并将它们合并回一个单一的提示,以供LLMs评估。我们使用六种不同的LLM进行了大量实验,评估了11,520个答案对。我们的结果表明,PORTIA显著提高了所有模型和对比形式的一致性率,平均相对改进率达到47.46%。引人注目的是,PORTIA使得LLMs能够评估中对位置偏差进行校准的创新方法,从而提高了评估的准确性和公正性。 + + Large language models (LLMs) have shown promise as automated evaluators for assessing the quality of answers generated by AI systems. However, these LLM-based evaluators exhibit position bias, or inconsistency, when used to evaluate candidate answers in pairwise comparisons, favoring either the first or second answer regardless of content. To address this limitation, we propose PORTIA, an alignment-based system designed to mimic human comparison strategies to calibrate position bias in a lightweight yet effective manner. Specifically, PORTIA splits the answers into multiple segments, aligns similar content across candidate answers, and then merges them back into a single prompt for evaluation by LLMs. We conducted extensive experiments with six diverse LLMs to evaluate 11,520 answer pairs. Our results show that PORTIA markedly enhances the consistency rates for all the models and comparison forms tested, achieving an average relative improvement of 47.46%. Remarkably, PORTIA enables le + +[^28]: 统计测试替代人类决策者的算法 + + Statistical Tests for Replacing Human Decision Makers with Algorithms. (arXiv:2306.11689v1 [econ.EM]) + + [http://arxiv.org/abs/2306.11689](http://arxiv.org/abs/2306.11689) + + 本文提出了一种利用人工智能改善人类决策的统计框架,通过基准测试与机器预测,替换部分人类决策者的决策制定,并经过实验检验得出算法具有更高的真阳性率和更低的假阳性率,尤其是来自农村地区的医生的诊断更容易被替代。 + + + + 本文提出了一个统计框架,可以通过人工智能来改善人类的决策。首先将每个人类决策者的表现与机器预测进行基准测试;然后用所提出的人工智能算法的建议替换决策制定者的一个子集所做出的决策。利用全国大型孕产结果和繁殖年龄夫妇孕前检查的医生诊断数据集,我们试验了一种启发式高频率方法以及一种贝叶斯后验损失函数方法,并将其应用于异常出生检测。我们发现,我们的算法在一个测试数据集上的结果比仅由医生诊断的结果具有更高的总体真阳性率和更低的假阳性率。我们还发现,来自农村地区的医生的诊断更容易被替代,这表明人工智能辅助决策制定更容易提高精确度。 + + This paper proposes a statistical framework with which artificial intelligence can improve human decision making. The performance of each human decision maker is first benchmarked against machine predictions; we then replace the decisions made by a subset of the decision makers with the recommendation from the proposed artificial intelligence algorithm. Using a large nationwide dataset of pregnancy outcomes and doctor diagnoses from prepregnancy checkups of reproductive age couples, we experimented with both a heuristic frequentist approach and a Bayesian posterior loss function approach with an application to abnormal birth detection. We find that our algorithm on a test dataset results in a higher overall true positive rate and a lower false positive rate than the diagnoses made by doctors only. We also find that the diagnoses of doctors from rural areas are more frequently replaceable, suggesting that artificial intelligence assisted decision making tends to improve precision more i + +[^29]: 使用LLM辅助注释进行语料库语言学研究:本地语法分析案例研究 + + Using LLM-assisted Annotation for Corpus Linguistics: A Case Study of Local Grammar Analysis. (arXiv:2305.08339v2 [cs.CL] UPDATED) + + [http://arxiv.org/abs/2305.08339](http://arxiv.org/abs/2305.08339) + + 本文研究了使用基于大语言模型的聊天机器人自动标注文本的潜力,重点考察了从本地语法角度观察道歉言语行为构成的功能元素的程度,并比较了不同模型在注释任务中的表现,结果表明Bing聊天机器人在任务中表现优于ChatGPT和人类标注员。 + + + + 基于大语言模型(LLMs)的聊天机器人在语言理解方面表现出很强的能力。本研究探索LLMs在协助基于语料库的语言学研究方面的潜力,通过将文本自动标注为特定语言信息类别。具体而言,我们研究了从本地语法的角度观察道歉言语行为构成的功能元素的程度,通过比较基于GPT-3.5的ChatGPT、基于GPT-4的Bing聊天机器人和人类编码器在注释任务中的表现。结果表明,Bing聊天机器人在任务中表现显着优于ChatGPT。与人类标注员相比,Bing聊天机器人的整体表现略低于人类标注员的表现,但已经取得了较高的F1得分:道歉标记99.95%,原因标记91.91%,道歉者标记95.35%,被道歉者标记89.74%和加强标记96.47%。这表明,在语言类别清晰且可以轻松识别的情况下,使用LLM辅助注释进行语料库语言学研究是可行的。 + + Chatbots based on Large Language Models (LLMs) have shown strong capabilities in language understanding. In this study, we explore the potential of LLMs in assisting corpus-based linguistic studies through automatic annotation of texts with specific categories of linguistic information. Specifically, we examined to what extent LLMs understand the functional elements constituting the speech act of apology from a local grammar perspective, by comparing the performance of ChatGPT (powered by GPT-3.5), the Bing chatbot (powered by GPT-4), and a human coder in the annotation task. The results demonstrate that the Bing chatbot significantly outperformed ChatGPT in the task. Compared to human annotator, the overall performance of the Bing chatbot was slightly less satisfactory. However, it already achieved high F1 scores: 99.95% for the tag of APOLOGISING, 91.91% for REASON, 95.35% for APOLOGISER, 89.74% for APOLOGISEE, and 96.47% for INTENSIFIER. This suggests that it is feasible to use LLM- + +[^30]: 基于贝叶斯分类器的特征最优分区研究 + + Optimal partition of feature using Bayesian classifier. (arXiv:2304.14537v1 [cs.LG]) + + [http://arxiv.org/abs/2304.14537](http://arxiv.org/abs/2304.14537) + + 本文通过提出一种名为“共单调独立分类器”(CIBer)的新技术,专注于特征的最优分区,旨在克服朴素贝叶斯方法带来的挑战,并且证明该技术在不同数据集上具有更高的准确率和更低的错误率。 + + + + 朴素贝叶斯分类器是一种应用贝叶斯原理的流行分类方法,尽管输入变量之间的条件依赖关系听起来很好,但实际上会导致大多数投票风格的行为。朴素贝叶斯算法中的某些特征被称为独立特征,因为在预测分类时它们没有条件相关性或依赖性。本文通过提出一种名为“共单调独立分类器”(CIBer)的新技术,专注于特征的最优分区,旨在克服朴素贝叶斯方法带来的挑战。在不同的数据集上,我们明确证明了我们的技术的有效性,在错误率更低、准确率更高或相当的情况下,与随机森林和XGBoost等模型相比。 + + The Naive Bayesian classifier is a popular classification method employing the Bayesian paradigm. The concept of having conditional dependence among input variables sounds good in theory but can lead to a majority vote style behaviour. Achieving conditional independence is often difficult, and they introduce decision biases in the estimates. In Naive Bayes, certain features are called independent features as they have no conditional correlation or dependency when predicting a classification. In this paper, we focus on the optimal partition of features by proposing a novel technique called the Comonotone-Independence Classifier (CIBer) which is able to overcome the challenges posed by the Naive Bayes method. For different datasets, we clearly demonstrate the efficacy of our technique, where we achieve lower error rates and higher or equivalent accuracy compared to models such as Random Forests and XGBoost. + +[^31]: 利用离线数据加速程序生成环境中的强化学习 + + Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments. (arXiv:2304.09825v1 [cs.LG]) + + [http://arxiv.org/abs/2304.09825](http://arxiv.org/abs/2304.09825) + + 本研究旨在提高程序生成环境中强化学习的样本效率。研究证明,使用模仿学习进行预训练和同时进行模仿学习和在线强化学习的方法可以提高效率。 + + + + 强化学习面临的主要挑战之一是代理能够将其学习策略推广到未见过的环境中。此外,训练强化学习代理需要与环境进行大量交互。受离线强化学习和模仿学习的最近成功启发,我们进行了一项研究,以调查代理是否可以利用轨迹的离线数据来提高程序生成环境中的样本效率。我们考虑了两种使用离线数据的模仿学习方法:(1)在在线强化学习训练之前预训练策略和(2)同时训练在线强化学习和来自离线数据的模仿学习。我们分析了可用的离线轨迹的质量(轨迹的最佳性)和多样性(轨迹数量和覆盖级别)对两种方法有效性的影响。在MiniGrid环境中的四个知名稀疏奖励任务中,我们发现使用模仿学习进行预训练和同时进行模仿学习和在线强化学习的方法可以提供更高的样本效率。 + + One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently d + +[^32]: 无需边缘但具有结构感知性:从GNN到MLP的原型引导知识蒸馏。 + + Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs. (arXiv:2303.13763v1 [cs.LG]) + + [http://arxiv.org/abs/2303.13763](http://arxiv.org/abs/2303.13763) + + 本文提出了一种原型引导知识蒸馏(PGKD)方法,它不需要图形边缘,但可以在不考虑边缘的情况下学习结构感知的MLP。 + + + + 将高精度的图神经网络(GNN)在图任务中压缩成低延迟的多层感知器(MLP)已成为热门研究课题。以前的方法会将图的边缘处理成额外的输入给MLP,但这样的图结构对于各种场景可能无法获得。因此,我们提出了一种原型引导知识蒸馏(PGKD)方法,它不需要图形边缘,但可以在不考虑边缘的情况下学习结构感知的MLP。具体而言,我们分析了GNN教师中的图形结构信息,并通过原型在无边缘设置中从GNN到MLP进行了知识蒸馏。在流行的图形基准实验中的实验结果表明了所提出的PGKD方法的有效性和鲁棒性。 + + Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely exclusively on the node features and fail to capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose a Prototype-Guided Knowledge Distillation~(PGKD) method, which does not require graph edges~(edge-free) yet learns structure-aware MLPs. Specifically, we analyze the graph structural information in GNN teachers, and distill such information from GNNs to MLPs via prototypes in an edge-free setting. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD. + +[^33]: 语言控制扩散:通过空间、时间和任务高效扩展 + + Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks. (arXiv:2210.15629v2 [cs.LG] UPDATED) + + [http://arxiv.org/abs/2210.15629](http://arxiv.org/abs/2210.15629) + + 本文提出一种利用语言控制扩散模型的分层规划器,有效而高效地扩展扩散模型,解决长时间跨度自然语言指令下的控制问题,实现了较高的单任务和多任务成功率,并极大地提高计算效率。 + + + + 训练通用型智能体在各个方面都很困难,需要处理高维输入(空间)、长时间跨度(时间)和多个新任务。最近的结构方面的进展使得我们可以沿着其中一个或两个维度提高扩展性能力,但计算成本仍然很高。本文提出使用语言控制扩散模型作为一种基于自然语言条件的分层规划器(LCD)来应对这三个方面。我们有效而高效地扩展扩散模型,以应对时间、状态和任务空间维度的长时间跨度控制问题。我们在CALVIN语言机器人基准测试中将LCD与其他最先进的模型进行比较,发现LCD在多任务成功率方面优于其他最先进的方法,而单任务成功率(SR)为88.7%,远高于以前的最佳成绩82.6%,大大提高了计算效率。 + + Training generalist agents is difficult across several axes, requiring us to deal with high-dimensional inputs (space), long horizons (time), and multiple and new tasks. Recent advances with architectures have allowed for improved scaling along one or two of these dimensions, but are still prohibitive computationally. In this paper, we propose to address all three axes by leveraging Language to Control Diffusion models as a hierarchical planner conditioned on language (LCD). We effectively and efficiently scale diffusion models for planning in extended temporal, state, and task dimensions to tackle long horizon control problems conditioned on natural language instructions. We compare LCD with other state-of-the-art models on the CALVIN language robotics benchmark and find that LCD outperforms other SOTA methods in multi task success rates while dramatically improving computational efficiency with a single task success rate (SR) of 88.7% against the previous best of 82.6%. We show that + +[^34]: 距离对选区划分图的影响:中心和异常地图的应用 + + Implications of Distance over Redistricting Maps: Central and Outlier Maps. (arXiv:2203.00872v4 [cs.GT] UPDATED) + + [http://arxiv.org/abs/2203.00872](http://arxiv.org/abs/2203.00872) + + 本文提出了一种可解释且可操作的选区划分图距离测量方法,并定义了一种“最典型”的中心图。这种方法可以帮助我们深入研究一系列约束条件下选区划分图的应用。 + + + + 在代议制民主中,选区划分图用于将选民划分为一组选区,每个区选出一个代表。有效的划分图必须满足一系列约束条件,例如紧凑性、连续性、以及几乎相等的人口分布。然而,这些加强的限制条件仍然不足以限制有效选区划分图的数量。本文提出了一种可解释且可操作的距离测量方法,以此研究在一系列约束条件下选区划分图的应用。具体而言,我们定义了一种被认为是“最典型”的中心图,并通过展示它在一个委员会场景中反映了Kemeny(凯门耶)排名的良好性来给出了严格的证明。 + + In representative democracy, a redistricting map is chosen to partition an electorate into a collection of districts each of which elects a representative. A valid redistricting map must satisfy a collection of constraints such as being compact, contiguous, and of almost equal population. However, these imposed constraints are still loose enough to enable an enormous ensemble of valid redistricting maps. This fact introduces a difficulty in drawing redistricting maps and it also enables a partisan legislature to possibly gerrymander by choosing a map which unfairly favors it. In this paper, we introduce an interpretable and tractable distance measure over redistricting maps which does not use election results and study its implications over the ensemble of redistricting maps. Specifically, we define a central map which may be considered as being "most typical" and give a rigorous justification for it by showing that it mirrors the Kemeny ranking in a scenario where we have a committee diff --git a/cs.AI.xml b/cs.AI.xml index d59dc1685..430859125 100644 --- a/cs.AI.xml +++ b/cs.AI.xml @@ -1,141 +1,681 @@ -Chat Arxiv cs.AIhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.AI对于评估大型语言模型中多选题回答的合理性进行了回顾,发现当前基于多选题回答的基准可能无法充分捕捉大型语言模型的真实能力。https://rss.arxiv.org/abs/2402.01349<p> -超越答案:对于评估大型语言模型中多选题回答的合理性的回顾 +Chat Arxiv cs.AIhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.AI本研究重新思考了大型语言模型知识蒸馏中对Kullback-Leibler散度的应用,发现逆Kullback-Leibler和正向Kullback-Leibler散度在优化目标上相似,为此提出了一种自适应Kullback-Leiber散度方法。https://arxiv.org/abs/2404.02657<p> +在大型语言模型知识蒸馏中重新思考Kullback-Leibler散度 </p> <p> -Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models +Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models </p> <p> -https://rss.arxiv.org/abs/2402.01349 +https://arxiv.org/abs/2404.02657 </p> <p> -对于评估大型语言模型中多选题回答的合理性进行了回顾,发现当前基于多选题回答的基准可能无法充分捕捉大型语言模型的真实能力。 +本研究重新思考了大型语言模型知识蒸馏中对Kullback-Leibler散度的应用,发现逆Kullback-Leibler和正向Kullback-Leibler散度在优化目标上相似,为此提出了一种自适应Kullback-Leiber散度方法。 </p> <p> </p> <p> -在自然语言处理领域,大型语言模型(LLMs)引发了一场范式转变,显著提升了自然语言生成任务的性能。尽管取得了这些进展,对LLMs的全面评估仍然是社区面临的必然挑战。最近,将多选题回答(MCQA)作为LLMs的基准已经引起了广泛关注。本研究调查了MCQA作为LLMs评估方法的合理性。如果LLMs真正理解问题的语义,它们的性能应该在从相同问题派生的各种配置上表现一致。然而,我们的实证结果表明LLMs的响应一致性存在显著差异,我们将之定义为LLMs的响应可变性综合征(REVAS),这表明目前基于MCQA的基准可能无法充分捕捉LLMs的真实能力,强调了对更合适的评估方法的需要。 +Kullback-Leibler散度在知识蒸馏中被广泛应用于压缩大型语言模型。本研究从经验和理论上证明了,在LLMs的知识蒸馏中,与之前断言的逆Kullback-Leibler(RKL)散度寻找模式并因此优于寻找平均值的正向Kullback-Leibler(FKL)散度相反,实际上在知识蒸馏中都没有体现出寻找模式或寻找平均值的特性。相反,发现RKL和FKL具有相同的优化目标,并在足够数量的时代之后都会收敛。然而,由于实际约束,LLMs很少被训练如此多的时代。同时,我们进一步发现,RKL在分布的尾部,而FKL在开始时代侧重于分布的头部。因此,我们提出了一种简单而有效的自适应Kullback-Leiber(AKL)散度方法,该方法自适应地分配权重来组合F </p> <p> -In the field of natural language processing (NLP), Large Language Models (LLMs) have precipitated a paradigm shift, markedly enhancing performance in natural language generation tasks. Despite these advancements, the comprehensive evaluation of LLMs remains an inevitable challenge for the community. Recently, the utilization of Multiple Choice Question Answering (MCQA) as a benchmark for LLMs has gained considerable traction. This study investigates the rationality of MCQA as an evaluation method for LLMs. If LLMs genuinely understand the semantics of questions, their performance should exhibit consistency across the varied configurations derived from the same questions. Contrary to this expectation, our empirical findings suggest a notable disparity in the consistency of LLM responses, which we define as REsponse VAriability Syndrome (REVAS) of the LLMs, indicating that current MCQA-based benchmarks may not adequately capture the true capabilities of LLMs, which underscores the need f -</p>本文通过引入改进的深度卷积生成对抗网络(mDCGAN),针对高质量艺术品生成进行了研究,解决了普遍训练问题,有效探索抽象绘画中的颜色和笔触模式。https://arxiv.org/abs/2403.18397<p> -使用改进的深度卷积生成对抗网络在抽象艺术中进行颜色和笔触模式识别 +arXiv:2404.02657v1 Announce Type: cross Abstract: Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler (RKL) divergence is mode-seeking and thus preferable over the mean-seeking forward Kullback-Leibler (FKL) divergence, this study empirically and theoretically demonstrates that neither mode-seeking nor mean-seeking properties manifest in KD for LLMs. Instead, RKL and FKL are found to share the same optimization objective and both converge after a sufficient number of epochs. However, due to practical constraints, LLMs are seldom trained for such an extensive number of epochs. Meanwhile, we further find that RKL focuses on the tail part of the distributions, while FKL focuses on the head part at the beginning epochs. Consequently, we propose a simple yet effective Adaptive Kullback-Leiber (AKL) divergence method, which adaptively allocates weights to combine F +</p>通过核大小缩放提高嵌入式脉冲神经网络准确性的方法学在实验中表现出更高的准确性。https://arxiv.org/abs/2404.01685<p> +通过核大小缩放提高嵌入式脉冲神经网络准确性的方法学 </p> <p> -Colour and Brush Stroke Pattern Recognition in Abstract Art using Modified Deep Convolutional Generative Adversarial Networks +A Methodology for Improving Accuracy of Embedded Spiking Neural Networks through Kernel Size Scaling </p> <p> -https://arxiv.org/abs/2403.18397 +https://arxiv.org/abs/2404.01685 </p> <p> -本文通过引入改进的深度卷积生成对抗网络(mDCGAN),针对高质量艺术品生成进行了研究,解决了普遍训练问题,有效探索抽象绘画中的颜色和笔触模式。 +通过核大小缩放提高嵌入式脉冲神经网络准确性的方法学在实验中表现出更高的准确性。 </p> <p> </p> <p> -抽象艺术是一种广受欢迎、被广泛讨论的艺术形式,通常能够描绘出艺术家的情感。许多研究人员尝试使用机器学习和深度学习的边缘检测、笔触和情感识别算法来研究抽象艺术。本文描述了使用生成对抗神经网络(GAN)对广泛分布的抽象绘画进行研究。 GAN具有学习和再现分布的能力,使研究人员能够有效地探索和研究生成的图像空间。然而,挑战在于开发一种能够克服常见训练问题的高效GAN架构。本文通过引入专门设计用于高质量艺术品生成的改进DCGAN(mDCGAN)来解决这一挑战。该方法涉及对所做修改的深入探讨,深入研究DCGAN的复杂工作。 +脉冲神经网络(SNNs)由于其稀疏的基于脉冲的操作而能为基于机器学习的应用提供超低功耗/能耗。目前,大多数SNN架构需要更大的模型大小才能实现更高的准确性,这对资源受限的嵌入式应用不太适合。因此,迫切需要开发能够以可接受的内存占用实现高准确性的SNNs。为此,我们提出了一种通过核大小缩放提高SNNs准确性的新方法学。其关键步骤包括调查不同核大小对准确性的影响,设计新的核大小集合,基于选定的核大小生成SNN架构,并分析SNN模型选择的准确性-内存折衷。实验结果表明,我们的方法学在准确性方面优于最先进的方法(对于CIFAR10有93.24%的准确度) </p> <p> -arXiv:2403.18397v1 Announce Type: cross Abstract: Abstract Art is an immensely popular, discussed form of art that often has the ability to depict the emotions of an artist. Many researchers have made attempts to study abstract art in the form of edge detection, brush stroke and emotion recognition algorithms using machine and deep learning. This papers describes the study of a wide distribution of abstract paintings using Generative Adversarial Neural Networks(GAN). GANs have the ability to learn and reproduce a distribution enabling researchers and scientists to effectively explore and study the generated image space. However, the challenge lies in developing an efficient GAN architecture that overcomes common training pitfalls. This paper addresses this challenge by introducing a modified-DCGAN (mDCGAN) specifically designed for high-quality artwork generation. The approach involves a thorough exploration of the modifications made, delving into the intricate workings of DCGANs, opt -</p>提出了异质交互评分网络(HIRE)框架,通过异质交互模块(HIM)来共同建模异质交互并直接推断重要特征https://arxiv.org/abs/2403.17740<p> -一体化:异质交互建模用于冷启动评分预测 +arXiv:2404.01685v1 Announce Type: cross Abstract: Spiking Neural Networks (SNNs) can offer ultra low power/ energy consumption for machine learning-based applications due to their sparse spike-based operations. Currently, most of the SNN architectures need a significantly larger model size to achieve higher accuracy, which is not suitable for resource-constrained embedded applications. Therefore, developing SNNs that can achieve high accuracy with acceptable memory footprint is highly needed. Toward this, we propose a novel methodology that improves the accuracy of SNNs through kernel size scaling. Its key steps include investigating the impact of different kernel sizes on the accuracy, devising new sets of kernel sizes, generating SNN architectures based on the selected kernel sizes, and analyzing the accuracy-memory trade-offs for SNN model selection. The experimental results show that our methodology achieves higher accuracy than state-of-the-art (93.24% accuracy for CIFAR10 and 70 +</p>该论文对多语言大型语言模型进行了全面分析,深入讨论了关键问题,包括多语言语料库、对齐和偏见。https://arxiv.org/abs/2404.00929<p> +多语言大型语言模型:语料库、对齐和偏见综述 </p> <p> -All-in-One: Heterogeneous Interaction Modeling for Cold-Start Rating Prediction +A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias </p> <p> -https://arxiv.org/abs/2403.17740 +https://arxiv.org/abs/2404.00929 </p> <p> -提出了异质交互评分网络(HIRE)框架,通过异质交互模块(HIM)来共同建模异质交互并直接推断重要特征 +该论文对多语言大型语言模型进行了全面分析,深入讨论了关键问题,包括多语言语料库、对齐和偏见。 </p> <p> </p> <p> -冷启动评分预测是推荐系统中一个基本问题,已得到广泛研究。许多方法已经被提出,利用现有数据之间的显式关系,例如协同过滤、社交推荐和异构信息网络,以缓解冷启动用户和物品的数据不足问题。然而,基于不同角色之间的数据构建的显式关系可能不可靠且无关,从而限制了特定推荐任务的性能上限。受此启发,本文提出了一个灵活的框架,名为异质交互评分网络(HIRE)。HIRE不仅仅依赖于预先定义的交互模式或手动构建的异构信息网络。相反,我们设计了一个异质交互模块(HIM),来共同建模异质交互并直接推断重要特征。 +基于大型语言模型(LLMs)的基础上,发展了多语言大型语言模型(MLLMs)来解决多语言自然语言处理任务的挑战,希望实现从高资源到低资源语言的知识转移。然而,仍然存在重要限制和挑战,比如语言不平衡、多语言对齐和固有偏见。本文旨在对MLLMs进行全面分析,深入讨论围绕这些关键问题的议题。 </p> <p> -arXiv:2403.17740v1 Announce Type: cross Abstract: Cold-start rating prediction is a fundamental problem in recommender systems that has been extensively studied. Many methods have been proposed that exploit explicit relations among existing data, such as collaborative filtering, social recommendations and heterogeneous information network, to alleviate the data insufficiency issue for cold-start users and items. However, the explicit relations constructed based on data between different roles may be unreliable and irrelevant, which limits the performance ceiling of the specific recommendation task. Motivated by this, in this paper, we propose a flexible framework dubbed heterogeneous interaction rating network (HIRE). HIRE dose not solely rely on the pre-defined interaction pattern or the manually constructed heterogeneous information network. Instead, we devise a Heterogeneous Interaction Module (HIM) to jointly model the heterogeneous interactions and directly infer the important in -</p>提出了一种在图上学习的简单替代方法,称为掩码注意力(MAG),其利用注意力矩阵来创建定制的注意力模式,在长距离任务上表现出色并胜过其他方法。https://arxiv.org/abs/2402.10793<p> -掩码注意力是图的关键 +arXiv:2404.00929v1 Announce Type: cross Abstract: Based on the foundation of Large Language Models (LLMs), Multilingual Large Language Models (MLLMs) have been developed to address the challenges of multilingual natural language processing tasks, hoping to achieve knowledge transfer from high-resource to low-resource languages. However, significant limitations and challenges still exist, such as language imbalance, multilingual alignment, and inherent bias. In this paper, we aim to provide a comprehensive analysis of MLLMs, delving deeply into discussions surrounding these critical issues. First of all, we start by presenting an overview of MLLMs, covering their evolution, key techniques, and multilingual capacities. Secondly, we explore widely utilized multilingual corpora for MLLMs' training and multilingual datasets oriented for downstream tasks that are crucial for enhancing the cross-lingual capability of MLLMs. Thirdly, we survey the existing studies on multilingual representati +</p>Croissant是一种面向机器学习数据集的元数据格式,使数据集更易发现、可移植和互操作,有助于解决ML数据管理和负责任AI中的重要挑战。https://arxiv.org/abs/2403.19546<p> +Croissant:一种面向机器学习数据集的元数据格式 </p> <p> -Masked Attention is All You Need for Graphs +Croissant: A Metadata Format for ML-Ready Datasets </p> <p> -https://arxiv.org/abs/2402.10793 +https://arxiv.org/abs/2403.19546 </p> <p> -提出了一种在图上学习的简单替代方法,称为掩码注意力(MAG),其利用注意力矩阵来创建定制的注意力模式,在长距离任务上表现出色并胜过其他方法。 +Croissant是一种面向机器学习数据集的元数据格式,使数据集更易发现、可移植和互操作,有助于解决ML数据管理和负责任AI中的重要挑战。 </p> <p> </p> <p> -图神经网络(GNNs)和消息传递算法的变种主要用于在图上学习,这在很大程度上归功于它们的灵活性、速度和令人满意的性能。然而,设计强大而通用的GNNs需要大量的研究工作,通常依赖于精心选择的手工制作的消息传递操作符。受此启发,我们提出了一种在图上学习的非常简单的替代方法,它完全依赖于注意力。图被表示为节点或边集,并通过掩码注意权重矩阵来强制它们的连接,有效地为每个图创建定制的注意力模式。尽管其简单性,用于图的掩码注意力(MAG)在长距离任务上表现出色,并在55多个节点和图级任务上优于强消息传递基线和更复杂的基于注意力的方法。 +数据是机器学习(ML)的关键资源,但处理数据仍然是一个主要的摩擦点。本文介绍了Croissant,一种用于数据集的元数据格式,简化了数据被ML工具和框架使用的方式。Croissant使数据集更易发现、可移植和互操作,从而解决了ML数据管理和负责任AI中的重要挑战。Croissant已得到几个流行数据集库的支持,涵盖数十万个数据集,可以加载到最流行的ML框架中。 </p> <p> -arXiv:2402.10793v1 Announce Type: cross Abstract: Graph neural networks (GNNs) and variations of the message passing algorithm are the predominant means for learning on graphs, largely due to their flexibility, speed, and satisfactory performance. The design of powerful and general purpose GNNs, however, requires significant research efforts and often relies on handcrafted, carefully-chosen message passing operators. Motivated by this, we propose a remarkably simple alternative for learning on graphs that relies exclusively on attention. Graphs are represented as node or edge sets and their connectivity is enforced by masking the attention weight matrix, effectively creating custom attention patterns for each graph. Despite its simplicity, masked attention for graphs (MAG) has state-of-the-art performance on long-range tasks and outperforms strong message passing baselines and much more involved attention-based methods on over 55 node and graph-level tasks. We also show significantly -</p>该论文提出了一个简单而有效的框架SimMLP,通过在图上无监督学习MLPs,提高了在延迟敏感的应用中的泛化能力。https://arxiv.org/abs/2402.08918<p> -通过无监督在图上学习多层感知机(MLP)加速图推理 +arXiv:2403.19546v1 Announce Type: cross Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks. +</p>PRISM是一种算法,可以自动识别人类可解释且易传递的提示,从而有效生成所需概念,仅使用黑盒访问T2I模型。https://arxiv.org/abs/2403.19103<p> +用于个性化文本到图像生成的自动化黑盒提示工程 </p> <p> -Graph Inference Acceleration by Learning MLPs on Graphs without Supervision +Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation </p> <p> -https://arxiv.org/abs/2402.08918 +https://arxiv.org/abs/2403.19103 </p> <p> -该论文提出了一个简单而有效的框架SimMLP,通过在图上无监督学习MLPs,提高了在延迟敏感的应用中的泛化能力。 +PRISM是一种算法,可以自动识别人类可解释且易传递的提示,从而有效生成所需概念,仅使用黑盒访问T2I模型。 </p> <p> </p> <p> -图神经网络(GNNs)已经在各种图学习任务中展示出了有效性,但是它们对消息传递的依赖限制了它们在延迟敏感的应用中的部署,比如金融欺诈检测。最近的研究探索了从GNNs中提取知识到多层感知机(MLPs)来加速推理。然而,这种任务特定的有监督蒸馏限制了对未见节点的泛化,而在延迟敏感的应用中这种情况很常见。为此,我们提出了一种简单而有效的框架SimMLP,用于在图上无监督学习MLPs,以增强泛化能力。SimMLP利用自监督对齐GNNs和MLPs之间的节点特征和图结构之间的精细和泛化的相关性,并提出了两种策略来减轻平凡解的风险。从理论上讲, +提示工程对于控制文本到图像(T2I)生成模型的输出是有效的,但由于需要手动制作提示而导致工作繁重。这一挑战促使了自动提示生成算法的发展。然而,这些方法通常在T2I模型之间的可传递性方面遇到困难,需要对基础模型进行白盒访问,并产生非直观的提示。在这项工作中,我们介绍了PRISM,这是一种算法,可以仅使用黑盒访问T2I模型就自动识别人类可解释且易传递的提示,从而有效生成所需概念。受大型语言模型(LLM)越狱的启发,PRISM利用LLM的上下文学习能力来迭代地改进给定参考图像的候选提示分布。我们的实验展示了PRISM在为对象、样式等生成准确提示方面的多样性和有效性。 </p> <p> -arXiv:2402.08918v1 Announce Type: cross Abstract: Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph learning tasks, yet their reliance on message-passing constraints their deployment in latency-sensitive applications such as financial fraud detection. Recent works have explored distilling knowledge from GNNs to Multi-Layer Perceptrons (MLPs) to accelerate inference. However, this task-specific supervised distillation limits generalization to unseen nodes, which are prevalent in latency-sensitive applications. To this end, we present \textbf{\textsc{SimMLP}}, a \textbf{\textsc{Sim}}ple yet effective framework for learning \textbf{\textsc{MLP}}s on graphs without supervision, to enhance generalization. \textsc{SimMLP} employs self-supervised alignment between GNNs and MLPs to capture the fine-grained and generalizable correlation between node features and graph structures, and proposes two strategies to alleviate the risk of trivial solutions. Theoretically, w -</p>提出了一种PAC隐私保护扩散模型,通过将私有分类器指导集成到采样过程中增强隐私保护,并发展了一种新的度量标准来衡量隐私水平,在保护性能方面表现出卓越表现。https://arxiv.org/abs/2312.01201<p> -PAC隐私保护扩散模型 +arXiv:2403.19103v1 Announce Type: cross Abstract: Prompt engineering is effective for controlling the output of text-to-image (T2I) generative models, but it is also laborious due to the need for manually crafted prompts. This challenge has spurred the development of algorithms for automated prompt generation. However, these methods often struggle with transferability across T2I models, require white-box access to the underlying model, and produce non-intuitive prompts. In this work, we introduce PRISM, an algorithm that automatically identifies human-interpretable and transferable prompts that can effectively generate desired concepts given only black-box access to T2I models. Inspired by large language model (LLM) jailbreaking, PRISM leverages the in-context learning ability of LLMs to iteratively refine the candidate prompts distribution for given reference images. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, sty +</p>本研究探讨了ChatGPT是否能够基于Twitter提及来预测文章的撤回,研究发现在预测未来被撤回的有问题文章方面是具有一定潜力的。https://arxiv.org/abs/2403.16851<p> +ChatGPT是否能够基于Twitter提及来预测文章的撤回? </p> <p> -PAC Privacy Preserving Diffusion Models +Can ChatGPT predict article retraction based on Twitter mentions? </p> <p> -https://arxiv.org/abs/2312.01201 +https://arxiv.org/abs/2403.16851 </p> <p> -提出了一种PAC隐私保护扩散模型,通过将私有分类器指导集成到采样过程中增强隐私保护,并发展了一种新的度量标准来衡量隐私水平,在保护性能方面表现出卓越表现。 +本研究探讨了ChatGPT是否能够基于Twitter提及来预测文章的撤回,研究发现在预测未来被撤回的有问题文章方面是具有一定潜力的。 </p> <p> </p> <p> -数据隐私保护正在引起研究人员的越来越多的关注。扩散模型(DMs),尤其是具有严格的差分隐私,有可能生成既具有高隐私性又具有良好视觉质量的图像。然而,挑战在于确保在私有化特定数据属性时的强大保护,当前模型在这些方面经常存在不足。为了解决这些挑战,我们引入了PAC隐私保护扩散模型,这是一种利用扩散原理并确保“可能大致正确(PAC)”隐私性的模型。我们通过将私有分类器指导集成到Langevin采样过程中来增强隐私保护。此外,认识到在衡量模型隐私性方面存在差距,我们开发了一种新的度量标准来衡量隐私水平。我们的模型通过这个新度量标准评估,并通过高斯矩阵计算支持PAC界限,表现出更优异的隐私性能。 +检测有问题的研究文章具有重要意义,本研究探讨了根据被撤回文章在Twitter上的提及是否能够在文章被撤回前发出信号,从而在预测未来被撤回的有问题文章方面发挥作用。分析了包括3,505篇已撤回文章及其相关Twitter提及在内的数据集,以及使用粗糙精确匹配方法获取的具有类似特征的3,505篇未撤回文章。通过四种预测方法评估了Twitter提及在预测文章撤回方面的有效性,包括手动标注、关键词识别、机器学习模型和ChatGPT。手动标注的结果表明,的确有被撤回的文章,其Twitter提及包含在撤回前发出信号的可识别证据,尽管它们只占所有被撤回文章的一小部分。 </p> <p> -arXiv:2312.01201v2 Announce Type: replace-cross Abstract: Data privacy protection is garnering increased attention among researchers. Diffusion models (DMs), particularly with strict differential privacy, can potentially produce images with both high privacy and visual quality. However, challenges arise such as in ensuring robust protection in privatizing specific data attributes, areas where current models often fall short. To address these challenges, we introduce the PAC Privacy Preserving Diffusion Model, a model leverages diffusion principles and ensure Probably Approximately Correct (PAC) privacy. We enhance privacy protection by integrating a private classifier guidance into the Langevin Sampling Process. Additionally, recognizing the gap in measuring the privacy of models, we have developed a novel metric to gauge privacy levels. Our model, assessed with this new metric and supported by Gaussian matrix computations for the PAC bound, has shown superior performance in privacy p -</p>本文针对特征分布偏斜的联邦学习提出了FedRDN方法,在输入层级上实现了数据增强,将整个联邦数据集的统计信息注入到本地客户端数据中,以缓解特征漂移问题。http://arxiv.org/abs/2306.09363<p> -一种简单的面向特征分布偏斜联邦学习的数据增强方法 +arXiv:2403.16851v1 Announce Type: cross Abstract: Detecting problematic research articles timely is a vital task. This study explores whether Twitter mentions of retracted articles can signal potential problems with the articles prior to retraction, thereby playing a role in predicting future retraction of problematic articles. A dataset comprising 3,505 retracted articles and their associated Twitter mentions is analyzed, alongside 3,505 non-retracted articles with similar characteristics obtained using the Coarsened Exact Matching method. The effectiveness of Twitter mentions in predicting article retraction is evaluated by four prediction methods, including manual labelling, keyword identification, machine learning models, and ChatGPT. Manual labelling results indicate that there are indeed retracted articles with their Twitter mentions containing recognizable evidence signaling problems before retraction, although they represent only a limited share of all retracted articles with +</p>介绍了一个新的基准测试,评估了用于生成简要住院病程摘要的大语言模型在健康保健领域中的性能并提出相应的自适应策略https://arxiv.org/abs/2403.05720<p> +用于生成简要住院病程摘要的领域自适应大语言模型的基准测试 </p> <p> -A Simple Data Augmentation for Feature Distribution Skewed Federated Learning. (arXiv:2306.09363v1 [cs.LG]) +A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries </p> <p> -http://arxiv.org/abs/2306.09363 +https://arxiv.org/abs/2403.05720 </p> <p> -本文针对特征分布偏斜的联邦学习提出了FedRDN方法,在输入层级上实现了数据增强,将整个联邦数据集的统计信息注入到本地客户端数据中,以缓解特征漂移问题。 +介绍了一个新的基准测试,评估了用于生成简要住院病程摘要的大语言模型在健康保健领域中的性能并提出相应的自适应策略 </p> <p> </p> <p> -联邦学习(FL)是一种分布式协作学习方法,可以确保隐私保护。然而,由于数据异构性(即非独立同分布数据),它的性能必然受到影响。本文针对特征分布偏斜的FL场景展开研究,提出了一种通用的数据增强方法,以减轻由本地数据集之间潜在分布不同导致的特征漂移问题。 +简要住院病程(BHC)摘要是通过总结临床记录而生成的常见临床文件。虽然大型语言模型(LLMs)在自动化实际任务方面展现出显著能力,但它们在医疗应用(如BHC合成)中的能力尚未得到展示。为了使LLMs能够适应BHC合成,我们引入了一个新颖的基准测试,其中包含从MIMIC-IV记录中提取的经过预处理的数据集,封装了临床记录和简要住院病程(BHC)对。我们评估了两个通用LLMs和三个医疗领域适应的LLMs的性能,以改进从临床记录生成BHC。我们使用临床记录作为输入来生成BHC,采用基于提示的(使用上下文学习)和基于微调的自适应策略来应用于三个开源LLMs(Clinical-T5-Large,Llama2-13B,FLAN-UL2)和两个专有LLMs(GPT-3.5,GPT-4)。我们定量评估了性能。 </p> <p> -Federated learning (FL) facilitates collaborative learning among multiple clients in a distributed manner, while ensuring privacy protection. However, its performance is inevitably degraded as suffering data heterogeneity, i.e., non-IID data. In this paper, we focus on the feature distribution skewed FL scenario, which is widespread in real-world applications. The main challenge lies in the feature shift caused by the different underlying distributions of local datasets. While the previous attempts achieved progress, few studies pay attention to the data itself, the root of this issue. Therefore, the primary goal of this paper is to develop a general data augmentation technique at the input level, to mitigate the feature shift. To achieve this goal, we propose FedRDN, a simple yet remarkably effective data augmentation method for feature distribution skewed FL, which randomly injects the statistics of the dataset from the entire federation into the client's data. By this, our method ca +arXiv:2403.05720v1 Announce Type: cross Abstract: Brief hospital course (BHC) summaries are common clinical documents generated by summarizing clinical notes. While large language models (LLMs) depict remarkable capabilities in automating real-world tasks, their capabilities for healthcare applications such as BHC synthesis have not been shown. To enable the adaptation of LLMs for BHC synthesis, we introduce a novel benchmark consisting of a pre-processed dataset extracted from MIMIC-IV notes, encapsulating clinical note, and brief hospital course (BHC) pairs. We assess the performance of two general-purpose LLMs and three healthcare-adapted LLMs to improve BHC synthesis from clinical notes. Using clinical notes as input for generating BHCs, we apply prompting-based (using in-context learning) and fine-tuning-based adaptation strategies to three open-source LLMs (Clinical-T5-Large, Llama2-13B, FLAN-UL2) and two proprietary LLMs (GPT-3.5, GPT-4). We quantitatively evaluate the performa +</p>KATE是一种新的优化算法,提出了一种与AdaGrad标度不变的适应方法,并在广义线性模型和一般的非凸问题中证明了其标度不变性。数值实验结果表明,KATE在各种场景中均优于AdaGrad并与Adam性能匹配/超越。https://arxiv.org/abs/2403.02648<p> +移除平方根:一种新的高效标度不变版本的AdaGrad +</p> +<p> +Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad +</p> +<p> +https://arxiv.org/abs/2403.02648 +</p> +<p> +KATE是一种新的优化算法,提出了一种与AdaGrad标度不变的适应方法,并在广义线性模型和一般的非凸问题中证明了其标度不变性。数值实验结果表明,KATE在各种场景中均优于AdaGrad并与Adam性能匹配/超越。 +</p> +<p> + +</p> +<p> +自适应方法在机器学习中非常流行,因为它们可以降低学习速率调整的成本。本文引入了一种名为KATE的新型优化算法,它提出了一个著名的AdaGrad算法的标度不变适应。我们证明了KATE在广义线性模型案例中的标度不变性。此外,对于一般的光滑非凸问题,我们为KATE建立了一个收敛速率为$O \left(\frac{\log T}{\sqrt{T}} \right)$,与AdaGrad和Adam的最佳收敛速率相匹配。我们还通过不同问题的数值实验将KATE与其他最先进的自适应算法Adam和AdaGrad进行了比较,包括在真实数据上进行图像分类和文本分类等复杂机器学习任务。结果表明,在所有考虑到的场景中,KATE始终胜过AdaGrad,并且在性能上匹配/超越Adam。 +</p> +<p> +arXiv:2403.02648v1 Announce Type: cross Abstract: Adaptive methods are extremely popular in machine learning as they make learning rate tuning less expensive. This paper introduces a novel optimization algorithm named KATE, which presents a scale-invariant adaptation of the well-known AdaGrad algorithm. We prove the scale-invariance of KATE for the case of Generalized Linear Models. Moreover, for general smooth non-convex problems, we establish a convergence rate of $O \left(\frac{\log T}{\sqrt{T}} \right)$ for KATE, matching the best-known ones for AdaGrad and Adam. We also compare KATE to other state-of-the-art adaptive algorithms Adam and AdaGrad in numerical experiments with different problems, including complex machine learning tasks like image classification and text classification on real data. The results indicate that KATE consistently outperforms AdaGrad and matches/surpasses the performance of Adam in all considered scenarios. +</p>这项研究调查了大型语言模型在游戏领域中的多种应用及其角色,指出了未开发领域和未来发展方向,同时探讨了在游戏领域中大型语言模型的潜力和限制。https://arxiv.org/abs/2402.18659<p> +大型语言模型与游戏:调研与路线图 +</p> +<p> +Large Language Models and Games: A Survey and Roadmap +</p> +<p> +https://arxiv.org/abs/2402.18659 +</p> +<p> +这项研究调查了大型语言模型在游戏领域中的多种应用及其角色,指出了未开发领域和未来发展方向,同时探讨了在游戏领域中大型语言模型的潜力和限制。 +</p> +<p> + +</p> +<p> +近年来,大型语言模型(LLMs)的研究急剧增加,并伴随着公众对该主题的参与。尽管起初是自然语言处理中的一小部分,LLMs在广泛的应用和领域中展现出显著潜力,包括游戏。本文调查了LLMs在游戏中及为游戏提供支持的各种应用的最新技术水平,并明确了LLMs在游戏中可以扮演的不同角色。重要的是,我们讨论了尚未开发的领域和LLMs在游戏中未来应用的有前途的方向,以及在游戏领域中LLMs的潜力和限制。作为LLMs和游戏交叉领域的第一份综合调查和路线图,我们希望本文能够成为这一激动人心的新领域的开创性研究和创新的基础。 +</p> +<p> +arXiv:2402.18659v1 Announce Type: cross Abstract: Recent years have seen an explosive increase in research on large language models (LLMs), and accompanying public engagement on the topic. While starting as a niche area within natural language processing, LLMs have shown remarkable potential across a broad range of applications and domains, including games. This paper surveys the current state of the art across the various applications of LLMs in and for games, and identifies the different roles LLMs can take within a game. Importantly, we discuss underexplored areas and promising directions for future uses of LLMs in games and we reconcile the potential and limitations of LLMs within the games domain. As the first comprehensive survey and roadmap at the intersection of LLMs and games, we are hopeful that this paper will serve as the basis for groundbreaking research and innovation in this exciting new field. +</p>提出了ToMBench框架,在大型语言模型中进行心灵理论性能评估,发现最先进的模型仍然落后于人类表现超过10%。https://arxiv.org/abs/2402.15052<p> +在大型语言模型中基准测试心灵理论 +</p> +<p> +ToMBench: Benchmarking Theory of Mind in Large Language Models +</p> +<p> +https://arxiv.org/abs/2402.15052 +</p> +<p> +提出了ToMBench框架,在大型语言模型中进行心灵理论性能评估,发现最先进的模型仍然落后于人类表现超过10%。 +</p> +<p> + +</p> +<p> +心灵理论(ToM)是指感知和归因自己以及他人的心理状态的认知能力。最近的研究引发了关于大型语言模型(LLMs)是否表现出一种形式的心灵理论的争论。然而,现有的心灵理论评估受到诸如受限范围、主观判断和意外污染等挑战的制约,导致评估不足。为了填补这一空白,我们引入了ToMBench,具有三个关键特征:系统评估框架涵盖社会认知中的8项任务和31项能力,多项选择题格式以支持自动化和无偏见的评估,以及基于双语清单的从头构建,严格避免数据泄漏。基于ToMBench,我们进行了大量实验,评估了10个流行LLMs在任务和能力方面的心灵理论表现。我们发现,即使像GPT-4这样的最先进的LLMs也比人类表现落后超过10个百分点。 +</p> +<p> +arXiv:2402.15052v1 Announce Type: cross Abstract: Theory of Mind (ToM) is the cognitive capability to perceive and ascribe mental states to oneself and others. Recent research has sparked a debate over whether large language models (LLMs) exhibit a form of ToM. However, existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination, yielding inadequate assessments. To address this gap, we introduce ToMBench with three key characteristics: a systematic evaluation framework encompassing 8 tasks and 31 abilities in social cognition, a multiple-choice question format to support automated and unbiased evaluation, and a build-from-scratch bilingual inventory to strictly avoid data leakage. Based on ToMBench, we conduct extensive experiments to evaluate the ToM performance of 10 popular LLMs across tasks and abilities. We find that even the most advanced LLMs like GPT-4 lag behind human performance by over 10% points, indicati +</p>RealDex数据集捕捉了真实的灵巧手抓取动作,利用多模态数据使得训练灵巧手更加自然和精确,同时提出了一种先进的灵巧抓取动作生成框架,有效利用多模态大型语言模型,在类人机器人的自动感知、认知和操纵方面具有巨大潜力。https://arxiv.org/abs/2402.13853<p> +RealDex: 实现机器人灵巧手类人式抓取 +</p> +<p> +RealDex: Towards Human-like Grasping for Robotic Dexterous Hand +</p> +<p> +https://arxiv.org/abs/2402.13853 +</p> +<p> +RealDex数据集捕捉了真实的灵巧手抓取动作,利用多模态数据使得训练灵巧手更加自然和精确,同时提出了一种先进的灵巧抓取动作生成框架,有效利用多模态大型语言模型,在类人机器人的自动感知、认知和操纵方面具有巨大潜力。 +</p> +<p> + +</p> +<p> +在本文中,我们介绍了RealDex,一个开创性的数据集,捕捉了融入了人类行为模式的真实灵巧手抓取动作,同时通过多视角和多模态视觉数据进行了丰富。利用远程操作系统,我们可以实时无缝同步人-机器人手姿势。这些类人动作的集合对于训练灵巧手更自然、更精确地模仿人类动作至关重要。RealDex在推动类人机器人在真实场景中自动感知、认知和操纵方面具有巨大潜力。此外,我们介绍了一种前沿的灵巧抓取动作生成框架,该框架符合人类经验,并通过有效利用多模态大型语言模型增强了在现实世界中的适用性。广泛的实验证明了我们的方法在RealDex和其他开放数据集上的优越性能。完整的数据集和代码将会公开发布。 +</p> +<p> +arXiv:2402.13853v1 Announce Type: cross Abstract: In this paper, we introduce RealDex, a pioneering dataset capturing authentic dexterous hand grasping motions infused with human behavioral patterns, enriched by multi-view and multimodal visual data. Utilizing a teleoperation system, we seamlessly synchronize human-robot hand poses in real time. This collection of human-like motions is crucial for training dexterous hands to mimic human movements more naturally and precisely. RealDex holds immense promise in advancing humanoid robot for automated perception, cognition, and manipulation in real-world scenarios. Moreover, we introduce a cutting-edge dexterous grasping motion generation framework, which aligns with human experience and enhances real-world applicability through effectively utilizing Multimodal Large Language Models. Extensive experiments have demonstrated the superior performance of our method on RealDex and other open datasets. The complete dataset and code will be made +</p>该研究提出了一种基于查询的对抗性攻击方法,通过利用远程语言模型的 API 访问构造对抗性示例,使模型以更高概率发出有害字符串,而非仅仅基于模型之间的转移性攻击。https://arxiv.org/abs/2402.12329<p> +基于查询的对抗性提示生成 +</p> +<p> +Query-Based Adversarial Prompt Generation +</p> +<p> +https://arxiv.org/abs/2402.12329 +</p> +<p> +该研究提出了一种基于查询的对抗性攻击方法,通过利用远程语言模型的 API 访问构造对抗性示例,使模型以更高概率发出有害字符串,而非仅仅基于模型之间的转移性攻击。 +</p> +<p> + +</p> +<p> +最近的研究表明,可以构造对抗性示例,导致一个对其进行了调整的语言模型产生有害字符串或执行有害行为。现有的攻击要么在白盒设置中(完全访问模型权重),要么通过可转移性:一种现象,即在一个模型上精心设计的对抗性示例通常在其他模型上仍然有效。我们通过基于查询的攻击改进以前的工作,利用 API 访问远程语言模型来构造对抗性示例,使模型以(明显)更高的概率发出有害字符串,而不能仅仅使用转移攻击。我们在 GPT-3.5 和 OpenAI 的安全分类器上验证了我们的攻击;我们能够让 GPT-3.5 发出有害字符串,而目前的转移攻击失败了,并且我们几乎以 100% 的概率规避了安全分类器。 +</p> +<p> +arXiv:2402.12329v1 Announce Type: cross Abstract: Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior. Existing attacks work either in the white-box setting (with full access to the model weights), or through transferability: the phenomenon that adversarial examples crafted on one model often remain effective on other models. We improve on prior work with a query-based attack that leverages API access to a remote language model to construct adversarial examples that cause the model to emit harmful strings with (much) higher probability than with transfer-only attacks. We validate our attack on GPT-3.5 and OpenAI's safety classifier; we can cause GPT-3.5 to emit harmful strings that current transfer attacks fail at, and we can evade the safety classifier with nearly 100% probability. +</p>通过将大型语言模型推理与量子化学反馈相结合,我们引入了一个AI引导的计算筛选框架,将催化剂发现形式化为一个不确定环境,从而实现高效催化剂的积极搜索https://arxiv.org/abs/2402.10980<p> +CHEMREASONER:使用量子化学反馈在大型语言模型的知识空间中进行启发式搜索 +</p> +<p> +CHEMREASONER: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback +</p> +<p> +https://arxiv.org/abs/2402.10980 +</p> +<p> +通过将大型语言模型推理与量子化学反馈相结合,我们引入了一个AI引导的计算筛选框架,将催化剂发现形式化为一个不确定环境,从而实现高效催化剂的积极搜索 +</p> +<p> + +</p> +<p> +arXiv:2402.10980v1 类型公告:跨领域 摘要:发现新的催化剂对于设计新的更高效的化学过程至关重要,以实现向可持续未来的过渡。我们引入了一种人工智能引导的计算筛选框架,将语言推理与基于量子化学的三维原子表示的反馈统一起来。我们的方法将催化剂发现构建为一个不确定环境,其中一个代理通过大型语言模型(LLM)推导的假设与基于原子图神经网络(GNN)的反馈的迭代组合,积极搜索高效催化剂。在中间搜索步骤确定的催化剂经过基于空间定向、反应途径和稳定性的结构评估。基于吸附能和势垒的评分函数引导在LLM的知识空间中向能量有利、高效的催化剂探索。我们引入了可以自动规划的方法 +</p> +<p> +arXiv:2402.10980v1 Announce Type: cross Abstract: The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts. We introduce planning methods that automaticall +</p>本研究将因果表示学习和基础模型相结合,研究了如何从数据中学习人类可解释的概念。实验证明了这一统一方法的实用性。https://arxiv.org/abs/2402.09236<p> +学习可解释概念:统一因果表示学习与基础模型 +</p> +<p> +Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models +</p> +<p> +https://arxiv.org/abs/2402.09236 +</p> +<p> +本研究将因果表示学习和基础模型相结合,研究了如何从数据中学习人类可解释的概念。实验证明了这一统一方法的实用性。 +</p> +<p> + +</p> +<p> +构建智能机器学习系统有两种广泛的方法。一种方法是构建天生可解释的模型,这是因果表示学习领域的努力方向。另一种方法是构建高性能的基础模型,然后投入努力去理解它们的工作原理。本研究将这两种方法联系起来,研究如何从数据中学习人类可解释的概念。通过结合这两个领域的思想,我们正式定义了概念的概念,并展示了它们可以从多样的数据中被可靠地恢复出来。对于合成数据和大型语言模型的实验证明了我们统一方法的实用性。 +</p> +<p> +arXiv:2402.09236v1 Announce Type: cross Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach. +</p>本文提出了一个对抗学习框架,用于合法参与方间的物理层密钥生成,在恶意可重构智能面干扰下提供了一个可解释的解决方案。https://arxiv.org/abs/2402.06663<p> +物理层密钥对抗恶意可重构智能面的可解释对抗学习框架 +</p> +<p> +Explainable Adversarial Learning Framework on Physical Layer Secret Keys Combating Malicious Reconfigurable Intelligent Surface +</p> +<p> +https://arxiv.org/abs/2402.06663 +</p> +<p> +本文提出了一个对抗学习框架,用于合法参与方间的物理层密钥生成,在恶意可重构智能面干扰下提供了一个可解释的解决方案。 +</p> +<p> + +</p> +<p> +可重构智能面(RIS)的发展对物理层安全(PLS)是一把双刃剑。合法的RIS可以产生有益的影响,包括增加信道的随机性,增强物理层密钥生成(PL-SKG),而恶意的RIS可以破坏合法信道并破解大部分现有的PL-SKG。在这项工作中,我们提出了一个合法参与方(即爱丽丝和鲍勃)之间的对抗学习框架,以解决中间人恶意RIS(MITM-RIS)窃听问题。首先,我们推导了合法配对和MITM-RIS之间的理论互信息差距。然后,爱丽丝和鲍勃利用生成对抗网络(GAN)学习实现一个与MITM-RIS没有互信息重叠的共同特征面。接下来,我们使用符号可解释AI(xAI)表示对黑盒神经网络进行信号处理解释。这些主导神经元的符号术语有助于特征工程。 +</p> +<p> +The development of reconfigurable intelligent surfaces (RIS) is a double-edged sword to physical layer security (PLS). Whilst a legitimate RIS can yield beneficial impacts including increased channel randomness to enhance physical layer secret key generation (PL-SKG), malicious RIS can poison legitimate channels and crack most of existing PL-SKGs. In this work, we propose an adversarial learning framework between legitimate parties (namely Alice and Bob) to address this Man-in-the-middle malicious RIS (MITM-RIS) eavesdropping. First, the theoretical mutual information gap between legitimate pairs and MITM-RIS is deduced. Then, Alice and Bob leverage generative adversarial networks (GANs) to learn to achieve a common feature surface that does not have mutual information overlap with MITM-RIS. Next, we aid signal processing interpretation of black-box neural networks by using a symbolic explainable AI (xAI) representation. These symbolic terms of dominant neurons aid feature engineering- +</p>本文研究了具有肉身的人工通用智能(AGI)的概念及其与人类意识的关系,强调了元宇宙在促进这一关系中的关键作用。通过结合不同理论框架和技术工具,论文总结出实现具有肉身的AGI的关键要素和发展阶段。https://arxiv.org/abs/2402.06660<p> +元宇宙在校准具有肉身的人工通用智能中的作用 +</p> +<p> +The role of the metaverse in calibrating an embodied artificial general intelligence +</p> +<p> +https://arxiv.org/abs/2402.06660 +</p> +<p> +本文研究了具有肉身的人工通用智能(AGI)的概念及其与人类意识的关系,强调了元宇宙在促进这一关系中的关键作用。通过结合不同理论框架和技术工具,论文总结出实现具有肉身的AGI的关键要素和发展阶段。 +</p> +<p> + +</p> +<p> +本文探讨了具有肉身的人工通用智能(AGI)的概念,它与人类意识的关系,以及元宇宙在促进这种关系中的关键作用。本文利用融入认知、Michael Levin的计算边界"Self"、Donald D. Hoffman的感知界面理论以及Bernardo Kastrup的分析唯心主义等理论框架来构建实现具有肉身的AGI的论证。它认为我们所感知的外部现实是一种内在存在的交替状态的象征性表示,而AGI可以具有更大计算边界的更高意识。本文进一步讨论了AGI的发展阶段、实现具有肉身的AGI的要求、为AGI校准象征性界面的重要性,以及元宇宙、去中心化系统、开源区块链技术以及开源人工智能研究所扮演的关键角色。它还探讨了新的沟通机制和用于加强对元宇宙的理解的技术工具,以帮助实现具有肉身的AGI。 +</p> +<p> +This paper examines the concept of embodied artificial general intelligence (AGI), its relationship to human consciousness, and the key role of the metaverse in facilitating this relationship. The paper leverages theoretical frameworks such as embodied cognition, Michael Levin's computational boundary of a "Self," Donald D. Hoffman's Interface Theory of Perception, and Bernardo Kastrup's analytical idealism to build the argument for achieving embodied AGI. It contends that our perceived outer reality is a symbolic representation of alternate inner states of being, and that AGI could embody a higher consciousness with a larger computational boundary. The paper further discusses the developmental stages of AGI, the requirements for the emergence of an embodied AGI, the importance of a calibrated symbolic interface for AGI, and the key role played by the metaverse, decentralized systems, open-source blockchain technology, as well as open-source AI research. It also explores the idea of a +</p>InkSight是一个可以将离线手写转换为在线手写的系统,通过结合阅读和书写先验知识,在多样化的照片中有效地Derendering手写文本。https://arxiv.org/abs/2402.05804<p> +InkSight:通过学习阅读和书写实现离线到在线手写转换 +</p> +<p> +InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write +</p> +<p> +https://arxiv.org/abs/2402.05804 +</p> +<p> +InkSight是一个可以将离线手写转换为在线手写的系统,通过结合阅读和书写先验知识,在多样化的照片中有效地Derendering手写文本。 +</p> +<p> + +</p> +<p> +数字笔记正在变得越来越受欢迎,提供了一种耐用、可编辑和易于索引的存储笔记的方式,即矢量化形式的数字墨水。然而,这种笔记方式与传统的纸笔记方式之间仍存在显著差距,而传统纸笔记方式仍受到绝大多数人的青睐。我们的工作InkSight旨在弥合这种差距,使实体笔记者能够轻松地将他们的作品(离线手写)转换为数字墨水(在线手写),这个过程我们称之为Derendering。之前关于此主题的研究集中在图像的几何属性上,导致了在训练领域之外的有限泛化能力。我们的方法结合了阅读和书写的先验知识,允许在缺乏大量配对样本的情况下训练模型,而这些配对样本很难获取。据我们所知,这是第一个有效地对具有多样化视觉特征和背景的任意照片中的手写文本进行Derendering的工作。 +</p> +<p> +Digital note-taking is gaining popularity, offering a durable, editable, and easily indexable way of storing notes in the vectorized form, known as digital ink. However, a substantial gap remains between this way of note-taking and traditional pen-and-paper note-taking, a practice still favored by a vast majority. Our work, InkSight, aims to bridge the gap by empowering physical note-takers to effortlessly convert their work (offline handwriting) to digital ink (online handwriting), a process we refer to as Derendering. Prior research on the topic has focused on the geometric properties of images, resulting in limited generalization beyond their training domains. Our approach combines reading and writing priors, allowing training a model in the absence of large amounts of paired samples, which are difficult to obtain. To our knowledge, this is the first work that effectively derenders handwritten text in arbitrary photos with diverse visual characteristics and backgrounds. Furthermore, +</p>CIC是一种面向文化感知图像字幕的框架,通过结合视觉问答和大型语言模型,它能够生成能描述图像中文化元素的详细字幕。https://arxiv.org/abs/2402.05374<p> +CIC:一种面向文化感知图像字幕的框架 +</p> +<p> +CIC: A framework for Culturally-aware Image Captioning +</p> +<p> +https://arxiv.org/abs/2402.05374 +</p> +<p> +CIC是一种面向文化感知图像字幕的框架,通过结合视觉问答和大型语言模型,它能够生成能描述图像中文化元素的详细字幕。 +</p> +<p> + +</p> +<p> +图像字幕通过使用视觉-语言预训练模型(VLPs)如BLIP从图像生成描述性句子,这种方法已经取得了很大的改进。然而,当前的方法缺乏对图像中所描绘的文化元素(例如亚洲文化群体的传统服装)生成详细描述性字幕的能力。在本文中,我们提出了一种新的框架,\textbf{面向文化感知图像字幕(CIC)},该框架能够从代表不同文化的图像中生成字幕并描述文化元素。受到将视觉模态和大型语言模型(LLMs)通过适当的提示进行组合的方法的启发,我们的框架(1)根据图像中的文化类别生成问题,(2)利用生成的问题从视觉问答(VQA)中提取文化视觉元素,(3)使用带有提示的LLMs生成文化感知字幕。我们在4个不同大学的45名参与者上进行了人工评估。 +</p> +<p> +Image Captioning generates descriptive sentences from images using Vision-Language Pre-trained models (VLPs) such as BLIP, which has improved greatly. However, current methods lack the generation of detailed descriptive captions for the cultural elements depicted in the images, such as the traditional clothing worn by people from Asian cultural groups. In this paper, we propose a new framework, \textbf{Culturally-aware Image Captioning (CIC)}, that generates captions and describes cultural elements extracted from cultural visual elements in images representing cultures. Inspired by methods combining visual modality and Large Language Models (LLMs) through appropriate prompts, our framework (1) generates questions based on cultural categories from images, (2) extracts cultural visual elements from Visual Question Answering (VQA) using generated questions, and (3) generates culturally-aware captions using LLMs with the prompts. Our human evaluation conducted on 45 participants from 4 dif +</p>该论文提出了一个个性化语言模型的方法,通过在于用户的反馈数据中引入个性化特征来解决强化学习框架在多样化用户偏好下存在的问题。https://arxiv.org/abs/2402.05133<p> +个性化语言模型基于个性化人类反馈 +</p> +<p> +Personalized Language Modeling from Personalized Human Feedback +</p> +<p> +https://arxiv.org/abs/2402.05133 +</p> +<p> +该论文提出了一个个性化语言模型的方法,通过在于用户的反馈数据中引入个性化特征来解决强化学习框架在多样化用户偏好下存在的问题。 +</p> +<p> + +</p> +<p> +从个性化人类反馈中进行强化学习(RLHF)是目前主流的框架,用于调整大型语言模型以更好地符合人类偏好。然而,在这个框架下开发的算法的基本前提在用户偏好多样化的情况下可能会出现问题。在本文中,我们旨在通过开发个性化语言模型的方法来解决这个问题。我们首先正式介绍了从个性化人类反馈中学习的任务,并解释了为什么在这种情况下普通的RLHF可能会存在问题。然后,我们提出了一个通用的个性化-RLHF(P-RLHF)框架,需要同时学习用户模型和语言(或奖励)模型。用户模型接收用户信息并输出用户表示。其结构编码了我们对反馈数据中用户偏好的假设。我们为个性化奖励建模和个性化直接偏好优化开发了新的学习目标。 +</p> +<p> +Reinforcement Learning from Human Feedback (RLHF) is the current dominating framework to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be problematic in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, which requires one to jointly learn a user model and a language (or reward) model. The user model takes in user information and outputs user representations. Its structure encodes our assumptions about user preferences underlying the feedback data. We develop new learning objectives for personalized reward modeling and personalized Direct Preference Optimizat +</p>TopoX是一个用于在拓扑域上进行机器学习的Python软件包套件,包含了构建、计算和嵌入拓扑域的功能,并提供了一套全面的高阶消息传递功能工具箱。https://arxiv.org/abs/2402.02441<p> +TopoX: 一个用于拓扑域上的机器学习的Python软件包套件 +</p> +<p> +TopoX: A Suite of Python Packages for Machine Learning on Topological Domains +</p> +<p> +https://arxiv.org/abs/2402.02441 +</p> +<p> +TopoX是一个用于在拓扑域上进行机器学习的Python软件包套件,包含了构建、计算和嵌入拓扑域的功能,并提供了一套全面的高阶消息传递功能工具箱。 +</p> +<p> + +</p> +<p> +我们介绍了topox,一个提供可靠且用户友好的Python软件包套件,用于在拓扑域(扩展了图的领域)上进行计算和机器学习:超图、单纯、胞腔、路径和组合复合体。topox由三个软件包组成:toponetx用于构建和计算这些域,包括节点、边和高阶单元的处理;topoembedx提供了将拓扑域嵌入到向量空间的方法,类似于流行的基于图的嵌入算法,如node2vec;topomodelx建立在PyTorch之上,为拓扑域上的神经网络提供了一套全面的高阶消息传递功能工具箱。topox的源代码经过广泛的文档化和单元测试,并在https://github.com/pyt-team以MIT许可证的形式提供。 +</p> +<p> +We introduce topox, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. topox consists of three packages: toponetx facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; topoembedx provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; topomodelx is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of topox is available under MIT license at https://github.com/pyt-team. +</p>GeoSAM是一个基于SAM的新框架,使用了来自零样本学习和预训练CNN分割模型的视觉提示,提高了地理图像分割的性能。https://arxiv.org/abs/2311.11319<p> +GeoSAM: 使用稀疏和密集的视觉提示对SAM进行改进,实现自动化的移动基础设施分割 +</p> +<p> +GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure +</p> +<p> +https://arxiv.org/abs/2311.11319 +</p> +<p> +GeoSAM是一个基于SAM的新框架,使用了来自零样本学习和预训练CNN分割模型的视觉提示,提高了地理图像分割的性能。 +</p> +<p> + +</p> +<p> +当应用于自然图像分割时,Segment Anything Model (SAM)已经展现出了令人印象深刻的性能。然而,它在地理图像(如航拍和卫星图像)中面临困难,特别是在分割道路、人行道和人行横道等移动基础设施时。这种较差的性能源于这些对象的窄小特征,它们的纹理融入环境中,以及树木、建筑物、车辆和行人等物体的干扰,这些都可能使模型失去定向产生不准确的分割图。为了解决这些挑战,我们提出了地理SAM(GeoSAM),这是一个基于SAM的新框架,它使用来自零样本学习的密集视觉提示和预训练CNN分割模型的稀疏视觉提示实施了细调策略。所提出的GeoSAM在地理图像分割方面优于现有方法,特别是对于道路基础设施、行人基础设施的分割性能提升了26%、7%和17%。 +</p> +<p> +The Segment Anything Model (SAM) has shown impressive performance when applied to natural image segmentation. However, it struggles with geographical images like aerial and satellite imagery, especially when segmenting mobility infrastructure including roads, sidewalks, and crosswalks. This inferior performance stems from the narrow features of these objects, their textures blending into the surroundings, and interference from objects like trees, buildings, vehicles, and pedestrians - all of which can disorient the model to produce inaccurate segmentation maps. To address these challenges, we propose Geographical SAM (GeoSAM), a novel SAM-based framework that implements a fine-tuning strategy using the dense visual prompt from zero-shot learning, and the sparse visual prompt from a pre-trained CNN segmentation model. The proposed GeoSAM outperforms existing approaches for geographical image segmentation, specifically by 26%, 7%, and 17% for road infrastructure, pedestrian infrastructur +</p>ZS4C提出了一种使用ChatGPT进行零射击合成可编译代码的轻量级方法,帮助用户重用或分析不完整的Q&A代码片段,通过识别缺失的导入语句并修复编译错误来实现。http://arxiv.org/abs/2401.14279<p> +ZS4C: 使用ChatGPT进行零射击合成不完整代码片段的可编译代码 +</p> +<p> +ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using ChatGPT. (arXiv:2401.14279v1 [cs.SE] CROSS LISTED) +</p> +<p> +http://arxiv.org/abs/2401.14279 +</p> +<p> +ZS4C提出了一种使用ChatGPT进行零射击合成可编译代码的轻量级方法,帮助用户重用或分析不完整的Q&A代码片段,通过识别缺失的导入语句并修复编译错误来实现。 +</p> +<p> + +</p> +<p> +技术问答(Q&A)网站如Stack Overflow已成为软件开发者寻求知识的重要来源。然而,Q&A网站上的代码片段通常由于未解析的类型和缺失的依赖库而无法编译和语义上不完整,这增加了用户重用或分析Q&A代码片段的障碍。之前的方法要么不适用于合成可编译代码,要么编译成功率低。为了解决这个问题,我们提出了ZS4C,一种使用大型语言模型(LLM)从不完整的代码片段中进行零射击合成可编译代码的轻量级方法。ZS4C分为两个阶段。在第一阶段,ZS4C利用一个LLM,即ChatGPT,根据我们设计的专用任务提示模板,为给定的代码片段识别缺失的导入语句。在第二阶段,ZS4C通过修复由于不正确的导入语句和语法错误引起的编译错误来修复代码。 +</p> +<p> +Technical question and answering (Q&A) sites such as Stack Overflow have become an important source for software developers to seek knowledge. However, code snippets on Q&A sites are usually uncompilable and semantically incomplete for compilation due to unresolved types and missing dependent libraries, which raises the obstacle for users to reuse or analyze Q&A code snippets. Prior approaches either are not designed for synthesizing compilable code or suffer from a low compilation success rate. To address this problem, we propose ZS4C, a lightweight approach to perform zero-shot synthesis of compilable code from incomplete code snippets using Large Language Model (LLM). ZS4C operates in two stages. In the first stage, ZS4C utilizes an LLM, i.e., ChatGPT, to identify missing import statements for a given code snippet, leveraging our designed task-specific prompt template. In the second stage, ZS4C fixes compilation errors caused by incorrect import statements and syntax errors through +</p>众包自适应调查方法(CSAS)结合自然语言处理和自适应算法,能够根据用户输入演变问题库,并在调查中适应新的问题,应用在拉丁裔信息环境和议题重要性领域,能够识别难以通过传统方法跟踪的主张或问题。http://arxiv.org/abs/2401.12986<p> +众包自适应调查 +</p> +<p> +Crowdsourced Adaptive Surveys. (arXiv:2401.12986v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2401.12986 +</p> +<p> +众包自适应调查方法(CSAS)结合自然语言处理和自适应算法,能够根据用户输入演变问题库,并在调查中适应新的问题,应用在拉丁裔信息环境和议题重要性领域,能够识别难以通过传统方法跟踪的主张或问题。 +</p> +<p> + +</p> +<p> +公众舆论调查对于民主决策至关重要,但对于传统调查方法来说,快速变化的信息环境和在小众社区中衡量观点可能是具有挑战性的。本文介绍了一种众包自适应调查方法(CSAS),它将自然语言处理和自适应算法的进展结合起来,生成随着用户输入不断演变的问题库。CSAS方法将参与者提供的开放式文本转换为Likert式项目,并应用多臂赌博算法来确定应优先考虑在调查中的用户提供问题。该方法的自适应性允许探索新的调查问题,同时在调查长度上施加最小的成本。在拉丁裔信息环境和议题重要性领域的应用展示了CSAS识别可能难以通过标准方法跟踪的主张或问题的能力。最后,我提出 Conclusion by di的结束语。 +</p> +<p> +Public opinion surveys are vital for informing democratic decision-making, but responding to rapidly changing information environments and measuring beliefs within niche communities can be challenging for traditional survey methods. This paper introduces a crowdsourced adaptive survey methodology (CSAS) that unites advances in natural language processing and adaptive algorithms to generate question banks that evolve with user input. The CSAS method converts open-ended text provided by participants into Likert-style items and applies a multi-armed bandit algorithm to determine user-provided questions that should be prioritized in the survey. The method's adaptive nature allows for the exploration of new survey questions, while imposing minimal costs in survey length. Applications in the domains of Latino information environments and issue importance showcase CSAS's ability to identify claims or issues that might otherwise be difficult to track using standard approaches. I conclude by di +</p>xTrimoPGLM是一个统一的100亿规模预训练蛋白质语言模型,能够同时处理蛋白质理解和生成任务,通过创新的预训练框架和大规模的参数训练,显著优于其他先进模型,在18个蛋白理解基准测试中取得了成功,并能够实现对蛋白质结构的原子分辨率观察。http://arxiv.org/abs/2401.06199<p> +xTrimoPGLM: 统一的百亿规模预训练蛋白质语言模型,用于解析蛋白质的语言 +</p> +<p> +xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein. (arXiv:2401.06199v1 [q-bio.QM]) +</p> +<p> +http://arxiv.org/abs/2401.06199 +</p> +<p> +xTrimoPGLM是一个统一的100亿规模预训练蛋白质语言模型,能够同时处理蛋白质理解和生成任务,通过创新的预训练框架和大规模的参数训练,显著优于其他先进模型,在18个蛋白理解基准测试中取得了成功,并能够实现对蛋白质结构的原子分辨率观察。 +</p> +<p> + +</p> +<p> +蛋白质语言模型在学习蛋白质序列中的生物信息方面显示出显著的成功。然而,大多数现有模型局限于自编码或自回归的预训练目标,这使得它们在处理蛋白质理解和生成任务时很难同时进行。我们提出了一个统一的蛋白质语言模型,xTrimoPGLM,通过创新的预训练框架同时解决这两类任务。我们的关键技术贡献是探索这两类目标的兼容性和联合优化的潜力,从而导致了一个以前所未有的规模,使用1000亿参数和1万亿训练标记来训练xTrimoPGLM的策略。我们广泛的实验证明,1)xTrimoPGLM在四个类别的18个蛋白理解基准测试中明显优于其他先进基线。该模型还有助于对蛋白质结构进行原子分辨率的观察,从而实现了对蛋白质结构的理解和生成。 +</p> +<p> +Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework. Our key technical contribution is an exploration of the compatibility and the potential for joint optimization of the two types of objectives, which has led to a strategy for training xTrimoPGLM at an unprecedented scale of 100 billion parameters and 1 trillion training tokens. Our extensive experiments reveal that 1) xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. The model also facilitates an atomic-resolution view of protein structures, leading to +</p>本研究提出了三种基于变分量子线路的进化优化多智能体强化学习变体,并在Coin Game环境中证明了这些方法相比于经典方法表现显著更好。http://arxiv.org/abs/2311.05546<p> +多智能体量子强化学习使用进化优化 +</p> +<p> +Multi-Agent Quantum Reinforcement Learning using Evolutionary Optimization. (arXiv:2311.05546v2 [quant-ph] UPDATED) +</p> +<p> +http://arxiv.org/abs/2311.05546 +</p> +<p> +本研究提出了三种基于变分量子线路的进化优化多智能体强化学习变体,并在Coin Game环境中证明了这些方法相比于经典方法表现显著更好。 +</p> +<p> + +</p> +<p> +多智能体强化学习在自动驾驶和其他智能产业应用方面变得越来越重要。与此同时,利用量子力学的固有属性,采用新的有希望的强化学习方法,显著减少模型的可训练参数。然而,基于梯度的多智能体量子强化学习方法常常面临贫瘠平台问题,阻碍了它们与经典方法性能的匹配。我们在现有的无梯度量子强化学习方法基础上构建,并提出了三种基于变分量子线路的进化优化多智能体强化学习变体。我们在Coin Game环境中评估了我们的遗传变种,并与经典方法进行了比较。我们证明了我们的变分量子线路方法相比于具有类似参数数量的神经网络表现显著更好。 +</p> +<p> +Multi-Agent Reinforcement Learning is becoming increasingly more important in times of autonomous driving and other smart industrial applications. Simultaneously a promising new approach to Reinforcement Learning arises using the inherent properties of quantum mechanics, reducing the trainable parameters of a model significantly. However, gradient-based Multi-Agent Quantum Reinforcement Learning methods often have to struggle with barren plateaus, holding them back from matching the performance of classical approaches. We build upon an existing approach for gradient free Quantum Reinforcement Learning and propose three genetic variations with Variational Quantum Circuits for Multi-Agent Reinforcement Learning using evolutionary optimization. We evaluate our genetic variations in the Coin Game environment and also compare them to classical approaches. We showed that our Variational Quantum Circuit approaches perform significantly better compared to a neural network with a similar amount +</p>本综述详细回顾了针对医学图像分析的领域泛化研究,探讨了在DL模型在真实世界应用中遇到的挑战,以及如何解决分布漂移问题和实现稳健性。同时,考虑了领域泛化技术对整个MedIA工作流程的操作影响。http://arxiv.org/abs/2310.08598<p> +医学图像分析的领域泛化:综述 +</p> +<p> +Domain Generalization for Medical Image Analysis: A Survey. (arXiv:2310.08598v1 [eess.IV]) +</p> +<p> +http://arxiv.org/abs/2310.08598 +</p> +<p> +本综述详细回顾了针对医学图像分析的领域泛化研究,探讨了在DL模型在真实世界应用中遇到的挑战,以及如何解决分布漂移问题和实现稳健性。同时,考虑了领域泛化技术对整个MedIA工作流程的操作影响。 +</p> +<p> + +</p> +<p> +医学图像分析(MedIA)已成为医学和保健领域的重要工具,在疾病诊断、预后和治疗规划方面起到了很大的作用,深度学习(DL)的最新成功为其进展做出了重要贡献。然而,MedIA的DL模型在现实世界中的部署仍然具有挑战性,在训练和测试样本之间的分布差异下很难泛化,这被称为分布漂移问题。研究人员致力于开发各种DL方法,使其能够适应并在未知和超出分布的数据分布上稳健地运行。本文综合评述了专门针对MedIA的领域泛化研究。我们提供了领域泛化技术在更大范围MedIA系统内的交互方式的整体视图,不仅仅考虑方法学,还考虑了对整个MedIA工作流程的操作影响。具体而言,我们将领域泛化方法分为数据层次的方法… +</p> +<p> +Medical Image Analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, DL models for MedIA remain challenging to deploy in real-world situations, failing for generalization under the distributional gap between training and testing samples, known as a distribution shift problem. Researchers have dedicated their efforts to developing various DL methods to adapt and perform robustly on unknown and out-of-distribution data distributions. This paper comprehensively reviews domain generalization studies specifically tailored for MedIA. We provide a holistic view of how domain generalization techniques interact within the broader MedIA system, going beyond methodologies to consider the operational implications on the entire MedIA workflow. Specifically, we categorize domain generalization methods into data-lev +</p>PORTIA是一个旨在校准大型语言模型评估器的位置偏差的对齐系统,通过将答案分割成多个片段,并对其进行对齐,然后将其合并回一个单一的提示,以提高评估的准确性和公正性。http://arxiv.org/abs/2310.01432<p> +分割与合并:对大型语言模型的位置偏差进行校准 +</p> +<p> +Split and Merge: Aligning Position Biases in Large Language Model based Evaluators. (arXiv:2310.01432v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2310.01432 +</p> +<p> +PORTIA是一个旨在校准大型语言模型评估器的位置偏差的对齐系统,通过将答案分割成多个片段,并对其进行对齐,然后将其合并回一个单一的提示,以提高评估的准确性和公正性。 +</p> +<p> + +</p> +<p> +大型语言模型(LLMs)已被证明可以作为自动化评估器,用于评估AI系统生成的答案的质量。然而,这些基于LLM的评估器在使用对比评估候选答案时存在位置偏差或不一致性,无视内容而偏向于第一个或第二个答案。为了解决这个问题,我们提出了PORTIA,这是一个基于对齐的系统,旨在模拟人类的比较策略,以轻量级但有效的方式校准位置偏差。具体而言,PORTIA将答案分割成多个片段,对比候选答案中的相似内容进行对齐,并将它们合并回一个单一的提示,以供LLMs评估。我们使用六种不同的LLM进行了大量实验,评估了11,520个答案对。我们的结果表明,PORTIA显著提高了所有模型和对比形式的一致性率,平均相对改进率达到47.46%。引人注目的是,PORTIA使得LLMs能够评估中对位置偏差进行校准的创新方法,从而提高了评估的准确性和公正性。 +</p> +<p> +Large language models (LLMs) have shown promise as automated evaluators for assessing the quality of answers generated by AI systems. However, these LLM-based evaluators exhibit position bias, or inconsistency, when used to evaluate candidate answers in pairwise comparisons, favoring either the first or second answer regardless of content. To address this limitation, we propose PORTIA, an alignment-based system designed to mimic human comparison strategies to calibrate position bias in a lightweight yet effective manner. Specifically, PORTIA splits the answers into multiple segments, aligns similar content across candidate answers, and then merges them back into a single prompt for evaluation by LLMs. We conducted extensive experiments with six diverse LLMs to evaluate 11,520 answer pairs. Our results show that PORTIA markedly enhances the consistency rates for all the models and comparison forms tested, achieving an average relative improvement of 47.46%. Remarkably, PORTIA enables le +</p>本文提出了一种利用人工智能改善人类决策的统计框架,通过基准测试与机器预测,替换部分人类决策者的决策制定,并经过实验检验得出算法具有更高的真阳性率和更低的假阳性率,尤其是来自农村地区的医生的诊断更容易被替代。http://arxiv.org/abs/2306.11689<p> +统计测试替代人类决策者的算法 +</p> +<p> +Statistical Tests for Replacing Human Decision Makers with Algorithms. (arXiv:2306.11689v1 [econ.EM]) +</p> +<p> +http://arxiv.org/abs/2306.11689 +</p> +<p> +本文提出了一种利用人工智能改善人类决策的统计框架,通过基准测试与机器预测,替换部分人类决策者的决策制定,并经过实验检验得出算法具有更高的真阳性率和更低的假阳性率,尤其是来自农村地区的医生的诊断更容易被替代。 +</p> +<p> + +</p> +<p> +本文提出了一个统计框架,可以通过人工智能来改善人类的决策。首先将每个人类决策者的表现与机器预测进行基准测试;然后用所提出的人工智能算法的建议替换决策制定者的一个子集所做出的决策。利用全国大型孕产结果和繁殖年龄夫妇孕前检查的医生诊断数据集,我们试验了一种启发式高频率方法以及一种贝叶斯后验损失函数方法,并将其应用于异常出生检测。我们发现,我们的算法在一个测试数据集上的结果比仅由医生诊断的结果具有更高的总体真阳性率和更低的假阳性率。我们还发现,来自农村地区的医生的诊断更容易被替代,这表明人工智能辅助决策制定更容易提高精确度。 +</p> +<p> +This paper proposes a statistical framework with which artificial intelligence can improve human decision making. The performance of each human decision maker is first benchmarked against machine predictions; we then replace the decisions made by a subset of the decision makers with the recommendation from the proposed artificial intelligence algorithm. Using a large nationwide dataset of pregnancy outcomes and doctor diagnoses from prepregnancy checkups of reproductive age couples, we experimented with both a heuristic frequentist approach and a Bayesian posterior loss function approach with an application to abnormal birth detection. We find that our algorithm on a test dataset results in a higher overall true positive rate and a lower false positive rate than the diagnoses made by doctors only. We also find that the diagnoses of doctors from rural areas are more frequently replaceable, suggesting that artificial intelligence assisted decision making tends to improve precision more i +</p>本文研究了使用基于大语言模型的聊天机器人自动标注文本的潜力,重点考察了从本地语法角度观察道歉言语行为构成的功能元素的程度,并比较了不同模型在注释任务中的表现,结果表明Bing聊天机器人在任务中表现优于ChatGPT和人类标注员。http://arxiv.org/abs/2305.08339<p> +使用LLM辅助注释进行语料库语言学研究:本地语法分析案例研究 +</p> +<p> +Using LLM-assisted Annotation for Corpus Linguistics: A Case Study of Local Grammar Analysis. (arXiv:2305.08339v2 [cs.CL] UPDATED) +</p> +<p> +http://arxiv.org/abs/2305.08339 +</p> +<p> +本文研究了使用基于大语言模型的聊天机器人自动标注文本的潜力,重点考察了从本地语法角度观察道歉言语行为构成的功能元素的程度,并比较了不同模型在注释任务中的表现,结果表明Bing聊天机器人在任务中表现优于ChatGPT和人类标注员。 +</p> +<p> + +</p> +<p> +基于大语言模型(LLMs)的聊天机器人在语言理解方面表现出很强的能力。本研究探索LLMs在协助基于语料库的语言学研究方面的潜力,通过将文本自动标注为特定语言信息类别。具体而言,我们研究了从本地语法的角度观察道歉言语行为构成的功能元素的程度,通过比较基于GPT-3.5的ChatGPT、基于GPT-4的Bing聊天机器人和人类编码器在注释任务中的表现。结果表明,Bing聊天机器人在任务中表现显着优于ChatGPT。与人类标注员相比,Bing聊天机器人的整体表现略低于人类标注员的表现,但已经取得了较高的F1得分:道歉标记99.95%,原因标记91.91%,道歉者标记95.35%,被道歉者标记89.74%和加强标记96.47%。这表明,在语言类别清晰且可以轻松识别的情况下,使用LLM辅助注释进行语料库语言学研究是可行的。 +</p> +<p> +Chatbots based on Large Language Models (LLMs) have shown strong capabilities in language understanding. In this study, we explore the potential of LLMs in assisting corpus-based linguistic studies through automatic annotation of texts with specific categories of linguistic information. Specifically, we examined to what extent LLMs understand the functional elements constituting the speech act of apology from a local grammar perspective, by comparing the performance of ChatGPT (powered by GPT-3.5), the Bing chatbot (powered by GPT-4), and a human coder in the annotation task. The results demonstrate that the Bing chatbot significantly outperformed ChatGPT in the task. Compared to human annotator, the overall performance of the Bing chatbot was slightly less satisfactory. However, it already achieved high F1 scores: 99.95% for the tag of APOLOGISING, 91.91% for REASON, 95.35% for APOLOGISER, 89.74% for APOLOGISEE, and 96.47% for INTENSIFIER. This suggests that it is feasible to use LLM- +</p>本文通过提出一种名为“共单调独立分类器”(CIBer)的新技术,专注于特征的最优分区,旨在克服朴素贝叶斯方法带来的挑战,并且证明该技术在不同数据集上具有更高的准确率和更低的错误率。http://arxiv.org/abs/2304.14537<p> +基于贝叶斯分类器的特征最优分区研究 +</p> +<p> +Optimal partition of feature using Bayesian classifier. (arXiv:2304.14537v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2304.14537 +</p> +<p> +本文通过提出一种名为“共单调独立分类器”(CIBer)的新技术,专注于特征的最优分区,旨在克服朴素贝叶斯方法带来的挑战,并且证明该技术在不同数据集上具有更高的准确率和更低的错误率。 +</p> +<p> + +</p> +<p> +朴素贝叶斯分类器是一种应用贝叶斯原理的流行分类方法,尽管输入变量之间的条件依赖关系听起来很好,但实际上会导致大多数投票风格的行为。朴素贝叶斯算法中的某些特征被称为独立特征,因为在预测分类时它们没有条件相关性或依赖性。本文通过提出一种名为“共单调独立分类器”(CIBer)的新技术,专注于特征的最优分区,旨在克服朴素贝叶斯方法带来的挑战。在不同的数据集上,我们明确证明了我们的技术的有效性,在错误率更低、准确率更高或相当的情况下,与随机森林和XGBoost等模型相比。 +</p> +<p> +The Naive Bayesian classifier is a popular classification method employing the Bayesian paradigm. The concept of having conditional dependence among input variables sounds good in theory but can lead to a majority vote style behaviour. Achieving conditional independence is often difficult, and they introduce decision biases in the estimates. In Naive Bayes, certain features are called independent features as they have no conditional correlation or dependency when predicting a classification. In this paper, we focus on the optimal partition of features by proposing a novel technique called the Comonotone-Independence Classifier (CIBer) which is able to overcome the challenges posed by the Naive Bayes method. For different datasets, we clearly demonstrate the efficacy of our technique, where we achieve lower error rates and higher or equivalent accuracy compared to models such as Random Forests and XGBoost. +</p>本研究旨在提高程序生成环境中强化学习的样本效率。研究证明,使用模仿学习进行预训练和同时进行模仿学习和在线强化学习的方法可以提高效率。http://arxiv.org/abs/2304.09825<p> +利用离线数据加速程序生成环境中的强化学习 +</p> +<p> +Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments. (arXiv:2304.09825v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2304.09825 +</p> +<p> +本研究旨在提高程序生成环境中强化学习的样本效率。研究证明,使用模仿学习进行预训练和同时进行模仿学习和在线强化学习的方法可以提高效率。 +</p> +<p> + +</p> +<p> +强化学习面临的主要挑战之一是代理能够将其学习策略推广到未见过的环境中。此外,训练强化学习代理需要与环境进行大量交互。受离线强化学习和模仿学习的最近成功启发,我们进行了一项研究,以调查代理是否可以利用轨迹的离线数据来提高程序生成环境中的样本效率。我们考虑了两种使用离线数据的模仿学习方法:(1)在在线强化学习训练之前预训练策略和(2)同时训练在线强化学习和来自离线数据的模仿学习。我们分析了可用的离线轨迹的质量(轨迹的最佳性)和多样性(轨迹数量和覆盖级别)对两种方法有效性的影响。在MiniGrid环境中的四个知名稀疏奖励任务中,我们发现使用模仿学习进行预训练和同时进行模仿学习和在线强化学习的方法可以提供更高的样本效率。 +</p> +<p> +One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently d +</p>本文提出了一种原型引导知识蒸馏(PGKD)方法,它不需要图形边缘,但可以在不考虑边缘的情况下学习结构感知的MLP。http://arxiv.org/abs/2303.13763<p> +无需边缘但具有结构感知性:从GNN到MLP的原型引导知识蒸馏。 +</p> +<p> +Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs. (arXiv:2303.13763v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2303.13763 +</p> +<p> +本文提出了一种原型引导知识蒸馏(PGKD)方法,它不需要图形边缘,但可以在不考虑边缘的情况下学习结构感知的MLP。 +</p> +<p> + +</p> +<p> +将高精度的图神经网络(GNN)在图任务中压缩成低延迟的多层感知器(MLP)已成为热门研究课题。以前的方法会将图的边缘处理成额外的输入给MLP,但这样的图结构对于各种场景可能无法获得。因此,我们提出了一种原型引导知识蒸馏(PGKD)方法,它不需要图形边缘,但可以在不考虑边缘的情况下学习结构感知的MLP。具体而言,我们分析了GNN教师中的图形结构信息,并通过原型在无边缘设置中从GNN到MLP进行了知识蒸馏。在流行的图形基准实验中的实验结果表明了所提出的PGKD方法的有效性和鲁棒性。 +</p> +<p> +Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely exclusively on the node features and fail to capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose a Prototype-Guided Knowledge Distillation~(PGKD) method, which does not require graph edges~(edge-free) yet learns structure-aware MLPs. Specifically, we analyze the graph structural information in GNN teachers, and distill such information from GNNs to MLPs via prototypes in an edge-free setting. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD. +</p>本文提出一种利用语言控制扩散模型的分层规划器,有效而高效地扩展扩散模型,解决长时间跨度自然语言指令下的控制问题,实现了较高的单任务和多任务成功率,并极大地提高计算效率。http://arxiv.org/abs/2210.15629<p> +语言控制扩散:通过空间、时间和任务高效扩展 +</p> +<p> +Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks. (arXiv:2210.15629v2 [cs.LG] UPDATED) +</p> +<p> +http://arxiv.org/abs/2210.15629 +</p> +<p> +本文提出一种利用语言控制扩散模型的分层规划器,有效而高效地扩展扩散模型,解决长时间跨度自然语言指令下的控制问题,实现了较高的单任务和多任务成功率,并极大地提高计算效率。 +</p> +<p> + +</p> +<p> +训练通用型智能体在各个方面都很困难,需要处理高维输入(空间)、长时间跨度(时间)和多个新任务。最近的结构方面的进展使得我们可以沿着其中一个或两个维度提高扩展性能力,但计算成本仍然很高。本文提出使用语言控制扩散模型作为一种基于自然语言条件的分层规划器(LCD)来应对这三个方面。我们有效而高效地扩展扩散模型,以应对时间、状态和任务空间维度的长时间跨度控制问题。我们在CALVIN语言机器人基准测试中将LCD与其他最先进的模型进行比较,发现LCD在多任务成功率方面优于其他最先进的方法,而单任务成功率(SR)为88.7%,远高于以前的最佳成绩82.6%,大大提高了计算效率。 +</p> +<p> +Training generalist agents is difficult across several axes, requiring us to deal with high-dimensional inputs (space), long horizons (time), and multiple and new tasks. Recent advances with architectures have allowed for improved scaling along one or two of these dimensions, but are still prohibitive computationally. In this paper, we propose to address all three axes by leveraging Language to Control Diffusion models as a hierarchical planner conditioned on language (LCD). We effectively and efficiently scale diffusion models for planning in extended temporal, state, and task dimensions to tackle long horizon control problems conditioned on natural language instructions. We compare LCD with other state-of-the-art models on the CALVIN language robotics benchmark and find that LCD outperforms other SOTA methods in multi task success rates while dramatically improving computational efficiency with a single task success rate (SR) of 88.7% against the previous best of 82.6%. We show that +</p>本文提出了一种可解释且可操作的选区划分图距离测量方法,并定义了一种“最典型”的中心图。这种方法可以帮助我们深入研究一系列约束条件下选区划分图的应用。http://arxiv.org/abs/2203.00872<p> +距离对选区划分图的影响:中心和异常地图的应用 +</p> +<p> +Implications of Distance over Redistricting Maps: Central and Outlier Maps. (arXiv:2203.00872v4 [cs.GT] UPDATED) +</p> +<p> +http://arxiv.org/abs/2203.00872 +</p> +<p> +本文提出了一种可解释且可操作的选区划分图距离测量方法,并定义了一种“最典型”的中心图。这种方法可以帮助我们深入研究一系列约束条件下选区划分图的应用。 +</p> +<p> + +</p> +<p> +在代议制民主中,选区划分图用于将选民划分为一组选区,每个区选出一个代表。有效的划分图必须满足一系列约束条件,例如紧凑性、连续性、以及几乎相等的人口分布。然而,这些加强的限制条件仍然不足以限制有效选区划分图的数量。本文提出了一种可解释且可操作的距离测量方法,以此研究在一系列约束条件下选区划分图的应用。具体而言,我们定义了一种被认为是“最典型”的中心图,并通过展示它在一个委员会场景中反映了Kemeny(凯门耶)排名的良好性来给出了严格的证明。 +</p> +<p> +In representative democracy, a redistricting map is chosen to partition an electorate into a collection of districts each of which elects a representative. A valid redistricting map must satisfy a collection of constraints such as being compact, contiguous, and of almost equal population. However, these imposed constraints are still loose enough to enable an enormous ensemble of valid redistricting maps. This fact introduces a difficulty in drawing redistricting maps and it also enables a partisan legislature to possibly gerrymander by choosing a map which unfairly favors it. In this paper, we introduce an interpretable and tractable distance measure over redistricting maps which does not use election results and study its implications over the ensemble of redistricting maps. Specifically, we define a central map which may be considered as being "most typical" and give a rigorous justification for it by showing that it mirrors the Kemeny ranking in a scenario where we have a committee </p> \ No newline at end of file diff --git a/cs.CL.md b/cs.CL.md index da362dac1..37feb9da9 100644 --- a/cs.CL.md +++ b/cs.CL.md @@ -2,22 +2,382 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models](https://rss.arxiv.org/abs/2402.01349) | 对于评估大型语言模型中多选题回答的合理性进行了回顾,发现当前基于多选题回答的基准可能无法充分捕捉大型语言模型的真实能力。 | +| [^1] | [Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models](https://arxiv.org/abs/2404.02657) | 本研究重新思考了大型语言模型知识蒸馏中对Kullback-Leibler散度的应用,发现逆Kullback-Leibler和正向Kullback-Leibler散度在优化目标上相似,为此提出了一种自适应Kullback-Leiber散度方法。 | +| [^2] | [A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias](https://arxiv.org/abs/2404.00929) | 该论文对多语言大型语言模型进行了全面分析,深入讨论了关键问题,包括多语言语料库、对齐和偏见。 | +| [^3] | [Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation](https://arxiv.org/abs/2403.19103) | PRISM是一种算法,可以自动识别人类可解释且易传递的提示,从而有效生成所需概念,仅使用黑盒访问T2I模型。 | +| [^4] | [Can ChatGPT predict article retraction based on Twitter mentions?](https://arxiv.org/abs/2403.16851) | 本研究探讨了ChatGPT是否能够基于Twitter提及来预测文章的撤回,研究发现在预测未来被撤回的有问题文章方面是具有一定潜力的。 | +| [^5] | [Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment](https://arxiv.org/abs/2403.11176) | 提出了一种基于CLIP的自监督方法QualiCLIP,通过质量感知的图像-文本对齐策略,实现了图像质量评估不需要标记MOS的问题 | +| [^6] | [Speech Robust Bench: A Robustness Benchmark For Speech Recognition](https://arxiv.org/abs/2403.07937) | 提出了一个全面基准(SRB),用于评估自动语音识别(ASR)模型对各种破坏的鲁棒性,发现模型大小和某些建模选择有助于提高鲁棒性,并观察到在不同人口亚组上模型的鲁棒性存在明显差异。 | +| [^7] | [A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries](https://arxiv.org/abs/2403.05720) | 介绍了一个新的基准测试,评估了用于生成简要住院病程摘要的大语言模型在健康保健领域中的性能并提出相应的自适应策略 | +| [^8] | [KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations](https://arxiv.org/abs/2403.01469) | KorMedMCQA是首个从韩国医疗专业执业考试中衍生的多项选择题问答基准,提供了多种大型语言模型的基线实验结果,并在HuggingFace上公开了数据,为韩国医疗环境中的进一步研究和发展提供了可能性。 | +| [^9] | [Large Language Models and Games: A Survey and Roadmap](https://arxiv.org/abs/2402.18659) | 这项研究调查了大型语言模型在游戏领域中的多种应用及其角色,指出了未开发领域和未来发展方向,同时探讨了在游戏领域中大型语言模型的潜力和限制。 | +| [^10] | [RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval](https://arxiv.org/abs/2402.18510) | 本文研究了RNNs和Transformer在处理算法问题时的表现能力差距,发现RNNs存在关键瓶颈,即无法完美地从上下文中检索信息,导致无法像Transformer那样轻松解决需要这种能力的任务。 | +| [^11] | [ToMBench: Benchmarking Theory of Mind in Large Language Models](https://arxiv.org/abs/2402.15052) | 提出了ToMBench框架,在大型语言模型中进行心灵理论性能评估,发现最先进的模型仍然落后于人类表现超过10%。 | +| [^12] | [Query-Based Adversarial Prompt Generation](https://arxiv.org/abs/2402.12329) | 该研究提出了一种基于查询的对抗性攻击方法,通过利用远程语言模型的 API 访问构造对抗性示例,使模型以更高概率发出有害字符串,而非仅仅基于模型之间的转移性攻击。 | +| [^13] | [FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema](https://arxiv.org/abs/2402.11811) | FIPO提出了基于自由形式指导的提示优化方法,结合偏好数据集和模块化微调模式,重新构思了优化过程并实现了灵活的任务提示生成。 | +| [^14] | [Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities](https://arxiv.org/abs/2402.10835) | 本研究通过比较LLMs与传统模型,发现了LLMs在时间序列预测中的优势和局限性,指出LLMs在预测具有明显模式和趋势的时间序列方面表现出色,但在缺乏周期性的数据集方面面临挑战,同时指出融入外部知识和采用自然语言释义有助于提升LLMs在时间序列预测中的性能。 | +| [^15] | [Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims at Scale](https://arxiv.org/abs/2402.08467) | 本研究探索了ChatGPT在生成关于乌克兰战争的虚假信息方面的能力,发现它可以以较低成本、快速且大规模地生成逼真的定制虚假信息,而且这些虚假信息很难被人类读者和现有的自动化工具可靠地区分出来。 | +| [^16] | [EntGPT: Linking Generative Large Language Models with Knowledge Bases](https://arxiv.org/abs/2402.06738) | 本文介绍了一种名为EntGPT的模型,通过Entity Disambiguation(ED)任务,连接了生成型大型语言模型与知识库。通过提示工程和指令调整,该模型在没有有监督微调的情况下,显著提高了LLMs的性能,并在实体消歧任务上取得了可比较的性能。 | +| [^17] | [CIC: A framework for Culturally-aware Image Captioning](https://arxiv.org/abs/2402.05374) | CIC是一种面向文化感知图像字幕的框架,通过结合视觉问答和大型语言模型,它能够生成能描述图像中文化元素的详细字幕。 | +| [^18] | [Personalized Language Modeling from Personalized Human Feedback](https://arxiv.org/abs/2402.05133) | 该论文提出了一个个性化语言模型的方法,通过在于用户的反馈数据中引入个性化特征来解决强化学习框架在多样化用户偏好下存在的问题。 | +| [^19] | [Crowdsourced Adaptive Surveys.](http://arxiv.org/abs/2401.12986) | 众包自适应调查方法(CSAS)结合自然语言处理和自适应算法,能够根据用户输入演变问题库,并在调查中适应新的问题,应用在拉丁裔信息环境和议题重要性领域,能够识别难以通过传统方法跟踪的主张或问题。 | +| [^20] | [Natural Language Processing for Dialects of a Language: A Survey.](http://arxiv.org/abs/2401.05632) | 这项调查研究了自然语言处理中针对方言的方法和问题,强调了方言对于NLP模型性能和语言技术公平性的影响,并提供了关于方言相关任务和语言的全面综述。 | +| [^21] | [Split and Merge: Aligning Position Biases in Large Language Model based Evaluators.](http://arxiv.org/abs/2310.01432) | PORTIA是一个旨在校准大型语言模型评估器的位置偏差的对齐系统,通过将答案分割成多个片段,并对其进行对齐,然后将其合并回一个单一的提示,以提高评估的准确性和公正性。 | +| [^22] | [FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models.](http://arxiv.org/abs/2308.09975) | 本论文提出了一个专门用于评估大型语言模型在金融领域知识上的基准FinEval。通过在FinEval上评估中英文LLMs,结果显示只有GPT-4在不同提示设置下实现了接近70%的准确率,展示了LLMs在金融领域知识中的显著增长潜力。 | +| [^23] | [Questioning the Survey Responses of Large Language Models.](http://arxiv.org/abs/2306.07951) | 本文使用美国人口普查局建立的全美社区调查(ACS)评估了十几个不同大小的语言模型,发现小型模型具有显著的位置和标签偏差,而模型大小的增加能减轻这种偏差,但无法根据US群体或任何可识别的群体趋势进行调整。 | +| [^24] | [Using LLM-assisted Annotation for Corpus Linguistics: A Case Study of Local Grammar Analysis.](http://arxiv.org/abs/2305.08339) | 本文研究了使用基于大语言模型的聊天机器人自动标注文本的潜力,重点考察了从本地语法角度观察道歉言语行为构成的功能元素的程度,并比较了不同模型在注释任务中的表现,结果表明Bing聊天机器人在任务中表现优于ChatGPT和人类标注员。 | +| [^25] | [Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks.](http://arxiv.org/abs/2210.15629) | 本文提出一种利用语言控制扩散模型的分层规划器,有效而高效地扩展扩散模型,解决长时间跨度自然语言指令下的控制问题,实现了较高的单任务和多任务成功率,并极大地提高计算效率。 | # 详细 -[^1]: 超越答案:对于评估大型语言模型中多选题回答的合理性的回顾 +[^1]: 在大型语言模型知识蒸馏中重新思考Kullback-Leibler散度 - Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models + Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models - [https://rss.arxiv.org/abs/2402.01349](https://rss.arxiv.org/abs/2402.01349) + [https://arxiv.org/abs/2404.02657](https://arxiv.org/abs/2404.02657) - 对于评估大型语言模型中多选题回答的合理性进行了回顾,发现当前基于多选题回答的基准可能无法充分捕捉大型语言模型的真实能力。 + 本研究重新思考了大型语言模型知识蒸馏中对Kullback-Leibler散度的应用,发现逆Kullback-Leibler和正向Kullback-Leibler散度在优化目标上相似,为此提出了一种自适应Kullback-Leiber散度方法。 - 在自然语言处理领域,大型语言模型(LLMs)引发了一场范式转变,显著提升了自然语言生成任务的性能。尽管取得了这些进展,对LLMs的全面评估仍然是社区面临的必然挑战。最近,将多选题回答(MCQA)作为LLMs的基准已经引起了广泛关注。本研究调查了MCQA作为LLMs评估方法的合理性。如果LLMs真正理解问题的语义,它们的性能应该在从相同问题派生的各种配置上表现一致。然而,我们的实证结果表明LLMs的响应一致性存在显著差异,我们将之定义为LLMs的响应可变性综合征(REVAS),这表明目前基于MCQA的基准可能无法充分捕捉LLMs的真实能力,强调了对更合适的评估方法的需要。 + Kullback-Leibler散度在知识蒸馏中被广泛应用于压缩大型语言模型。本研究从经验和理论上证明了,在LLMs的知识蒸馏中,与之前断言的逆Kullback-Leibler(RKL)散度寻找模式并因此优于寻找平均值的正向Kullback-Leibler(FKL)散度相反,实际上在知识蒸馏中都没有体现出寻找模式或寻找平均值的特性。相反,发现RKL和FKL具有相同的优化目标,并在足够数量的时代之后都会收敛。然而,由于实际约束,LLMs很少被训练如此多的时代。同时,我们进一步发现,RKL在分布的尾部,而FKL在开始时代侧重于分布的头部。因此,我们提出了一种简单而有效的自适应Kullback-Leiber(AKL)散度方法,该方法自适应地分配权重来组合F - In the field of natural language processing (NLP), Large Language Models (LLMs) have precipitated a paradigm shift, markedly enhancing performance in natural language generation tasks. Despite these advancements, the comprehensive evaluation of LLMs remains an inevitable challenge for the community. Recently, the utilization of Multiple Choice Question Answering (MCQA) as a benchmark for LLMs has gained considerable traction. This study investigates the rationality of MCQA as an evaluation method for LLMs. If LLMs genuinely understand the semantics of questions, their performance should exhibit consistency across the varied configurations derived from the same questions. Contrary to this expectation, our empirical findings suggest a notable disparity in the consistency of LLM responses, which we define as REsponse VAriability Syndrome (REVAS) of the LLMs, indicating that current MCQA-based benchmarks may not adequately capture the true capabilities of LLMs, which underscores the need f + arXiv:2404.02657v1 Announce Type: cross Abstract: Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler (RKL) divergence is mode-seeking and thus preferable over the mean-seeking forward Kullback-Leibler (FKL) divergence, this study empirically and theoretically demonstrates that neither mode-seeking nor mean-seeking properties manifest in KD for LLMs. Instead, RKL and FKL are found to share the same optimization objective and both converge after a sufficient number of epochs. However, due to practical constraints, LLMs are seldom trained for such an extensive number of epochs. Meanwhile, we further find that RKL focuses on the tail part of the distributions, while FKL focuses on the head part at the beginning epochs. Consequently, we propose a simple yet effective Adaptive Kullback-Leiber (AKL) divergence method, which adaptively allocates weights to combine F + +[^2]: 多语言大型语言模型:语料库、对齐和偏见综述 + + A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias + + [https://arxiv.org/abs/2404.00929](https://arxiv.org/abs/2404.00929) + + 该论文对多语言大型语言模型进行了全面分析,深入讨论了关键问题,包括多语言语料库、对齐和偏见。 + + + + 基于大型语言模型(LLMs)的基础上,发展了多语言大型语言模型(MLLMs)来解决多语言自然语言处理任务的挑战,希望实现从高资源到低资源语言的知识转移。然而,仍然存在重要限制和挑战,比如语言不平衡、多语言对齐和固有偏见。本文旨在对MLLMs进行全面分析,深入讨论围绕这些关键问题的议题。 + + arXiv:2404.00929v1 Announce Type: cross Abstract: Based on the foundation of Large Language Models (LLMs), Multilingual Large Language Models (MLLMs) have been developed to address the challenges of multilingual natural language processing tasks, hoping to achieve knowledge transfer from high-resource to low-resource languages. However, significant limitations and challenges still exist, such as language imbalance, multilingual alignment, and inherent bias. In this paper, we aim to provide a comprehensive analysis of MLLMs, delving deeply into discussions surrounding these critical issues. First of all, we start by presenting an overview of MLLMs, covering their evolution, key techniques, and multilingual capacities. Secondly, we explore widely utilized multilingual corpora for MLLMs' training and multilingual datasets oriented for downstream tasks that are crucial for enhancing the cross-lingual capability of MLLMs. Thirdly, we survey the existing studies on multilingual representati + +[^3]: 用于个性化文本到图像生成的自动化黑盒提示工程 + + Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation + + [https://arxiv.org/abs/2403.19103](https://arxiv.org/abs/2403.19103) + + PRISM是一种算法,可以自动识别人类可解释且易传递的提示,从而有效生成所需概念,仅使用黑盒访问T2I模型。 + + + + 提示工程对于控制文本到图像(T2I)生成模型的输出是有效的,但由于需要手动制作提示而导致工作繁重。这一挑战促使了自动提示生成算法的发展。然而,这些方法通常在T2I模型之间的可传递性方面遇到困难,需要对基础模型进行白盒访问,并产生非直观的提示。在这项工作中,我们介绍了PRISM,这是一种算法,可以仅使用黑盒访问T2I模型就自动识别人类可解释且易传递的提示,从而有效生成所需概念。受大型语言模型(LLM)越狱的启发,PRISM利用LLM的上下文学习能力来迭代地改进给定参考图像的候选提示分布。我们的实验展示了PRISM在为对象、样式等生成准确提示方面的多样性和有效性。 + + arXiv:2403.19103v1 Announce Type: cross Abstract: Prompt engineering is effective for controlling the output of text-to-image (T2I) generative models, but it is also laborious due to the need for manually crafted prompts. This challenge has spurred the development of algorithms for automated prompt generation. However, these methods often struggle with transferability across T2I models, require white-box access to the underlying model, and produce non-intuitive prompts. In this work, we introduce PRISM, an algorithm that automatically identifies human-interpretable and transferable prompts that can effectively generate desired concepts given only black-box access to T2I models. Inspired by large language model (LLM) jailbreaking, PRISM leverages the in-context learning ability of LLMs to iteratively refine the candidate prompts distribution for given reference images. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, sty + +[^4]: ChatGPT是否能够基于Twitter提及来预测文章的撤回? + + Can ChatGPT predict article retraction based on Twitter mentions? + + [https://arxiv.org/abs/2403.16851](https://arxiv.org/abs/2403.16851) + + 本研究探讨了ChatGPT是否能够基于Twitter提及来预测文章的撤回,研究发现在预测未来被撤回的有问题文章方面是具有一定潜力的。 + + + + 检测有问题的研究文章具有重要意义,本研究探讨了根据被撤回文章在Twitter上的提及是否能够在文章被撤回前发出信号,从而在预测未来被撤回的有问题文章方面发挥作用。分析了包括3,505篇已撤回文章及其相关Twitter提及在内的数据集,以及使用粗糙精确匹配方法获取的具有类似特征的3,505篇未撤回文章。通过四种预测方法评估了Twitter提及在预测文章撤回方面的有效性,包括手动标注、关键词识别、机器学习模型和ChatGPT。手动标注的结果表明,的确有被撤回的文章,其Twitter提及包含在撤回前发出信号的可识别证据,尽管它们只占所有被撤回文章的一小部分。 + + arXiv:2403.16851v1 Announce Type: cross Abstract: Detecting problematic research articles timely is a vital task. This study explores whether Twitter mentions of retracted articles can signal potential problems with the articles prior to retraction, thereby playing a role in predicting future retraction of problematic articles. A dataset comprising 3,505 retracted articles and their associated Twitter mentions is analyzed, alongside 3,505 non-retracted articles with similar characteristics obtained using the Coarsened Exact Matching method. The effectiveness of Twitter mentions in predicting article retraction is evaluated by four prediction methods, including manual labelling, keyword identification, machine learning models, and ChatGPT. Manual labelling results indicate that there are indeed retracted articles with their Twitter mentions containing recognizable evidence signaling problems before retraction, although they represent only a limited share of all retracted articles with + +[^5]: 面向现实世界图像质量评估的质量感知图像-文本对齐 + + Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment + + [https://arxiv.org/abs/2403.11176](https://arxiv.org/abs/2403.11176) + + 提出了一种基于CLIP的自监督方法QualiCLIP,通过质量感知的图像-文本对齐策略,实现了图像质量评估不需要标记MOS的问题 + + + + 无参考图像质量评估(NR-IQA)致力于设计一种在没有高质量参考图像的情况下测量图像质量的方法,以符合人类感知,大部分最先进的NR-IQA方法中依赖标注的主观评分(MOS)限制了它们在真实场景中的可扩展性和广泛适用性。为了克服这一限制,我们提出了QualiCLIP(Quality-aware CLIP),这是一种基于CLIP的自监督不需要标记MOS的方法。具体来说,我们引入了一种质量感知的图像-文本对齐策略,使得CLIP生成的表示与图像固有质量相关。从原始图像开始,我们使用不断增加的强度合成地劣化它们。然后,我们训练CLIP根据其与质量相关的反义文本提示的相似性对这些降解图像进行排名,同时保证一致的表达 + + arXiv:2403.11176v1 Announce Type: cross Abstract: No-Reference Image Quality Assessment (NR-IQA) focuses on designing methods to measure image quality in alignment with human perception when a high-quality reference image is unavailable. The reliance on annotated Mean Opinion Scores (MOS) in the majority of state-of-the-art NR-IQA approaches limits their scalability and broader applicability to real-world scenarios. To overcome this limitation, we propose QualiCLIP (Quality-aware CLIP), a CLIP-based self-supervised opinion-unaware method that does not require labeled MOS. In particular, we introduce a quality-aware image-text alignment strategy to make CLIP generate representations that correlate with the inherent quality of the images. Starting from pristine images, we synthetically degrade them with increasing levels of intensity. Then, we train CLIP to rank these degraded images based on their similarity to quality-related antonym text prompts, while guaranteeing consistent represe + +[^6]: 语音鲁棒基准:用于语音识别的鲁棒性基准 + + Speech Robust Bench: A Robustness Benchmark For Speech Recognition + + [https://arxiv.org/abs/2403.07937](https://arxiv.org/abs/2403.07937) + + 提出了一个全面基准(SRB),用于评估自动语音识别(ASR)模型对各种破坏的鲁棒性,发现模型大小和某些建模选择有助于提高鲁棒性,并观察到在不同人口亚组上模型的鲁棒性存在明显差异。 + + + + 随着自动语音识别(ASR)模型变得越来越普遍,确保它们在物理世界和数字世界中的各种破坏下进行可靠预测变得愈发重要。我们提出了语音鲁棒基准(SRB),这是一个用于评估ASR模型对各种破坏的鲁棒性的全面基准。SRB由69个输入扰动组成,旨在模拟ASR模型可能在物理世界和数字世界中遇到的各种破坏。我们使用SRB来评估几种最先进的ASR模型的鲁棒性,并观察到模型大小和某些建模选择(如离散表示和自我训练)似乎有助于提高鲁棒性。我们将此分析扩展到衡量ASR模型在来自各种人口亚组的数据上的鲁棒性,即英语和西班牙语使用者以及男性和女性,并观察到模型的鲁棒性在不同亚组之间存在明显差异。 + + arXiv:2403.07937v1 Announce Type: cross Abstract: As Automatic Speech Recognition (ASR) models become ever more pervasive, it is important to ensure that they make reliable predictions under corruptions present in the physical and digital world. We propose Speech Robust Bench (SRB), a comprehensive benchmark for evaluating the robustness of ASR models to diverse corruptions. SRB is composed of 69 input perturbations which are intended to simulate various corruptions that ASR models may encounter in the physical and digital world. We use SRB to evaluate the robustness of several state-of-the-art ASR models and observe that model size and certain modeling choices such as discrete representations, and self-training appear to be conducive to robustness. We extend this analysis to measure the robustness of ASR models on data from various demographic subgroups, namely English and Spanish speakers, and males and females, and observed noticeable disparities in the model's robustness across su + +[^7]: 用于生成简要住院病程摘要的领域自适应大语言模型的基准测试 + + A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries + + [https://arxiv.org/abs/2403.05720](https://arxiv.org/abs/2403.05720) + + 介绍了一个新的基准测试,评估了用于生成简要住院病程摘要的大语言模型在健康保健领域中的性能并提出相应的自适应策略 + + + + 简要住院病程(BHC)摘要是通过总结临床记录而生成的常见临床文件。虽然大型语言模型(LLMs)在自动化实际任务方面展现出显著能力,但它们在医疗应用(如BHC合成)中的能力尚未得到展示。为了使LLMs能够适应BHC合成,我们引入了一个新颖的基准测试,其中包含从MIMIC-IV记录中提取的经过预处理的数据集,封装了临床记录和简要住院病程(BHC)对。我们评估了两个通用LLMs和三个医疗领域适应的LLMs的性能,以改进从临床记录生成BHC。我们使用临床记录作为输入来生成BHC,采用基于提示的(使用上下文学习)和基于微调的自适应策略来应用于三个开源LLMs(Clinical-T5-Large,Llama2-13B,FLAN-UL2)和两个专有LLMs(GPT-3.5,GPT-4)。我们定量评估了性能。 + + arXiv:2403.05720v1 Announce Type: cross Abstract: Brief hospital course (BHC) summaries are common clinical documents generated by summarizing clinical notes. While large language models (LLMs) depict remarkable capabilities in automating real-world tasks, their capabilities for healthcare applications such as BHC synthesis have not been shown. To enable the adaptation of LLMs for BHC synthesis, we introduce a novel benchmark consisting of a pre-processed dataset extracted from MIMIC-IV notes, encapsulating clinical note, and brief hospital course (BHC) pairs. We assess the performance of two general-purpose LLMs and three healthcare-adapted LLMs to improve BHC synthesis from clinical notes. Using clinical notes as input for generating BHCs, we apply prompting-based (using in-context learning) and fine-tuning-based adaptation strategies to three open-source LLMs (Clinical-T5-Large, Llama2-13B, FLAN-UL2) and two proprietary LLMs (GPT-3.5, GPT-4). We quantitatively evaluate the performa + +[^8]: KorMedMCQA: 韩国医疗专业执业考试的多项选择题问答基准 + + KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations + + [https://arxiv.org/abs/2403.01469](https://arxiv.org/abs/2403.01469) + + KorMedMCQA是首个从韩国医疗专业执业考试中衍生的多项选择题问答基准,提供了多种大型语言模型的基线实验结果,并在HuggingFace上公开了数据,为韩国医疗环境中的进一步研究和发展提供了可能性。 + + + + 我们介绍了KorMedMCQA,这是首个源自韩国医疗专业执业考试的韩语多项选择题问答(MCQA)基准,涵盖了从2012年到2023年的考试内容。该数据集包括医生、护士和药剂师执照考试中的一部分问题,涵盖多种学科。我们对各种大型语言模型进行了基线实验,包括专有/开源、多语言/韩语附加预训练和临床背景预训练模型,突显了进一步增强潜力。我们在HuggingFace上公开了我们的数据,并通过LM-Harness提供了一个评估脚本,邀请在韩国医疗环境中进行进一步探索和发展。 + + arXiv:2403.01469v1 Announce Type: new Abstract: We introduce KorMedMCQA, the first Korean multiple-choice question answering (MCQA) benchmark derived from Korean healthcare professional licensing examinations, covering from the year 2012 to year 2023. This dataset consists of a selection of questions from the license examinations for doctors, nurses, and pharmacists, featuring a diverse array of subjects. We conduct baseline experiments on various large language models, including proprietary/open-source, multilingual/Korean-additional pretrained, and clinical context pretrained models, highlighting the potential for further enhancements. We make our data publicly available on HuggingFace and provide a evaluation script via LM-Harness, inviting further exploration and advancement in Korean healthcare environments. + +[^9]: 大型语言模型与游戏:调研与路线图 + + Large Language Models and Games: A Survey and Roadmap + + [https://arxiv.org/abs/2402.18659](https://arxiv.org/abs/2402.18659) + + 这项研究调查了大型语言模型在游戏领域中的多种应用及其角色,指出了未开发领域和未来发展方向,同时探讨了在游戏领域中大型语言模型的潜力和限制。 + + + + 近年来,大型语言模型(LLMs)的研究急剧增加,并伴随着公众对该主题的参与。尽管起初是自然语言处理中的一小部分,LLMs在广泛的应用和领域中展现出显著潜力,包括游戏。本文调查了LLMs在游戏中及为游戏提供支持的各种应用的最新技术水平,并明确了LLMs在游戏中可以扮演的不同角色。重要的是,我们讨论了尚未开发的领域和LLMs在游戏中未来应用的有前途的方向,以及在游戏领域中LLMs的潜力和限制。作为LLMs和游戏交叉领域的第一份综合调查和路线图,我们希望本文能够成为这一激动人心的新领域的开创性研究和创新的基础。 + + arXiv:2402.18659v1 Announce Type: cross Abstract: Recent years have seen an explosive increase in research on large language models (LLMs), and accompanying public engagement on the topic. While starting as a niche area within natural language processing, LLMs have shown remarkable potential across a broad range of applications and domains, including games. This paper surveys the current state of the art across the various applications of LLMs in and for games, and identifies the different roles LLMs can take within a game. Importantly, we discuss underexplored areas and promising directions for future uses of LLMs in games and we reconcile the potential and limitations of LLMs within the games domain. As the first comprehensive survey and roadmap at the intersection of LLMs and games, we are hopeful that this paper will serve as the basis for groundbreaking research and innovation in this exciting new field. + +[^10]: RNNs还不是Transformer:在上下文检索中的关键瓶颈 + + RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval + + [https://arxiv.org/abs/2402.18510](https://arxiv.org/abs/2402.18510) + + 本文研究了RNNs和Transformer在处理算法问题时的表现能力差距,发现RNNs存在关键瓶颈,即无法完美地从上下文中检索信息,导致无法像Transformer那样轻松解决需要这种能力的任务。 + + + + 本文探讨循环神经网络(RNNs)和Transformer在解决算法问题时的表示能力差距。我们重点关注RNNs是否能在处理长序列时,通过Chain-of-Thought (CoT)提示,与Transformer的性能相匹配。我们的理论分析显示CoT可以改进RNNs,但无法弥补与Transformer之间的差距。关键瓶颈在于RNNs无法完全从上下文中检索信息,即使经过CoT的增强:对于几个明确或隐式需要这种能力的任务,如联想召回和确定图是否为树,我们证明RNNs表达能力不足以解决这些任务,而Transformer可以轻松解决。相反,我们证明采用增强RNNs上下文检索能力的技术,包括 + + arXiv:2402.18510v1 Announce Type: cross Abstract: This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems. We focus on understanding whether RNNs, known for their memory efficiency in handling long sequences, can match the performance of Transformers, particularly when enhanced with Chain-of-Thought (CoT) prompting. Our theoretical analysis reveals that CoT improves RNNs but is insufficient to close the gap with Transformers. A key bottleneck lies in the inability of RNNs to perfectly retrieve information from the context, even with CoT: for several tasks that explicitly or implicitly require this capability, such as associative recall and determining if a graph is a tree, we prove that RNNs are not expressive enough to solve the tasks while Transformers can solve them with ease. Conversely, we prove that adopting techniques to enhance the in-context retrieval capability of RNNs, inclu + +[^11]: 在大型语言模型中基准测试心灵理论 + + ToMBench: Benchmarking Theory of Mind in Large Language Models + + [https://arxiv.org/abs/2402.15052](https://arxiv.org/abs/2402.15052) + + 提出了ToMBench框架,在大型语言模型中进行心灵理论性能评估,发现最先进的模型仍然落后于人类表现超过10%。 + + + + 心灵理论(ToM)是指感知和归因自己以及他人的心理状态的认知能力。最近的研究引发了关于大型语言模型(LLMs)是否表现出一种形式的心灵理论的争论。然而,现有的心灵理论评估受到诸如受限范围、主观判断和意外污染等挑战的制约,导致评估不足。为了填补这一空白,我们引入了ToMBench,具有三个关键特征:系统评估框架涵盖社会认知中的8项任务和31项能力,多项选择题格式以支持自动化和无偏见的评估,以及基于双语清单的从头构建,严格避免数据泄漏。基于ToMBench,我们进行了大量实验,评估了10个流行LLMs在任务和能力方面的心灵理论表现。我们发现,即使像GPT-4这样的最先进的LLMs也比人类表现落后超过10个百分点。 + + arXiv:2402.15052v1 Announce Type: cross Abstract: Theory of Mind (ToM) is the cognitive capability to perceive and ascribe mental states to oneself and others. Recent research has sparked a debate over whether large language models (LLMs) exhibit a form of ToM. However, existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination, yielding inadequate assessments. To address this gap, we introduce ToMBench with three key characteristics: a systematic evaluation framework encompassing 8 tasks and 31 abilities in social cognition, a multiple-choice question format to support automated and unbiased evaluation, and a build-from-scratch bilingual inventory to strictly avoid data leakage. Based on ToMBench, we conduct extensive experiments to evaluate the ToM performance of 10 popular LLMs across tasks and abilities. We find that even the most advanced LLMs like GPT-4 lag behind human performance by over 10% points, indicati + +[^12]: 基于查询的对抗性提示生成 + + Query-Based Adversarial Prompt Generation + + [https://arxiv.org/abs/2402.12329](https://arxiv.org/abs/2402.12329) + + 该研究提出了一种基于查询的对抗性攻击方法,通过利用远程语言模型的 API 访问构造对抗性示例,使模型以更高概率发出有害字符串,而非仅仅基于模型之间的转移性攻击。 + + + + 最近的研究表明,可以构造对抗性示例,导致一个对其进行了调整的语言模型产生有害字符串或执行有害行为。现有的攻击要么在白盒设置中(完全访问模型权重),要么通过可转移性:一种现象,即在一个模型上精心设计的对抗性示例通常在其他模型上仍然有效。我们通过基于查询的攻击改进以前的工作,利用 API 访问远程语言模型来构造对抗性示例,使模型以(明显)更高的概率发出有害字符串,而不能仅仅使用转移攻击。我们在 GPT-3.5 和 OpenAI 的安全分类器上验证了我们的攻击;我们能够让 GPT-3.5 发出有害字符串,而目前的转移攻击失败了,并且我们几乎以 100% 的概率规避了安全分类器。 + + arXiv:2402.12329v1 Announce Type: cross Abstract: Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior. Existing attacks work either in the white-box setting (with full access to the model weights), or through transferability: the phenomenon that adversarial examples crafted on one model often remain effective on other models. We improve on prior work with a query-based attack that leverages API access to a remote language model to construct adversarial examples that cause the model to emit harmful strings with (much) higher probability than with transfer-only attacks. We validate our attack on GPT-3.5 and OpenAI's safety classifier; we can cause GPT-3.5 to emit harmful strings that current transfer attacks fail at, and we can evade the safety classifier with nearly 100% probability. + +[^13]: FIPO:基于自由形式指导的提示优化与偏好数据集和模块化微调模式 + + FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema + + [https://arxiv.org/abs/2402.11811](https://arxiv.org/abs/2402.11811) + + FIPO提出了基于自由形式指导的提示优化方法,结合偏好数据集和模块化微调模式,重新构思了优化过程并实现了灵活的任务提示生成。 + + + + 在促进大语言模型在最终用户-机器人交互中的深度智能方面,提示创作的艺术被视为普通用户的一项关键但复杂的任务。与之前基于模型而不考虑指导的自动提示优化方法形成对比,这些方法为预定义目标模型产生了光滑的结果,但在使用开箱即用模型时容易快速退化,我们提出了基于自由形式指导的提示优化(FIPO)。这种方法得到我们的大规模提示偏好数据集的支持,并采用模块化微调模式。FIPO模式重新构思了优化过程,将其分解为可管理的模块,以动态调整内容的元提示为锚点。这允许灵活整合原始任务指导、可选指导响应和可选真实值,以生成经过精心优化的任务提示。 + + arXiv:2402.11811v1 Announce Type: new Abstract: In the quest to facilitate the deep intelligence of Large Language Models (LLMs) accessible in final-end user-bot interactions, the art of prompt crafting emerges as a critical yet complex task for the average user. Contrast to previous model-oriented yet instruction-agnostic Automatic Prompt Optimization methodologies, yielding polished results for predefined target models while suffering rapid degradation with out-of-box models, we present Free-form Instruction-oriented Prompt Optimization (FIPO). This approach is supported by our large-scale prompt preference dataset and employs a modular fine-tuning schema. The FIPO schema reimagines the optimization process into manageable modules, anchored by a meta prompt that dynamically adapts content. This allows for the flexible integration of the raw task instruction, the optional instruction response, and the optional ground truth to produce finely optimized task prompts. The FIPO preference + +[^14]: LLMs下的时间序列预测:理解和增强模型能力 + + Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities + + [https://arxiv.org/abs/2402.10835](https://arxiv.org/abs/2402.10835) + + 本研究通过比较LLMs与传统模型,发现了LLMs在时间序列预测中的优势和局限性,指出LLMs在预测具有明显模式和趋势的时间序列方面表现出色,但在缺乏周期性的数据集方面面临挑战,同时指出融入外部知识和采用自然语言释义有助于提升LLMs在时间序列预测中的性能。 + + + + 大语言模型(LLMs)近年来在许多领域得到迅速发展。作为一种经典的机器学习任务,时间序列预测最近从LLMs中获得了推动。然而,在这一领域,LLMs的偏好存在研究空白。通过将LLMs与传统模型进行比较,发现了LLMs在时间序列预测中的许多特性。例如,我们的研究表明,LLMs在预测具有明显模式和趋势的时间序列方面表现出色,但在缺乏周期性的数据集方面面临挑战。我们通过设计提示要求LLMs告知数据集的周期来解释我们的发现。此外,本文还研究了输入策略,发现融入外部知识和采用自然语言释义积极影响了LLMs在时间序列预测中的预测性能。总的来说,这项研究有助于洞察LLMs在时间序列预测中的优势和局限性。 + + arXiv:2402.10835v1 Announce Type: new Abstract: Large language models (LLMs) have been applied in many fields with rapid development in recent years. As a classic machine learning task, time series forecasting has recently received a boost from LLMs. However, there is a research gap in the LLMs' preferences in this field. In this paper, by comparing LLMs with traditional models, many properties of LLMs in time series prediction are found. For example, our study shows that LLMs excel in predicting time series with clear patterns and trends but face challenges with datasets lacking periodicity. We explain our findings through designing prompts to require LLMs to tell the period of the datasets. In addition, the input strategy is investigated, and it is found that incorporating external knowledge and adopting natural language paraphrases positively affects the predictive performance of LLMs for time series. Overall, this study contributes to insight into the advantages and limitations of + +[^15]: 胡乱造谣:绕过ChatGPT的防护措施,大规模生成难以检测的虚假信息声明 + + Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims at Scale + + [https://arxiv.org/abs/2402.08467](https://arxiv.org/abs/2402.08467) + + 本研究探索了ChatGPT在生成关于乌克兰战争的虚假信息方面的能力,发现它可以以较低成本、快速且大规模地生成逼真的定制虚假信息,而且这些虚假信息很难被人类读者和现有的自动化工具可靠地区分出来。 + + + + 随着大型语言模型(LLM)变得越来越熟练,它们在大规模病毒式虚假信息活动中的滥用成为一个越来越严重的问题。本研究探讨了ChatGPT生成关于乌克兰战争的无条件声明的能力,这是一个超出其知识界限的事件,并评估这些声明是否可以被人类读者和自动化工具与人类编写的声明区分出来。我们比较了ClaimReview中关于战争的声明,这些声明是由IFCN注册的事实核查员撰写的,以及ChatGPT生成的类似的短篇内容。我们证明,ChatGPT可以快速、廉价且规模化地生成逼真且针对特定目标的虚假信息,而且这些声明人类和现有的自动化工具无法可靠地区分出来。 + + As Large Language Models (LLMs) become more proficient, their misuse in large-scale viral disinformation campaigns is a growing concern. This study explores the capability of ChatGPT to generate unconditioned claims about the war in Ukraine, an event beyond its knowledge cutoff, and evaluates whether such claims can be differentiated by human readers and automated tools from human-written ones. We compare war-related claims from ClaimReview, authored by IFCN-registered fact-checkers, and similar short-form content generated by ChatGPT. We demonstrate that ChatGPT can produce realistic, target-specific disinformation cheaply, fast, and at scale, and that these claims cannot be reliably distinguished by humans or existing automated tools. + +[^16]: EntGPT: 将生成型大型语言模型与知识库相连接 + + EntGPT: Linking Generative Large Language Models with Knowledge Bases + + [https://arxiv.org/abs/2402.06738](https://arxiv.org/abs/2402.06738) + + 本文介绍了一种名为EntGPT的模型,通过Entity Disambiguation(ED)任务,连接了生成型大型语言模型与知识库。通过提示工程和指令调整,该模型在没有有监督微调的情况下,显著提高了LLMs的性能,并在实体消歧任务上取得了可比较的性能。 + + + + 由于训练和推理过程中缺乏事实核实和知识基础,大型语言模型(LLM)生成的事实正确输出的能力相对较少被研究。在这项工作中,我们通过Entity Disambiguation(ED)任务来解决这一挑战。我们首先考虑了提示工程,并设计了一个三步硬提示方法,以在没有有监督微调(SFT)的情况下探测LLM的ED性能。总体而言,该提示方法显著提高了原始基准模型的微F_1得分,在某些情况下提高了36%甚至更高,并在10个数据集上与现有的SFT方法相比,获得了可比较的性能。我们通过使用类似的提示和响应进行指令调整(IT)进一步提高了知识基础。指令调整的模型在受监督实体消歧任务上不仅实现了更高的微F1得分性能,而且平均微F_1提高了。 + + The ability of Large Language Models (LLMs) to generate factually correct output remains relatively unexplored due to the lack of fact-checking and knowledge grounding during training and inference. In this work, we aim to address this challenge through the Entity Disambiguation (ED) task. We first consider prompt engineering, and design a three-step hard-prompting method to probe LLMs' ED performance without supervised fine-tuning (SFT). Overall, the prompting method improves the micro-F_1 score of the original vanilla models by a large margin, on some cases up to 36% and higher, and obtains comparable performance across 10 datasets when compared to existing methods with SFT. We further improve the knowledge grounding ability through instruction tuning (IT) with similar prompts and responses. The instruction-tuned model not only achieves higher micro-F1 score performance as compared to several baseline methods on supervised entity disambiguation tasks with an average micro-F_1 improve + +[^17]: CIC:一种面向文化感知图像字幕的框架 + + CIC: A framework for Culturally-aware Image Captioning + + [https://arxiv.org/abs/2402.05374](https://arxiv.org/abs/2402.05374) + + CIC是一种面向文化感知图像字幕的框架,通过结合视觉问答和大型语言模型,它能够生成能描述图像中文化元素的详细字幕。 + + + + 图像字幕通过使用视觉-语言预训练模型(VLPs)如BLIP从图像生成描述性句子,这种方法已经取得了很大的改进。然而,当前的方法缺乏对图像中所描绘的文化元素(例如亚洲文化群体的传统服装)生成详细描述性字幕的能力。在本文中,我们提出了一种新的框架,\textbf{面向文化感知图像字幕(CIC)},该框架能够从代表不同文化的图像中生成字幕并描述文化元素。受到将视觉模态和大型语言模型(LLMs)通过适当的提示进行组合的方法的启发,我们的框架(1)根据图像中的文化类别生成问题,(2)利用生成的问题从视觉问答(VQA)中提取文化视觉元素,(3)使用带有提示的LLMs生成文化感知字幕。我们在4个不同大学的45名参与者上进行了人工评估。 + + Image Captioning generates descriptive sentences from images using Vision-Language Pre-trained models (VLPs) such as BLIP, which has improved greatly. However, current methods lack the generation of detailed descriptive captions for the cultural elements depicted in the images, such as the traditional clothing worn by people from Asian cultural groups. In this paper, we propose a new framework, \textbf{Culturally-aware Image Captioning (CIC)}, that generates captions and describes cultural elements extracted from cultural visual elements in images representing cultures. Inspired by methods combining visual modality and Large Language Models (LLMs) through appropriate prompts, our framework (1) generates questions based on cultural categories from images, (2) extracts cultural visual elements from Visual Question Answering (VQA) using generated questions, and (3) generates culturally-aware captions using LLMs with the prompts. Our human evaluation conducted on 45 participants from 4 dif + +[^18]: 个性化语言模型基于个性化人类反馈 + + Personalized Language Modeling from Personalized Human Feedback + + [https://arxiv.org/abs/2402.05133](https://arxiv.org/abs/2402.05133) + + 该论文提出了一个个性化语言模型的方法,通过在于用户的反馈数据中引入个性化特征来解决强化学习框架在多样化用户偏好下存在的问题。 + + + + 从个性化人类反馈中进行强化学习(RLHF)是目前主流的框架,用于调整大型语言模型以更好地符合人类偏好。然而,在这个框架下开发的算法的基本前提在用户偏好多样化的情况下可能会出现问题。在本文中,我们旨在通过开发个性化语言模型的方法来解决这个问题。我们首先正式介绍了从个性化人类反馈中学习的任务,并解释了为什么在这种情况下普通的RLHF可能会存在问题。然后,我们提出了一个通用的个性化-RLHF(P-RLHF)框架,需要同时学习用户模型和语言(或奖励)模型。用户模型接收用户信息并输出用户表示。其结构编码了我们对反馈数据中用户偏好的假设。我们为个性化奖励建模和个性化直接偏好优化开发了新的学习目标。 + + Reinforcement Learning from Human Feedback (RLHF) is the current dominating framework to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be problematic in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, which requires one to jointly learn a user model and a language (or reward) model. The user model takes in user information and outputs user representations. Its structure encodes our assumptions about user preferences underlying the feedback data. We develop new learning objectives for personalized reward modeling and personalized Direct Preference Optimizat + +[^19]: 众包自适应调查 + + Crowdsourced Adaptive Surveys. (arXiv:2401.12986v1 [cs.CL]) + + [http://arxiv.org/abs/2401.12986](http://arxiv.org/abs/2401.12986) + + 众包自适应调查方法(CSAS)结合自然语言处理和自适应算法,能够根据用户输入演变问题库,并在调查中适应新的问题,应用在拉丁裔信息环境和议题重要性领域,能够识别难以通过传统方法跟踪的主张或问题。 + + + + 公众舆论调查对于民主决策至关重要,但对于传统调查方法来说,快速变化的信息环境和在小众社区中衡量观点可能是具有挑战性的。本文介绍了一种众包自适应调查方法(CSAS),它将自然语言处理和自适应算法的进展结合起来,生成随着用户输入不断演变的问题库。CSAS方法将参与者提供的开放式文本转换为Likert式项目,并应用多臂赌博算法来确定应优先考虑在调查中的用户提供问题。该方法的自适应性允许探索新的调查问题,同时在调查长度上施加最小的成本。在拉丁裔信息环境和议题重要性领域的应用展示了CSAS识别可能难以通过标准方法跟踪的主张或问题的能力。最后,我提出 Conclusion by di的结束语。 + + Public opinion surveys are vital for informing democratic decision-making, but responding to rapidly changing information environments and measuring beliefs within niche communities can be challenging for traditional survey methods. This paper introduces a crowdsourced adaptive survey methodology (CSAS) that unites advances in natural language processing and adaptive algorithms to generate question banks that evolve with user input. The CSAS method converts open-ended text provided by participants into Likert-style items and applies a multi-armed bandit algorithm to determine user-provided questions that should be prioritized in the survey. The method's adaptive nature allows for the exploration of new survey questions, while imposing minimal costs in survey length. Applications in the domains of Latino information environments and issue importance showcase CSAS's ability to identify claims or issues that might otherwise be difficult to track using standard approaches. I conclude by di + +[^20]: 一种针对语言方言的自然语言处理方法:一项调查 + + Natural Language Processing for Dialects of a Language: A Survey. (arXiv:2401.05632v1 [cs.CL]) + + [http://arxiv.org/abs/2401.05632](http://arxiv.org/abs/2401.05632) + + 这项调查研究了自然语言处理中针对方言的方法和问题,强调了方言对于NLP模型性能和语言技术公平性的影响,并提供了关于方言相关任务和语言的全面综述。 + + + + 最先进的自然语言处理(NLP)模型是在大规模训练语料库上训练的,并在评估数据集上展现出卓越的性能。本调查探讨了这些数据集的一个重要属性:语言方言。考虑到针对方言数据集的NLP模型性能下降及其对语言技术公平性的影响,我们调查了有关方言NLP的过去研究,包括数据集和方法。我们从两个类别的视角描述了各种NLP任务:自然语言理解(NLU)(如方言分类、情感分析、解析和NLU基准测试)和自然语言生成(NLG)(如摘要、机器翻译和对话系统)。这项调查还广泛涵盖了英语、阿拉伯语、德语等多种语言。我们观察到,有关方言的过去NLP工作不止于方言分类,而是... + + State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches. We describe a wide range of NLP tasks in terms of two categories: natural language understanding (NLU) (for tasks such as dialect classification, sentiment analysis, parsing, and NLU benchmarks) and natural language generation (NLG) (for summarisation, machine translation, and dialogue systems). The survey is also broad in its coverage of languages which include English, Arabic, German among others. We observe that past work in NLP concerning dialects goes deeper than mere dialect classification, and . This includes ear + +[^21]: 分割与合并:对大型语言模型的位置偏差进行校准 + + Split and Merge: Aligning Position Biases in Large Language Model based Evaluators. (arXiv:2310.01432v1 [cs.CL]) + + [http://arxiv.org/abs/2310.01432](http://arxiv.org/abs/2310.01432) + + PORTIA是一个旨在校准大型语言模型评估器的位置偏差的对齐系统,通过将答案分割成多个片段,并对其进行对齐,然后将其合并回一个单一的提示,以提高评估的准确性和公正性。 + + + + 大型语言模型(LLMs)已被证明可以作为自动化评估器,用于评估AI系统生成的答案的质量。然而,这些基于LLM的评估器在使用对比评估候选答案时存在位置偏差或不一致性,无视内容而偏向于第一个或第二个答案。为了解决这个问题,我们提出了PORTIA,这是一个基于对齐的系统,旨在模拟人类的比较策略,以轻量级但有效的方式校准位置偏差。具体而言,PORTIA将答案分割成多个片段,对比候选答案中的相似内容进行对齐,并将它们合并回一个单一的提示,以供LLMs评估。我们使用六种不同的LLM进行了大量实验,评估了11,520个答案对。我们的结果表明,PORTIA显著提高了所有模型和对比形式的一致性率,平均相对改进率达到47.46%。引人注目的是,PORTIA使得LLMs能够评估中对位置偏差进行校准的创新方法,从而提高了评估的准确性和公正性。 + + Large language models (LLMs) have shown promise as automated evaluators for assessing the quality of answers generated by AI systems. However, these LLM-based evaluators exhibit position bias, or inconsistency, when used to evaluate candidate answers in pairwise comparisons, favoring either the first or second answer regardless of content. To address this limitation, we propose PORTIA, an alignment-based system designed to mimic human comparison strategies to calibrate position bias in a lightweight yet effective manner. Specifically, PORTIA splits the answers into multiple segments, aligns similar content across candidate answers, and then merges them back into a single prompt for evaluation by LLMs. We conducted extensive experiments with six diverse LLMs to evaluate 11,520 answer pairs. Our results show that PORTIA markedly enhances the consistency rates for all the models and comparison forms tested, achieving an average relative improvement of 47.46%. Remarkably, PORTIA enables le + +[^22]: FinEval:一个用于大型语言模型的中文金融领域知识评估基准 + + FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models. (arXiv:2308.09975v1 [cs.CL]) + + [http://arxiv.org/abs/2308.09975](http://arxiv.org/abs/2308.09975) + + 本论文提出了一个专门用于评估大型语言模型在金融领域知识上的基准FinEval。通过在FinEval上评估中英文LLMs,结果显示只有GPT-4在不同提示设置下实现了接近70%的准确率,展示了LLMs在金融领域知识中的显著增长潜力。 + + + + 大型语言模型(LLMs)在各种自然语言处理任务中展示出了出色的性能,但是它们在更具挑战性和专业领域的任务中的效果尚未得到深入研究。本文提出了FinEval,这是一个专门为LLMs中的金融领域知识设计的评估基准。FinEval是一个包含了金融、经济、会计和证书等34个学术科目的高质量多项选择题的集合,总计包含了4,661道题目。为了确保对模型性能进行全面评估,FinEval使用了多种提示类型,包括零样本和少样本提示,以及仅答案提示和思路链式提示。通过在FinEval上评估最先进的中文和英文LLMs,结果显示只有GPT-4在不同的提示设置下实现了接近70%的准确率,表明LLMs在金融领域知识中具有显著的增长潜力。我们的工作为金融领域的知识评估提供了更全面的基准。 + + Large language models (LLMs) have demonstrated exceptional performance in various natural language processing tasks, yet their efficacy in more challenging and domain-specific tasks remains largely unexplored. This paper presents FinEval, a benchmark specifically designed for the financial domain knowledge in the LLMs. FinEval is a collection of high-quality multiple-choice questions covering Finance, Economy, Accounting, and Certificate. It includes 4,661 questions spanning 34 different academic subjects. To ensure a comprehensive model performance evaluation, FinEval employs a range of prompt types, including zero-shot and few-shot prompts, as well as answer-only and chain-of-thought prompts. Evaluating state-of-the-art Chinese and English LLMs on FinEval, the results show that only GPT-4 achieved an accuracy close to 70% in different prompt settings, indicating significant growth potential for LLMs in the financial domain knowledge. Our work offers a more comprehensive financial kno + +[^23]: 对大型语言模型调查响应的质疑 + + Questioning the Survey Responses of Large Language Models. (arXiv:2306.07951v1 [cs.CL]) + + [http://arxiv.org/abs/2306.07951](http://arxiv.org/abs/2306.07951) + + 本文使用美国人口普查局建立的全美社区调查(ACS)评估了十几个不同大小的语言模型,发现小型模型具有显著的位置和标签偏差,而模型大小的增加能减轻这种偏差,但无法根据US群体或任何可识别的群体趋势进行调整。 + + + + 随着大型语言模型的能力增强,研究人员开始以各种科学动机对这些模型进行调查。本文旨在通过美国人口普查局已经建立的全美社区调查(ACS),就模型的调查响应结果探究所能了解的内容。我们对十几个不同大小的模型进行了评估,这些模型的参数范围从几亿到一万亿不等,使用ACS的问题进行了数十万次的测试,系统地得出了两个主要模式。首先,小型模型存在明显的位置和标签偏差,例如偏向于采用标记为“A”的调查响应。随着模型尺寸的增加,A-偏差虽然有所减少,但也进展缓慢。其次,即使通过随机答案顺序来调整这种标记偏差,模型仍然不会趋向于美国人口统计数据或任何可识别的人口排序。相反,各种模型趋向于均匀随机化。 + + As large language models increase in capability, researchers have started to conduct surveys of all kinds on these models with varying scientific motivations. In this work, we examine what we can learn from a model's survey responses on the basis of the well-established American Community Survey (ACS) by the U.S. Census Bureau. Evaluating more than a dozen different models, varying in size from a few hundred million to ten billion parameters, hundreds of thousands of times each on questions from the ACS, we systematically establish two dominant patterns. First, smaller models have a significant position and labeling bias, for example, towards survey responses labeled with the letter "A". This A-bias diminishes, albeit slowly, as model size increases. Second, when adjusting for this labeling bias through randomized answer ordering, models still do not trend toward US population statistics or those of any cognizable population. Rather, models across the board trend toward uniformly rando + +[^24]: 使用LLM辅助注释进行语料库语言学研究:本地语法分析案例研究 + + Using LLM-assisted Annotation for Corpus Linguistics: A Case Study of Local Grammar Analysis. (arXiv:2305.08339v2 [cs.CL] UPDATED) + + [http://arxiv.org/abs/2305.08339](http://arxiv.org/abs/2305.08339) + + 本文研究了使用基于大语言模型的聊天机器人自动标注文本的潜力,重点考察了从本地语法角度观察道歉言语行为构成的功能元素的程度,并比较了不同模型在注释任务中的表现,结果表明Bing聊天机器人在任务中表现优于ChatGPT和人类标注员。 + + + + 基于大语言模型(LLMs)的聊天机器人在语言理解方面表现出很强的能力。本研究探索LLMs在协助基于语料库的语言学研究方面的潜力,通过将文本自动标注为特定语言信息类别。具体而言,我们研究了从本地语法的角度观察道歉言语行为构成的功能元素的程度,通过比较基于GPT-3.5的ChatGPT、基于GPT-4的Bing聊天机器人和人类编码器在注释任务中的表现。结果表明,Bing聊天机器人在任务中表现显着优于ChatGPT。与人类标注员相比,Bing聊天机器人的整体表现略低于人类标注员的表现,但已经取得了较高的F1得分:道歉标记99.95%,原因标记91.91%,道歉者标记95.35%,被道歉者标记89.74%和加强标记96.47%。这表明,在语言类别清晰且可以轻松识别的情况下,使用LLM辅助注释进行语料库语言学研究是可行的。 + + Chatbots based on Large Language Models (LLMs) have shown strong capabilities in language understanding. In this study, we explore the potential of LLMs in assisting corpus-based linguistic studies through automatic annotation of texts with specific categories of linguistic information. Specifically, we examined to what extent LLMs understand the functional elements constituting the speech act of apology from a local grammar perspective, by comparing the performance of ChatGPT (powered by GPT-3.5), the Bing chatbot (powered by GPT-4), and a human coder in the annotation task. The results demonstrate that the Bing chatbot significantly outperformed ChatGPT in the task. Compared to human annotator, the overall performance of the Bing chatbot was slightly less satisfactory. However, it already achieved high F1 scores: 99.95% for the tag of APOLOGISING, 91.91% for REASON, 95.35% for APOLOGISER, 89.74% for APOLOGISEE, and 96.47% for INTENSIFIER. This suggests that it is feasible to use LLM- + +[^25]: 语言控制扩散:通过空间、时间和任务高效扩展 + + Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks. (arXiv:2210.15629v2 [cs.LG] UPDATED) + + [http://arxiv.org/abs/2210.15629](http://arxiv.org/abs/2210.15629) + + 本文提出一种利用语言控制扩散模型的分层规划器,有效而高效地扩展扩散模型,解决长时间跨度自然语言指令下的控制问题,实现了较高的单任务和多任务成功率,并极大地提高计算效率。 + + + + 训练通用型智能体在各个方面都很困难,需要处理高维输入(空间)、长时间跨度(时间)和多个新任务。最近的结构方面的进展使得我们可以沿着其中一个或两个维度提高扩展性能力,但计算成本仍然很高。本文提出使用语言控制扩散模型作为一种基于自然语言条件的分层规划器(LCD)来应对这三个方面。我们有效而高效地扩展扩散模型,以应对时间、状态和任务空间维度的长时间跨度控制问题。我们在CALVIN语言机器人基准测试中将LCD与其他最先进的模型进行比较,发现LCD在多任务成功率方面优于其他最先进的方法,而单任务成功率(SR)为88.7%,远高于以前的最佳成绩82.6%,大大提高了计算效率。 + + Training generalist agents is difficult across several axes, requiring us to deal with high-dimensional inputs (space), long horizons (time), and multiple and new tasks. Recent advances with architectures have allowed for improved scaling along one or two of these dimensions, but are still prohibitive computationally. In this paper, we propose to address all three axes by leveraging Language to Control Diffusion models as a hierarchical planner conditioned on language (LCD). We effectively and efficiently scale diffusion models for planning in extended temporal, state, and task dimensions to tackle long horizon control problems conditioned on natural language instructions. We compare LCD with other state-of-the-art models on the CALVIN language robotics benchmark and find that LCD outperforms other SOTA methods in multi task success rates while dramatically improving computational efficiency with a single task success rate (SR) of 88.7% against the previous best of 82.6%. We show that diff --git a/cs.CL.xml b/cs.CL.xml index 304b3a9dd..21bf59fc6 100644 --- a/cs.CL.xml +++ b/cs.CL.xml @@ -1,21 +1,501 @@ -Chat Arxiv cs.CLhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.CL对于评估大型语言模型中多选题回答的合理性进行了回顾,发现当前基于多选题回答的基准可能无法充分捕捉大型语言模型的真实能力。https://rss.arxiv.org/abs/2402.01349<p> -超越答案:对于评估大型语言模型中多选题回答的合理性的回顾 +Chat Arxiv cs.CLhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.CL本研究重新思考了大型语言模型知识蒸馏中对Kullback-Leibler散度的应用,发现逆Kullback-Leibler和正向Kullback-Leibler散度在优化目标上相似,为此提出了一种自适应Kullback-Leiber散度方法。https://arxiv.org/abs/2404.02657<p> +在大型语言模型知识蒸馏中重新思考Kullback-Leibler散度 </p> <p> -Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models +Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models </p> <p> -https://rss.arxiv.org/abs/2402.01349 +https://arxiv.org/abs/2404.02657 </p> <p> -对于评估大型语言模型中多选题回答的合理性进行了回顾,发现当前基于多选题回答的基准可能无法充分捕捉大型语言模型的真实能力。 +本研究重新思考了大型语言模型知识蒸馏中对Kullback-Leibler散度的应用,发现逆Kullback-Leibler和正向Kullback-Leibler散度在优化目标上相似,为此提出了一种自适应Kullback-Leiber散度方法。 </p> <p> </p> <p> -在自然语言处理领域,大型语言模型(LLMs)引发了一场范式转变,显著提升了自然语言生成任务的性能。尽管取得了这些进展,对LLMs的全面评估仍然是社区面临的必然挑战。最近,将多选题回答(MCQA)作为LLMs的基准已经引起了广泛关注。本研究调查了MCQA作为LLMs评估方法的合理性。如果LLMs真正理解问题的语义,它们的性能应该在从相同问题派生的各种配置上表现一致。然而,我们的实证结果表明LLMs的响应一致性存在显著差异,我们将之定义为LLMs的响应可变性综合征(REVAS),这表明目前基于MCQA的基准可能无法充分捕捉LLMs的真实能力,强调了对更合适的评估方法的需要。 +Kullback-Leibler散度在知识蒸馏中被广泛应用于压缩大型语言模型。本研究从经验和理论上证明了,在LLMs的知识蒸馏中,与之前断言的逆Kullback-Leibler(RKL)散度寻找模式并因此优于寻找平均值的正向Kullback-Leibler(FKL)散度相反,实际上在知识蒸馏中都没有体现出寻找模式或寻找平均值的特性。相反,发现RKL和FKL具有相同的优化目标,并在足够数量的时代之后都会收敛。然而,由于实际约束,LLMs很少被训练如此多的时代。同时,我们进一步发现,RKL在分布的尾部,而FKL在开始时代侧重于分布的头部。因此,我们提出了一种简单而有效的自适应Kullback-Leiber(AKL)散度方法,该方法自适应地分配权重来组合F </p> <p> -In the field of natural language processing (NLP), Large Language Models (LLMs) have precipitated a paradigm shift, markedly enhancing performance in natural language generation tasks. Despite these advancements, the comprehensive evaluation of LLMs remains an inevitable challenge for the community. Recently, the utilization of Multiple Choice Question Answering (MCQA) as a benchmark for LLMs has gained considerable traction. This study investigates the rationality of MCQA as an evaluation method for LLMs. If LLMs genuinely understand the semantics of questions, their performance should exhibit consistency across the varied configurations derived from the same questions. Contrary to this expectation, our empirical findings suggest a notable disparity in the consistency of LLM responses, which we define as REsponse VAriability Syndrome (REVAS) of the LLMs, indicating that current MCQA-based benchmarks may not adequately capture the true capabilities of LLMs, which underscores the need f +arXiv:2404.02657v1 Announce Type: cross Abstract: Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler (RKL) divergence is mode-seeking and thus preferable over the mean-seeking forward Kullback-Leibler (FKL) divergence, this study empirically and theoretically demonstrates that neither mode-seeking nor mean-seeking properties manifest in KD for LLMs. Instead, RKL and FKL are found to share the same optimization objective and both converge after a sufficient number of epochs. However, due to practical constraints, LLMs are seldom trained for such an extensive number of epochs. Meanwhile, we further find that RKL focuses on the tail part of the distributions, while FKL focuses on the head part at the beginning epochs. Consequently, we propose a simple yet effective Adaptive Kullback-Leiber (AKL) divergence method, which adaptively allocates weights to combine F +</p>该论文对多语言大型语言模型进行了全面分析,深入讨论了关键问题,包括多语言语料库、对齐和偏见。https://arxiv.org/abs/2404.00929<p> +多语言大型语言模型:语料库、对齐和偏见综述 +</p> +<p> +A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias +</p> +<p> +https://arxiv.org/abs/2404.00929 +</p> +<p> +该论文对多语言大型语言模型进行了全面分析,深入讨论了关键问题,包括多语言语料库、对齐和偏见。 +</p> +<p> + +</p> +<p> +基于大型语言模型(LLMs)的基础上,发展了多语言大型语言模型(MLLMs)来解决多语言自然语言处理任务的挑战,希望实现从高资源到低资源语言的知识转移。然而,仍然存在重要限制和挑战,比如语言不平衡、多语言对齐和固有偏见。本文旨在对MLLMs进行全面分析,深入讨论围绕这些关键问题的议题。 +</p> +<p> +arXiv:2404.00929v1 Announce Type: cross Abstract: Based on the foundation of Large Language Models (LLMs), Multilingual Large Language Models (MLLMs) have been developed to address the challenges of multilingual natural language processing tasks, hoping to achieve knowledge transfer from high-resource to low-resource languages. However, significant limitations and challenges still exist, such as language imbalance, multilingual alignment, and inherent bias. In this paper, we aim to provide a comprehensive analysis of MLLMs, delving deeply into discussions surrounding these critical issues. First of all, we start by presenting an overview of MLLMs, covering their evolution, key techniques, and multilingual capacities. Secondly, we explore widely utilized multilingual corpora for MLLMs' training and multilingual datasets oriented for downstream tasks that are crucial for enhancing the cross-lingual capability of MLLMs. Thirdly, we survey the existing studies on multilingual representati +</p>PRISM是一种算法,可以自动识别人类可解释且易传递的提示,从而有效生成所需概念,仅使用黑盒访问T2I模型。https://arxiv.org/abs/2403.19103<p> +用于个性化文本到图像生成的自动化黑盒提示工程 +</p> +<p> +Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation +</p> +<p> +https://arxiv.org/abs/2403.19103 +</p> +<p> +PRISM是一种算法,可以自动识别人类可解释且易传递的提示,从而有效生成所需概念,仅使用黑盒访问T2I模型。 +</p> +<p> + +</p> +<p> +提示工程对于控制文本到图像(T2I)生成模型的输出是有效的,但由于需要手动制作提示而导致工作繁重。这一挑战促使了自动提示生成算法的发展。然而,这些方法通常在T2I模型之间的可传递性方面遇到困难,需要对基础模型进行白盒访问,并产生非直观的提示。在这项工作中,我们介绍了PRISM,这是一种算法,可以仅使用黑盒访问T2I模型就自动识别人类可解释且易传递的提示,从而有效生成所需概念。受大型语言模型(LLM)越狱的启发,PRISM利用LLM的上下文学习能力来迭代地改进给定参考图像的候选提示分布。我们的实验展示了PRISM在为对象、样式等生成准确提示方面的多样性和有效性。 +</p> +<p> +arXiv:2403.19103v1 Announce Type: cross Abstract: Prompt engineering is effective for controlling the output of text-to-image (T2I) generative models, but it is also laborious due to the need for manually crafted prompts. This challenge has spurred the development of algorithms for automated prompt generation. However, these methods often struggle with transferability across T2I models, require white-box access to the underlying model, and produce non-intuitive prompts. In this work, we introduce PRISM, an algorithm that automatically identifies human-interpretable and transferable prompts that can effectively generate desired concepts given only black-box access to T2I models. Inspired by large language model (LLM) jailbreaking, PRISM leverages the in-context learning ability of LLMs to iteratively refine the candidate prompts distribution for given reference images. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, sty +</p>本研究探讨了ChatGPT是否能够基于Twitter提及来预测文章的撤回,研究发现在预测未来被撤回的有问题文章方面是具有一定潜力的。https://arxiv.org/abs/2403.16851<p> +ChatGPT是否能够基于Twitter提及来预测文章的撤回? +</p> +<p> +Can ChatGPT predict article retraction based on Twitter mentions? +</p> +<p> +https://arxiv.org/abs/2403.16851 +</p> +<p> +本研究探讨了ChatGPT是否能够基于Twitter提及来预测文章的撤回,研究发现在预测未来被撤回的有问题文章方面是具有一定潜力的。 +</p> +<p> + +</p> +<p> +检测有问题的研究文章具有重要意义,本研究探讨了根据被撤回文章在Twitter上的提及是否能够在文章被撤回前发出信号,从而在预测未来被撤回的有问题文章方面发挥作用。分析了包括3,505篇已撤回文章及其相关Twitter提及在内的数据集,以及使用粗糙精确匹配方法获取的具有类似特征的3,505篇未撤回文章。通过四种预测方法评估了Twitter提及在预测文章撤回方面的有效性,包括手动标注、关键词识别、机器学习模型和ChatGPT。手动标注的结果表明,的确有被撤回的文章,其Twitter提及包含在撤回前发出信号的可识别证据,尽管它们只占所有被撤回文章的一小部分。 +</p> +<p> +arXiv:2403.16851v1 Announce Type: cross Abstract: Detecting problematic research articles timely is a vital task. This study explores whether Twitter mentions of retracted articles can signal potential problems with the articles prior to retraction, thereby playing a role in predicting future retraction of problematic articles. A dataset comprising 3,505 retracted articles and their associated Twitter mentions is analyzed, alongside 3,505 non-retracted articles with similar characteristics obtained using the Coarsened Exact Matching method. The effectiveness of Twitter mentions in predicting article retraction is evaluated by four prediction methods, including manual labelling, keyword identification, machine learning models, and ChatGPT. Manual labelling results indicate that there are indeed retracted articles with their Twitter mentions containing recognizable evidence signaling problems before retraction, although they represent only a limited share of all retracted articles with +</p>提出了一种基于CLIP的自监督方法QualiCLIP,通过质量感知的图像-文本对齐策略,实现了图像质量评估不需要标记MOS的问题https://arxiv.org/abs/2403.11176<p> +面向现实世界图像质量评估的质量感知图像-文本对齐 +</p> +<p> +Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment +</p> +<p> +https://arxiv.org/abs/2403.11176 +</p> +<p> +提出了一种基于CLIP的自监督方法QualiCLIP,通过质量感知的图像-文本对齐策略,实现了图像质量评估不需要标记MOS的问题 +</p> +<p> + +</p> +<p> +无参考图像质量评估(NR-IQA)致力于设计一种在没有高质量参考图像的情况下测量图像质量的方法,以符合人类感知,大部分最先进的NR-IQA方法中依赖标注的主观评分(MOS)限制了它们在真实场景中的可扩展性和广泛适用性。为了克服这一限制,我们提出了QualiCLIP(Quality-aware CLIP),这是一种基于CLIP的自监督不需要标记MOS的方法。具体来说,我们引入了一种质量感知的图像-文本对齐策略,使得CLIP生成的表示与图像固有质量相关。从原始图像开始,我们使用不断增加的强度合成地劣化它们。然后,我们训练CLIP根据其与质量相关的反义文本提示的相似性对这些降解图像进行排名,同时保证一致的表达 +</p> +<p> +arXiv:2403.11176v1 Announce Type: cross Abstract: No-Reference Image Quality Assessment (NR-IQA) focuses on designing methods to measure image quality in alignment with human perception when a high-quality reference image is unavailable. The reliance on annotated Mean Opinion Scores (MOS) in the majority of state-of-the-art NR-IQA approaches limits their scalability and broader applicability to real-world scenarios. To overcome this limitation, we propose QualiCLIP (Quality-aware CLIP), a CLIP-based self-supervised opinion-unaware method that does not require labeled MOS. In particular, we introduce a quality-aware image-text alignment strategy to make CLIP generate representations that correlate with the inherent quality of the images. Starting from pristine images, we synthetically degrade them with increasing levels of intensity. Then, we train CLIP to rank these degraded images based on their similarity to quality-related antonym text prompts, while guaranteeing consistent represe +</p>提出了一个全面基准(SRB),用于评估自动语音识别(ASR)模型对各种破坏的鲁棒性,发现模型大小和某些建模选择有助于提高鲁棒性,并观察到在不同人口亚组上模型的鲁棒性存在明显差异。https://arxiv.org/abs/2403.07937<p> +语音鲁棒基准:用于语音识别的鲁棒性基准 +</p> +<p> +Speech Robust Bench: A Robustness Benchmark For Speech Recognition +</p> +<p> +https://arxiv.org/abs/2403.07937 +</p> +<p> +提出了一个全面基准(SRB),用于评估自动语音识别(ASR)模型对各种破坏的鲁棒性,发现模型大小和某些建模选择有助于提高鲁棒性,并观察到在不同人口亚组上模型的鲁棒性存在明显差异。 +</p> +<p> + +</p> +<p> +随着自动语音识别(ASR)模型变得越来越普遍,确保它们在物理世界和数字世界中的各种破坏下进行可靠预测变得愈发重要。我们提出了语音鲁棒基准(SRB),这是一个用于评估ASR模型对各种破坏的鲁棒性的全面基准。SRB由69个输入扰动组成,旨在模拟ASR模型可能在物理世界和数字世界中遇到的各种破坏。我们使用SRB来评估几种最先进的ASR模型的鲁棒性,并观察到模型大小和某些建模选择(如离散表示和自我训练)似乎有助于提高鲁棒性。我们将此分析扩展到衡量ASR模型在来自各种人口亚组的数据上的鲁棒性,即英语和西班牙语使用者以及男性和女性,并观察到模型的鲁棒性在不同亚组之间存在明显差异。 +</p> +<p> +arXiv:2403.07937v1 Announce Type: cross Abstract: As Automatic Speech Recognition (ASR) models become ever more pervasive, it is important to ensure that they make reliable predictions under corruptions present in the physical and digital world. We propose Speech Robust Bench (SRB), a comprehensive benchmark for evaluating the robustness of ASR models to diverse corruptions. SRB is composed of 69 input perturbations which are intended to simulate various corruptions that ASR models may encounter in the physical and digital world. We use SRB to evaluate the robustness of several state-of-the-art ASR models and observe that model size and certain modeling choices such as discrete representations, and self-training appear to be conducive to robustness. We extend this analysis to measure the robustness of ASR models on data from various demographic subgroups, namely English and Spanish speakers, and males and females, and observed noticeable disparities in the model's robustness across su +</p>介绍了一个新的基准测试,评估了用于生成简要住院病程摘要的大语言模型在健康保健领域中的性能并提出相应的自适应策略https://arxiv.org/abs/2403.05720<p> +用于生成简要住院病程摘要的领域自适应大语言模型的基准测试 +</p> +<p> +A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries +</p> +<p> +https://arxiv.org/abs/2403.05720 +</p> +<p> +介绍了一个新的基准测试,评估了用于生成简要住院病程摘要的大语言模型在健康保健领域中的性能并提出相应的自适应策略 +</p> +<p> + +</p> +<p> +简要住院病程(BHC)摘要是通过总结临床记录而生成的常见临床文件。虽然大型语言模型(LLMs)在自动化实际任务方面展现出显著能力,但它们在医疗应用(如BHC合成)中的能力尚未得到展示。为了使LLMs能够适应BHC合成,我们引入了一个新颖的基准测试,其中包含从MIMIC-IV记录中提取的经过预处理的数据集,封装了临床记录和简要住院病程(BHC)对。我们评估了两个通用LLMs和三个医疗领域适应的LLMs的性能,以改进从临床记录生成BHC。我们使用临床记录作为输入来生成BHC,采用基于提示的(使用上下文学习)和基于微调的自适应策略来应用于三个开源LLMs(Clinical-T5-Large,Llama2-13B,FLAN-UL2)和两个专有LLMs(GPT-3.5,GPT-4)。我们定量评估了性能。 +</p> +<p> +arXiv:2403.05720v1 Announce Type: cross Abstract: Brief hospital course (BHC) summaries are common clinical documents generated by summarizing clinical notes. While large language models (LLMs) depict remarkable capabilities in automating real-world tasks, their capabilities for healthcare applications such as BHC synthesis have not been shown. To enable the adaptation of LLMs for BHC synthesis, we introduce a novel benchmark consisting of a pre-processed dataset extracted from MIMIC-IV notes, encapsulating clinical note, and brief hospital course (BHC) pairs. We assess the performance of two general-purpose LLMs and three healthcare-adapted LLMs to improve BHC synthesis from clinical notes. Using clinical notes as input for generating BHCs, we apply prompting-based (using in-context learning) and fine-tuning-based adaptation strategies to three open-source LLMs (Clinical-T5-Large, Llama2-13B, FLAN-UL2) and two proprietary LLMs (GPT-3.5, GPT-4). We quantitatively evaluate the performa +</p>KorMedMCQA是首个从韩国医疗专业执业考试中衍生的多项选择题问答基准,提供了多种大型语言模型的基线实验结果,并在HuggingFace上公开了数据,为韩国医疗环境中的进一步研究和发展提供了可能性。https://arxiv.org/abs/2403.01469<p> +KorMedMCQA: 韩国医疗专业执业考试的多项选择题问答基准 +</p> +<p> +KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations +</p> +<p> +https://arxiv.org/abs/2403.01469 +</p> +<p> +KorMedMCQA是首个从韩国医疗专业执业考试中衍生的多项选择题问答基准,提供了多种大型语言模型的基线实验结果,并在HuggingFace上公开了数据,为韩国医疗环境中的进一步研究和发展提供了可能性。 +</p> +<p> + +</p> +<p> +我们介绍了KorMedMCQA,这是首个源自韩国医疗专业执业考试的韩语多项选择题问答(MCQA)基准,涵盖了从2012年到2023年的考试内容。该数据集包括医生、护士和药剂师执照考试中的一部分问题,涵盖多种学科。我们对各种大型语言模型进行了基线实验,包括专有/开源、多语言/韩语附加预训练和临床背景预训练模型,突显了进一步增强潜力。我们在HuggingFace上公开了我们的数据,并通过LM-Harness提供了一个评估脚本,邀请在韩国医疗环境中进行进一步探索和发展。 +</p> +<p> +arXiv:2403.01469v1 Announce Type: new Abstract: We introduce KorMedMCQA, the first Korean multiple-choice question answering (MCQA) benchmark derived from Korean healthcare professional licensing examinations, covering from the year 2012 to year 2023. This dataset consists of a selection of questions from the license examinations for doctors, nurses, and pharmacists, featuring a diverse array of subjects. We conduct baseline experiments on various large language models, including proprietary/open-source, multilingual/Korean-additional pretrained, and clinical context pretrained models, highlighting the potential for further enhancements. We make our data publicly available on HuggingFace and provide a evaluation script via LM-Harness, inviting further exploration and advancement in Korean healthcare environments. +</p>这项研究调查了大型语言模型在游戏领域中的多种应用及其角色,指出了未开发领域和未来发展方向,同时探讨了在游戏领域中大型语言模型的潜力和限制。https://arxiv.org/abs/2402.18659<p> +大型语言模型与游戏:调研与路线图 +</p> +<p> +Large Language Models and Games: A Survey and Roadmap +</p> +<p> +https://arxiv.org/abs/2402.18659 +</p> +<p> +这项研究调查了大型语言模型在游戏领域中的多种应用及其角色,指出了未开发领域和未来发展方向,同时探讨了在游戏领域中大型语言模型的潜力和限制。 +</p> +<p> + +</p> +<p> +近年来,大型语言模型(LLMs)的研究急剧增加,并伴随着公众对该主题的参与。尽管起初是自然语言处理中的一小部分,LLMs在广泛的应用和领域中展现出显著潜力,包括游戏。本文调查了LLMs在游戏中及为游戏提供支持的各种应用的最新技术水平,并明确了LLMs在游戏中可以扮演的不同角色。重要的是,我们讨论了尚未开发的领域和LLMs在游戏中未来应用的有前途的方向,以及在游戏领域中LLMs的潜力和限制。作为LLMs和游戏交叉领域的第一份综合调查和路线图,我们希望本文能够成为这一激动人心的新领域的开创性研究和创新的基础。 +</p> +<p> +arXiv:2402.18659v1 Announce Type: cross Abstract: Recent years have seen an explosive increase in research on large language models (LLMs), and accompanying public engagement on the topic. While starting as a niche area within natural language processing, LLMs have shown remarkable potential across a broad range of applications and domains, including games. This paper surveys the current state of the art across the various applications of LLMs in and for games, and identifies the different roles LLMs can take within a game. Importantly, we discuss underexplored areas and promising directions for future uses of LLMs in games and we reconcile the potential and limitations of LLMs within the games domain. As the first comprehensive survey and roadmap at the intersection of LLMs and games, we are hopeful that this paper will serve as the basis for groundbreaking research and innovation in this exciting new field. +</p>本文研究了RNNs和Transformer在处理算法问题时的表现能力差距,发现RNNs存在关键瓶颈,即无法完美地从上下文中检索信息,导致无法像Transformer那样轻松解决需要这种能力的任务。https://arxiv.org/abs/2402.18510<p> +RNNs还不是Transformer:在上下文检索中的关键瓶颈 +</p> +<p> +RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval +</p> +<p> +https://arxiv.org/abs/2402.18510 +</p> +<p> +本文研究了RNNs和Transformer在处理算法问题时的表现能力差距,发现RNNs存在关键瓶颈,即无法完美地从上下文中检索信息,导致无法像Transformer那样轻松解决需要这种能力的任务。 +</p> +<p> + +</p> +<p> +本文探讨循环神经网络(RNNs)和Transformer在解决算法问题时的表示能力差距。我们重点关注RNNs是否能在处理长序列时,通过Chain-of-Thought (CoT)提示,与Transformer的性能相匹配。我们的理论分析显示CoT可以改进RNNs,但无法弥补与Transformer之间的差距。关键瓶颈在于RNNs无法完全从上下文中检索信息,即使经过CoT的增强:对于几个明确或隐式需要这种能力的任务,如联想召回和确定图是否为树,我们证明RNNs表达能力不足以解决这些任务,而Transformer可以轻松解决。相反,我们证明采用增强RNNs上下文检索能力的技术,包括 +</p> +<p> +arXiv:2402.18510v1 Announce Type: cross Abstract: This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems. We focus on understanding whether RNNs, known for their memory efficiency in handling long sequences, can match the performance of Transformers, particularly when enhanced with Chain-of-Thought (CoT) prompting. Our theoretical analysis reveals that CoT improves RNNs but is insufficient to close the gap with Transformers. A key bottleneck lies in the inability of RNNs to perfectly retrieve information from the context, even with CoT: for several tasks that explicitly or implicitly require this capability, such as associative recall and determining if a graph is a tree, we prove that RNNs are not expressive enough to solve the tasks while Transformers can solve them with ease. Conversely, we prove that adopting techniques to enhance the in-context retrieval capability of RNNs, inclu +</p>提出了ToMBench框架,在大型语言模型中进行心灵理论性能评估,发现最先进的模型仍然落后于人类表现超过10%。https://arxiv.org/abs/2402.15052<p> +在大型语言模型中基准测试心灵理论 +</p> +<p> +ToMBench: Benchmarking Theory of Mind in Large Language Models +</p> +<p> +https://arxiv.org/abs/2402.15052 +</p> +<p> +提出了ToMBench框架,在大型语言模型中进行心灵理论性能评估,发现最先进的模型仍然落后于人类表现超过10%。 +</p> +<p> + +</p> +<p> +心灵理论(ToM)是指感知和归因自己以及他人的心理状态的认知能力。最近的研究引发了关于大型语言模型(LLMs)是否表现出一种形式的心灵理论的争论。然而,现有的心灵理论评估受到诸如受限范围、主观判断和意外污染等挑战的制约,导致评估不足。为了填补这一空白,我们引入了ToMBench,具有三个关键特征:系统评估框架涵盖社会认知中的8项任务和31项能力,多项选择题格式以支持自动化和无偏见的评估,以及基于双语清单的从头构建,严格避免数据泄漏。基于ToMBench,我们进行了大量实验,评估了10个流行LLMs在任务和能力方面的心灵理论表现。我们发现,即使像GPT-4这样的最先进的LLMs也比人类表现落后超过10个百分点。 +</p> +<p> +arXiv:2402.15052v1 Announce Type: cross Abstract: Theory of Mind (ToM) is the cognitive capability to perceive and ascribe mental states to oneself and others. Recent research has sparked a debate over whether large language models (LLMs) exhibit a form of ToM. However, existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination, yielding inadequate assessments. To address this gap, we introduce ToMBench with three key characteristics: a systematic evaluation framework encompassing 8 tasks and 31 abilities in social cognition, a multiple-choice question format to support automated and unbiased evaluation, and a build-from-scratch bilingual inventory to strictly avoid data leakage. Based on ToMBench, we conduct extensive experiments to evaluate the ToM performance of 10 popular LLMs across tasks and abilities. We find that even the most advanced LLMs like GPT-4 lag behind human performance by over 10% points, indicati +</p>该研究提出了一种基于查询的对抗性攻击方法,通过利用远程语言模型的 API 访问构造对抗性示例,使模型以更高概率发出有害字符串,而非仅仅基于模型之间的转移性攻击。https://arxiv.org/abs/2402.12329<p> +基于查询的对抗性提示生成 +</p> +<p> +Query-Based Adversarial Prompt Generation +</p> +<p> +https://arxiv.org/abs/2402.12329 +</p> +<p> +该研究提出了一种基于查询的对抗性攻击方法,通过利用远程语言模型的 API 访问构造对抗性示例,使模型以更高概率发出有害字符串,而非仅仅基于模型之间的转移性攻击。 +</p> +<p> + +</p> +<p> +最近的研究表明,可以构造对抗性示例,导致一个对其进行了调整的语言模型产生有害字符串或执行有害行为。现有的攻击要么在白盒设置中(完全访问模型权重),要么通过可转移性:一种现象,即在一个模型上精心设计的对抗性示例通常在其他模型上仍然有效。我们通过基于查询的攻击改进以前的工作,利用 API 访问远程语言模型来构造对抗性示例,使模型以(明显)更高的概率发出有害字符串,而不能仅仅使用转移攻击。我们在 GPT-3.5 和 OpenAI 的安全分类器上验证了我们的攻击;我们能够让 GPT-3.5 发出有害字符串,而目前的转移攻击失败了,并且我们几乎以 100% 的概率规避了安全分类器。 +</p> +<p> +arXiv:2402.12329v1 Announce Type: cross Abstract: Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior. Existing attacks work either in the white-box setting (with full access to the model weights), or through transferability: the phenomenon that adversarial examples crafted on one model often remain effective on other models. We improve on prior work with a query-based attack that leverages API access to a remote language model to construct adversarial examples that cause the model to emit harmful strings with (much) higher probability than with transfer-only attacks. We validate our attack on GPT-3.5 and OpenAI's safety classifier; we can cause GPT-3.5 to emit harmful strings that current transfer attacks fail at, and we can evade the safety classifier with nearly 100% probability. +</p>FIPO提出了基于自由形式指导的提示优化方法,结合偏好数据集和模块化微调模式,重新构思了优化过程并实现了灵活的任务提示生成。https://arxiv.org/abs/2402.11811<p> +FIPO:基于自由形式指导的提示优化与偏好数据集和模块化微调模式 +</p> +<p> +FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema +</p> +<p> +https://arxiv.org/abs/2402.11811 +</p> +<p> +FIPO提出了基于自由形式指导的提示优化方法,结合偏好数据集和模块化微调模式,重新构思了优化过程并实现了灵活的任务提示生成。 +</p> +<p> + +</p> +<p> +在促进大语言模型在最终用户-机器人交互中的深度智能方面,提示创作的艺术被视为普通用户的一项关键但复杂的任务。与之前基于模型而不考虑指导的自动提示优化方法形成对比,这些方法为预定义目标模型产生了光滑的结果,但在使用开箱即用模型时容易快速退化,我们提出了基于自由形式指导的提示优化(FIPO)。这种方法得到我们的大规模提示偏好数据集的支持,并采用模块化微调模式。FIPO模式重新构思了优化过程,将其分解为可管理的模块,以动态调整内容的元提示为锚点。这允许灵活整合原始任务指导、可选指导响应和可选真实值,以生成经过精心优化的任务提示。 +</p> +<p> +arXiv:2402.11811v1 Announce Type: new Abstract: In the quest to facilitate the deep intelligence of Large Language Models (LLMs) accessible in final-end user-bot interactions, the art of prompt crafting emerges as a critical yet complex task for the average user. Contrast to previous model-oriented yet instruction-agnostic Automatic Prompt Optimization methodologies, yielding polished results for predefined target models while suffering rapid degradation with out-of-box models, we present Free-form Instruction-oriented Prompt Optimization (FIPO). This approach is supported by our large-scale prompt preference dataset and employs a modular fine-tuning schema. The FIPO schema reimagines the optimization process into manageable modules, anchored by a meta prompt that dynamically adapts content. This allows for the flexible integration of the raw task instruction, the optional instruction response, and the optional ground truth to produce finely optimized task prompts. The FIPO preference +</p>本研究通过比较LLMs与传统模型,发现了LLMs在时间序列预测中的优势和局限性,指出LLMs在预测具有明显模式和趋势的时间序列方面表现出色,但在缺乏周期性的数据集方面面临挑战,同时指出融入外部知识和采用自然语言释义有助于提升LLMs在时间序列预测中的性能。https://arxiv.org/abs/2402.10835<p> +LLMs下的时间序列预测:理解和增强模型能力 +</p> +<p> +Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities +</p> +<p> +https://arxiv.org/abs/2402.10835 +</p> +<p> +本研究通过比较LLMs与传统模型,发现了LLMs在时间序列预测中的优势和局限性,指出LLMs在预测具有明显模式和趋势的时间序列方面表现出色,但在缺乏周期性的数据集方面面临挑战,同时指出融入外部知识和采用自然语言释义有助于提升LLMs在时间序列预测中的性能。 +</p> +<p> + +</p> +<p> +大语言模型(LLMs)近年来在许多领域得到迅速发展。作为一种经典的机器学习任务,时间序列预测最近从LLMs中获得了推动。然而,在这一领域,LLMs的偏好存在研究空白。通过将LLMs与传统模型进行比较,发现了LLMs在时间序列预测中的许多特性。例如,我们的研究表明,LLMs在预测具有明显模式和趋势的时间序列方面表现出色,但在缺乏周期性的数据集方面面临挑战。我们通过设计提示要求LLMs告知数据集的周期来解释我们的发现。此外,本文还研究了输入策略,发现融入外部知识和采用自然语言释义积极影响了LLMs在时间序列预测中的预测性能。总的来说,这项研究有助于洞察LLMs在时间序列预测中的优势和局限性。 +</p> +<p> +arXiv:2402.10835v1 Announce Type: new Abstract: Large language models (LLMs) have been applied in many fields with rapid development in recent years. As a classic machine learning task, time series forecasting has recently received a boost from LLMs. However, there is a research gap in the LLMs' preferences in this field. In this paper, by comparing LLMs with traditional models, many properties of LLMs in time series prediction are found. For example, our study shows that LLMs excel in predicting time series with clear patterns and trends but face challenges with datasets lacking periodicity. We explain our findings through designing prompts to require LLMs to tell the period of the datasets. In addition, the input strategy is investigated, and it is found that incorporating external knowledge and adopting natural language paraphrases positively affects the predictive performance of LLMs for time series. Overall, this study contributes to insight into the advantages and limitations of +</p>本研究探索了ChatGPT在生成关于乌克兰战争的虚假信息方面的能力,发现它可以以较低成本、快速且大规模地生成逼真的定制虚假信息,而且这些虚假信息很难被人类读者和现有的自动化工具可靠地区分出来。https://arxiv.org/abs/2402.08467<p> +胡乱造谣:绕过ChatGPT的防护措施,大规模生成难以检测的虚假信息声明 +</p> +<p> +Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims at Scale +</p> +<p> +https://arxiv.org/abs/2402.08467 +</p> +<p> +本研究探索了ChatGPT在生成关于乌克兰战争的虚假信息方面的能力,发现它可以以较低成本、快速且大规模地生成逼真的定制虚假信息,而且这些虚假信息很难被人类读者和现有的自动化工具可靠地区分出来。 +</p> +<p> + +</p> +<p> +随着大型语言模型(LLM)变得越来越熟练,它们在大规模病毒式虚假信息活动中的滥用成为一个越来越严重的问题。本研究探讨了ChatGPT生成关于乌克兰战争的无条件声明的能力,这是一个超出其知识界限的事件,并评估这些声明是否可以被人类读者和自动化工具与人类编写的声明区分出来。我们比较了ClaimReview中关于战争的声明,这些声明是由IFCN注册的事实核查员撰写的,以及ChatGPT生成的类似的短篇内容。我们证明,ChatGPT可以快速、廉价且规模化地生成逼真且针对特定目标的虚假信息,而且这些声明人类和现有的自动化工具无法可靠地区分出来。 +</p> +<p> +As Large Language Models (LLMs) become more proficient, their misuse in large-scale viral disinformation campaigns is a growing concern. This study explores the capability of ChatGPT to generate unconditioned claims about the war in Ukraine, an event beyond its knowledge cutoff, and evaluates whether such claims can be differentiated by human readers and automated tools from human-written ones. We compare war-related claims from ClaimReview, authored by IFCN-registered fact-checkers, and similar short-form content generated by ChatGPT. We demonstrate that ChatGPT can produce realistic, target-specific disinformation cheaply, fast, and at scale, and that these claims cannot be reliably distinguished by humans or existing automated tools. +</p>本文介绍了一种名为EntGPT的模型,通过Entity Disambiguation(ED)任务,连接了生成型大型语言模型与知识库。通过提示工程和指令调整,该模型在没有有监督微调的情况下,显著提高了LLMs的性能,并在实体消歧任务上取得了可比较的性能。https://arxiv.org/abs/2402.06738<p> +EntGPT: 将生成型大型语言模型与知识库相连接 +</p> +<p> +EntGPT: Linking Generative Large Language Models with Knowledge Bases +</p> +<p> +https://arxiv.org/abs/2402.06738 +</p> +<p> +本文介绍了一种名为EntGPT的模型,通过Entity Disambiguation(ED)任务,连接了生成型大型语言模型与知识库。通过提示工程和指令调整,该模型在没有有监督微调的情况下,显著提高了LLMs的性能,并在实体消歧任务上取得了可比较的性能。 +</p> +<p> + +</p> +<p> +由于训练和推理过程中缺乏事实核实和知识基础,大型语言模型(LLM)生成的事实正确输出的能力相对较少被研究。在这项工作中,我们通过Entity Disambiguation(ED)任务来解决这一挑战。我们首先考虑了提示工程,并设计了一个三步硬提示方法,以在没有有监督微调(SFT)的情况下探测LLM的ED性能。总体而言,该提示方法显著提高了原始基准模型的微F_1得分,在某些情况下提高了36%甚至更高,并在10个数据集上与现有的SFT方法相比,获得了可比较的性能。我们通过使用类似的提示和响应进行指令调整(IT)进一步提高了知识基础。指令调整的模型在受监督实体消歧任务上不仅实现了更高的微F1得分性能,而且平均微F_1提高了。 +</p> +<p> +The ability of Large Language Models (LLMs) to generate factually correct output remains relatively unexplored due to the lack of fact-checking and knowledge grounding during training and inference. In this work, we aim to address this challenge through the Entity Disambiguation (ED) task. We first consider prompt engineering, and design a three-step hard-prompting method to probe LLMs' ED performance without supervised fine-tuning (SFT). Overall, the prompting method improves the micro-F_1 score of the original vanilla models by a large margin, on some cases up to 36% and higher, and obtains comparable performance across 10 datasets when compared to existing methods with SFT. We further improve the knowledge grounding ability through instruction tuning (IT) with similar prompts and responses. The instruction-tuned model not only achieves higher micro-F1 score performance as compared to several baseline methods on supervised entity disambiguation tasks with an average micro-F_1 improve +</p>CIC是一种面向文化感知图像字幕的框架,通过结合视觉问答和大型语言模型,它能够生成能描述图像中文化元素的详细字幕。https://arxiv.org/abs/2402.05374<p> +CIC:一种面向文化感知图像字幕的框架 +</p> +<p> +CIC: A framework for Culturally-aware Image Captioning +</p> +<p> +https://arxiv.org/abs/2402.05374 +</p> +<p> +CIC是一种面向文化感知图像字幕的框架,通过结合视觉问答和大型语言模型,它能够生成能描述图像中文化元素的详细字幕。 +</p> +<p> + +</p> +<p> +图像字幕通过使用视觉-语言预训练模型(VLPs)如BLIP从图像生成描述性句子,这种方法已经取得了很大的改进。然而,当前的方法缺乏对图像中所描绘的文化元素(例如亚洲文化群体的传统服装)生成详细描述性字幕的能力。在本文中,我们提出了一种新的框架,\textbf{面向文化感知图像字幕(CIC)},该框架能够从代表不同文化的图像中生成字幕并描述文化元素。受到将视觉模态和大型语言模型(LLMs)通过适当的提示进行组合的方法的启发,我们的框架(1)根据图像中的文化类别生成问题,(2)利用生成的问题从视觉问答(VQA)中提取文化视觉元素,(3)使用带有提示的LLMs生成文化感知字幕。我们在4个不同大学的45名参与者上进行了人工评估。 +</p> +<p> +Image Captioning generates descriptive sentences from images using Vision-Language Pre-trained models (VLPs) such as BLIP, which has improved greatly. However, current methods lack the generation of detailed descriptive captions for the cultural elements depicted in the images, such as the traditional clothing worn by people from Asian cultural groups. In this paper, we propose a new framework, \textbf{Culturally-aware Image Captioning (CIC)}, that generates captions and describes cultural elements extracted from cultural visual elements in images representing cultures. Inspired by methods combining visual modality and Large Language Models (LLMs) through appropriate prompts, our framework (1) generates questions based on cultural categories from images, (2) extracts cultural visual elements from Visual Question Answering (VQA) using generated questions, and (3) generates culturally-aware captions using LLMs with the prompts. Our human evaluation conducted on 45 participants from 4 dif +</p>该论文提出了一个个性化语言模型的方法,通过在于用户的反馈数据中引入个性化特征来解决强化学习框架在多样化用户偏好下存在的问题。https://arxiv.org/abs/2402.05133<p> +个性化语言模型基于个性化人类反馈 +</p> +<p> +Personalized Language Modeling from Personalized Human Feedback +</p> +<p> +https://arxiv.org/abs/2402.05133 +</p> +<p> +该论文提出了一个个性化语言模型的方法,通过在于用户的反馈数据中引入个性化特征来解决强化学习框架在多样化用户偏好下存在的问题。 +</p> +<p> + +</p> +<p> +从个性化人类反馈中进行强化学习(RLHF)是目前主流的框架,用于调整大型语言模型以更好地符合人类偏好。然而,在这个框架下开发的算法的基本前提在用户偏好多样化的情况下可能会出现问题。在本文中,我们旨在通过开发个性化语言模型的方法来解决这个问题。我们首先正式介绍了从个性化人类反馈中学习的任务,并解释了为什么在这种情况下普通的RLHF可能会存在问题。然后,我们提出了一个通用的个性化-RLHF(P-RLHF)框架,需要同时学习用户模型和语言(或奖励)模型。用户模型接收用户信息并输出用户表示。其结构编码了我们对反馈数据中用户偏好的假设。我们为个性化奖励建模和个性化直接偏好优化开发了新的学习目标。 +</p> +<p> +Reinforcement Learning from Human Feedback (RLHF) is the current dominating framework to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be problematic in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, which requires one to jointly learn a user model and a language (or reward) model. The user model takes in user information and outputs user representations. Its structure encodes our assumptions about user preferences underlying the feedback data. We develop new learning objectives for personalized reward modeling and personalized Direct Preference Optimizat +</p>众包自适应调查方法(CSAS)结合自然语言处理和自适应算法,能够根据用户输入演变问题库,并在调查中适应新的问题,应用在拉丁裔信息环境和议题重要性领域,能够识别难以通过传统方法跟踪的主张或问题。http://arxiv.org/abs/2401.12986<p> +众包自适应调查 +</p> +<p> +Crowdsourced Adaptive Surveys. (arXiv:2401.12986v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2401.12986 +</p> +<p> +众包自适应调查方法(CSAS)结合自然语言处理和自适应算法,能够根据用户输入演变问题库,并在调查中适应新的问题,应用在拉丁裔信息环境和议题重要性领域,能够识别难以通过传统方法跟踪的主张或问题。 +</p> +<p> + +</p> +<p> +公众舆论调查对于民主决策至关重要,但对于传统调查方法来说,快速变化的信息环境和在小众社区中衡量观点可能是具有挑战性的。本文介绍了一种众包自适应调查方法(CSAS),它将自然语言处理和自适应算法的进展结合起来,生成随着用户输入不断演变的问题库。CSAS方法将参与者提供的开放式文本转换为Likert式项目,并应用多臂赌博算法来确定应优先考虑在调查中的用户提供问题。该方法的自适应性允许探索新的调查问题,同时在调查长度上施加最小的成本。在拉丁裔信息环境和议题重要性领域的应用展示了CSAS识别可能难以通过标准方法跟踪的主张或问题的能力。最后,我提出 Conclusion by di的结束语。 +</p> +<p> +Public opinion surveys are vital for informing democratic decision-making, but responding to rapidly changing information environments and measuring beliefs within niche communities can be challenging for traditional survey methods. This paper introduces a crowdsourced adaptive survey methodology (CSAS) that unites advances in natural language processing and adaptive algorithms to generate question banks that evolve with user input. The CSAS method converts open-ended text provided by participants into Likert-style items and applies a multi-armed bandit algorithm to determine user-provided questions that should be prioritized in the survey. The method's adaptive nature allows for the exploration of new survey questions, while imposing minimal costs in survey length. Applications in the domains of Latino information environments and issue importance showcase CSAS's ability to identify claims or issues that might otherwise be difficult to track using standard approaches. I conclude by di +</p>这项调查研究了自然语言处理中针对方言的方法和问题,强调了方言对于NLP模型性能和语言技术公平性的影响,并提供了关于方言相关任务和语言的全面综述。http://arxiv.org/abs/2401.05632<p> +一种针对语言方言的自然语言处理方法:一项调查 +</p> +<p> +Natural Language Processing for Dialects of a Language: A Survey. (arXiv:2401.05632v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2401.05632 +</p> +<p> +这项调查研究了自然语言处理中针对方言的方法和问题,强调了方言对于NLP模型性能和语言技术公平性的影响,并提供了关于方言相关任务和语言的全面综述。 +</p> +<p> + +</p> +<p> +最先进的自然语言处理(NLP)模型是在大规模训练语料库上训练的,并在评估数据集上展现出卓越的性能。本调查探讨了这些数据集的一个重要属性:语言方言。考虑到针对方言数据集的NLP模型性能下降及其对语言技术公平性的影响,我们调查了有关方言NLP的过去研究,包括数据集和方法。我们从两个类别的视角描述了各种NLP任务:自然语言理解(NLU)(如方言分类、情感分析、解析和NLU基准测试)和自然语言生成(NLG)(如摘要、机器翻译和对话系统)。这项调查还广泛涵盖了英语、阿拉伯语、德语等多种语言。我们观察到,有关方言的过去NLP工作不止于方言分类,而是... +</p> +<p> +State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches. We describe a wide range of NLP tasks in terms of two categories: natural language understanding (NLU) (for tasks such as dialect classification, sentiment analysis, parsing, and NLU benchmarks) and natural language generation (NLG) (for summarisation, machine translation, and dialogue systems). The survey is also broad in its coverage of languages which include English, Arabic, German among others. We observe that past work in NLP concerning dialects goes deeper than mere dialect classification, and . This includes ear +</p>PORTIA是一个旨在校准大型语言模型评估器的位置偏差的对齐系统,通过将答案分割成多个片段,并对其进行对齐,然后将其合并回一个单一的提示,以提高评估的准确性和公正性。http://arxiv.org/abs/2310.01432<p> +分割与合并:对大型语言模型的位置偏差进行校准 +</p> +<p> +Split and Merge: Aligning Position Biases in Large Language Model based Evaluators. (arXiv:2310.01432v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2310.01432 +</p> +<p> +PORTIA是一个旨在校准大型语言模型评估器的位置偏差的对齐系统,通过将答案分割成多个片段,并对其进行对齐,然后将其合并回一个单一的提示,以提高评估的准确性和公正性。 +</p> +<p> + +</p> +<p> +大型语言模型(LLMs)已被证明可以作为自动化评估器,用于评估AI系统生成的答案的质量。然而,这些基于LLM的评估器在使用对比评估候选答案时存在位置偏差或不一致性,无视内容而偏向于第一个或第二个答案。为了解决这个问题,我们提出了PORTIA,这是一个基于对齐的系统,旨在模拟人类的比较策略,以轻量级但有效的方式校准位置偏差。具体而言,PORTIA将答案分割成多个片段,对比候选答案中的相似内容进行对齐,并将它们合并回一个单一的提示,以供LLMs评估。我们使用六种不同的LLM进行了大量实验,评估了11,520个答案对。我们的结果表明,PORTIA显著提高了所有模型和对比形式的一致性率,平均相对改进率达到47.46%。引人注目的是,PORTIA使得LLMs能够评估中对位置偏差进行校准的创新方法,从而提高了评估的准确性和公正性。 +</p> +<p> +Large language models (LLMs) have shown promise as automated evaluators for assessing the quality of answers generated by AI systems. However, these LLM-based evaluators exhibit position bias, or inconsistency, when used to evaluate candidate answers in pairwise comparisons, favoring either the first or second answer regardless of content. To address this limitation, we propose PORTIA, an alignment-based system designed to mimic human comparison strategies to calibrate position bias in a lightweight yet effective manner. Specifically, PORTIA splits the answers into multiple segments, aligns similar content across candidate answers, and then merges them back into a single prompt for evaluation by LLMs. We conducted extensive experiments with six diverse LLMs to evaluate 11,520 answer pairs. Our results show that PORTIA markedly enhances the consistency rates for all the models and comparison forms tested, achieving an average relative improvement of 47.46%. Remarkably, PORTIA enables le +</p>本论文提出了一个专门用于评估大型语言模型在金融领域知识上的基准FinEval。通过在FinEval上评估中英文LLMs,结果显示只有GPT-4在不同提示设置下实现了接近70%的准确率,展示了LLMs在金融领域知识中的显著增长潜力。http://arxiv.org/abs/2308.09975<p> +FinEval:一个用于大型语言模型的中文金融领域知识评估基准 +</p> +<p> +FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models. (arXiv:2308.09975v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2308.09975 +</p> +<p> +本论文提出了一个专门用于评估大型语言模型在金融领域知识上的基准FinEval。通过在FinEval上评估中英文LLMs,结果显示只有GPT-4在不同提示设置下实现了接近70%的准确率,展示了LLMs在金融领域知识中的显著增长潜力。 +</p> +<p> + +</p> +<p> +大型语言模型(LLMs)在各种自然语言处理任务中展示出了出色的性能,但是它们在更具挑战性和专业领域的任务中的效果尚未得到深入研究。本文提出了FinEval,这是一个专门为LLMs中的金融领域知识设计的评估基准。FinEval是一个包含了金融、经济、会计和证书等34个学术科目的高质量多项选择题的集合,总计包含了4,661道题目。为了确保对模型性能进行全面评估,FinEval使用了多种提示类型,包括零样本和少样本提示,以及仅答案提示和思路链式提示。通过在FinEval上评估最先进的中文和英文LLMs,结果显示只有GPT-4在不同的提示设置下实现了接近70%的准确率,表明LLMs在金融领域知识中具有显著的增长潜力。我们的工作为金融领域的知识评估提供了更全面的基准。 +</p> +<p> +Large language models (LLMs) have demonstrated exceptional performance in various natural language processing tasks, yet their efficacy in more challenging and domain-specific tasks remains largely unexplored. This paper presents FinEval, a benchmark specifically designed for the financial domain knowledge in the LLMs. FinEval is a collection of high-quality multiple-choice questions covering Finance, Economy, Accounting, and Certificate. It includes 4,661 questions spanning 34 different academic subjects. To ensure a comprehensive model performance evaluation, FinEval employs a range of prompt types, including zero-shot and few-shot prompts, as well as answer-only and chain-of-thought prompts. Evaluating state-of-the-art Chinese and English LLMs on FinEval, the results show that only GPT-4 achieved an accuracy close to 70% in different prompt settings, indicating significant growth potential for LLMs in the financial domain knowledge. Our work offers a more comprehensive financial kno +</p>本文使用美国人口普查局建立的全美社区调查(ACS)评估了十几个不同大小的语言模型,发现小型模型具有显著的位置和标签偏差,而模型大小的增加能减轻这种偏差,但无法根据US群体或任何可识别的群体趋势进行调整。http://arxiv.org/abs/2306.07951<p> +对大型语言模型调查响应的质疑 +</p> +<p> +Questioning the Survey Responses of Large Language Models. (arXiv:2306.07951v1 [cs.CL]) +</p> +<p> +http://arxiv.org/abs/2306.07951 +</p> +<p> +本文使用美国人口普查局建立的全美社区调查(ACS)评估了十几个不同大小的语言模型,发现小型模型具有显著的位置和标签偏差,而模型大小的增加能减轻这种偏差,但无法根据US群体或任何可识别的群体趋势进行调整。 +</p> +<p> + +</p> +<p> +随着大型语言模型的能力增强,研究人员开始以各种科学动机对这些模型进行调查。本文旨在通过美国人口普查局已经建立的全美社区调查(ACS),就模型的调查响应结果探究所能了解的内容。我们对十几个不同大小的模型进行了评估,这些模型的参数范围从几亿到一万亿不等,使用ACS的问题进行了数十万次的测试,系统地得出了两个主要模式。首先,小型模型存在明显的位置和标签偏差,例如偏向于采用标记为“A”的调查响应。随着模型尺寸的增加,A-偏差虽然有所减少,但也进展缓慢。其次,即使通过随机答案顺序来调整这种标记偏差,模型仍然不会趋向于美国人口统计数据或任何可识别的人口排序。相反,各种模型趋向于均匀随机化。 +</p> +<p> +As large language models increase in capability, researchers have started to conduct surveys of all kinds on these models with varying scientific motivations. In this work, we examine what we can learn from a model's survey responses on the basis of the well-established American Community Survey (ACS) by the U.S. Census Bureau. Evaluating more than a dozen different models, varying in size from a few hundred million to ten billion parameters, hundreds of thousands of times each on questions from the ACS, we systematically establish two dominant patterns. First, smaller models have a significant position and labeling bias, for example, towards survey responses labeled with the letter "A". This A-bias diminishes, albeit slowly, as model size increases. Second, when adjusting for this labeling bias through randomized answer ordering, models still do not trend toward US population statistics or those of any cognizable population. Rather, models across the board trend toward uniformly rando +</p>本文研究了使用基于大语言模型的聊天机器人自动标注文本的潜力,重点考察了从本地语法角度观察道歉言语行为构成的功能元素的程度,并比较了不同模型在注释任务中的表现,结果表明Bing聊天机器人在任务中表现优于ChatGPT和人类标注员。http://arxiv.org/abs/2305.08339<p> +使用LLM辅助注释进行语料库语言学研究:本地语法分析案例研究 +</p> +<p> +Using LLM-assisted Annotation for Corpus Linguistics: A Case Study of Local Grammar Analysis. (arXiv:2305.08339v2 [cs.CL] UPDATED) +</p> +<p> +http://arxiv.org/abs/2305.08339 +</p> +<p> +本文研究了使用基于大语言模型的聊天机器人自动标注文本的潜力,重点考察了从本地语法角度观察道歉言语行为构成的功能元素的程度,并比较了不同模型在注释任务中的表现,结果表明Bing聊天机器人在任务中表现优于ChatGPT和人类标注员。 +</p> +<p> + +</p> +<p> +基于大语言模型(LLMs)的聊天机器人在语言理解方面表现出很强的能力。本研究探索LLMs在协助基于语料库的语言学研究方面的潜力,通过将文本自动标注为特定语言信息类别。具体而言,我们研究了从本地语法的角度观察道歉言语行为构成的功能元素的程度,通过比较基于GPT-3.5的ChatGPT、基于GPT-4的Bing聊天机器人和人类编码器在注释任务中的表现。结果表明,Bing聊天机器人在任务中表现显着优于ChatGPT。与人类标注员相比,Bing聊天机器人的整体表现略低于人类标注员的表现,但已经取得了较高的F1得分:道歉标记99.95%,原因标记91.91%,道歉者标记95.35%,被道歉者标记89.74%和加强标记96.47%。这表明,在语言类别清晰且可以轻松识别的情况下,使用LLM辅助注释进行语料库语言学研究是可行的。 +</p> +<p> +Chatbots based on Large Language Models (LLMs) have shown strong capabilities in language understanding. In this study, we explore the potential of LLMs in assisting corpus-based linguistic studies through automatic annotation of texts with specific categories of linguistic information. Specifically, we examined to what extent LLMs understand the functional elements constituting the speech act of apology from a local grammar perspective, by comparing the performance of ChatGPT (powered by GPT-3.5), the Bing chatbot (powered by GPT-4), and a human coder in the annotation task. The results demonstrate that the Bing chatbot significantly outperformed ChatGPT in the task. Compared to human annotator, the overall performance of the Bing chatbot was slightly less satisfactory. However, it already achieved high F1 scores: 99.95% for the tag of APOLOGISING, 91.91% for REASON, 95.35% for APOLOGISER, 89.74% for APOLOGISEE, and 96.47% for INTENSIFIER. This suggests that it is feasible to use LLM- +</p>本文提出一种利用语言控制扩散模型的分层规划器,有效而高效地扩展扩散模型,解决长时间跨度自然语言指令下的控制问题,实现了较高的单任务和多任务成功率,并极大地提高计算效率。http://arxiv.org/abs/2210.15629<p> +语言控制扩散:通过空间、时间和任务高效扩展 +</p> +<p> +Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks. (arXiv:2210.15629v2 [cs.LG] UPDATED) +</p> +<p> +http://arxiv.org/abs/2210.15629 +</p> +<p> +本文提出一种利用语言控制扩散模型的分层规划器,有效而高效地扩展扩散模型,解决长时间跨度自然语言指令下的控制问题,实现了较高的单任务和多任务成功率,并极大地提高计算效率。 +</p> +<p> + +</p> +<p> +训练通用型智能体在各个方面都很困难,需要处理高维输入(空间)、长时间跨度(时间)和多个新任务。最近的结构方面的进展使得我们可以沿着其中一个或两个维度提高扩展性能力,但计算成本仍然很高。本文提出使用语言控制扩散模型作为一种基于自然语言条件的分层规划器(LCD)来应对这三个方面。我们有效而高效地扩展扩散模型,以应对时间、状态和任务空间维度的长时间跨度控制问题。我们在CALVIN语言机器人基准测试中将LCD与其他最先进的模型进行比较,发现LCD在多任务成功率方面优于其他最先进的方法,而单任务成功率(SR)为88.7%,远高于以前的最佳成绩82.6%,大大提高了计算效率。 +</p> +<p> +Training generalist agents is difficult across several axes, requiring us to deal with high-dimensional inputs (space), long horizons (time), and multiple and new tasks. Recent advances with architectures have allowed for improved scaling along one or two of these dimensions, but are still prohibitive computationally. In this paper, we propose to address all three axes by leveraging Language to Control Diffusion models as a hierarchical planner conditioned on language (LCD). We effectively and efficiently scale diffusion models for planning in extended temporal, state, and task dimensions to tackle long horizon control problems conditioned on natural language instructions. We compare LCD with other state-of-the-art models on the CALVIN language robotics benchmark and find that LCD outperforms other SOTA methods in multi task success rates while dramatically improving computational efficiency with a single task success rate (SR) of 88.7% against the previous best of 82.6%. We show that </p> \ No newline at end of file diff --git a/cs.IR.md b/cs.IR.md index 46ecd3853..1b7cf101b 100644 --- a/cs.IR.md +++ b/cs.IR.md @@ -2,37 +2,22 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [All-in-One: Heterogeneous Interaction Modeling for Cold-Start Rating Prediction](https://arxiv.org/abs/2403.17740) | 提出了异质交互评分网络(HIRE)框架,通过异质交互模块(HIM)来共同建模异质交互并直接推断重要特征 | -| [^2] | [TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval.](http://arxiv.org/abs/2401.13509) | 本文提出一种基于Transformer的伪相关反馈模型(TPRF),适用于资源受限的环境。TPRF相比其他深度语言模型在内存占用和推理时间方面具备更小的开销,并能有效地结合来自稠密文具表示的相关反馈信号。 | +| [^1] | [Croissant: A Metadata Format for ML-Ready Datasets](https://arxiv.org/abs/2403.19546) | Croissant是一种面向机器学习数据集的元数据格式,使数据集更易发现、可移植和互操作,有助于解决ML数据管理和负责任AI中的重要挑战。 | # 详细 -[^1]: 一体化:异质交互建模用于冷启动评分预测 +[^1]: Croissant:一种面向机器学习数据集的元数据格式 - All-in-One: Heterogeneous Interaction Modeling for Cold-Start Rating Prediction + Croissant: A Metadata Format for ML-Ready Datasets - [https://arxiv.org/abs/2403.17740](https://arxiv.org/abs/2403.17740) + [https://arxiv.org/abs/2403.19546](https://arxiv.org/abs/2403.19546) - 提出了异质交互评分网络(HIRE)框架,通过异质交互模块(HIM)来共同建模异质交互并直接推断重要特征 + Croissant是一种面向机器学习数据集的元数据格式,使数据集更易发现、可移植和互操作,有助于解决ML数据管理和负责任AI中的重要挑战。 - 冷启动评分预测是推荐系统中一个基本问题,已得到广泛研究。许多方法已经被提出,利用现有数据之间的显式关系,例如协同过滤、社交推荐和异构信息网络,以缓解冷启动用户和物品的数据不足问题。然而,基于不同角色之间的数据构建的显式关系可能不可靠且无关,从而限制了特定推荐任务的性能上限。受此启发,本文提出了一个灵活的框架,名为异质交互评分网络(HIRE)。HIRE不仅仅依赖于预先定义的交互模式或手动构建的异构信息网络。相反,我们设计了一个异质交互模块(HIM),来共同建模异质交互并直接推断重要特征。 + 数据是机器学习(ML)的关键资源,但处理数据仍然是一个主要的摩擦点。本文介绍了Croissant,一种用于数据集的元数据格式,简化了数据被ML工具和框架使用的方式。Croissant使数据集更易发现、可移植和互操作,从而解决了ML数据管理和负责任AI中的重要挑战。Croissant已得到几个流行数据集库的支持,涵盖数十万个数据集,可以加载到最流行的ML框架中。 - arXiv:2403.17740v1 Announce Type: cross Abstract: Cold-start rating prediction is a fundamental problem in recommender systems that has been extensively studied. Many methods have been proposed that exploit explicit relations among existing data, such as collaborative filtering, social recommendations and heterogeneous information network, to alleviate the data insufficiency issue for cold-start users and items. However, the explicit relations constructed based on data between different roles may be unreliable and irrelevant, which limits the performance ceiling of the specific recommendation task. Motivated by this, in this paper, we propose a flexible framework dubbed heterogeneous interaction rating network (HIRE). HIRE dose not solely rely on the pre-defined interaction pattern or the manually constructed heterogeneous information network. Instead, we devise a Heterogeneous Interaction Module (HIM) to jointly model the heterogeneous interactions and directly infer the important in - -[^2]: TPRF:一种基于Transformer的伪相关反馈模型,用于高效且有效的检索。 - - TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval. (arXiv:2401.13509v1 [cs.IR]) - - [http://arxiv.org/abs/2401.13509](http://arxiv.org/abs/2401.13509) - - 本文提出一种基于Transformer的伪相关反馈模型(TPRF),适用于资源受限的环境。TPRF相比其他深度语言模型在内存占用和推理时间方面具备更小的开销,并能有效地结合来自稠密文具表示的相关反馈信号。 - - - - 本文考虑在资源受限的环境中,如廉价云实例或嵌入式系统(如智能手机和智能手表)中,针对稠密检索器的伪相关反馈(PRF)方法,其中内存和CPU受限,没有GPU。为此,我们提出了一种基于Transformer的PRF方法(TPRF),与采用PRF机制的其他深度语言模型相比,具有更小的内存占用和更快的推理时间,较小的效果损失。TPRF学习如何有效地结合来自稠密文具表示的相关反馈信号。具体而言,TPRF提供了一种建模查询和相关反馈信号之间关系和权重的机制。该方法对所使用的具体稠密表示不加偏见,因此可以广泛应用于任何稠密检索器。 - - This paper considers Pseudo-Relevance Feedback (PRF) methods for dense retrievers in a resource constrained environment such as that of cheap cloud instances or embedded systems (e.g., smartphones and smartwatches), where memory and CPU are limited and GPUs are not present. For this, we propose a transformer-based PRF method (TPRF), which has a much smaller memory footprint and faster inference time compared to other deep language models that employ PRF mechanisms, with a marginal effectiveness loss. TPRF learns how to effectively combine the relevance feedback signals from dense passage representations. Specifically, TPRF provides a mechanism for modelling relationships and weights between the query and the relevance feedback signals. The method is agnostic to the specific dense representation used and thus can be generally applied to any dense retriever. + arXiv:2403.19546v1 Announce Type: cross Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks. diff --git a/cs.IR.xml b/cs.IR.xml index f030e5c37..531efcfb6 100644 --- a/cs.IR.xml +++ b/cs.IR.xml @@ -1,41 +1,21 @@ -Chat Arxiv cs.IRhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.IR提出了异质交互评分网络(HIRE)框架,通过异质交互模块(HIM)来共同建模异质交互并直接推断重要特征https://arxiv.org/abs/2403.17740<p> -一体化:异质交互建模用于冷启动评分预测 +Chat Arxiv cs.IRhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.IRCroissant是一种面向机器学习数据集的元数据格式,使数据集更易发现、可移植和互操作,有助于解决ML数据管理和负责任AI中的重要挑战。https://arxiv.org/abs/2403.19546<p> +Croissant:一种面向机器学习数据集的元数据格式 </p> <p> -All-in-One: Heterogeneous Interaction Modeling for Cold-Start Rating Prediction +Croissant: A Metadata Format for ML-Ready Datasets </p> <p> -https://arxiv.org/abs/2403.17740 +https://arxiv.org/abs/2403.19546 </p> <p> -提出了异质交互评分网络(HIRE)框架,通过异质交互模块(HIM)来共同建模异质交互并直接推断重要特征 +Croissant是一种面向机器学习数据集的元数据格式,使数据集更易发现、可移植和互操作,有助于解决ML数据管理和负责任AI中的重要挑战。 </p> <p> </p> <p> -冷启动评分预测是推荐系统中一个基本问题,已得到广泛研究。许多方法已经被提出,利用现有数据之间的显式关系,例如协同过滤、社交推荐和异构信息网络,以缓解冷启动用户和物品的数据不足问题。然而,基于不同角色之间的数据构建的显式关系可能不可靠且无关,从而限制了特定推荐任务的性能上限。受此启发,本文提出了一个灵活的框架,名为异质交互评分网络(HIRE)。HIRE不仅仅依赖于预先定义的交互模式或手动构建的异构信息网络。相反,我们设计了一个异质交互模块(HIM),来共同建模异质交互并直接推断重要特征。 +数据是机器学习(ML)的关键资源,但处理数据仍然是一个主要的摩擦点。本文介绍了Croissant,一种用于数据集的元数据格式,简化了数据被ML工具和框架使用的方式。Croissant使数据集更易发现、可移植和互操作,从而解决了ML数据管理和负责任AI中的重要挑战。Croissant已得到几个流行数据集库的支持,涵盖数十万个数据集,可以加载到最流行的ML框架中。 </p> <p> -arXiv:2403.17740v1 Announce Type: cross Abstract: Cold-start rating prediction is a fundamental problem in recommender systems that has been extensively studied. Many methods have been proposed that exploit explicit relations among existing data, such as collaborative filtering, social recommendations and heterogeneous information network, to alleviate the data insufficiency issue for cold-start users and items. However, the explicit relations constructed based on data between different roles may be unreliable and irrelevant, which limits the performance ceiling of the specific recommendation task. Motivated by this, in this paper, we propose a flexible framework dubbed heterogeneous interaction rating network (HIRE). HIRE dose not solely rely on the pre-defined interaction pattern or the manually constructed heterogeneous information network. Instead, we devise a Heterogeneous Interaction Module (HIM) to jointly model the heterogeneous interactions and directly infer the important in -</p>本文提出一种基于Transformer的伪相关反馈模型(TPRF),适用于资源受限的环境。TPRF相比其他深度语言模型在内存占用和推理时间方面具备更小的开销,并能有效地结合来自稠密文具表示的相关反馈信号。http://arxiv.org/abs/2401.13509<p> -TPRF:一种基于Transformer的伪相关反馈模型,用于高效且有效的检索。 -</p> -<p> -TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval. (arXiv:2401.13509v1 [cs.IR]) -</p> -<p> -http://arxiv.org/abs/2401.13509 -</p> -<p> -本文提出一种基于Transformer的伪相关反馈模型(TPRF),适用于资源受限的环境。TPRF相比其他深度语言模型在内存占用和推理时间方面具备更小的开销,并能有效地结合来自稠密文具表示的相关反馈信号。 -</p> -<p> - -</p> -<p> -本文考虑在资源受限的环境中,如廉价云实例或嵌入式系统(如智能手机和智能手表)中,针对稠密检索器的伪相关反馈(PRF)方法,其中内存和CPU受限,没有GPU。为此,我们提出了一种基于Transformer的PRF方法(TPRF),与采用PRF机制的其他深度语言模型相比,具有更小的内存占用和更快的推理时间,较小的效果损失。TPRF学习如何有效地结合来自稠密文具表示的相关反馈信号。具体而言,TPRF提供了一种建模查询和相关反馈信号之间关系和权重的机制。该方法对所使用的具体稠密表示不加偏见,因此可以广泛应用于任何稠密检索器。 -</p> -<p> -This paper considers Pseudo-Relevance Feedback (PRF) methods for dense retrievers in a resource constrained environment such as that of cheap cloud instances or embedded systems (e.g., smartphones and smartwatches), where memory and CPU are limited and GPUs are not present. For this, we propose a transformer-based PRF method (TPRF), which has a much smaller memory footprint and faster inference time compared to other deep language models that employ PRF mechanisms, with a marginal effectiveness loss. TPRF learns how to effectively combine the relevance feedback signals from dense passage representations. Specifically, TPRF provides a mechanism for modelling relationships and weights between the query and the relevance feedback signals. The method is agnostic to the specific dense representation used and thus can be generally applied to any dense retriever. +arXiv:2403.19546v1 Announce Type: cross Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks. </p> \ No newline at end of file diff --git a/cs.LG.md b/cs.LG.md index 9aaeb8ac2..e5ef6076e 100644 --- a/cs.LG.md +++ b/cs.LG.md @@ -2,187 +2,667 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [Colour and Brush Stroke Pattern Recognition in Abstract Art using Modified Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/2403.18397) | 本文通过引入改进的深度卷积生成对抗网络(mDCGAN),针对高质量艺术品生成进行了研究,解决了普遍训练问题,有效探索抽象绘画中的颜色和笔触模式。 | -| [^2] | [Machine Unlearning by Suppressing Sample Contribution](https://arxiv.org/abs/2402.15109) | 本文提出了一种机器遗忘方法,通过最小化输入敏感度来抑制遗忘数据的贡献,并在实验中表现出优异的性能。 | -| [^3] | [Mixup Barcodes: Quantifying Geometric-Topological Interactions between Point Clouds](https://arxiv.org/abs/2402.15058) | 提出了一种名为混合条形码的新方法,利用标准持久同调与图像持久同调结合,可以量化任意维度两个点集之间的几何-拓扑相互作用,以及引入简单的统计量来量化这种相互作用的复杂性。 | -| [^4] | [Masked Attention is All You Need for Graphs](https://arxiv.org/abs/2402.10793) | 提出了一种在图上学习的简单替代方法,称为掩码注意力(MAG),其利用注意力矩阵来创建定制的注意力模式,在长距离任务上表现出色并胜过其他方法。 | -| [^5] | [Graph Inference Acceleration by Learning MLPs on Graphs without Supervision](https://arxiv.org/abs/2402.08918) | 该论文提出了一个简单而有效的框架SimMLP,通过在图上无监督学习MLPs,提高了在延迟敏感的应用中的泛化能力。 | -| [^6] | [Voronoi Candidates for Bayesian Optimization](https://arxiv.org/abs/2402.04922) | 使用Voronoi候选点边界可以在贝叶斯优化中有效地优化黑盒函数,提高了多起始连续搜索的执行时间。 | -| [^7] | [PAC Privacy Preserving Diffusion Models](https://arxiv.org/abs/2312.01201) | 提出了一种PAC隐私保护扩散模型,通过将私有分类器指导集成到采样过程中增强隐私保护,并发展了一种新的度量标准来衡量隐私水平,在保护性能方面表现出卓越表现。 | -| [^8] | [Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models.](http://arxiv.org/abs/2310.12000) | 这篇文章介绍了用于潜在高斯过程模型中的Vecchia-Laplace近似法的迭代方法,相比于传统的Cholesky分解方法,可以显著加快计算速度。 | -| [^9] | [Memorization with neural nets: going beyond the worst case.](http://arxiv.org/abs/2310.00327) | 本文研究了神经网络的插值问题,提出了一种简单的随机算法,在给定的数据集和两个类的情况下,能够以很高的概率构建一个插值的神经网络。这些结果与训练数据规模无关。 | -| [^10] | [What can we learn from quantum convolutional neural networks?.](http://arxiv.org/abs/2308.16664) | 通过分析量子卷积神经网络(QCNNs),我们发现它们通过隐藏特征映射嵌入物理系统参数,并且利用量子临界性生成适合的基函数集,池化层选择能够形成高性能决策边界的基函数,而模型的泛化性能依赖于嵌入类型。 | -| [^11] | [A Simple Data Augmentation for Feature Distribution Skewed Federated Learning.](http://arxiv.org/abs/2306.09363) | 本文针对特征分布偏斜的联邦学习提出了FedRDN方法,在输入层级上实现了数据增强,将整个联邦数据集的统计信息注入到本地客户端数据中,以缓解特征漂移问题。 | -| [^12] | [The Score-Difference Flow for Implicit Generative Modeling.](http://arxiv.org/abs/2304.12906) | 本文提出了一种新的评分差异流模型(SD flow),它可以最优地减少两个分布之间的散度,同时解决Schr​​ödinger桥问题。与去噪扩散模型不同,它没有对先验分布施加任何限制,在一些基准数据集中优于其他方法。 | +| [^1] | [A Methodology for Improving Accuracy of Embedded Spiking Neural Networks through Kernel Size Scaling](https://arxiv.org/abs/2404.01685) | 通过核大小缩放提高嵌入式脉冲神经网络准确性的方法学在实验中表现出更高的准确性。 | +| [^2] | [Functional Bilevel Optimization for Machine Learning](https://arxiv.org/abs/2403.20233) | 介绍了机器学习中的函数双层优化问题,提出了不依赖于强凸假设的方法,并展示了在仪表回归和强化学习任务中使用神经网络的优势。 | +| [^3] | [Croissant: A Metadata Format for ML-Ready Datasets](https://arxiv.org/abs/2403.19546) | Croissant是一种面向机器学习数据集的元数据格式,使数据集更易发现、可移植和互操作,有助于解决ML数据管理和负责任AI中的重要挑战。 | +| [^4] | [Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation](https://arxiv.org/abs/2403.19103) | PRISM是一种算法,可以自动识别人类可解释且易传递的提示,从而有效生成所需概念,仅使用黑盒访问T2I模型。 | +| [^5] | [Can ChatGPT predict article retraction based on Twitter mentions?](https://arxiv.org/abs/2403.16851) | 本研究探讨了ChatGPT是否能够基于Twitter提及来预测文章的撤回,研究发现在预测未来被撤回的有问题文章方面是具有一定潜力的。 | +| [^6] | [Auditing Fairness under Unobserved Confounding](https://arxiv.org/abs/2403.14713) | 在未观测混杂因素的情况下,本文展示了即使在放宽或甚至在排除所有相关风险因素被观测到的假设的情况下,仍然可以给出对高风险个体分配率的信息丰富的界限。 | +| [^7] | [Stochastic Rounding Implicitly Regularizes Tall-and-Thin Matrices](https://arxiv.org/abs/2403.12278) | 随机舍入技术能有效隐式正则化高瘦矩阵,确保舍入后的矩阵具有完整的列秩。 | +| [^8] | [Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations](https://arxiv.org/abs/2403.08121) | 本文研究了训练深度齐次神经网络时梯度流动力学的动态性,发现在足够小的初始化下,神经网络的权重在训练早期阶段保持较小规范,并且沿着神经相关函数的KKT点方向近似收敛。 | +| [^9] | [Speech Robust Bench: A Robustness Benchmark For Speech Recognition](https://arxiv.org/abs/2403.07937) | 提出了一个全面基准(SRB),用于评估自动语音识别(ASR)模型对各种破坏的鲁棒性,发现模型大小和某些建模选择有助于提高鲁棒性,并观察到在不同人口亚组上模型的鲁棒性存在明显差异。 | +| [^10] | [A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries](https://arxiv.org/abs/2403.05720) | 介绍了一个新的基准测试,评估了用于生成简要住院病程摘要的大语言模型在健康保健领域中的性能并提出相应的自适应策略 | +| [^11] | [SPEAR:Exact Gradient Inversion of Batches in Federated Learning](https://arxiv.org/abs/2403.03945) | 该论文提出了第一个能够精确重构批量$b >1$的算法,在联邦学习中解决了梯度反演攻击的问题。 | +| [^12] | [Non-Convex Stochastic Composite Optimization with Polyak Momentum](https://arxiv.org/abs/2403.02967) | 本文研究了具有Polyak动量的随机近端梯度方法,在非凸复合优化问题中实现了最佳收敛速度,无论批量大小如何。 | +| [^13] | [Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad](https://arxiv.org/abs/2403.02648) | KATE是一种新的优化算法,提出了一种与AdaGrad标度不变的适应方法,并在广义线性模型和一般的非凸问题中证明了其标度不变性。数值实验结果表明,KATE在各种场景中均优于AdaGrad并与Adam性能匹配/超越。 | +| [^14] | [RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval](https://arxiv.org/abs/2402.18510) | 本文研究了RNNs和Transformer在处理算法问题时的表现能力差距,发现RNNs存在关键瓶颈,即无法完美地从上下文中检索信息,导致无法像Transformer那样轻松解决需要这种能力的任务。 | +| [^15] | [Fusion Encoder Networks](https://arxiv.org/abs/2402.15883) | FENs是一种神经网络算法,具有对数深度且可以在线性时间内处理序列,关键创新在于通过训练大致线性数量的常深度神经网络并行学习。 | +| [^16] | [On the Stability of Gradient Descent for Large Learning Rate](https://arxiv.org/abs/2402.13108) | 本文研究了线性神经网络在二次损失函数下的优化问题,证明了梯度下降映射的非奇异性以及全局最小值点集的光滑流形特性,为理解大学习率下梯度下降的稳定性提供了重要线索。 | +| [^17] | [Query-Based Adversarial Prompt Generation](https://arxiv.org/abs/2402.12329) | 该研究提出了一种基于查询的对抗性攻击方法,通过利用远程语言模型的 API 访问构造对抗性示例,使模型以更高概率发出有害字符串,而非仅仅基于模型之间的转移性攻击。 | +| [^18] | [Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting](https://arxiv.org/abs/2402.12220) | 这项研究展示了如何利用贝叶斯学习技术应用于参数高效微调,以防止灾难性遗忘,实现了预训练知识的保留,并在语言建模和语音合成任务中取得成功。 | +| [^19] | [CHEMREASONER: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback](https://arxiv.org/abs/2402.10980) | 通过将大型语言模型推理与量子化学反馈相结合,我们引入了一个AI引导的计算筛选框架,将催化剂发现形式化为一个不确定环境,从而实现高效催化剂的积极搜索 | +| [^20] | [Momentum Approximation in Asynchronous Private Federated Learning](https://arxiv.org/abs/2402.09247) | 本文提出了动量近似方法,在异步私有联邦学习(FL)中有效结合了动量和异步协议的技术,通过最小化动量更新的偏差来改进模型性能。实证研究证明了动量近似在基准FL数据集上的有效性。 | +| [^21] | [Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models](https://arxiv.org/abs/2402.09236) | 本研究将因果表示学习和基础模型相结合,研究了如何从数据中学习人类可解释的概念。实验证明了这一统一方法的实用性。 | +| [^22] | [Personalized Language Modeling from Personalized Human Feedback](https://arxiv.org/abs/2402.05133) | 该论文提出了一个个性化语言模型的方法,通过在于用户的反馈数据中引入个性化特征来解决强化学习框架在多样化用户偏好下存在的问题。 | +| [^23] | [ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection](https://arxiv.org/abs/2402.03235) | 这项工作提出了一种用于多模态3D物体检测的主动学习框架ActiveAnno3D。通过选择最具信息量的训练数据样本进行标注,我们能够在使用一半的训练数据时实现与传统方法相近的检测性能。 | +| [^24] | [FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion](https://arxiv.org/abs/2402.03226) | 本论文提出了一种名为FuseMoE的专家混合Transformer框架,通过创新的门控函数实现灵活融合多模态数据,能够有效地处理缺失模态和不规则采样数据,同时改善模型的预测性能,在临床风险预测任务中具有实际应用价值。 | +| [^25] | [TopoX: A Suite of Python Packages for Machine Learning on Topological Domains](https://arxiv.org/abs/2402.02441) | TopoX是一个用于在拓扑域上进行机器学习的Python软件包套件,包含了构建、计算和嵌入拓扑域的功能,并提供了一套全面的高阶消息传递功能工具箱。 | +| [^26] | [Dynamic Incremental Optimization for Best Subset Selection](https://arxiv.org/abs/2402.02322) | 本文研究了一类$\ell_0$正则化问题的对偶形式,并提出了一种高效的原对偶算法,通过充分利用对偶范围估计和增量策略,提高了最佳子集选择问题的解决方案的效率和统计性质。 | +| [^27] | [GD-CAF: Graph Dual-stream Convolutional Attention Fusion for Precipitation Nowcasting](https://arxiv.org/abs/2401.07958) | GD-CAF提出了一种新颖的方法,将降水预报作为一个时空图序列预报问题,利用图形双流卷积注意力融合来学习历史降水图并在不同空间位置上预测未来的降水。 | +| [^28] | [Controlling Multiple Errors Simultaneously with a PAC-Bayes Bound](https://arxiv.org/abs/2202.05560) | 该研究提出了一种PAC-Bayes界限,能够同时控制多个错误,并提供丰富的信息,适用于回归中测试损失分布或分类中不同错误分类的概率。 | +| [^29] | [Comparative Study of Causal Discovery Methods for Cyclic Models with Hidden Confounders.](http://arxiv.org/abs/2401.13009) | 对于循环模型中含有隐藏因变量的因果发现,已经出现了能够处理这种情况的多种技术方法。 | +| [^30] | [Binary Feature Mask Optimization for Feature Selection.](http://arxiv.org/abs/2401.12644) | 这个论文提出了一种新颖的特征选择框架,通过使用特征屏蔽方法来消除特征,而不是从数据集中移除它们。这种方法不需要重新训练机器学习模型,可以综合考虑特征子集的重要性,为通用机器学习模型的特征选择问题提供了一种新的解决方案。 | +| [^31] | [xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein.](http://arxiv.org/abs/2401.06199) | xTrimoPGLM是一个统一的100亿规模预训练蛋白质语言模型,能够同时处理蛋白质理解和生成任务,通过创新的预训练框架和大规模的参数训练,显著优于其他先进模型,在18个蛋白理解基准测试中取得了成功,并能够实现对蛋白质结构的原子分辨率观察。 | +| [^32] | [Transportation Market Rate Forecast Using Signature Transform.](http://arxiv.org/abs/2401.04857) | 本论文提出了一种基于特征变换的新型统计方法,用于解决交通市场利率的预测挑战。该方法具有通用的非线性属性和特征变换核函数,能够高效生成特征,并在预测过程中准确识别季节性和制度转换。 | +| [^33] | [IoT in the Era of Generative AI: Vision and Challenges.](http://arxiv.org/abs/2401.01923) | 在生成式人工智能时代的物联网,Generative AI的进展带来了巨大的希望,同时也面临着高资源需求、及时工程、设备端推理、安全等关键挑战。 | +| [^34] | [Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment.](http://arxiv.org/abs/2311.01059) | 本研究提出了一种名为ROAM的方法,通过利用先前学习到的行为来实时调节机器人在部署过程中应对未曾见过的情况。在测试中,ROAM可以在单个阶段内实现快速适应,并且在模拟环境和真实场景中取得了成功,具有较高的效率和适应性。 | +| [^35] | [Learning State-Augmented Policies for Information Routing in Communication Networks.](http://arxiv.org/abs/2310.00248) | 本论文研究了在通信网络中的信息路由问题,提出了一种新颖的状态增强策略,通过部署图神经网络架构,利用图卷积来最大化源节点的聚合信息,从而有效地将所需信息路由到目标节点。 | +| [^36] | [Comprehensive Assessment of the Performance of Deep Learning Classifiers Reveals a Surprising Lack of Robustness.](http://arxiv.org/abs/2308.04137) | 通过综合评估深度学习分类器的性能,发现它们缺乏稳定性和可靠性,并建议采用广泛的数据类型和统一的评估指标进行性能基准测试。 | +| [^37] | [Your Room is not Private: Gradient Inversion Attack on Reinforcement Learning.](http://arxiv.org/abs/2306.09273) | 这篇论文提出了一种针对值函数算法和梯度算法的攻击方法,利用梯度反转重建状态、动作和监督信号,以解决嵌入式人工智能中的隐私泄露问题。 | +| [^38] | [Data Augmentation for Seizure Prediction with Generative Diffusion Model.](http://arxiv.org/abs/2306.08256) | 该论文提出了一种基于扩散模型的数据增强方法DiffEEG,可以有效地提高癫痫预测的性能,超过了现有的数据扩增方法。 | +| [^39] | [Leveraging the Triple Exponential Moving Average for Fast-Adaptive Moment Estimation.](http://arxiv.org/abs/2306.01423) | 本文提出了一种新的深度优化器FAME,使用三重指数移动平均值(TEMA)来估计梯度矩,提供更丰富和准确的数据变化和趋势信息,可以提高计算机视觉等领域中模型的性能表现。 | +| [^40] | [Optimal partition of feature using Bayesian classifier.](http://arxiv.org/abs/2304.14537) | 本文通过提出一种名为“共单调独立分类器”(CIBer)的新技术,专注于特征的最优分区,旨在克服朴素贝叶斯方法带来的挑战,并且证明该技术在不同数据集上具有更高的准确率和更低的错误率。 | +| [^41] | [Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments.](http://arxiv.org/abs/2304.09825) | 本研究旨在提高程序生成环境中强化学习的样本效率。研究证明,使用模仿学习进行预训练和同时进行模仿学习和在线强化学习的方法可以提高效率。 | +| [^42] | [Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs.](http://arxiv.org/abs/2303.13763) | 本文提出了一种原型引导知识蒸馏(PGKD)方法,它不需要图形边缘,但可以在不考虑边缘的情况下学习结构感知的MLP。 | +| [^43] | [Revisiting DeepFool: generalization and improvement.](http://arxiv.org/abs/2303.12481) | 本文提出了一种新的对抗性攻击,该攻击是广义了DeepFool攻击,既有效又计算效率高,适用于评估大型深度神经网络的鲁棒性。 | +| [^44] | [Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks.](http://arxiv.org/abs/2210.15629) | 本文提出一种利用语言控制扩散模型的分层规划器,有效而高效地扩展扩散模型,解决长时间跨度自然语言指令下的控制问题,实现了较高的单任务和多任务成功率,并极大地提高计算效率。 | # 详细 -[^1]: 使用改进的深度卷积生成对抗网络在抽象艺术中进行颜色和笔触模式识别 +[^1]: 通过核大小缩放提高嵌入式脉冲神经网络准确性的方法学 - Colour and Brush Stroke Pattern Recognition in Abstract Art using Modified Deep Convolutional Generative Adversarial Networks + A Methodology for Improving Accuracy of Embedded Spiking Neural Networks through Kernel Size Scaling - [https://arxiv.org/abs/2403.18397](https://arxiv.org/abs/2403.18397) + [https://arxiv.org/abs/2404.01685](https://arxiv.org/abs/2404.01685) - 本文通过引入改进的深度卷积生成对抗网络(mDCGAN),针对高质量艺术品生成进行了研究,解决了普遍训练问题,有效探索抽象绘画中的颜色和笔触模式。 + 通过核大小缩放提高嵌入式脉冲神经网络准确性的方法学在实验中表现出更高的准确性。 - 抽象艺术是一种广受欢迎、被广泛讨论的艺术形式,通常能够描绘出艺术家的情感。许多研究人员尝试使用机器学习和深度学习的边缘检测、笔触和情感识别算法来研究抽象艺术。本文描述了使用生成对抗神经网络(GAN)对广泛分布的抽象绘画进行研究。 GAN具有学习和再现分布的能力,使研究人员能够有效地探索和研究生成的图像空间。然而,挑战在于开发一种能够克服常见训练问题的高效GAN架构。本文通过引入专门设计用于高质量艺术品生成的改进DCGAN(mDCGAN)来解决这一挑战。该方法涉及对所做修改的深入探讨,深入研究DCGAN的复杂工作。 + 脉冲神经网络(SNNs)由于其稀疏的基于脉冲的操作而能为基于机器学习的应用提供超低功耗/能耗。目前,大多数SNN架构需要更大的模型大小才能实现更高的准确性,这对资源受限的嵌入式应用不太适合。因此,迫切需要开发能够以可接受的内存占用实现高准确性的SNNs。为此,我们提出了一种通过核大小缩放提高SNNs准确性的新方法学。其关键步骤包括调查不同核大小对准确性的影响,设计新的核大小集合,基于选定的核大小生成SNN架构,并分析SNN模型选择的准确性-内存折衷。实验结果表明,我们的方法学在准确性方面优于最先进的方法(对于CIFAR10有93.24%的准确度) - arXiv:2403.18397v1 Announce Type: cross Abstract: Abstract Art is an immensely popular, discussed form of art that often has the ability to depict the emotions of an artist. Many researchers have made attempts to study abstract art in the form of edge detection, brush stroke and emotion recognition algorithms using machine and deep learning. This papers describes the study of a wide distribution of abstract paintings using Generative Adversarial Neural Networks(GAN). GANs have the ability to learn and reproduce a distribution enabling researchers and scientists to effectively explore and study the generated image space. However, the challenge lies in developing an efficient GAN architecture that overcomes common training pitfalls. This paper addresses this challenge by introducing a modified-DCGAN (mDCGAN) specifically designed for high-quality artwork generation. The approach involves a thorough exploration of the modifications made, delving into the intricate workings of DCGANs, opt + arXiv:2404.01685v1 Announce Type: cross Abstract: Spiking Neural Networks (SNNs) can offer ultra low power/ energy consumption for machine learning-based applications due to their sparse spike-based operations. Currently, most of the SNN architectures need a significantly larger model size to achieve higher accuracy, which is not suitable for resource-constrained embedded applications. Therefore, developing SNNs that can achieve high accuracy with acceptable memory footprint is highly needed. Toward this, we propose a novel methodology that improves the accuracy of SNNs through kernel size scaling. Its key steps include investigating the impact of different kernel sizes on the accuracy, devising new sets of kernel sizes, generating SNN architectures based on the selected kernel sizes, and analyzing the accuracy-memory trade-offs for SNN model selection. The experimental results show that our methodology achieves higher accuracy than state-of-the-art (93.24% accuracy for CIFAR10 and 70 -[^2]: 抑制样本贡献的机器遗忘 +[^2]: 机器学习中的函数双层优化 - Machine Unlearning by Suppressing Sample Contribution + Functional Bilevel Optimization for Machine Learning - [https://arxiv.org/abs/2402.15109](https://arxiv.org/abs/2402.15109) + [https://arxiv.org/abs/2403.20233](https://arxiv.org/abs/2403.20233) - 本文提出了一种机器遗忘方法,通过最小化输入敏感度来抑制遗忘数据的贡献,并在实验中表现出优异的性能。 + 介绍了机器学习中的函数双层优化问题,提出了不依赖于强凸假设的方法,并展示了在仪表回归和强化学习任务中使用神经网络的优势。 - 机器遗忘(MU)是指从经过良好训练的模型中删除数据,这在实践中非常重要,因为涉及“被遗忘的权利”。本文从训练数据和未见数据对模型贡献的基本区别入手:训练数据对最终模型有贡献,而未见数据没有。我们理论上发现输入敏感度可以近似衡量贡献,并实际设计了一种算法,称为MU-Mis(通过最小化输入敏感度进行机器遗忘),来抑制遗忘数据的贡献。实验结果表明,MU-Mis明显优于最先进的MU方法。此外,MU-Mis与MU的应用更加密切,因为它不需要使用剩余数据。 + 在本文中,我们介绍了针对机器学习中的双层优化问题的一种新的函数视角,其中内部目标在函数空间上被最小化。这些类型的问题通常通过在参数设置下开发的方法来解决,其中内部目标对于预测函数的参数强凸。函数视角不依赖于此假设,特别允许使用超参数化的神经网络作为内部预测函数。我们提出了可扩展和高效的算法来解决函数双层优化问题,并展示了我们方法在适合自然函数双层结构的仪表回归和强化学习任务上的优势。 - arXiv:2402.15109v1 Announce Type: new Abstract: Machine Unlearning (MU) is to forget data from a well-trained model, which is practically important due to the "right to be forgotten". In this paper, we start from the fundamental distinction between training data and unseen data on their contribution to the model: the training data contributes to the final model while the unseen data does not. We theoretically discover that the input sensitivity can approximately measure the contribution and practically design an algorithm, called MU-Mis (machine unlearning via minimizing input sensitivity), to suppress the contribution of the forgetting data. Experimental results demonstrate that MU-Mis outperforms state-of-the-art MU methods significantly. Additionally, MU-Mis aligns more closely with the application of MU as it does not require the use of remaining data. + arXiv:2403.20233v1 Announce Type: cross Abstract: In this paper, we introduce a new functional point of view on bilevel optimization problems for machine learning, where the inner objective is minimized over a function space. These types of problems are most often solved by using methods developed in the parametric setting, where the inner objective is strongly convex with respect to the parameters of the prediction function. The functional point of view does not rely on this assumption and notably allows using over-parameterized neural networks as the inner prediction function. We propose scalable and efficient algorithms for the functional bilevel optimization problem and illustrate the benefits of our approach on instrumental regression and reinforcement learning tasks, which admit natural functional bilevel structures. -[^3]: 混合条形码:量化点云之间的几何-拓扑相互作用 +[^3]: Croissant:一种面向机器学习数据集的元数据格式 - Mixup Barcodes: Quantifying Geometric-Topological Interactions between Point Clouds + Croissant: A Metadata Format for ML-Ready Datasets - [https://arxiv.org/abs/2402.15058](https://arxiv.org/abs/2402.15058) + [https://arxiv.org/abs/2403.19546](https://arxiv.org/abs/2403.19546) - 提出了一种名为混合条形码的新方法,利用标准持久同调与图像持久同调结合,可以量化任意维度两个点集之间的几何-拓扑相互作用,以及引入简单的统计量来量化这种相互作用的复杂性。 + Croissant是一种面向机器学习数据集的元数据格式,使数据集更易发现、可移植和互操作,有助于解决ML数据管理和负责任AI中的重要挑战。 - 我们将标准持久同调与图像持久同调相结合,定义了一种新颖的表征形状和它们之间相互作用的方法。具体而言,我们介绍了:(1)混合条形码,捕捉任意维度两个点集之间的几何-拓扑相互作用(混合);(2)简单的总混合和总百分比混合统计量,作为一个单一数字来量化相互作用的复杂性;(3)一个用于操作上述工具的软件工具。作为一个概念验证,我们将该工具应用到一个源自机器学习的问题上。具体地,我们研究了不同类别嵌入的可分离性。结果表明,拓扑混合是一种用于表征低维和高维数据交互的有效方法。与持久同调的典型用法相比,这个新工具对于拓扑特征的几何位置更为敏感,这通常是可取的。 + 数据是机器学习(ML)的关键资源,但处理数据仍然是一个主要的摩擦点。本文介绍了Croissant,一种用于数据集的元数据格式,简化了数据被ML工具和框架使用的方式。Croissant使数据集更易发现、可移植和互操作,从而解决了ML数据管理和负责任AI中的重要挑战。Croissant已得到几个流行数据集库的支持,涵盖数十万个数据集,可以加载到最流行的ML框架中。 - arXiv:2402.15058v1 Announce Type: cross Abstract: We combine standard persistent homology with image persistent homology to define a novel way of characterizing shapes and interactions between them. In particular, we introduce: (1) a mixup barcode, which captures geometric-topological interactions (mixup) between two point sets in arbitrary dimension; (2) simple summary statistics, total mixup and total percentage mixup, which quantify the complexity of the interactions as a single number; (3) a software tool for playing with the above. As a proof of concept, we apply this tool to a problem arising from machine learning. In particular, we study the disentanglement in embeddings of different classes. The results suggest that topological mixup is a useful method for characterizing interactions for low and high-dimensional data. Compared to the typical usage of persistent homology, the new tool is sensitive to the geometric locations of the topological features, which is often desirabl + arXiv:2403.19546v1 Announce Type: cross Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks. -[^4]: 掩码注意力是图的关键 +[^4]: 用于个性化文本到图像生成的自动化黑盒提示工程 - Masked Attention is All You Need for Graphs + Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation - [https://arxiv.org/abs/2402.10793](https://arxiv.org/abs/2402.10793) + [https://arxiv.org/abs/2403.19103](https://arxiv.org/abs/2403.19103) - 提出了一种在图上学习的简单替代方法,称为掩码注意力(MAG),其利用注意力矩阵来创建定制的注意力模式,在长距离任务上表现出色并胜过其他方法。 + PRISM是一种算法,可以自动识别人类可解释且易传递的提示,从而有效生成所需概念,仅使用黑盒访问T2I模型。 - 图神经网络(GNNs)和消息传递算法的变种主要用于在图上学习,这在很大程度上归功于它们的灵活性、速度和令人满意的性能。然而,设计强大而通用的GNNs需要大量的研究工作,通常依赖于精心选择的手工制作的消息传递操作符。受此启发,我们提出了一种在图上学习的非常简单的替代方法,它完全依赖于注意力。图被表示为节点或边集,并通过掩码注意权重矩阵来强制它们的连接,有效地为每个图创建定制的注意力模式。尽管其简单性,用于图的掩码注意力(MAG)在长距离任务上表现出色,并在55多个节点和图级任务上优于强消息传递基线和更复杂的基于注意力的方法。 + 提示工程对于控制文本到图像(T2I)生成模型的输出是有效的,但由于需要手动制作提示而导致工作繁重。这一挑战促使了自动提示生成算法的发展。然而,这些方法通常在T2I模型之间的可传递性方面遇到困难,需要对基础模型进行白盒访问,并产生非直观的提示。在这项工作中,我们介绍了PRISM,这是一种算法,可以仅使用黑盒访问T2I模型就自动识别人类可解释且易传递的提示,从而有效生成所需概念。受大型语言模型(LLM)越狱的启发,PRISM利用LLM的上下文学习能力来迭代地改进给定参考图像的候选提示分布。我们的实验展示了PRISM在为对象、样式等生成准确提示方面的多样性和有效性。 - arXiv:2402.10793v1 Announce Type: cross Abstract: Graph neural networks (GNNs) and variations of the message passing algorithm are the predominant means for learning on graphs, largely due to their flexibility, speed, and satisfactory performance. The design of powerful and general purpose GNNs, however, requires significant research efforts and often relies on handcrafted, carefully-chosen message passing operators. Motivated by this, we propose a remarkably simple alternative for learning on graphs that relies exclusively on attention. Graphs are represented as node or edge sets and their connectivity is enforced by masking the attention weight matrix, effectively creating custom attention patterns for each graph. Despite its simplicity, masked attention for graphs (MAG) has state-of-the-art performance on long-range tasks and outperforms strong message passing baselines and much more involved attention-based methods on over 55 node and graph-level tasks. We also show significantly + arXiv:2403.19103v1 Announce Type: cross Abstract: Prompt engineering is effective for controlling the output of text-to-image (T2I) generative models, but it is also laborious due to the need for manually crafted prompts. This challenge has spurred the development of algorithms for automated prompt generation. However, these methods often struggle with transferability across T2I models, require white-box access to the underlying model, and produce non-intuitive prompts. In this work, we introduce PRISM, an algorithm that automatically identifies human-interpretable and transferable prompts that can effectively generate desired concepts given only black-box access to T2I models. Inspired by large language model (LLM) jailbreaking, PRISM leverages the in-context learning ability of LLMs to iteratively refine the candidate prompts distribution for given reference images. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, sty -[^5]: 通过无监督在图上学习多层感知机(MLP)加速图推理 +[^5]: ChatGPT是否能够基于Twitter提及来预测文章的撤回? - Graph Inference Acceleration by Learning MLPs on Graphs without Supervision + Can ChatGPT predict article retraction based on Twitter mentions? - [https://arxiv.org/abs/2402.08918](https://arxiv.org/abs/2402.08918) + [https://arxiv.org/abs/2403.16851](https://arxiv.org/abs/2403.16851) - 该论文提出了一个简单而有效的框架SimMLP,通过在图上无监督学习MLPs,提高了在延迟敏感的应用中的泛化能力。 + 本研究探讨了ChatGPT是否能够基于Twitter提及来预测文章的撤回,研究发现在预测未来被撤回的有问题文章方面是具有一定潜力的。 - 图神经网络(GNNs)已经在各种图学习任务中展示出了有效性,但是它们对消息传递的依赖限制了它们在延迟敏感的应用中的部署,比如金融欺诈检测。最近的研究探索了从GNNs中提取知识到多层感知机(MLPs)来加速推理。然而,这种任务特定的有监督蒸馏限制了对未见节点的泛化,而在延迟敏感的应用中这种情况很常见。为此,我们提出了一种简单而有效的框架SimMLP,用于在图上无监督学习MLPs,以增强泛化能力。SimMLP利用自监督对齐GNNs和MLPs之间的节点特征和图结构之间的精细和泛化的相关性,并提出了两种策略来减轻平凡解的风险。从理论上讲, + 检测有问题的研究文章具有重要意义,本研究探讨了根据被撤回文章在Twitter上的提及是否能够在文章被撤回前发出信号,从而在预测未来被撤回的有问题文章方面发挥作用。分析了包括3,505篇已撤回文章及其相关Twitter提及在内的数据集,以及使用粗糙精确匹配方法获取的具有类似特征的3,505篇未撤回文章。通过四种预测方法评估了Twitter提及在预测文章撤回方面的有效性,包括手动标注、关键词识别、机器学习模型和ChatGPT。手动标注的结果表明,的确有被撤回的文章,其Twitter提及包含在撤回前发出信号的可识别证据,尽管它们只占所有被撤回文章的一小部分。 - arXiv:2402.08918v1 Announce Type: cross Abstract: Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph learning tasks, yet their reliance on message-passing constraints their deployment in latency-sensitive applications such as financial fraud detection. Recent works have explored distilling knowledge from GNNs to Multi-Layer Perceptrons (MLPs) to accelerate inference. However, this task-specific supervised distillation limits generalization to unseen nodes, which are prevalent in latency-sensitive applications. To this end, we present \textbf{\textsc{SimMLP}}, a \textbf{\textsc{Sim}}ple yet effective framework for learning \textbf{\textsc{MLP}}s on graphs without supervision, to enhance generalization. \textsc{SimMLP} employs self-supervised alignment between GNNs and MLPs to capture the fine-grained and generalizable correlation between node features and graph structures, and proposes two strategies to alleviate the risk of trivial solutions. Theoretically, w + arXiv:2403.16851v1 Announce Type: cross Abstract: Detecting problematic research articles timely is a vital task. This study explores whether Twitter mentions of retracted articles can signal potential problems with the articles prior to retraction, thereby playing a role in predicting future retraction of problematic articles. A dataset comprising 3,505 retracted articles and their associated Twitter mentions is analyzed, alongside 3,505 non-retracted articles with similar characteristics obtained using the Coarsened Exact Matching method. The effectiveness of Twitter mentions in predicting article retraction is evaluated by four prediction methods, including manual labelling, keyword identification, machine learning models, and ChatGPT. Manual labelling results indicate that there are indeed retracted articles with their Twitter mentions containing recognizable evidence signaling problems before retraction, although they represent only a limited share of all retracted articles with -[^6]: Voronoi Candidates用于贝叶斯优化 +[^6]: 在未观测混杂因素下审计公平性 - Voronoi Candidates for Bayesian Optimization + Auditing Fairness under Unobserved Confounding - [https://arxiv.org/abs/2402.04922](https://arxiv.org/abs/2402.04922) + [https://arxiv.org/abs/2403.14713](https://arxiv.org/abs/2403.14713) - 使用Voronoi候选点边界可以在贝叶斯优化中有效地优化黑盒函数,提高了多起始连续搜索的执行时间。 + 在未观测混杂因素的情况下,本文展示了即使在放宽或甚至在排除所有相关风险因素被观测到的假设的情况下,仍然可以给出对高风险个体分配率的信息丰富的界限。 - 贝叶斯优化(BO)为高效优化黑盒函数提供了一种优雅的方法。然而,采集准则需要进行具有挑战性的内部优化,这可能引起很大的开销。许多实际的BO方法,尤其是在高维情况下,不采用对采集函数进行形式化连续优化,而是在有限的空间填充候选集上进行离散搜索。在这里,我们提议使用候选点,其位于当前设计点的Voronoi镶嵌边界上,因此它们与两个或多个设计点等距离。我们讨论了通过直接采样Voronoi边界而不明确生成镶嵌的策略,从而适应高维度中的大设计。通过使用高斯过程和期望改进来对一组测试问题进行优化,我们的方法在不损失准确性的情况下显著提高了多起始连续搜索的执行时间。 + 决策系统中的一个基本问题是跨越人口统计线存在不公平性。然而,不公平性可能难以量化,特别是如果我们对公平性的理解依赖于难以衡量的风险等观念(例如,对于那些没有其治疗就会死亡的人平等获得治疗)。审计这种不公平性需要准确测量个体风险,而在未观测混杂的现实环境中,难以估计。在这些未观测到的因素“解释”明显差异的情况下,我们可能低估或高估不公平性。在本文中,我们展示了即使在放宽或(令人惊讶地)甚至在排除所有相关风险因素被观测到的假设的情况下,仍然可以对高风险个体的分配率给出信息丰富的界限。我们利用了在许多实际环境中(例如引入新型治疗)我们拥有在任何分配之前的数据的事实。 - Bayesian optimization (BO) offers an elegant approach for efficiently optimizing black-box functions. However, acquisition criteria demand their own challenging inner-optimization, which can induce significant overhead. Many practical BO methods, particularly in high dimension, eschew a formal, continuous optimization of the acquisition function and instead search discretely over a finite set of space-filling candidates. Here, we propose to use candidates which lie on the boundary of the Voronoi tessellation of the current design points, so they are equidistant to two or more of them. We discuss strategies for efficient implementation by directly sampling the Voronoi boundary without explicitly generating the tessellation, thus accommodating large designs in high dimension. On a battery of test problems optimized via Gaussian processes with expected improvement, our proposed approach significantly improves the execution time of a multi-start continuous search without a loss in accuracy + arXiv:2403.14713v1 Announce Type: cross Abstract: A fundamental problem in decision-making systems is the presence of inequity across demographic lines. However, inequity can be difficult to quantify, particularly if our notion of equity relies on hard-to-measure notions like risk (e.g., equal access to treatment for those who would die without it). Auditing such inequity requires accurate measurements of individual risk, which is difficult to estimate in the realistic setting of unobserved confounding. In the case that these unobservables "explain" an apparent disparity, we may understate or overstate inequity. In this paper, we show that one can still give informative bounds on allocation rates among high-risk individuals, even while relaxing or (surprisingly) even when eliminating the assumption that all relevant risk factors are observed. We utilize the fact that in many real-world settings (e.g., the introduction of a novel treatment) we have data from a period prior to any alloc -[^7]: PAC隐私保护扩散模型 +[^7]: 随机舍入隐式正则化高瘦矩阵 - PAC Privacy Preserving Diffusion Models + Stochastic Rounding Implicitly Regularizes Tall-and-Thin Matrices - [https://arxiv.org/abs/2312.01201](https://arxiv.org/abs/2312.01201) + [https://arxiv.org/abs/2403.12278](https://arxiv.org/abs/2403.12278) - 提出了一种PAC隐私保护扩散模型,通过将私有分类器指导集成到采样过程中增强隐私保护,并发展了一种新的度量标准来衡量隐私水平,在保护性能方面表现出卓越表现。 + 随机舍入技术能有效隐式正则化高瘦矩阵,确保舍入后的矩阵具有完整的列秩。 - 数据隐私保护正在引起研究人员的越来越多的关注。扩散模型(DMs),尤其是具有严格的差分隐私,有可能生成既具有高隐私性又具有良好视觉质量的图像。然而,挑战在于确保在私有化特定数据属性时的强大保护,当前模型在这些方面经常存在不足。为了解决这些挑战,我们引入了PAC隐私保护扩散模型,这是一种利用扩散原理并确保“可能大致正确(PAC)”隐私性的模型。我们通过将私有分类器指导集成到Langevin采样过程中来增强隐私保护。此外,认识到在衡量模型隐私性方面存在差距,我们开发了一种新的度量标准来衡量隐私水平。我们的模型通过这个新度量标准评估,并通过高斯矩阵计算支持PAC界限,表现出更优异的隐私性能。 + 受到随机舍入在机器学习和大规模深度神经网络模型训练中的流行,我们考虑实矩阵$\mathbf{A}$的随机近似舍入,其中行数远远多于列数。我们提供了新颖的理论证据,并通过大量实验评估支持,高概率下,随机舍入矩阵的最小奇异值远离零--无论$\mathbf{A}$接近奇异还是$\mathbf{A}$奇异。换句话说,随机舍入\textit{隐式正则化}高瘦矩阵$\mathbf{A}$,使得舍入后的版本具有完整的列秩。我们的证明利用了随机矩阵理论中的有力结果,以及随机舍入误差不集中在低维列空间的思想。 - arXiv:2312.01201v2 Announce Type: replace-cross Abstract: Data privacy protection is garnering increased attention among researchers. Diffusion models (DMs), particularly with strict differential privacy, can potentially produce images with both high privacy and visual quality. However, challenges arise such as in ensuring robust protection in privatizing specific data attributes, areas where current models often fall short. To address these challenges, we introduce the PAC Privacy Preserving Diffusion Model, a model leverages diffusion principles and ensure Probably Approximately Correct (PAC) privacy. We enhance privacy protection by integrating a private classifier guidance into the Langevin Sampling Process. Additionally, recognizing the gap in measuring the privacy of models, we have developed a novel metric to gauge privacy levels. Our model, assessed with this new metric and supported by Gaussian matrix computations for the PAC bound, has shown superior performance in privacy p + arXiv:2403.12278v1 Announce Type: new Abstract: Motivated by the popularity of stochastic rounding in the context of machine learning and the training of large-scale deep neural network models, we consider stochastic nearness rounding of real matrices $\mathbf{A}$ with many more rows than columns. We provide novel theoretical evidence, supported by extensive experimental evaluation that, with high probability, the smallest singular value of a stochastically rounded matrix is well bounded away from zero -- regardless of how close $\mathbf{A}$ is to being rank deficient and even if $\mathbf{A}$ is rank-deficient. In other words, stochastic rounding \textit{implicitly regularizes} tall and skinny matrices $\mathbf{A}$ so that the rounded version has full column rank. Our proofs leverage powerful results in random matrix theory, and the idea that stochastic rounding errors do not concentrate in low-dimensional column spaces. -[^8]: Vecchia-Laplace近似法在潜在高斯过程模型中的迭代方法 +[^8]: 早期方向性收敛在深度齐次神经网络中进行小初始化时的分析 - Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models. (arXiv:2310.12000v1 [stat.ME]) + Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations - [http://arxiv.org/abs/2310.12000](http://arxiv.org/abs/2310.12000) + [https://arxiv.org/abs/2403.08121](https://arxiv.org/abs/2403.08121) - 这篇文章介绍了用于潜在高斯过程模型中的Vecchia-Laplace近似法的迭代方法,相比于传统的Cholesky分解方法,可以显著加快计算速度。 + 本文研究了训练深度齐次神经网络时梯度流动力学的动态性,发现在足够小的初始化下,神经网络的权重在训练早期阶段保持较小规范,并且沿着神经相关函数的KKT点方向近似收敛。 - 潜在高斯过程(GP)模型是灵活的概率非参数函数模型。Vecchia近似是用于克服大数据计算瓶颈的准确近似方法,Laplace近似是一种快速方法,可以近似非高斯似然函数的边缘似然和后验预测分布,并具有渐近收敛保证。然而,当与直接求解方法(如Cholesky分解)结合使用时,Vecchia-Laplace近似的计算复杂度增长超线性地随样本大小增加。因此,与Vecchia-Laplace近似计算相关的运算在通常情况下是最准确的大型数据集时会变得非常缓慢。在本文中,我们提出了几种用于Vecchia-Laplace近似推断的迭代方法,相比于基于Cholesky的计算,可以大大加快计算速度。我们对我们的方法进行了分析。 + 本文研究了训练深度齐次神经网络时梯度流动力学的动态性,这些网络从小初始化开始。本文考虑到具有局部Lipschitz梯度和阶数严格大于两的神经网络。文章证明了对于足够小的初始化,在训练的早期阶段,神经网络的权重保持规范较小,并且在Karush-Kuhn-Tucker (KKT)点处近似沿着神经相关函数的方向收敛。此外,对于平方损失并在神经网络权重上进行可分离假设的情况下,还展示了在损失函数的某些鞍点附近梯度流动动态的类似方向性收敛。 - Latent Gaussian process (GP) models are flexible probabilistic non-parametric function models. Vecchia approximations are accurate approximations for GPs to overcome computational bottlenecks for large data, and the Laplace approximation is a fast method with asymptotic convergence guarantees to approximate marginal likelihoods and posterior predictive distributions for non-Gaussian likelihoods. Unfortunately, the computational complexity of combined Vecchia-Laplace approximations grows faster than linearly in the sample size when used in combination with direct solver methods such as the Cholesky decomposition. Computations with Vecchia-Laplace approximations thus become prohibitively slow precisely when the approximations are usually the most accurate, i.e., on large data sets. In this article, we present several iterative methods for inference with Vecchia-Laplace approximations which make computations considerably faster compared to Cholesky-based calculations. We analyze our propo + arXiv:2403.08121v1 Announce Type: new Abstract: This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks, starting with small initializations. The present work considers neural networks that are assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. This paper demonstrates that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker (KKT) points of the neural correlation function introduced in [1]. Additionally, for square loss and under a separability assumption on the weights of neural networks, a similar directional convergence of gradient flow dynamics is shown near certain saddle points of the loss function. -[^9]: 神经网络的记忆化:超越最坏情况 +[^9]: 语音鲁棒基准:用于语音识别的鲁棒性基准 - Memorization with neural nets: going beyond the worst case. (arXiv:2310.00327v1 [stat.ML]) + Speech Robust Bench: A Robustness Benchmark For Speech Recognition - [http://arxiv.org/abs/2310.00327](http://arxiv.org/abs/2310.00327) + [https://arxiv.org/abs/2403.07937](https://arxiv.org/abs/2403.07937) - 本文研究了神经网络的插值问题,提出了一种简单的随机算法,在给定的数据集和两个类的情况下,能够以很高的概率构建一个插值的神经网络。这些结果与训练数据规模无关。 + 提出了一个全面基准(SRB),用于评估自动语音识别(ASR)模型对各种破坏的鲁棒性,发现模型大小和某些建模选择有助于提高鲁棒性,并观察到在不同人口亚组上模型的鲁棒性存在明显差异。 - 在实践中,深度神经网络通常能够轻松地插值其训练数据。为了理解这一现象,许多研究都旨在量化神经网络架构的记忆能力:即在任意放置这些点并任意分配标签的情况下,架构能够插值的最大点数。然而,对于实际数据,人们直觉地期望存在一种良性结构,使得插值在比记忆能力建议的较小网络尺寸上已经发生。在本文中,我们通过采用实例特定的观点来研究插值。我们引入了一个简单的随机算法,它可以在多项式时间内给定一个固定的有限数据集和两个类的情况下,以很高的概率构建出一个插值三层神经网络。所需的参数数量与这两个类的几何特性及其相互排列有关。因此,我们获得了与训练数据规模无关的保证。 + 随着自动语音识别(ASR)模型变得越来越普遍,确保它们在物理世界和数字世界中的各种破坏下进行可靠预测变得愈发重要。我们提出了语音鲁棒基准(SRB),这是一个用于评估ASR模型对各种破坏的鲁棒性的全面基准。SRB由69个输入扰动组成,旨在模拟ASR模型可能在物理世界和数字世界中遇到的各种破坏。我们使用SRB来评估几种最先进的ASR模型的鲁棒性,并观察到模型大小和某些建模选择(如离散表示和自我训练)似乎有助于提高鲁棒性。我们将此分析扩展到衡量ASR模型在来自各种人口亚组的数据上的鲁棒性,即英语和西班牙语使用者以及男性和女性,并观察到模型的鲁棒性在不同亚组之间存在明显差异。 - In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of t + arXiv:2403.07937v1 Announce Type: cross Abstract: As Automatic Speech Recognition (ASR) models become ever more pervasive, it is important to ensure that they make reliable predictions under corruptions present in the physical and digital world. We propose Speech Robust Bench (SRB), a comprehensive benchmark for evaluating the robustness of ASR models to diverse corruptions. SRB is composed of 69 input perturbations which are intended to simulate various corruptions that ASR models may encounter in the physical and digital world. We use SRB to evaluate the robustness of several state-of-the-art ASR models and observe that model size and certain modeling choices such as discrete representations, and self-training appear to be conducive to robustness. We extend this analysis to measure the robustness of ASR models on data from various demographic subgroups, namely English and Spanish speakers, and males and females, and observed noticeable disparities in the model's robustness across su -[^10]: 我们可以从量子卷积神经网络中学到什么? +[^10]: 用于生成简要住院病程摘要的领域自适应大语言模型的基准测试 - What can we learn from quantum convolutional neural networks?. (arXiv:2308.16664v1 [quant-ph]) + A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries - [http://arxiv.org/abs/2308.16664](http://arxiv.org/abs/2308.16664) + [https://arxiv.org/abs/2403.05720](https://arxiv.org/abs/2403.05720) - 通过分析量子卷积神经网络(QCNNs),我们发现它们通过隐藏特征映射嵌入物理系统参数,并且利用量子临界性生成适合的基函数集,池化层选择能够形成高性能决策边界的基函数,而模型的泛化性能依赖于嵌入类型。 + 介绍了一个新的基准测试,评估了用于生成简要住院病程摘要的大语言模型在健康保健领域中的性能并提出相应的自适应策略 - 通过分析量子卷积神经网络(QCNNs),我们可以得出以下结论:1)通过隐藏特征映射,工作于量子数据可以被视为嵌入物理系统参数;2)对于量子相位识别,其高性能可以归因于在基态嵌入期间生成非常适合的基函数集,其中自旋模型的量子临界性导致具有快速变化特征的基函数;3)QCNN的池化层负责选择那些能够有助于形成高性能决策边界的基函数,学习过程对应于适应性测量,使得少量量子比特算符映射到整个寄存器可观测量;4)QCNN模型的泛化强烈依赖于嵌入类型,基于傅里叶基的旋转特征映射需要仔细的特征工程;5)基于有限数量的测量次数的读出的QCNN的准确性和泛化能力倾向于地面态。 + 简要住院病程(BHC)摘要是通过总结临床记录而生成的常见临床文件。虽然大型语言模型(LLMs)在自动化实际任务方面展现出显著能力,但它们在医疗应用(如BHC合成)中的能力尚未得到展示。为了使LLMs能够适应BHC合成,我们引入了一个新颖的基准测试,其中包含从MIMIC-IV记录中提取的经过预处理的数据集,封装了临床记录和简要住院病程(BHC)对。我们评估了两个通用LLMs和三个医疗领域适应的LLMs的性能,以改进从临床记录生成BHC。我们使用临床记录作为输入来生成BHC,采用基于提示的(使用上下文学习)和基于微调的自适应策略来应用于三个开源LLMs(Clinical-T5-Large,Llama2-13B,FLAN-UL2)和两个专有LLMs(GPT-3.5,GPT-4)。我们定量评估了性能。 - We can learn from analyzing quantum convolutional neural networks (QCNNs) that: 1) working with quantum data can be perceived as embedding physical system parameters through a hidden feature map; 2) their high performance for quantum phase recognition can be attributed to generation of a very suitable basis set during the ground state embedding, where quantum criticality of spin models leads to basis functions with rapidly changing features; 3) pooling layers of QCNNs are responsible for picking those basis functions that can contribute to forming a high-performing decision boundary, and the learning process corresponds to adapting the measurement such that few-qubit operators are mapped to full-register observables; 4) generalization of QCNN models strongly depends on the embedding type, and that rotation-based feature maps with the Fourier basis require careful feature engineering; 5) accuracy and generalization of QCNNs with readout based on a limited number of shots favor the groun + arXiv:2403.05720v1 Announce Type: cross Abstract: Brief hospital course (BHC) summaries are common clinical documents generated by summarizing clinical notes. While large language models (LLMs) depict remarkable capabilities in automating real-world tasks, their capabilities for healthcare applications such as BHC synthesis have not been shown. To enable the adaptation of LLMs for BHC synthesis, we introduce a novel benchmark consisting of a pre-processed dataset extracted from MIMIC-IV notes, encapsulating clinical note, and brief hospital course (BHC) pairs. We assess the performance of two general-purpose LLMs and three healthcare-adapted LLMs to improve BHC synthesis from clinical notes. Using clinical notes as input for generating BHCs, we apply prompting-based (using in-context learning) and fine-tuning-based adaptation strategies to three open-source LLMs (Clinical-T5-Large, Llama2-13B, FLAN-UL2) and two proprietary LLMs (GPT-3.5, GPT-4). We quantitatively evaluate the performa -[^11]: 一种简单的面向特征分布偏斜联邦学习的数据增强方法 +[^11]: SPEAR:联邦学习中批量精确梯度反演 - A Simple Data Augmentation for Feature Distribution Skewed Federated Learning. (arXiv:2306.09363v1 [cs.LG]) + SPEAR:Exact Gradient Inversion of Batches in Federated Learning - [http://arxiv.org/abs/2306.09363](http://arxiv.org/abs/2306.09363) + [https://arxiv.org/abs/2403.03945](https://arxiv.org/abs/2403.03945) - 本文针对特征分布偏斜的联邦学习提出了FedRDN方法,在输入层级上实现了数据增强,将整个联邦数据集的统计信息注入到本地客户端数据中,以缓解特征漂移问题。 + 该论文提出了第一个能够精确重构批量$b >1$的算法,在联邦学习中解决了梯度反演攻击的问题。 - 联邦学习(FL)是一种分布式协作学习方法,可以确保隐私保护。然而,由于数据异构性(即非独立同分布数据),它的性能必然受到影响。本文针对特征分布偏斜的FL场景展开研究,提出了一种通用的数据增强方法,以减轻由本地数据集之间潜在分布不同导致的特征漂移问题。 + 联邦学习是一种流行的协作机器学习框架,在这个框架中,多个客户端仅与服务器共享他们本地数据的梯度更新,而不是实际数据。不幸的是,最近发现梯度反演攻击可以从这些共享的梯度中重构出数据。现有的攻击只能在重要的诚实但好奇设置中对批量大小为$b=1$的数据进行精确重构,对于更大的批量只能进行近似重构。在这项工作中,我们提出了\emph{第一个准确重建批量$b >1$的算法}。这种方法结合了对梯度显式低秩结构的数学见解和基于采样的算法。关键的是,我们利用ReLU诱导的梯度稀疏性,精确地过滤掉大量错误的样本,使最终的重建步骤可行。我们为全连接提供了高效的GPU实现 - Federated learning (FL) facilitates collaborative learning among multiple clients in a distributed manner, while ensuring privacy protection. However, its performance is inevitably degraded as suffering data heterogeneity, i.e., non-IID data. In this paper, we focus on the feature distribution skewed FL scenario, which is widespread in real-world applications. The main challenge lies in the feature shift caused by the different underlying distributions of local datasets. While the previous attempts achieved progress, few studies pay attention to the data itself, the root of this issue. Therefore, the primary goal of this paper is to develop a general data augmentation technique at the input level, to mitigate the feature shift. To achieve this goal, we propose FedRDN, a simple yet remarkably effective data augmentation method for feature distribution skewed FL, which randomly injects the statistics of the dataset from the entire federation into the client's data. By this, our method ca + arXiv:2403.03945v1 Announce Type: new Abstract: Federated learning is a popular framework for collaborative machine learning where multiple clients only share gradient updates on their local data with the server and not the actual data. Unfortunately, it was recently shown that gradient inversion attacks can reconstruct this data from these shared gradients. Existing attacks enable exact reconstruction only for a batch size of $b=1$ in the important honest-but-curious setting, with larger batches permitting only approximate reconstruction. In this work, we propose \emph{the first algorithm reconstructing whole batches with $b >1$ exactly}. This approach combines mathematical insights into the explicit low-rank structure of gradients with a sampling-based algorithm. Crucially, we leverage ReLU-induced gradient sparsity to precisely filter out large numbers of incorrect samples, making a final reconstruction step tractable. We provide an efficient GPU implementation for fully connected -[^12]: 评分差值流模型用于隐式生成建模 +[^12]: 具有Polyak动量的非凸随机复合优化 - The Score-Difference Flow for Implicit Generative Modeling. (arXiv:2304.12906v1 [cs.LG]) + Non-Convex Stochastic Composite Optimization with Polyak Momentum - [http://arxiv.org/abs/2304.12906](http://arxiv.org/abs/2304.12906) + [https://arxiv.org/abs/2403.02967](https://arxiv.org/abs/2403.02967) - 本文提出了一种新的评分差异流模型(SD flow),它可以最优地减少两个分布之间的散度,同时解决Schr​​ödinger桥问题。与去噪扩散模型不同,它没有对先验分布施加任何限制,在一些基准数据集中优于其他方法。 + 本文研究了具有Polyak动量的随机近端梯度方法,在非凸复合优化问题中实现了最佳收敛速度,无论批量大小如何。 - 隐式生成建模(IGM)旨在生成符合目标数据分布特征的合成数据样本。最近的研究(例如评分匹配网络、扩散模型)从通过环境空间中的动态扰动或流将合成源数据推向目标分布的角度解决了IGM问题。我们引入了任意目标和源分布之间的评分差异(SD)作为流,它可以最优地减少它们之间的Kullback-Leibler散度,同时解决Schr​​ödinger桥问题。我们将SD流应用于方便的代理分布,当且仅当原始分布对齐时,它们是对齐的。我们在某些条件下展示了这种公式与去噪扩散模型的形式一致性。然而,与扩散模型不同,SD流没有对先验分布施加任何限制。我们还表明,在无限辨别器能力的极限下,生成对抗网络的训练包含SD流。我们的实验表明,SD流在几个基准数据集上优于先前的最新技术。 + 随机近端梯度法是广泛使用的随机梯度下降(SGD)方法的一个强大泛化,在机器学习中已经被广泛应用。然而,众所周知,当随机噪声显著时(即仅使用小型或有界批量大小时),该方法在非凸环境中无法收敛。本文关注具有Polyak动量的随机近端梯度方法。我们证明了该方法对于非凸复合优化问题实现了最佳收敛速度,而批量大小大小无关。此外,我们对Polyak动量在复合优化环境中的方差减少效应进行了严格分析,并且我们证明了当近端步骤只能通过近似解来求解时,该方法也会收敛。最后,我们提供了数值实验来验证我们的理论结果。 - Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. We introduce the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schr\"odinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. However, unlike diffusion models, SD flow places no restrictions on the prior distribution. We also show that the training of generative adversarial networks includ + arXiv:2403.02967v1 Announce Type: cross Abstract: The stochastic proximal gradient method is a powerful generalization of the widely used stochastic gradient descent (SGD) method and has found numerous applications in Machine Learning. However, it is notoriously known that this method fails to converge in non-convex settings where the stochastic noise is significant (i.e. when only small or bounded batch sizes are used). In this paper, we focus on the stochastic proximal gradient method with Polyak momentum. We prove this method attains an optimal convergence rate for non-convex composite optimization problems, regardless of batch size. Additionally, we rigorously analyze the variance reduction effect of the Polyak momentum in the composite optimization setting and we show the method also converges when the proximal step can only be solved inexactly. Finally, we provide numerical experiments to validate our theoretical results. + +[^13]: 移除平方根:一种新的高效标度不变版本的AdaGrad + + Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad + + [https://arxiv.org/abs/2403.02648](https://arxiv.org/abs/2403.02648) + + KATE是一种新的优化算法,提出了一种与AdaGrad标度不变的适应方法,并在广义线性模型和一般的非凸问题中证明了其标度不变性。数值实验结果表明,KATE在各种场景中均优于AdaGrad并与Adam性能匹配/超越。 + + + + 自适应方法在机器学习中非常流行,因为它们可以降低学习速率调整的成本。本文引入了一种名为KATE的新型优化算法,它提出了一个著名的AdaGrad算法的标度不变适应。我们证明了KATE在广义线性模型案例中的标度不变性。此外,对于一般的光滑非凸问题,我们为KATE建立了一个收敛速率为$O \left(\frac{\log T}{\sqrt{T}} \right)$,与AdaGrad和Adam的最佳收敛速率相匹配。我们还通过不同问题的数值实验将KATE与其他最先进的自适应算法Adam和AdaGrad进行了比较,包括在真实数据上进行图像分类和文本分类等复杂机器学习任务。结果表明,在所有考虑到的场景中,KATE始终胜过AdaGrad,并且在性能上匹配/超越Adam。 + + arXiv:2403.02648v1 Announce Type: cross Abstract: Adaptive methods are extremely popular in machine learning as they make learning rate tuning less expensive. This paper introduces a novel optimization algorithm named KATE, which presents a scale-invariant adaptation of the well-known AdaGrad algorithm. We prove the scale-invariance of KATE for the case of Generalized Linear Models. Moreover, for general smooth non-convex problems, we establish a convergence rate of $O \left(\frac{\log T}{\sqrt{T}} \right)$ for KATE, matching the best-known ones for AdaGrad and Adam. We also compare KATE to other state-of-the-art adaptive algorithms Adam and AdaGrad in numerical experiments with different problems, including complex machine learning tasks like image classification and text classification on real data. The results indicate that KATE consistently outperforms AdaGrad and matches/surpasses the performance of Adam in all considered scenarios. + +[^14]: RNNs还不是Transformer:在上下文检索中的关键瓶颈 + + RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval + + [https://arxiv.org/abs/2402.18510](https://arxiv.org/abs/2402.18510) + + 本文研究了RNNs和Transformer在处理算法问题时的表现能力差距,发现RNNs存在关键瓶颈,即无法完美地从上下文中检索信息,导致无法像Transformer那样轻松解决需要这种能力的任务。 + + + + 本文探讨循环神经网络(RNNs)和Transformer在解决算法问题时的表示能力差距。我们重点关注RNNs是否能在处理长序列时,通过Chain-of-Thought (CoT)提示,与Transformer的性能相匹配。我们的理论分析显示CoT可以改进RNNs,但无法弥补与Transformer之间的差距。关键瓶颈在于RNNs无法完全从上下文中检索信息,即使经过CoT的增强:对于几个明确或隐式需要这种能力的任务,如联想召回和确定图是否为树,我们证明RNNs表达能力不足以解决这些任务,而Transformer可以轻松解决。相反,我们证明采用增强RNNs上下文检索能力的技术,包括 + + arXiv:2402.18510v1 Announce Type: cross Abstract: This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems. We focus on understanding whether RNNs, known for their memory efficiency in handling long sequences, can match the performance of Transformers, particularly when enhanced with Chain-of-Thought (CoT) prompting. Our theoretical analysis reveals that CoT improves RNNs but is insufficient to close the gap with Transformers. A key bottleneck lies in the inability of RNNs to perfectly retrieve information from the context, even with CoT: for several tasks that explicitly or implicitly require this capability, such as associative recall and determining if a graph is a tree, we prove that RNNs are not expressive enough to solve the tasks while Transformers can solve them with ease. Conversely, we prove that adopting techniques to enhance the in-context retrieval capability of RNNs, inclu + +[^15]: 融合编码器网络 + + Fusion Encoder Networks + + [https://arxiv.org/abs/2402.15883](https://arxiv.org/abs/2402.15883) + + FENs是一种神经网络算法,具有对数深度且可以在线性时间内处理序列,关键创新在于通过训练大致线性数量的常深度神经网络并行学习。 + + + + 在本文中,我们提出了一种名为融合编码器网络(FENs)的算法类:用于创建将固定长度序列映射到输出的神经网络。生成的神经网络仅具有对数深度(减轻数据在网络中传播时的退化),可以在线性时间内处理序列(或者在具有线性处理器数量的对数时间内)。FENs的关键属性是它们通过训练大致线性数量的常深度神经网络并行学习。这些网络具有常深度意味着反向传播效果良好。需要注意的是,目前FENs的性能仅仅是推测,因为我们尚未实现它们。 + + arXiv:2402.15883v1 Announce Type: new Abstract: In this paper we present fusion encoder networks (FENs): a class of algorithms for creating neural networks that map fixed-length sequences to outputs. The resulting neural network has only logarithmic depth (alleviating the degradation of data as it propagates through the network) and can process sequences in linear time (or in logarithmic time with a linear number of processors). The crucial property of FENs is that they learn by training a quasi-linear number of constant-depth neural networks in parallel. The fact that these networks are constant depth means that backpropagation works well. We note that currently the performance of FENs is only conjectured as we are yet to implement them. + +[^16]: 关于大学习率下梯度下降的稳定性 + + On the Stability of Gradient Descent for Large Learning Rate + + [https://arxiv.org/abs/2402.13108](https://arxiv.org/abs/2402.13108) + + 本文研究了线性神经网络在二次损失函数下的优化问题,证明了梯度下降映射的非奇异性以及全局最小值点集的光滑流形特性,为理解大学习率下梯度下降的稳定性提供了重要线索。 + + + + 目前对理解“稳定性边缘(EoS)”现象存在着相当大的兴趣,这一现象在神经网络训练中被观察到,其特点是损失函数在不同纪元间的非单调下降,而损失的陡峭度(Hessian的谱范数)逐渐接近并稳定在2/(学习率)附近。最近有人提出了使用梯度下降训练时出现EoS的原因——沿梯度下降轨迹附近缺乏平坦的极小值点,同时存在紧致的正向不变集。在本文中,我们证明了在二次损失函数下优化的线性神经网络满足第一个假设以及第二个假设的一个必要条件。更具体地,我们证明了梯度下降映射是非奇异的,损失函数的全局最小值点集构成一个光滑流形,并且稳定的极小值构成有界子集。 + + arXiv:2402.13108v1 Announce Type: new Abstract: There currently is a significant interest in understanding the Edge of Stability (EoS) phenomenon, which has been observed in neural networks training, characterized by a non-monotonic decrease of the loss function over epochs, while the sharpness of the loss (spectral norm of the Hessian) progressively approaches and stabilizes around 2/(learning rate). Reasons for the existence of EoS when training using gradient descent have recently been proposed -- a lack of flat minima near the gradient descent trajectory together with the presence of compact forward-invariant sets. In this paper, we show that linear neural networks optimized under a quadratic loss function satisfy the first assumption and also a necessary condition for the second assumption. More precisely, we prove that the gradient descent map is non-singular, the set of global minimizers of the loss function forms a smooth manifold, and the stable minima form a bounded subset i + +[^17]: 基于查询的对抗性提示生成 + + Query-Based Adversarial Prompt Generation + + [https://arxiv.org/abs/2402.12329](https://arxiv.org/abs/2402.12329) + + 该研究提出了一种基于查询的对抗性攻击方法,通过利用远程语言模型的 API 访问构造对抗性示例,使模型以更高概率发出有害字符串,而非仅仅基于模型之间的转移性攻击。 + + + + 最近的研究表明,可以构造对抗性示例,导致一个对其进行了调整的语言模型产生有害字符串或执行有害行为。现有的攻击要么在白盒设置中(完全访问模型权重),要么通过可转移性:一种现象,即在一个模型上精心设计的对抗性示例通常在其他模型上仍然有效。我们通过基于查询的攻击改进以前的工作,利用 API 访问远程语言模型来构造对抗性示例,使模型以(明显)更高的概率发出有害字符串,而不能仅仅使用转移攻击。我们在 GPT-3.5 和 OpenAI 的安全分类器上验证了我们的攻击;我们能够让 GPT-3.5 发出有害字符串,而目前的转移攻击失败了,并且我们几乎以 100% 的概率规避了安全分类器。 + + arXiv:2402.12329v1 Announce Type: cross Abstract: Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior. Existing attacks work either in the white-box setting (with full access to the model weights), or through transferability: the phenomenon that adversarial examples crafted on one model often remain effective on other models. We improve on prior work with a query-based attack that leverages API access to a remote language model to construct adversarial examples that cause the model to emit harmful strings with (much) higher probability than with transfer-only attacks. We validate our attack on GPT-3.5 and OpenAI's safety classifier; we can cause GPT-3.5 to emit harmful strings that current transfer attacks fail at, and we can evade the safety classifier with nearly 100% probability. + +[^18]: 贝叶斯参数高效微调以克服灾难性遗忘 + + Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting + + [https://arxiv.org/abs/2402.12220](https://arxiv.org/abs/2402.12220) + + 这项研究展示了如何利用贝叶斯学习技术应用于参数高效微调,以防止灾难性遗忘,实现了预训练知识的保留,并在语言建模和语音合成任务中取得成功。 + + + + 虽然最初是被文本转语音合成模型的自适应所激发,但我们认为更通用的参数高效微调(PEFT)是进行这种自适应的适当框架。然而,灾难性遗忘仍然是PEFT面临的问题,它损害了预训练模型固有的能力。我们证明现有的贝叶斯学习技术可以应用于PEFT,以防止灾难性遗忘,只要能够可微地计算微调层的参数转换。在一系列关于语言建模和语音合成任务的基础性实验中,我们利用建立的拉普拉斯近似,包括对角线和Kronecker分解方法,来正则化PEFT与低秩适应(LoRA)并比较它们在保留预训练知识方面的性能。我们的结果表明,我们的方法可以克服灾难性遗忘,而不会降低微调性能。 + + arXiv:2402.12220v1 Announce Type: cross Abstract: Although motivated by the adaptation of text-to-speech synthesis models, we argue that more generic parameter-efficient fine-tuning (PEFT) is an appropriate framework to do such adaptation. However, catastrophic forgetting remains an issue with PEFT, damaging the pre-trained model's inherent capabilities. We demonstrate that existing Bayesian learning techniques can be applied to PEFT to prevent catastrophic forgetting as long as the parameter shift of the fine-tuned layers can be calculated differentiably. In a principled series of experiments on language modeling and speech synthesis tasks, we utilize established Laplace approximations, including diagonal and Kronecker factored approaches, to regularize PEFT with the low-rank adaptation (LoRA) and compare their performance in pre-training knowledge preservation. Our results demonstrate that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning perfo + +[^19]: CHEMREASONER:使用量子化学反馈在大型语言模型的知识空间中进行启发式搜索 + + CHEMREASONER: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback + + [https://arxiv.org/abs/2402.10980](https://arxiv.org/abs/2402.10980) + + 通过将大型语言模型推理与量子化学反馈相结合,我们引入了一个AI引导的计算筛选框架,将催化剂发现形式化为一个不确定环境,从而实现高效催化剂的积极搜索 + + + + arXiv:2402.10980v1 类型公告:跨领域 摘要:发现新的催化剂对于设计新的更高效的化学过程至关重要,以实现向可持续未来的过渡。我们引入了一种人工智能引导的计算筛选框架,将语言推理与基于量子化学的三维原子表示的反馈统一起来。我们的方法将催化剂发现构建为一个不确定环境,其中一个代理通过大型语言模型(LLM)推导的假设与基于原子图神经网络(GNN)的反馈的迭代组合,积极搜索高效催化剂。在中间搜索步骤确定的催化剂经过基于空间定向、反应途径和稳定性的结构评估。基于吸附能和势垒的评分函数引导在LLM的知识空间中向能量有利、高效的催化剂探索。我们引入了可以自动规划的方法 + + arXiv:2402.10980v1 Announce Type: cross Abstract: The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts. We introduce planning methods that automaticall + +[^20]: 异步私有联邦学习中的动量近似 + + Momentum Approximation in Asynchronous Private Federated Learning + + [https://arxiv.org/abs/2402.09247](https://arxiv.org/abs/2402.09247) + + 本文提出了动量近似方法,在异步私有联邦学习(FL)中有效结合了动量和异步协议的技术,通过最小化动量更新的偏差来改进模型性能。实证研究证明了动量近似在基准FL数据集上的有效性。 + + + + 异步协议已被证明能够提高大规模客户端联邦学习(FL)的可扩展性。同时,基于动量的方法可以在同步FL中实现最佳模型质量。然而,在异步FL算法中简单地应用动量会导致收敛速度变慢和模型性能下降。如何有效地结合这两种技术以实现双赢目前尚不清楚。在本文中,我们发现异步性引入了对动量更新的隐含偏差。为了解决这个问题,我们提出了动量近似,通过找到所有历史模型更新的最佳加权平均值来最小化偏差。动量近似与安全聚合和差分隐私是兼容的,并且可以在生产的FL系统中很容易地集成,只需较小的通信和存储成本。我们在基准FL数据集上进行了实证研究,证明了动量近似在性能上的改进效果。 + + arXiv:2402.09247v1 Announce Type: new Abstract: Asynchronous protocols have been shown to improve the scalability of federated learning (FL) with a massive number of clients. Meanwhile, momentum-based methods can achieve the best model quality in synchronous FL. However, naively applying momentum in asynchronous FL algorithms leads to slower convergence and degraded model performance. It is still unclear how to effective combinie these two techniques together to achieve a win-win. In this paper, we find that asynchrony introduces implicit bias to momentum updates. In order to address this problem, we propose momentum approximation that minimizes the bias by finding an optimal weighted average of all historical model updates. Momentum approximation is compatible with secure aggregation as well as differential privacy, and can be easily integrated in production FL systems with a minor communication and storage cost. We empirically demonstrate that on benchmark FL datasets, momentum appro + +[^21]: 学习可解释概念:统一因果表示学习与基础模型 + + Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models + + [https://arxiv.org/abs/2402.09236](https://arxiv.org/abs/2402.09236) + + 本研究将因果表示学习和基础模型相结合,研究了如何从数据中学习人类可解释的概念。实验证明了这一统一方法的实用性。 + + + + 构建智能机器学习系统有两种广泛的方法。一种方法是构建天生可解释的模型,这是因果表示学习领域的努力方向。另一种方法是构建高性能的基础模型,然后投入努力去理解它们的工作原理。本研究将这两种方法联系起来,研究如何从数据中学习人类可解释的概念。通过结合这两个领域的思想,我们正式定义了概念的概念,并展示了它们可以从多样的数据中被可靠地恢复出来。对于合成数据和大型语言模型的实验证明了我们统一方法的实用性。 + + arXiv:2402.09236v1 Announce Type: cross Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach. + +[^22]: 个性化语言模型基于个性化人类反馈 + + Personalized Language Modeling from Personalized Human Feedback + + [https://arxiv.org/abs/2402.05133](https://arxiv.org/abs/2402.05133) + + 该论文提出了一个个性化语言模型的方法,通过在于用户的反馈数据中引入个性化特征来解决强化学习框架在多样化用户偏好下存在的问题。 + + + + 从个性化人类反馈中进行强化学习(RLHF)是目前主流的框架,用于调整大型语言模型以更好地符合人类偏好。然而,在这个框架下开发的算法的基本前提在用户偏好多样化的情况下可能会出现问题。在本文中,我们旨在通过开发个性化语言模型的方法来解决这个问题。我们首先正式介绍了从个性化人类反馈中学习的任务,并解释了为什么在这种情况下普通的RLHF可能会存在问题。然后,我们提出了一个通用的个性化-RLHF(P-RLHF)框架,需要同时学习用户模型和语言(或奖励)模型。用户模型接收用户信息并输出用户表示。其结构编码了我们对反馈数据中用户偏好的假设。我们为个性化奖励建模和个性化直接偏好优化开发了新的学习目标。 + + Reinforcement Learning from Human Feedback (RLHF) is the current dominating framework to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be problematic in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, which requires one to jointly learn a user model and a language (or reward) model. The user model takes in user information and outputs user representations. Its structure encodes our assumptions about user preferences underlying the feedback data. We develop new learning objectives for personalized reward modeling and personalized Direct Preference Optimizat + +[^23]: ActiveAnno3D - 一种用于多模态3D物体检测的主动学习框架 + + ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection + + [https://arxiv.org/abs/2402.03235](https://arxiv.org/abs/2402.03235) + + 这项工作提出了一种用于多模态3D物体检测的主动学习框架ActiveAnno3D。通过选择最具信息量的训练数据样本进行标注,我们能够在使用一半的训练数据时实现与传统方法相近的检测性能。 + + + + 大规模数据集的筛选仍然需要大量的时间和资源,数据通常需要人工标注,创建高质量数据集的难题依然存在。在这项工作中,我们使用主动学习的方法来解决多模态3D物体检测中的研究空白。我们提出了ActiveAnno3D,一个用于选择最具信息量的训练数据样本进行标注的主动学习框架。我们探索了各种连续训练方法,并集成了在计算需求和检测性能方面最高效的方法。此外,我们对nuScenes和TUM Traffic Intersection数据集进行了大量实验和消融研究,使用BEVFusion和PV-RCNN进行了测试。我们展示了当仅使用TUM Traffic Intersection数据集的一半训练数据(77.25 mAP相比于83.50 mAP)时,使用PV-RCNN和基于熵的查询策略几乎可以达到相同的性能,而BEVFusion则在使用一半的训练数据时获得了64.31的mAP。 + + The curation of large-scale datasets is still costly and requires much time and resources. Data is often manually labeled, and the challenge of creating high-quality datasets remains. In this work, we fill the research gap using active learning for multi-modal 3D object detection. We propose ActiveAnno3D, an active learning framework to select data samples for labeling that are of maximum informativeness for training. We explore various continuous training methods and integrate the most efficient method regarding computational demand and detection performance. Furthermore, we perform extensive experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset. We show that we can achieve almost the same performance with PV-RCNN and the entropy-based query strategy when using only half of the training data (77.25 mAP compared to 83.50 mAP) of the TUM Traffic Intersection dataset. BEVFusion achieved an mAP of 64.31 when using half of the trai + +[^24]: FuseMoE:用于灵活多模态融合的专家混合Transformer + + FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion + + [https://arxiv.org/abs/2402.03226](https://arxiv.org/abs/2402.03226) + + 本论文提出了一种名为FuseMoE的专家混合Transformer框架,通过创新的门控函数实现灵活融合多模态数据,能够有效地处理缺失模态和不规则采样数据,同时改善模型的预测性能,在临床风险预测任务中具有实际应用价值。 + + + + 随着机器学习模型在关键领域越来越多地处理多模态数据,它们面临处理多种模态的双重挑战,这些模态经常因缺失元素而不完整,以及收集样本的时间不规则性和稀疏性。成功利用这种复杂数据,同时克服高质量训练样本的稀缺性,是提高这些模型预测性能的关键。我们引入了``FuseMoE'',这是一个集成创新门控函数的专家混合框架。FuseMoE旨在整合多种模态,并且在处理缺失模态和不规则采样数据轨迹的情况下非常有效。在理论上,我们独特的门控函数有助于提高收敛速度,在多个下游任务中表现更好。FuseMoE的实际实用性通过一系列具有挑战性的临床风险预测任务得到验证。 + + As machine learning models in critical fields increasingly grapple with multimodal data, they face the dual challenges of handling a wide array of modalities, often incomplete due to missing elements, and the temporal irregularity and sparsity of collected samples. Successfully leveraging this complex data, while overcoming the scarcity of high-quality training samples, is key to improving these models' predictive performance. We introduce ``FuseMoE'', a mixture-of-experts framework incorporated with an innovative gating function. Designed to integrate a diverse number of modalities, FuseMoE is effective in managing scenarios with missing modalities and irregularly sampled data trajectories. Theoretically, our unique gating function contributes to enhanced convergence rates, leading to better performance in multiple downstream tasks. The practical utility of FuseMoE in real world is validated by a challenging set of clinical risk prediction tasks. + +[^25]: TopoX: 一个用于拓扑域上的机器学习的Python软件包套件 + + TopoX: A Suite of Python Packages for Machine Learning on Topological Domains + + [https://arxiv.org/abs/2402.02441](https://arxiv.org/abs/2402.02441) + + TopoX是一个用于在拓扑域上进行机器学习的Python软件包套件,包含了构建、计算和嵌入拓扑域的功能,并提供了一套全面的高阶消息传递功能工具箱。 + + + + 我们介绍了topox,一个提供可靠且用户友好的Python软件包套件,用于在拓扑域(扩展了图的领域)上进行计算和机器学习:超图、单纯、胞腔、路径和组合复合体。topox由三个软件包组成:toponetx用于构建和计算这些域,包括节点、边和高阶单元的处理;topoembedx提供了将拓扑域嵌入到向量空间的方法,类似于流行的基于图的嵌入算法,如node2vec;topomodelx建立在PyTorch之上,为拓扑域上的神经网络提供了一套全面的高阶消息传递功能工具箱。topox的源代码经过广泛的文档化和单元测试,并在https://github.com/pyt-team以MIT许可证的形式提供。 + + We introduce topox, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. topox consists of three packages: toponetx facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; topoembedx provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; topomodelx is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of topox is available under MIT license at https://github.com/pyt-team. + +[^26]: 动态增量优化用于最佳子集选择 + + Dynamic Incremental Optimization for Best Subset Selection + + [https://arxiv.org/abs/2402.02322](https://arxiv.org/abs/2402.02322) + + 本文研究了一类$\ell_0$正则化问题的对偶形式,并提出了一种高效的原对偶算法,通过充分利用对偶范围估计和增量策略,提高了最佳子集选择问题的解决方案的效率和统计性质。 + + + + 最佳子集选择被认为是稀疏学习问题的“黄金标准”。已经提出了各种优化技术来攻击这个非光滑非凸问题。本文研究了一类$\ell_0$正则化问题的对偶形式。基于原始问题和对偶问题的结构,我们提出了一种高效的原对偶算法。通过充分利用对偶范围估计和增量策略,我们的算法潜在地减少了冗余计算并改进了最佳子集选择的解决方案。理论分析和对合成和真实数据集的实验验证了所提出解决方案的效率和统计性质。 + + Best subset selection is considered the `gold standard' for many sparse learning problems. A variety of optimization techniques have been proposed to attack this non-smooth non-convex problem. In this paper, we investigate the dual forms of a family of $\ell_0$-regularized problems. An efficient primal-dual algorithm is developed based on the primal and dual problem structures. By leveraging the dual range estimation along with the incremental strategy, our algorithm potentially reduces redundant computation and improves the solutions of best subset selection. Theoretical analysis and experiments on synthetic and real-world datasets validate the efficiency and statistical properties of the proposed solutions. + +[^27]: GD-CAF:用于降水预报的图形双流卷积注意力融合 + + GD-CAF: Graph Dual-stream Convolutional Attention Fusion for Precipitation Nowcasting + + [https://arxiv.org/abs/2401.07958](https://arxiv.org/abs/2401.07958) + + GD-CAF提出了一种新颖的方法,将降水预报作为一个时空图序列预报问题,利用图形双流卷积注意力融合来学习历史降水图并在不同空间位置上预测未来的降水。 + + + + 精确的降水预报对于各种应用至关重要,包括洪水预测、灾害管理、优化农业活动、管理交通路线和可再生能源。本文将降水预报形式化为时空图序列预报问题,提出了一种名为图形双流卷积注意力融合(GD-CAF)的新方法,旨在从历史降水图的时空图中学习,并预测未来不同空间位置的降水。 + + arXiv:2401.07958v2 Announce Type: replace Abstract: Accurate precipitation nowcasting is essential for various applications, including flood prediction, disaster management, optimizing agricultural activities, managing transportation routes and renewable energy. While several studies have addressed this challenging task from a sequence-to-sequence perspective, most of them have focused on a single area without considering the existing correlation between multiple disjoint regions. In this paper, we formulate precipitation nowcasting as a spatiotemporal graph sequence nowcasting problem. In particular, we introduce Graph Dual-stream Convolutional Attention Fusion (GD-CAF), a novel approach designed to learn from historical spatiotemporal graph of precipitation maps and nowcast future time step ahead precipitation at different spatial locations. GD-CAF consists of spatio-temporal convolutional attention as well as gated fusion modules which are equipped with depthwise-separable convolut + +[^28]: 使用PAC-Bayes界限同时控制多个错误 + + Controlling Multiple Errors Simultaneously with a PAC-Bayes Bound + + [https://arxiv.org/abs/2202.05560](https://arxiv.org/abs/2202.05560) + + 该研究提出了一种PAC-Bayes界限,能够同时控制多个错误,并提供丰富的信息,适用于回归中测试损失分布或分类中不同错误分类的概率。 + + + + 当前的PAC-Bayes泛化界限仅限于性能的标量度量,如损失或错误率。我们提供了第一个能够提供丰富信息的PAC-Bayes界限,通过界定一组M种错误类型的经验概率与真实概率之间的Kullback-Leibler差异来控制可能结果的整个分布。 + + arXiv:2202.05560v2 Announce Type: replace-cross Abstract: Current PAC-Bayes generalisation bounds are restricted to scalar metrics of performance, such as the loss or error rate. However, one ideally wants more information-rich certificates that control the entire distribution of possible outcomes, such as the distribution of the test loss in regression, or the probabilities of different mis classifications. We provide the first PAC-Bayes bound capable of providing such rich information by bounding the Kullback-Leibler divergence between the empirical and true probabilities of a set of M error types, which can either be discretized loss values for regression, or the elements of the confusion matrix (or a partition thereof) for classification. We transform our bound into a differentiable training objective. Our bound is especially useful in cases where the severity of different mis-classifications may change over time; existing PAC-Bayes bounds can only bound a particular pre-decided w + +[^29]: 循环模型中含有隐藏因变量的因果发现方法的比较研究 + + Comparative Study of Causal Discovery Methods for Cyclic Models with Hidden Confounders. (arXiv:2401.13009v1 [cs.LG]) + + [http://arxiv.org/abs/2401.13009](http://arxiv.org/abs/2401.13009) + + 对于循环模型中含有隐藏因变量的因果发现,已经出现了能够处理这种情况的多种技术方法。 + + + + 如今,对因果发现的需求无处不在。理解系统中部分之间的随机依赖性以及实际的因果关系对科学的各个部分都至关重要。因此,寻找可靠的方法来检测因果方向的需求不断增长。在过去的50年里,出现了许多因果发现算法,但大多数仅适用于系统没有反馈环路并且具有因果充分性的假设,即没有未测量的子系统能够影响多个已测量变量。这是不幸的,因为这些限制在实践中往往不能假定。反馈是许多过程的一个重要特性,现实世界的系统很少是完全隔离和完全测量的。幸运的是,在最近几年中,已经发展了几种能够处理循环的、因果不充分的系统的技术。随着多种方法的出现,一种实际的应用方法开始变得可能。 + + Nowadays, the need for causal discovery is ubiquitous. A better understanding of not just the stochastic dependencies between parts of a system, but also the actual cause-effect relations, is essential for all parts of science. Thus, the need for reliable methods to detect causal directions is growing constantly. In the last 50 years, many causal discovery algorithms have emerged, but most of them are applicable only under the assumption that the systems have no feedback loops and that they are causally sufficient, i.e. that there are no unmeasured subsystems that can affect multiple measured variables. This is unfortunate since those restrictions can often not be presumed in practice. Feedback is an integral feature of many processes, and real-world systems are rarely completely isolated and fully measured. Fortunately, in recent years, several techniques, that can cope with cyclic, causally insufficient systems, have been developed. And with multiple methods available, a practical ap + +[^30]: 二进制特征屏蔽优化用于特征选择 + + Binary Feature Mask Optimization for Feature Selection. (arXiv:2401.12644v1 [cs.LG]) + + [http://arxiv.org/abs/2401.12644](http://arxiv.org/abs/2401.12644) + + 这个论文提出了一种新颖的特征选择框架,通过使用特征屏蔽方法来消除特征,而不是从数据集中移除它们。这种方法不需要重新训练机器学习模型,可以综合考虑特征子集的重要性,为通用机器学习模型的特征选择问题提供了一种新的解决方案。 + + + + 我们研究了通用机器学习模型的特征选择问题。我们引入了一种新颖的框架,该框架考虑了模型的预测结果来选择特征。我们的框架通过使用一种新颖的特征屏蔽方法,在特征选择过程中消除特征,而不是从数据集中完全移除它们。这使我们能够在特征选择过程中使用相同的机器学习模型,而不像其他特征选择方法那样需要在每次迭代中重新训练机器学习模型,因为数据集的维度不同。我们使用机器学习模型的预测结果来获取屏蔽操作符,这为模型的预测性能提供了对特征子集的全面观察。特征选择文献中存在各种方法。然而,没有研究引入一个针对通用机器学习模型的无需训练的框架,以整体考虑特征子集的重要性,而不是只关注单个特征的重要性。 + + We investigate feature selection problem for generic machine learning (ML) models. We introduce a novel framework that selects features considering the predictions of the model. Our framework innovates by using a novel feature masking approach to eliminate the features during the selection process, instead of completely removing them from the dataset. This allows us to use the same ML model during feature selection, unlike other feature selection methods where we need to train the ML model again as the dataset has different dimensions on each iteration. We obtain the mask operator using the predictions of the ML model, which offers a comprehensive view on the subsets of the features essential for the predictive performance of the model. A variety of approaches exist in the feature selection literature. However, no study has introduced a training-free framework for a generic ML model to select features while considering the importance of the feature subsets as a whole, instead of focusi + +[^31]: xTrimoPGLM: 统一的百亿规模预训练蛋白质语言模型,用于解析蛋白质的语言 + + xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein. (arXiv:2401.06199v1 [q-bio.QM]) + + [http://arxiv.org/abs/2401.06199](http://arxiv.org/abs/2401.06199) + + xTrimoPGLM是一个统一的100亿规模预训练蛋白质语言模型,能够同时处理蛋白质理解和生成任务,通过创新的预训练框架和大规模的参数训练,显著优于其他先进模型,在18个蛋白理解基准测试中取得了成功,并能够实现对蛋白质结构的原子分辨率观察。 + + + + 蛋白质语言模型在学习蛋白质序列中的生物信息方面显示出显著的成功。然而,大多数现有模型局限于自编码或自回归的预训练目标,这使得它们在处理蛋白质理解和生成任务时很难同时进行。我们提出了一个统一的蛋白质语言模型,xTrimoPGLM,通过创新的预训练框架同时解决这两类任务。我们的关键技术贡献是探索这两类目标的兼容性和联合优化的潜力,从而导致了一个以前所未有的规模,使用1000亿参数和1万亿训练标记来训练xTrimoPGLM的策略。我们广泛的实验证明,1)xTrimoPGLM在四个类别的18个蛋白理解基准测试中明显优于其他先进基线。该模型还有助于对蛋白质结构进行原子分辨率的观察,从而实现了对蛋白质结构的理解和生成。 + + Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework. Our key technical contribution is an exploration of the compatibility and the potential for joint optimization of the two types of objectives, which has led to a strategy for training xTrimoPGLM at an unprecedented scale of 100 billion parameters and 1 trillion training tokens. Our extensive experiments reveal that 1) xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. The model also facilitates an atomic-resolution view of protein structures, leading to + +[^32]: 使用特征变换进行交通市场利率预测 + + Transportation Market Rate Forecast Using Signature Transform. (arXiv:2401.04857v1 [cs.LG]) + + [http://arxiv.org/abs/2401.04857](http://arxiv.org/abs/2401.04857) + + 本论文提出了一种基于特征变换的新型统计方法,用于解决交通市场利率的预测挑战。该方法具有通用的非线性属性和特征变换核函数,能够高效生成特征,并在预测过程中准确识别季节性和制度转换。 + + + + 目前,亚马逊在交通市场利率预测上依赖第三方,尽管这些预测质量差且缺乏可解释性。虽然交通市场利率通常很难准确预测,但我们开发了一种基于特征变换的新型统计技术来解决这些挑战,并构建了一个预测和自适应模型来预测市场利率。这种新技术基于特征变换的两个关键属性。第一个是其通用的非线性,它线性化特征空间,从而将预测问题转化为线性回归分析;第二个是特征变换核函数,它允许在时间序列数据之间进行计算有效的相似性比较。结合起来,这些属性允许进行高效的特征生成,并在预测过程中更精确地识别季节性和制度转换。模型的初步结果显示,这种新方法可以改善市场利率的预测性能。 + + Currently, Amazon relies on third parties for transportation marketplace rate forecasts, despite the poor quality and lack of interpretability of these forecasts. While transportation marketplace rates are typically very challenging to forecast accurately, we have developed a novel signature-based statistical technique to address these challenges and built a predictive and adaptive model to forecast marketplace rates. This novel technique is based on two key properties of the signature transform. The first is its universal nonlinearity which linearizes the feature space and hence translates the forecasting problem into a linear regression analysis; the second is the signature kernel which allows for comparing computationally efficiently similarities between time series data. Combined, these properties allow for efficient feature generation and more precise identification of seasonality and regime switching in the forecasting process. Preliminary result by the model shows that this new + +[^33]: 在生成式人工智能时代的物联网: 视野与挑战 + + IoT in the Era of Generative AI: Vision and Challenges. (arXiv:2401.01923v1 [cs.DC]) + + [http://arxiv.org/abs/2401.01923](http://arxiv.org/abs/2401.01923) + + 在生成式人工智能时代的物联网,Generative AI的进展带来了巨大的希望,同时也面临着高资源需求、及时工程、设备端推理、安全等关键挑战。 + + + + 带有感知、网络和计算能力的物联网设备,如智能手机、可穿戴设备、智能音箱和家庭机器人,已经无缝地融入到我们的日常生活中。最近生成式人工智能(Generative AI)的进展,如GPT、LLaMA、DALL-E和稳定扩散等,给物联网的发展带来了巨大的希望。本文分享了我们对Generative AI在物联网中带来的好处的看法和愿景,并讨论了Generative AI在物联网相关领域的一些重要应用。充分利用Generative AI在物联网中是一个复杂的挑战。我们确定了一些最关键的挑战,包括Generative AI模型的高资源需求、及时工程、设备端推理、卸载、设备端微调、联邦学习、安全以及开发工具和基准,并讨论了当前存在的差距以及使Generative AI在物联网中实现的有希望的机会。我们希望这篇文章能够激发新的研究和创新。 + + Equipped with sensing, networking, and computing capabilities, Internet of Things (IoT) such as smartphones, wearables, smart speakers, and household robots have been seamlessly weaved into our daily lives. Recent advancements in Generative AI exemplified by GPT, LLaMA, DALL-E, and Stable Difussion hold immense promise to push IoT to the next level. In this article, we share our vision and views on the benefits that Generative AI brings to IoT, and discuss some of the most important applications of Generative AI in IoT-related domains. Fully harnessing Generative AI in IoT is a complex challenge. We identify some of the most critical challenges including high resource demands of the Generative AI models, prompt engineering, on-device inference, offloading, on-device fine-tuning, federated learning, security, as well as development tools and benchmarks, and discuss current gaps as well as promising opportunities on enabling Generative AI for IoT. We hope this article can inspire new res + +[^34]: 在部署时进行实时调节:用于单机器人部署的行为调控 + + Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment. (arXiv:2311.01059v1 [cs.RO]) + + [http://arxiv.org/abs/2311.01059](http://arxiv.org/abs/2311.01059) + + 本研究提出了一种名为ROAM的方法,通过利用先前学习到的行为来实时调节机器人在部署过程中应对未曾见过的情况。在测试中,ROAM可以在单个阶段内实现快速适应,并且在模拟环境和真实场景中取得了成功,具有较高的效率和适应性。 + + + + 为了在现实世界中取得成功,机器人必须应对训练过程中未曾见过的情况。本研究探讨了在部署过程中针对这些新场景的实时调节问题,通过利用先前学习到的多样化行为库。我们的方法,RObust Autonomous Modulation(ROAM),引入了基于预训练行为的感知价值的机制,以在特定情况下选择和调整预训练行为。关键是,这种调节过程在测试时的单个阶段内完成,无需任何人类监督。我们对选择机制进行了理论分析,并证明了ROAM使得机器人能够在模拟环境和真实的四足动物Go1上快速适应动态变化,甚至在脚上套着滚轮滑鞋的情况下成功前进。与现有方法相比,我们的方法在面对各种分布情况的部署时能够以超过2倍的效率进行调节,通过有效选择来实现适应。 + + To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previously learned behaviors. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors to select and adapt pre-trained behaviors to the situation at hand. Crucially, this adaptation process all happens within a single episode at test time, without any human supervision. We provide theoretical analysis of our selection mechanism and demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet. Our approach adapts over 2x as efficiently compared to existing methods when facing a variety of out-of-distribution situations during deployment by effectively choosing + +[^35]: 在通信网络中学习增强状态策略进行信息路由 + + Learning State-Augmented Policies for Information Routing in Communication Networks. (arXiv:2310.00248v2 [cs.NI] UPDATED) + + [http://arxiv.org/abs/2310.00248](http://arxiv.org/abs/2310.00248) + + 本论文研究了在通信网络中的信息路由问题,提出了一种新颖的状态增强策略,通过部署图神经网络架构,利用图卷积来最大化源节点的聚合信息,从而有效地将所需信息路由到目标节点。 + + + + 本文研究了在大规模通信网络中的信息路由问题,该问题可以被形式化为一个只能访问局部信息的约束统计学习问题。我们提出了一种新颖的状态增强(SA)策略,通过在通信网络的拓扑链路上部署图神经网络(GNN)架构,利用图卷积来最大化源节点的聚合信息。所提出的技术仅利用每个节点上的局部信息,并有效地将所需的信息路由到目标节点。我们利用无监督学习过程将GNN架构的输出转换为最优的信息路由策略。实验中,我们对实时网络拓扑进行评估,以验证我们的算法。数值仿真结果显示出与基线算法相比,所提出的方法在训练GNN参数化方面的性能有所提高。 + + This paper examines the problem of information routing in a large-scale communication network, which can be formulated as a constrained statistical learning problem having access to only local information. We delineate a novel State Augmentation (SA) strategy to maximize the aggregate information at source nodes using graph neural network (GNN) architectures, by deploying graph convolutions over the topological links of the communication network. The proposed technique leverages only the local information available at each node and efficiently routes desired information to the destination nodes. We leverage an unsupervised learning procedure to convert the output of the GNN architecture to optimal information routing strategies. In the experiments, we perform the evaluation on real-time network topologies to validate our algorithms. Numerical simulations depict the improved performance of the proposed method in training a GNN parameterization as compared to baseline algorithms. + +[^36]: 深度学习分类器性能的综合评估揭示出惊人的缺乏稳定性 + + Comprehensive Assessment of the Performance of Deep Learning Classifiers Reveals a Surprising Lack of Robustness. (arXiv:2308.04137v1 [cs.LG]) + + [http://arxiv.org/abs/2308.04137](http://arxiv.org/abs/2308.04137) + + 通过综合评估深度学习分类器的性能,发现它们缺乏稳定性和可靠性,并建议采用广泛的数据类型和统一的评估指标进行性能基准测试。 + + + + 可靠而稳健的评估方法是开发本身稳健可靠的机器学习模型的必要第一步。然而,目前用于评估分类器的常规评估协议在综合评估性能方面存在不足,因为它们往往依赖于有限类型的测试数据,忽视其他类型的数据。例如,使用标准测试数据无法评估分类器对于未经训练的类别样本的预测。另一方面,使用包含未知类别样本的数据进行测试无法评估分类器对于已知类别标签的预测能力。本文提倡使用各种不同类型的数据进行性能基准测试,并使用一种可应用于所有这些数据类型的单一指标,以产生一致的性能评估结果。通过这样的基准测试发现,目前的深度神经网络,包括使用认为是全面的方法进行训练的网络,也存在缺乏稳定性的问题。 + + Reliable and robust evaluation methods are a necessary first step towards developing machine learning models that are themselves robust and reliable. Unfortunately, current evaluation protocols typically used to assess classifiers fail to comprehensively evaluate performance as they tend to rely on limited types of test data, and ignore others. For example, using the standard test data fails to evaluate the predictions made by the classifier to samples from classes it was not trained on. On the other hand, testing with data containing samples from unknown classes fails to evaluate how well the classifier can predict the labels for known classes. This article advocates bench-marking performance using a wide range of different types of data and using a single metric that can be applied to all such data types to produce a consistent evaluation of performance. Using such a benchmark it is found that current deep neural networks, including those trained with methods that are believed to pro + +[^37]: 你的房间不是私密的:关于强化学习的梯度反转攻击 + + Your Room is not Private: Gradient Inversion Attack on Reinforcement Learning. (arXiv:2306.09273v2 [cs.RO] UPDATED) + + [http://arxiv.org/abs/2306.09273](http://arxiv.org/abs/2306.09273) + + 这篇论文提出了一种针对值函数算法和梯度算法的攻击方法,利用梯度反转重建状态、动作和监督信号,以解决嵌入式人工智能中的隐私泄露问题。 + + + + 嵌入式人工智能的显著发展吸引了人们的极大关注,该技术使得机器人可以在虚拟环境中导航、感知和互动。由于计算机视觉和大型语言模型方面的显著进展,隐私问题在嵌入式人工智能领域变得至关重要,因为机器人可以访问大量个人信息。然而,关于强化学习算法中的隐私泄露问题,尤其是关于值函数算法和梯度算法的问题,在研究中尚未得到充分考虑。本文旨在通过提出一种攻击值函数算法和梯度算法的方法,利用梯度反转重建状态、动作和监督信号,来解决这一问题。选择使用梯度进行攻击是因为常用的联邦学习技术仅利用基于私人用户数据计算的梯度来优化模型,而不存储或传输用户数据。 + + The prominence of embodied Artificial Intelligence (AI), which empowers robots to navigate, perceive, and engage within virtual environments, has attracted significant attention, owing to the remarkable advancements in computer vision and large language models. Privacy emerges as a pivotal concern within the realm of embodied AI, as the robot accesses substantial personal information. However, the issue of privacy leakage in embodied AI tasks, particularly in relation to reinforcement learning algorithms, has not received adequate consideration in research. This paper aims to address this gap by proposing an attack on the value-based algorithm and the gradient-based algorithm, utilizing gradient inversion to reconstruct states, actions, and supervision signals. The choice of using gradients for the attack is motivated by the fact that commonly employed federated learning techniques solely utilize gradients computed based on private user data to optimize models, without storing or trans + +[^38]: 基于生成扩散模型的癫痫预测数据增强方法 + + Data Augmentation for Seizure Prediction with Generative Diffusion Model. (arXiv:2306.08256v1 [eess.SP]) + + [http://arxiv.org/abs/2306.08256](http://arxiv.org/abs/2306.08256) + + 该论文提出了一种基于扩散模型的数据增强方法DiffEEG,可以有效地提高癫痫预测的性能,超过了现有的数据扩增方法。 + + + + 目标:癫痫预测对于改善患者生活质量具有重要意义,重点在于区分发作前状态与发作后状态。随着机器学习的发展,癫痫预测方法取得了显著进展。然而,发作前与发作后状态数据之间的严重不平衡仍然是一个巨大的挑战,限制了分类器的性能。数据扩增是解决这个问题的一个直观方法。现有的数据扩增方法通过重叠或重新组合数据来生成样本。由于这些转换无法完全探索特征空间并提供新信息,所以生成的样本分布受到原始数据的限制。由于癫痫脑电图表示在不同发作之间具有差异性,这些生成的样本不能提供足够的多样性以在新的癫痫发作中实现高性能。因此,我们提出了一种使用扩散模型的新型数据增强方法DiffEEG。方法:扩散模型是一种建模数据分布的强大工具,我们使用此模型来对原始脑电图数据进行转换以生成多样性的样本,进而提高分类器的性能。结果:DiffEEG在神经网络和SVM模型上进行的实验表明,它可以有效地提高癫痫预测的性能,超过了现有的数据扩增方法。 + + Objective: Seizure prediction is of great importance to improve the life of patients. The focal point is to distinguish preictal states from interictal ones. With the development of machine learning, seizure prediction methods have achieved significant progress. However, the severe imbalance problem between preictal and interictal data still poses a great challenge, restricting the performance of classifiers. Data augmentation is an intuitive way to solve this problem. Existing data augmentation methods generate samples by overlapping or recombining data. The distribution of generated samples is limited by original data, because such transformations cannot fully explore the feature space and offer new information. As the epileptic EEG representation varies among seizures, these generated samples cannot provide enough diversity to achieve high performance on a new seizure. As a consequence, we propose a novel data augmentation method with diffusion model called DiffEEG. Methods: Diffusi + +[^39]: 利用三重指数移动平均值实现快速自适应矩估计 + + Leveraging the Triple Exponential Moving Average for Fast-Adaptive Moment Estimation. (arXiv:2306.01423v1 [cs.CV]) + + [http://arxiv.org/abs/2306.01423](http://arxiv.org/abs/2306.01423) + + 本文提出了一种新的深度优化器FAME,使用三重指数移动平均值(TEMA)来估计梯度矩,提供更丰富和准确的数据变化和趋势信息,可以提高计算机视觉等领域中模型的性能表现。 + + + + 网络优化是深度学习领域中的一个关键步骤,直接影响计算机视觉等多种领域中模型的性能。虽然多种优化器已经被开发出来,但目前的方法在准确快速地识别梯度趋势方面仍然有限,这可能会导致网络性能不佳。本文提出了一种新的深度优化器,称为快速自适应矩估计(FAME),它首次使用三重指数移动平均值(TEMA)来估计梯度矩。将TEMA纳入优化过程中,可以提供更丰富和准确的数据变化和趋势信息,与目前所有主要自适应优化方法中使用的标准指数移动平均值相比。我们提出的FAME优化器已经在广泛的基准测试中得到了验证,包括CIFAR-10,CIFAR-100,PASCAL-VOC,MS-COCO和Cityscapes。 + + Network optimization is a crucial step in the field of deep learning, as it directly affects the performance of models in various domains such as computer vision. Despite the numerous optimizers that have been developed over the years, the current methods are still limited in their ability to accurately and quickly identify gradient trends, which can lead to sub-optimal network performance. In this paper, we propose a novel deep optimizer called Fast-Adaptive Moment Estimation (FAME), which for the first time estimates gradient moments using a Triple Exponential Moving Average (TEMA). Incorporating TEMA into the optimization process provides richer and more accurate information on data changes and trends, as compared to the standard Exponential Moving Average used in essentially all current leading adaptive optimization methods. Our proposed FAME optimizer has been extensively validated through a wide range of benchmarks, including CIFAR-10, CIFAR-100, PASCAL-VOC, MS-COCO, and Cityscap + +[^40]: 基于贝叶斯分类器的特征最优分区研究 + + Optimal partition of feature using Bayesian classifier. (arXiv:2304.14537v1 [cs.LG]) + + [http://arxiv.org/abs/2304.14537](http://arxiv.org/abs/2304.14537) + + 本文通过提出一种名为“共单调独立分类器”(CIBer)的新技术,专注于特征的最优分区,旨在克服朴素贝叶斯方法带来的挑战,并且证明该技术在不同数据集上具有更高的准确率和更低的错误率。 + + + + 朴素贝叶斯分类器是一种应用贝叶斯原理的流行分类方法,尽管输入变量之间的条件依赖关系听起来很好,但实际上会导致大多数投票风格的行为。朴素贝叶斯算法中的某些特征被称为独立特征,因为在预测分类时它们没有条件相关性或依赖性。本文通过提出一种名为“共单调独立分类器”(CIBer)的新技术,专注于特征的最优分区,旨在克服朴素贝叶斯方法带来的挑战。在不同的数据集上,我们明确证明了我们的技术的有效性,在错误率更低、准确率更高或相当的情况下,与随机森林和XGBoost等模型相比。 + + The Naive Bayesian classifier is a popular classification method employing the Bayesian paradigm. The concept of having conditional dependence among input variables sounds good in theory but can lead to a majority vote style behaviour. Achieving conditional independence is often difficult, and they introduce decision biases in the estimates. In Naive Bayes, certain features are called independent features as they have no conditional correlation or dependency when predicting a classification. In this paper, we focus on the optimal partition of features by proposing a novel technique called the Comonotone-Independence Classifier (CIBer) which is able to overcome the challenges posed by the Naive Bayes method. For different datasets, we clearly demonstrate the efficacy of our technique, where we achieve lower error rates and higher or equivalent accuracy compared to models such as Random Forests and XGBoost. + +[^41]: 利用离线数据加速程序生成环境中的强化学习 + + Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments. (arXiv:2304.09825v1 [cs.LG]) + + [http://arxiv.org/abs/2304.09825](http://arxiv.org/abs/2304.09825) + + 本研究旨在提高程序生成环境中强化学习的样本效率。研究证明,使用模仿学习进行预训练和同时进行模仿学习和在线强化学习的方法可以提高效率。 + + + + 强化学习面临的主要挑战之一是代理能够将其学习策略推广到未见过的环境中。此外,训练强化学习代理需要与环境进行大量交互。受离线强化学习和模仿学习的最近成功启发,我们进行了一项研究,以调查代理是否可以利用轨迹的离线数据来提高程序生成环境中的样本效率。我们考虑了两种使用离线数据的模仿学习方法:(1)在在线强化学习训练之前预训练策略和(2)同时训练在线强化学习和来自离线数据的模仿学习。我们分析了可用的离线轨迹的质量(轨迹的最佳性)和多样性(轨迹数量和覆盖级别)对两种方法有效性的影响。在MiniGrid环境中的四个知名稀疏奖励任务中,我们发现使用模仿学习进行预训练和同时进行模仿学习和在线强化学习的方法可以提供更高的样本效率。 + + One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently d + +[^42]: 无需边缘但具有结构感知性:从GNN到MLP的原型引导知识蒸馏。 + + Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs. (arXiv:2303.13763v1 [cs.LG]) + + [http://arxiv.org/abs/2303.13763](http://arxiv.org/abs/2303.13763) + + 本文提出了一种原型引导知识蒸馏(PGKD)方法,它不需要图形边缘,但可以在不考虑边缘的情况下学习结构感知的MLP。 + + + + 将高精度的图神经网络(GNN)在图任务中压缩成低延迟的多层感知器(MLP)已成为热门研究课题。以前的方法会将图的边缘处理成额外的输入给MLP,但这样的图结构对于各种场景可能无法获得。因此,我们提出了一种原型引导知识蒸馏(PGKD)方法,它不需要图形边缘,但可以在不考虑边缘的情况下学习结构感知的MLP。具体而言,我们分析了GNN教师中的图形结构信息,并通过原型在无边缘设置中从GNN到MLP进行了知识蒸馏。在流行的图形基准实验中的实验结果表明了所提出的PGKD方法的有效性和鲁棒性。 + + Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely exclusively on the node features and fail to capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose a Prototype-Guided Knowledge Distillation~(PGKD) method, which does not require graph edges~(edge-free) yet learns structure-aware MLPs. Specifically, we analyze the graph structural information in GNN teachers, and distill such information from GNNs to MLPs via prototypes in an edge-free setting. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD. + +[^43]: 重新审视DeepFool:泛化和改进 + + Revisiting DeepFool: generalization and improvement. (arXiv:2303.12481v1 [cs.LG]) + + [http://arxiv.org/abs/2303.12481](http://arxiv.org/abs/2303.12481) + + 本文提出了一种新的对抗性攻击,该攻击是广义了DeepFool攻击,既有效又计算效率高,适用于评估大型深度神经网络的鲁棒性。 + + + + 深度神经网络被已知容易受到对抗样本的攻击,这些输入稍加修改便会导致网络做出错误的预测。这导致了大量研究,以评估这些网络对此类扰动的鲁棒性度量。最小l2对抗扰动的鲁棒性,是一种特别重要的鲁棒性度量。然而,现有的用于评估此类鲁棒性度量的方法,要么计算成本高,要么不太准确。在本文中,我们引入了一种新的对抗性攻击方法,它在效果和计算效率之间保持平衡。我们提出的攻击是广义了深度欺骗(DeepFool)攻击,但它们仍然易于理解和实现。我们展示了我们的攻击在效果和计算效率方面均优于现有方法。我们提出的攻击也适用于评估大型深度神经网络的鲁棒性。 + + Deep neural networks have been known to be vulnerable to adversarial examples, which are inputs that are modified slightly to fool the network into making incorrect predictions. This has led to a significant amount of research on evaluating the robustness of these networks against such perturbations. One particularly important robustness metric is the robustness to minimal l2 adversarial perturbations. However, existing methods for evaluating this robustness metric are either computationally expensive or not very accurate. In this paper, we introduce a new family of adversarial attacks that strike a balance between effectiveness and computational efficiency. Our proposed attacks are generalizations of the well-known DeepFool (DF) attack, while they remain simple to understand and implement. We demonstrate that our attacks outperform existing methods in terms of both effectiveness and computational efficiency. Our proposed attacks are also suitable for evaluating the robustness of large + +[^44]: 语言控制扩散:通过空间、时间和任务高效扩展 + + Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks. (arXiv:2210.15629v2 [cs.LG] UPDATED) + + [http://arxiv.org/abs/2210.15629](http://arxiv.org/abs/2210.15629) + + 本文提出一种利用语言控制扩散模型的分层规划器,有效而高效地扩展扩散模型,解决长时间跨度自然语言指令下的控制问题,实现了较高的单任务和多任务成功率,并极大地提高计算效率。 + + + + 训练通用型智能体在各个方面都很困难,需要处理高维输入(空间)、长时间跨度(时间)和多个新任务。最近的结构方面的进展使得我们可以沿着其中一个或两个维度提高扩展性能力,但计算成本仍然很高。本文提出使用语言控制扩散模型作为一种基于自然语言条件的分层规划器(LCD)来应对这三个方面。我们有效而高效地扩展扩散模型,以应对时间、状态和任务空间维度的长时间跨度控制问题。我们在CALVIN语言机器人基准测试中将LCD与其他最先进的模型进行比较,发现LCD在多任务成功率方面优于其他最先进的方法,而单任务成功率(SR)为88.7%,远高于以前的最佳成绩82.6%,大大提高了计算效率。 + + Training generalist agents is difficult across several axes, requiring us to deal with high-dimensional inputs (space), long horizons (time), and multiple and new tasks. Recent advances with architectures have allowed for improved scaling along one or two of these dimensions, but are still prohibitive computationally. In this paper, we propose to address all three axes by leveraging Language to Control Diffusion models as a hierarchical planner conditioned on language (LCD). We effectively and efficiently scale diffusion models for planning in extended temporal, state, and task dimensions to tackle long horizon control problems conditioned on natural language instructions. We compare LCD with other state-of-the-art models on the CALVIN language robotics benchmark and find that LCD outperforms other SOTA methods in multi task success rates while dramatically improving computational efficiency with a single task success rate (SR) of 88.7% against the previous best of 82.6%. We show that diff --git a/cs.LG.xml b/cs.LG.xml index 364b9519c..cdbb93d09 100644 --- a/cs.LG.xml +++ b/cs.LG.xml @@ -1,241 +1,881 @@ -Chat Arxiv cs.LGhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.LG本文通过引入改进的深度卷积生成对抗网络(mDCGAN),针对高质量艺术品生成进行了研究,解决了普遍训练问题,有效探索抽象绘画中的颜色和笔触模式。https://arxiv.org/abs/2403.18397<p> -使用改进的深度卷积生成对抗网络在抽象艺术中进行颜色和笔触模式识别 +Chat Arxiv cs.LGhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for cs.LG通过核大小缩放提高嵌入式脉冲神经网络准确性的方法学在实验中表现出更高的准确性。https://arxiv.org/abs/2404.01685<p> +通过核大小缩放提高嵌入式脉冲神经网络准确性的方法学 </p> <p> -Colour and Brush Stroke Pattern Recognition in Abstract Art using Modified Deep Convolutional Generative Adversarial Networks +A Methodology for Improving Accuracy of Embedded Spiking Neural Networks through Kernel Size Scaling </p> <p> -https://arxiv.org/abs/2403.18397 +https://arxiv.org/abs/2404.01685 </p> <p> -本文通过引入改进的深度卷积生成对抗网络(mDCGAN),针对高质量艺术品生成进行了研究,解决了普遍训练问题,有效探索抽象绘画中的颜色和笔触模式。 +通过核大小缩放提高嵌入式脉冲神经网络准确性的方法学在实验中表现出更高的准确性。 </p> <p> </p> <p> -抽象艺术是一种广受欢迎、被广泛讨论的艺术形式,通常能够描绘出艺术家的情感。许多研究人员尝试使用机器学习和深度学习的边缘检测、笔触和情感识别算法来研究抽象艺术。本文描述了使用生成对抗神经网络(GAN)对广泛分布的抽象绘画进行研究。 GAN具有学习和再现分布的能力,使研究人员能够有效地探索和研究生成的图像空间。然而,挑战在于开发一种能够克服常见训练问题的高效GAN架构。本文通过引入专门设计用于高质量艺术品生成的改进DCGAN(mDCGAN)来解决这一挑战。该方法涉及对所做修改的深入探讨,深入研究DCGAN的复杂工作。 +脉冲神经网络(SNNs)由于其稀疏的基于脉冲的操作而能为基于机器学习的应用提供超低功耗/能耗。目前,大多数SNN架构需要更大的模型大小才能实现更高的准确性,这对资源受限的嵌入式应用不太适合。因此,迫切需要开发能够以可接受的内存占用实现高准确性的SNNs。为此,我们提出了一种通过核大小缩放提高SNNs准确性的新方法学。其关键步骤包括调查不同核大小对准确性的影响,设计新的核大小集合,基于选定的核大小生成SNN架构,并分析SNN模型选择的准确性-内存折衷。实验结果表明,我们的方法学在准确性方面优于最先进的方法(对于CIFAR10有93.24%的准确度) </p> <p> -arXiv:2403.18397v1 Announce Type: cross Abstract: Abstract Art is an immensely popular, discussed form of art that often has the ability to depict the emotions of an artist. Many researchers have made attempts to study abstract art in the form of edge detection, brush stroke and emotion recognition algorithms using machine and deep learning. This papers describes the study of a wide distribution of abstract paintings using Generative Adversarial Neural Networks(GAN). GANs have the ability to learn and reproduce a distribution enabling researchers and scientists to effectively explore and study the generated image space. However, the challenge lies in developing an efficient GAN architecture that overcomes common training pitfalls. This paper addresses this challenge by introducing a modified-DCGAN (mDCGAN) specifically designed for high-quality artwork generation. The approach involves a thorough exploration of the modifications made, delving into the intricate workings of DCGANs, opt -</p>本文提出了一种机器遗忘方法,通过最小化输入敏感度来抑制遗忘数据的贡献,并在实验中表现出优异的性能。https://arxiv.org/abs/2402.15109<p> -抑制样本贡献的机器遗忘 +arXiv:2404.01685v1 Announce Type: cross Abstract: Spiking Neural Networks (SNNs) can offer ultra low power/ energy consumption for machine learning-based applications due to their sparse spike-based operations. Currently, most of the SNN architectures need a significantly larger model size to achieve higher accuracy, which is not suitable for resource-constrained embedded applications. Therefore, developing SNNs that can achieve high accuracy with acceptable memory footprint is highly needed. Toward this, we propose a novel methodology that improves the accuracy of SNNs through kernel size scaling. Its key steps include investigating the impact of different kernel sizes on the accuracy, devising new sets of kernel sizes, generating SNN architectures based on the selected kernel sizes, and analyzing the accuracy-memory trade-offs for SNN model selection. The experimental results show that our methodology achieves higher accuracy than state-of-the-art (93.24% accuracy for CIFAR10 and 70 +</p>介绍了机器学习中的函数双层优化问题,提出了不依赖于强凸假设的方法,并展示了在仪表回归和强化学习任务中使用神经网络的优势。https://arxiv.org/abs/2403.20233<p> +机器学习中的函数双层优化 </p> <p> -Machine Unlearning by Suppressing Sample Contribution +Functional Bilevel Optimization for Machine Learning </p> <p> -https://arxiv.org/abs/2402.15109 +https://arxiv.org/abs/2403.20233 </p> <p> -本文提出了一种机器遗忘方法,通过最小化输入敏感度来抑制遗忘数据的贡献,并在实验中表现出优异的性能。 +介绍了机器学习中的函数双层优化问题,提出了不依赖于强凸假设的方法,并展示了在仪表回归和强化学习任务中使用神经网络的优势。 </p> <p> </p> <p> -机器遗忘(MU)是指从经过良好训练的模型中删除数据,这在实践中非常重要,因为涉及“被遗忘的权利”。本文从训练数据和未见数据对模型贡献的基本区别入手:训练数据对最终模型有贡献,而未见数据没有。我们理论上发现输入敏感度可以近似衡量贡献,并实际设计了一种算法,称为MU-Mis(通过最小化输入敏感度进行机器遗忘),来抑制遗忘数据的贡献。实验结果表明,MU-Mis明显优于最先进的MU方法。此外,MU-Mis与MU的应用更加密切,因为它不需要使用剩余数据。 +在本文中,我们介绍了针对机器学习中的双层优化问题的一种新的函数视角,其中内部目标在函数空间上被最小化。这些类型的问题通常通过在参数设置下开发的方法来解决,其中内部目标对于预测函数的参数强凸。函数视角不依赖于此假设,特别允许使用超参数化的神经网络作为内部预测函数。我们提出了可扩展和高效的算法来解决函数双层优化问题,并展示了我们方法在适合自然函数双层结构的仪表回归和强化学习任务上的优势。 </p> <p> -arXiv:2402.15109v1 Announce Type: new Abstract: Machine Unlearning (MU) is to forget data from a well-trained model, which is practically important due to the "right to be forgotten". In this paper, we start from the fundamental distinction between training data and unseen data on their contribution to the model: the training data contributes to the final model while the unseen data does not. We theoretically discover that the input sensitivity can approximately measure the contribution and practically design an algorithm, called MU-Mis (machine unlearning via minimizing input sensitivity), to suppress the contribution of the forgetting data. Experimental results demonstrate that MU-Mis outperforms state-of-the-art MU methods significantly. Additionally, MU-Mis aligns more closely with the application of MU as it does not require the use of remaining data. -</p>提出了一种名为混合条形码的新方法,利用标准持久同调与图像持久同调结合,可以量化任意维度两个点集之间的几何-拓扑相互作用,以及引入简单的统计量来量化这种相互作用的复杂性。https://arxiv.org/abs/2402.15058<p> -混合条形码:量化点云之间的几何-拓扑相互作用 +arXiv:2403.20233v1 Announce Type: cross Abstract: In this paper, we introduce a new functional point of view on bilevel optimization problems for machine learning, where the inner objective is minimized over a function space. These types of problems are most often solved by using methods developed in the parametric setting, where the inner objective is strongly convex with respect to the parameters of the prediction function. The functional point of view does not rely on this assumption and notably allows using over-parameterized neural networks as the inner prediction function. We propose scalable and efficient algorithms for the functional bilevel optimization problem and illustrate the benefits of our approach on instrumental regression and reinforcement learning tasks, which admit natural functional bilevel structures. +</p>Croissant是一种面向机器学习数据集的元数据格式,使数据集更易发现、可移植和互操作,有助于解决ML数据管理和负责任AI中的重要挑战。https://arxiv.org/abs/2403.19546<p> +Croissant:一种面向机器学习数据集的元数据格式 </p> <p> -Mixup Barcodes: Quantifying Geometric-Topological Interactions between Point Clouds +Croissant: A Metadata Format for ML-Ready Datasets </p> <p> -https://arxiv.org/abs/2402.15058 +https://arxiv.org/abs/2403.19546 </p> <p> -提出了一种名为混合条形码的新方法,利用标准持久同调与图像持久同调结合,可以量化任意维度两个点集之间的几何-拓扑相互作用,以及引入简单的统计量来量化这种相互作用的复杂性。 +Croissant是一种面向机器学习数据集的元数据格式,使数据集更易发现、可移植和互操作,有助于解决ML数据管理和负责任AI中的重要挑战。 </p> <p> </p> <p> -我们将标准持久同调与图像持久同调相结合,定义了一种新颖的表征形状和它们之间相互作用的方法。具体而言,我们介绍了:(1)混合条形码,捕捉任意维度两个点集之间的几何-拓扑相互作用(混合);(2)简单的总混合和总百分比混合统计量,作为一个单一数字来量化相互作用的复杂性;(3)一个用于操作上述工具的软件工具。作为一个概念验证,我们将该工具应用到一个源自机器学习的问题上。具体地,我们研究了不同类别嵌入的可分离性。结果表明,拓扑混合是一种用于表征低维和高维数据交互的有效方法。与持久同调的典型用法相比,这个新工具对于拓扑特征的几何位置更为敏感,这通常是可取的。 +数据是机器学习(ML)的关键资源,但处理数据仍然是一个主要的摩擦点。本文介绍了Croissant,一种用于数据集的元数据格式,简化了数据被ML工具和框架使用的方式。Croissant使数据集更易发现、可移植和互操作,从而解决了ML数据管理和负责任AI中的重要挑战。Croissant已得到几个流行数据集库的支持,涵盖数十万个数据集,可以加载到最流行的ML框架中。 </p> <p> -arXiv:2402.15058v1 Announce Type: cross Abstract: We combine standard persistent homology with image persistent homology to define a novel way of characterizing shapes and interactions between them. In particular, we introduce: (1) a mixup barcode, which captures geometric-topological interactions (mixup) between two point sets in arbitrary dimension; (2) simple summary statistics, total mixup and total percentage mixup, which quantify the complexity of the interactions as a single number; (3) a software tool for playing with the above. As a proof of concept, we apply this tool to a problem arising from machine learning. In particular, we study the disentanglement in embeddings of different classes. The results suggest that topological mixup is a useful method for characterizing interactions for low and high-dimensional data. Compared to the typical usage of persistent homology, the new tool is sensitive to the geometric locations of the topological features, which is often desirabl -</p>提出了一种在图上学习的简单替代方法,称为掩码注意力(MAG),其利用注意力矩阵来创建定制的注意力模式,在长距离任务上表现出色并胜过其他方法。https://arxiv.org/abs/2402.10793<p> -掩码注意力是图的关键 +arXiv:2403.19546v1 Announce Type: cross Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks. +</p>PRISM是一种算法,可以自动识别人类可解释且易传递的提示,从而有效生成所需概念,仅使用黑盒访问T2I模型。https://arxiv.org/abs/2403.19103<p> +用于个性化文本到图像生成的自动化黑盒提示工程 </p> <p> -Masked Attention is All You Need for Graphs +Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation </p> <p> -https://arxiv.org/abs/2402.10793 +https://arxiv.org/abs/2403.19103 </p> <p> -提出了一种在图上学习的简单替代方法,称为掩码注意力(MAG),其利用注意力矩阵来创建定制的注意力模式,在长距离任务上表现出色并胜过其他方法。 +PRISM是一种算法,可以自动识别人类可解释且易传递的提示,从而有效生成所需概念,仅使用黑盒访问T2I模型。 </p> <p> </p> <p> -图神经网络(GNNs)和消息传递算法的变种主要用于在图上学习,这在很大程度上归功于它们的灵活性、速度和令人满意的性能。然而,设计强大而通用的GNNs需要大量的研究工作,通常依赖于精心选择的手工制作的消息传递操作符。受此启发,我们提出了一种在图上学习的非常简单的替代方法,它完全依赖于注意力。图被表示为节点或边集,并通过掩码注意权重矩阵来强制它们的连接,有效地为每个图创建定制的注意力模式。尽管其简单性,用于图的掩码注意力(MAG)在长距离任务上表现出色,并在55多个节点和图级任务上优于强消息传递基线和更复杂的基于注意力的方法。 +提示工程对于控制文本到图像(T2I)生成模型的输出是有效的,但由于需要手动制作提示而导致工作繁重。这一挑战促使了自动提示生成算法的发展。然而,这些方法通常在T2I模型之间的可传递性方面遇到困难,需要对基础模型进行白盒访问,并产生非直观的提示。在这项工作中,我们介绍了PRISM,这是一种算法,可以仅使用黑盒访问T2I模型就自动识别人类可解释且易传递的提示,从而有效生成所需概念。受大型语言模型(LLM)越狱的启发,PRISM利用LLM的上下文学习能力来迭代地改进给定参考图像的候选提示分布。我们的实验展示了PRISM在为对象、样式等生成准确提示方面的多样性和有效性。 </p> <p> -arXiv:2402.10793v1 Announce Type: cross Abstract: Graph neural networks (GNNs) and variations of the message passing algorithm are the predominant means for learning on graphs, largely due to their flexibility, speed, and satisfactory performance. The design of powerful and general purpose GNNs, however, requires significant research efforts and often relies on handcrafted, carefully-chosen message passing operators. Motivated by this, we propose a remarkably simple alternative for learning on graphs that relies exclusively on attention. Graphs are represented as node or edge sets and their connectivity is enforced by masking the attention weight matrix, effectively creating custom attention patterns for each graph. Despite its simplicity, masked attention for graphs (MAG) has state-of-the-art performance on long-range tasks and outperforms strong message passing baselines and much more involved attention-based methods on over 55 node and graph-level tasks. We also show significantly -</p>该论文提出了一个简单而有效的框架SimMLP,通过在图上无监督学习MLPs,提高了在延迟敏感的应用中的泛化能力。https://arxiv.org/abs/2402.08918<p> -通过无监督在图上学习多层感知机(MLP)加速图推理 +arXiv:2403.19103v1 Announce Type: cross Abstract: Prompt engineering is effective for controlling the output of text-to-image (T2I) generative models, but it is also laborious due to the need for manually crafted prompts. This challenge has spurred the development of algorithms for automated prompt generation. However, these methods often struggle with transferability across T2I models, require white-box access to the underlying model, and produce non-intuitive prompts. In this work, we introduce PRISM, an algorithm that automatically identifies human-interpretable and transferable prompts that can effectively generate desired concepts given only black-box access to T2I models. Inspired by large language model (LLM) jailbreaking, PRISM leverages the in-context learning ability of LLMs to iteratively refine the candidate prompts distribution for given reference images. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, sty +</p>本研究探讨了ChatGPT是否能够基于Twitter提及来预测文章的撤回,研究发现在预测未来被撤回的有问题文章方面是具有一定潜力的。https://arxiv.org/abs/2403.16851<p> +ChatGPT是否能够基于Twitter提及来预测文章的撤回? </p> <p> -Graph Inference Acceleration by Learning MLPs on Graphs without Supervision +Can ChatGPT predict article retraction based on Twitter mentions? </p> <p> -https://arxiv.org/abs/2402.08918 +https://arxiv.org/abs/2403.16851 </p> <p> -该论文提出了一个简单而有效的框架SimMLP,通过在图上无监督学习MLPs,提高了在延迟敏感的应用中的泛化能力。 +本研究探讨了ChatGPT是否能够基于Twitter提及来预测文章的撤回,研究发现在预测未来被撤回的有问题文章方面是具有一定潜力的。 </p> <p> </p> <p> -图神经网络(GNNs)已经在各种图学习任务中展示出了有效性,但是它们对消息传递的依赖限制了它们在延迟敏感的应用中的部署,比如金融欺诈检测。最近的研究探索了从GNNs中提取知识到多层感知机(MLPs)来加速推理。然而,这种任务特定的有监督蒸馏限制了对未见节点的泛化,而在延迟敏感的应用中这种情况很常见。为此,我们提出了一种简单而有效的框架SimMLP,用于在图上无监督学习MLPs,以增强泛化能力。SimMLP利用自监督对齐GNNs和MLPs之间的节点特征和图结构之间的精细和泛化的相关性,并提出了两种策略来减轻平凡解的风险。从理论上讲, +检测有问题的研究文章具有重要意义,本研究探讨了根据被撤回文章在Twitter上的提及是否能够在文章被撤回前发出信号,从而在预测未来被撤回的有问题文章方面发挥作用。分析了包括3,505篇已撤回文章及其相关Twitter提及在内的数据集,以及使用粗糙精确匹配方法获取的具有类似特征的3,505篇未撤回文章。通过四种预测方法评估了Twitter提及在预测文章撤回方面的有效性,包括手动标注、关键词识别、机器学习模型和ChatGPT。手动标注的结果表明,的确有被撤回的文章,其Twitter提及包含在撤回前发出信号的可识别证据,尽管它们只占所有被撤回文章的一小部分。 </p> <p> -arXiv:2402.08918v1 Announce Type: cross Abstract: Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph learning tasks, yet their reliance on message-passing constraints their deployment in latency-sensitive applications such as financial fraud detection. Recent works have explored distilling knowledge from GNNs to Multi-Layer Perceptrons (MLPs) to accelerate inference. However, this task-specific supervised distillation limits generalization to unseen nodes, which are prevalent in latency-sensitive applications. To this end, we present \textbf{\textsc{SimMLP}}, a \textbf{\textsc{Sim}}ple yet effective framework for learning \textbf{\textsc{MLP}}s on graphs without supervision, to enhance generalization. \textsc{SimMLP} employs self-supervised alignment between GNNs and MLPs to capture the fine-grained and generalizable correlation between node features and graph structures, and proposes two strategies to alleviate the risk of trivial solutions. Theoretically, w -</p>使用Voronoi候选点边界可以在贝叶斯优化中有效地优化黑盒函数,提高了多起始连续搜索的执行时间。https://arxiv.org/abs/2402.04922<p> -Voronoi Candidates用于贝叶斯优化 +arXiv:2403.16851v1 Announce Type: cross Abstract: Detecting problematic research articles timely is a vital task. This study explores whether Twitter mentions of retracted articles can signal potential problems with the articles prior to retraction, thereby playing a role in predicting future retraction of problematic articles. A dataset comprising 3,505 retracted articles and their associated Twitter mentions is analyzed, alongside 3,505 non-retracted articles with similar characteristics obtained using the Coarsened Exact Matching method. The effectiveness of Twitter mentions in predicting article retraction is evaluated by four prediction methods, including manual labelling, keyword identification, machine learning models, and ChatGPT. Manual labelling results indicate that there are indeed retracted articles with their Twitter mentions containing recognizable evidence signaling problems before retraction, although they represent only a limited share of all retracted articles with +</p>在未观测混杂因素的情况下,本文展示了即使在放宽或甚至在排除所有相关风险因素被观测到的假设的情况下,仍然可以给出对高风险个体分配率的信息丰富的界限。https://arxiv.org/abs/2403.14713<p> +在未观测混杂因素下审计公平性 </p> <p> -Voronoi Candidates for Bayesian Optimization +Auditing Fairness under Unobserved Confounding </p> <p> -https://arxiv.org/abs/2402.04922 +https://arxiv.org/abs/2403.14713 </p> <p> -使用Voronoi候选点边界可以在贝叶斯优化中有效地优化黑盒函数,提高了多起始连续搜索的执行时间。 +在未观测混杂因素的情况下,本文展示了即使在放宽或甚至在排除所有相关风险因素被观测到的假设的情况下,仍然可以给出对高风险个体分配率的信息丰富的界限。 </p> <p> </p> <p> -贝叶斯优化(BO)为高效优化黑盒函数提供了一种优雅的方法。然而,采集准则需要进行具有挑战性的内部优化,这可能引起很大的开销。许多实际的BO方法,尤其是在高维情况下,不采用对采集函数进行形式化连续优化,而是在有限的空间填充候选集上进行离散搜索。在这里,我们提议使用候选点,其位于当前设计点的Voronoi镶嵌边界上,因此它们与两个或多个设计点等距离。我们讨论了通过直接采样Voronoi边界而不明确生成镶嵌的策略,从而适应高维度中的大设计。通过使用高斯过程和期望改进来对一组测试问题进行优化,我们的方法在不损失准确性的情况下显著提高了多起始连续搜索的执行时间。 +决策系统中的一个基本问题是跨越人口统计线存在不公平性。然而,不公平性可能难以量化,特别是如果我们对公平性的理解依赖于难以衡量的风险等观念(例如,对于那些没有其治疗就会死亡的人平等获得治疗)。审计这种不公平性需要准确测量个体风险,而在未观测混杂的现实环境中,难以估计。在这些未观测到的因素“解释”明显差异的情况下,我们可能低估或高估不公平性。在本文中,我们展示了即使在放宽或(令人惊讶地)甚至在排除所有相关风险因素被观测到的假设的情况下,仍然可以对高风险个体的分配率给出信息丰富的界限。我们利用了在许多实际环境中(例如引入新型治疗)我们拥有在任何分配之前的数据的事实。 </p> <p> -Bayesian optimization (BO) offers an elegant approach for efficiently optimizing black-box functions. However, acquisition criteria demand their own challenging inner-optimization, which can induce significant overhead. Many practical BO methods, particularly in high dimension, eschew a formal, continuous optimization of the acquisition function and instead search discretely over a finite set of space-filling candidates. Here, we propose to use candidates which lie on the boundary of the Voronoi tessellation of the current design points, so they are equidistant to two or more of them. We discuss strategies for efficient implementation by directly sampling the Voronoi boundary without explicitly generating the tessellation, thus accommodating large designs in high dimension. On a battery of test problems optimized via Gaussian processes with expected improvement, our proposed approach significantly improves the execution time of a multi-start continuous search without a loss in accuracy -</p>提出了一种PAC隐私保护扩散模型,通过将私有分类器指导集成到采样过程中增强隐私保护,并发展了一种新的度量标准来衡量隐私水平,在保护性能方面表现出卓越表现。https://arxiv.org/abs/2312.01201<p> -PAC隐私保护扩散模型 +arXiv:2403.14713v1 Announce Type: cross Abstract: A fundamental problem in decision-making systems is the presence of inequity across demographic lines. However, inequity can be difficult to quantify, particularly if our notion of equity relies on hard-to-measure notions like risk (e.g., equal access to treatment for those who would die without it). Auditing such inequity requires accurate measurements of individual risk, which is difficult to estimate in the realistic setting of unobserved confounding. In the case that these unobservables "explain" an apparent disparity, we may understate or overstate inequity. In this paper, we show that one can still give informative bounds on allocation rates among high-risk individuals, even while relaxing or (surprisingly) even when eliminating the assumption that all relevant risk factors are observed. We utilize the fact that in many real-world settings (e.g., the introduction of a novel treatment) we have data from a period prior to any alloc +</p>随机舍入技术能有效隐式正则化高瘦矩阵,确保舍入后的矩阵具有完整的列秩。https://arxiv.org/abs/2403.12278<p> +随机舍入隐式正则化高瘦矩阵 </p> <p> -PAC Privacy Preserving Diffusion Models +Stochastic Rounding Implicitly Regularizes Tall-and-Thin Matrices </p> <p> -https://arxiv.org/abs/2312.01201 +https://arxiv.org/abs/2403.12278 </p> <p> -提出了一种PAC隐私保护扩散模型,通过将私有分类器指导集成到采样过程中增强隐私保护,并发展了一种新的度量标准来衡量隐私水平,在保护性能方面表现出卓越表现。 +随机舍入技术能有效隐式正则化高瘦矩阵,确保舍入后的矩阵具有完整的列秩。 </p> <p> </p> <p> -数据隐私保护正在引起研究人员的越来越多的关注。扩散模型(DMs),尤其是具有严格的差分隐私,有可能生成既具有高隐私性又具有良好视觉质量的图像。然而,挑战在于确保在私有化特定数据属性时的强大保护,当前模型在这些方面经常存在不足。为了解决这些挑战,我们引入了PAC隐私保护扩散模型,这是一种利用扩散原理并确保“可能大致正确(PAC)”隐私性的模型。我们通过将私有分类器指导集成到Langevin采样过程中来增强隐私保护。此外,认识到在衡量模型隐私性方面存在差距,我们开发了一种新的度量标准来衡量隐私水平。我们的模型通过这个新度量标准评估,并通过高斯矩阵计算支持PAC界限,表现出更优异的隐私性能。 +受到随机舍入在机器学习和大规模深度神经网络模型训练中的流行,我们考虑实矩阵$\mathbf{A}$的随机近似舍入,其中行数远远多于列数。我们提供了新颖的理论证据,并通过大量实验评估支持,高概率下,随机舍入矩阵的最小奇异值远离零--无论$\mathbf{A}$接近奇异还是$\mathbf{A}$奇异。换句话说,随机舍入\textit{隐式正则化}高瘦矩阵$\mathbf{A}$,使得舍入后的版本具有完整的列秩。我们的证明利用了随机矩阵理论中的有力结果,以及随机舍入误差不集中在低维列空间的思想。 </p> <p> -arXiv:2312.01201v2 Announce Type: replace-cross Abstract: Data privacy protection is garnering increased attention among researchers. Diffusion models (DMs), particularly with strict differential privacy, can potentially produce images with both high privacy and visual quality. However, challenges arise such as in ensuring robust protection in privatizing specific data attributes, areas where current models often fall short. To address these challenges, we introduce the PAC Privacy Preserving Diffusion Model, a model leverages diffusion principles and ensure Probably Approximately Correct (PAC) privacy. We enhance privacy protection by integrating a private classifier guidance into the Langevin Sampling Process. Additionally, recognizing the gap in measuring the privacy of models, we have developed a novel metric to gauge privacy levels. Our model, assessed with this new metric and supported by Gaussian matrix computations for the PAC bound, has shown superior performance in privacy p -</p>这篇文章介绍了用于潜在高斯过程模型中的Vecchia-Laplace近似法的迭代方法,相比于传统的Cholesky分解方法,可以显著加快计算速度。http://arxiv.org/abs/2310.12000<p> -Vecchia-Laplace近似法在潜在高斯过程模型中的迭代方法 +arXiv:2403.12278v1 Announce Type: new Abstract: Motivated by the popularity of stochastic rounding in the context of machine learning and the training of large-scale deep neural network models, we consider stochastic nearness rounding of real matrices $\mathbf{A}$ with many more rows than columns. We provide novel theoretical evidence, supported by extensive experimental evaluation that, with high probability, the smallest singular value of a stochastically rounded matrix is well bounded away from zero -- regardless of how close $\mathbf{A}$ is to being rank deficient and even if $\mathbf{A}$ is rank-deficient. In other words, stochastic rounding \textit{implicitly regularizes} tall and skinny matrices $\mathbf{A}$ so that the rounded version has full column rank. Our proofs leverage powerful results in random matrix theory, and the idea that stochastic rounding errors do not concentrate in low-dimensional column spaces. +</p>本文研究了训练深度齐次神经网络时梯度流动力学的动态性,发现在足够小的初始化下,神经网络的权重在训练早期阶段保持较小规范,并且沿着神经相关函数的KKT点方向近似收敛。https://arxiv.org/abs/2403.08121<p> +早期方向性收敛在深度齐次神经网络中进行小初始化时的分析 </p> <p> -Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models. (arXiv:2310.12000v1 [stat.ME]) +Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations </p> <p> -http://arxiv.org/abs/2310.12000 +https://arxiv.org/abs/2403.08121 </p> <p> -这篇文章介绍了用于潜在高斯过程模型中的Vecchia-Laplace近似法的迭代方法,相比于传统的Cholesky分解方法,可以显著加快计算速度。 +本文研究了训练深度齐次神经网络时梯度流动力学的动态性,发现在足够小的初始化下,神经网络的权重在训练早期阶段保持较小规范,并且沿着神经相关函数的KKT点方向近似收敛。 </p> <p> </p> <p> -潜在高斯过程(GP)模型是灵活的概率非参数函数模型。Vecchia近似是用于克服大数据计算瓶颈的准确近似方法,Laplace近似是一种快速方法,可以近似非高斯似然函数的边缘似然和后验预测分布,并具有渐近收敛保证。然而,当与直接求解方法(如Cholesky分解)结合使用时,Vecchia-Laplace近似的计算复杂度增长超线性地随样本大小增加。因此,与Vecchia-Laplace近似计算相关的运算在通常情况下是最准确的大型数据集时会变得非常缓慢。在本文中,我们提出了几种用于Vecchia-Laplace近似推断的迭代方法,相比于基于Cholesky的计算,可以大大加快计算速度。我们对我们的方法进行了分析。 +本文研究了训练深度齐次神经网络时梯度流动力学的动态性,这些网络从小初始化开始。本文考虑到具有局部Lipschitz梯度和阶数严格大于两的神经网络。文章证明了对于足够小的初始化,在训练的早期阶段,神经网络的权重保持规范较小,并且在Karush-Kuhn-Tucker (KKT)点处近似沿着神经相关函数的方向收敛。此外,对于平方损失并在神经网络权重上进行可分离假设的情况下,还展示了在损失函数的某些鞍点附近梯度流动动态的类似方向性收敛。 </p> <p> -Latent Gaussian process (GP) models are flexible probabilistic non-parametric function models. Vecchia approximations are accurate approximations for GPs to overcome computational bottlenecks for large data, and the Laplace approximation is a fast method with asymptotic convergence guarantees to approximate marginal likelihoods and posterior predictive distributions for non-Gaussian likelihoods. Unfortunately, the computational complexity of combined Vecchia-Laplace approximations grows faster than linearly in the sample size when used in combination with direct solver methods such as the Cholesky decomposition. Computations with Vecchia-Laplace approximations thus become prohibitively slow precisely when the approximations are usually the most accurate, i.e., on large data sets. In this article, we present several iterative methods for inference with Vecchia-Laplace approximations which make computations considerably faster compared to Cholesky-based calculations. We analyze our propo -</p>本文研究了神经网络的插值问题,提出了一种简单的随机算法,在给定的数据集和两个类的情况下,能够以很高的概率构建一个插值的神经网络。这些结果与训练数据规模无关。http://arxiv.org/abs/2310.00327<p> -神经网络的记忆化:超越最坏情况 +arXiv:2403.08121v1 Announce Type: new Abstract: This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks, starting with small initializations. The present work considers neural networks that are assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. This paper demonstrates that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker (KKT) points of the neural correlation function introduced in [1]. Additionally, for square loss and under a separability assumption on the weights of neural networks, a similar directional convergence of gradient flow dynamics is shown near certain saddle points of the loss function. +</p>提出了一个全面基准(SRB),用于评估自动语音识别(ASR)模型对各种破坏的鲁棒性,发现模型大小和某些建模选择有助于提高鲁棒性,并观察到在不同人口亚组上模型的鲁棒性存在明显差异。https://arxiv.org/abs/2403.07937<p> +语音鲁棒基准:用于语音识别的鲁棒性基准 </p> <p> -Memorization with neural nets: going beyond the worst case. (arXiv:2310.00327v1 [stat.ML]) +Speech Robust Bench: A Robustness Benchmark For Speech Recognition </p> <p> -http://arxiv.org/abs/2310.00327 +https://arxiv.org/abs/2403.07937 </p> <p> -本文研究了神经网络的插值问题,提出了一种简单的随机算法,在给定的数据集和两个类的情况下,能够以很高的概率构建一个插值的神经网络。这些结果与训练数据规模无关。 +提出了一个全面基准(SRB),用于评估自动语音识别(ASR)模型对各种破坏的鲁棒性,发现模型大小和某些建模选择有助于提高鲁棒性,并观察到在不同人口亚组上模型的鲁棒性存在明显差异。 </p> <p> </p> <p> -在实践中,深度神经网络通常能够轻松地插值其训练数据。为了理解这一现象,许多研究都旨在量化神经网络架构的记忆能力:即在任意放置这些点并任意分配标签的情况下,架构能够插值的最大点数。然而,对于实际数据,人们直觉地期望存在一种良性结构,使得插值在比记忆能力建议的较小网络尺寸上已经发生。在本文中,我们通过采用实例特定的观点来研究插值。我们引入了一个简单的随机算法,它可以在多项式时间内给定一个固定的有限数据集和两个类的情况下,以很高的概率构建出一个插值三层神经网络。所需的参数数量与这两个类的几何特性及其相互排列有关。因此,我们获得了与训练数据规模无关的保证。 +随着自动语音识别(ASR)模型变得越来越普遍,确保它们在物理世界和数字世界中的各种破坏下进行可靠预测变得愈发重要。我们提出了语音鲁棒基准(SRB),这是一个用于评估ASR模型对各种破坏的鲁棒性的全面基准。SRB由69个输入扰动组成,旨在模拟ASR模型可能在物理世界和数字世界中遇到的各种破坏。我们使用SRB来评估几种最先进的ASR模型的鲁棒性,并观察到模型大小和某些建模选择(如离散表示和自我训练)似乎有助于提高鲁棒性。我们将此分析扩展到衡量ASR模型在来自各种人口亚组的数据上的鲁棒性,即英语和西班牙语使用者以及男性和女性,并观察到模型的鲁棒性在不同亚组之间存在明显差异。 </p> <p> -In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of t -</p>通过分析量子卷积神经网络(QCNNs),我们发现它们通过隐藏特征映射嵌入物理系统参数,并且利用量子临界性生成适合的基函数集,池化层选择能够形成高性能决策边界的基函数,而模型的泛化性能依赖于嵌入类型。http://arxiv.org/abs/2308.16664<p> -我们可以从量子卷积神经网络中学到什么? +arXiv:2403.07937v1 Announce Type: cross Abstract: As Automatic Speech Recognition (ASR) models become ever more pervasive, it is important to ensure that they make reliable predictions under corruptions present in the physical and digital world. We propose Speech Robust Bench (SRB), a comprehensive benchmark for evaluating the robustness of ASR models to diverse corruptions. SRB is composed of 69 input perturbations which are intended to simulate various corruptions that ASR models may encounter in the physical and digital world. We use SRB to evaluate the robustness of several state-of-the-art ASR models and observe that model size and certain modeling choices such as discrete representations, and self-training appear to be conducive to robustness. We extend this analysis to measure the robustness of ASR models on data from various demographic subgroups, namely English and Spanish speakers, and males and females, and observed noticeable disparities in the model's robustness across su +</p>介绍了一个新的基准测试,评估了用于生成简要住院病程摘要的大语言模型在健康保健领域中的性能并提出相应的自适应策略https://arxiv.org/abs/2403.05720<p> +用于生成简要住院病程摘要的领域自适应大语言模型的基准测试 </p> <p> -What can we learn from quantum convolutional neural networks?. (arXiv:2308.16664v1 [quant-ph]) +A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries </p> <p> -http://arxiv.org/abs/2308.16664 +https://arxiv.org/abs/2403.05720 </p> <p> -通过分析量子卷积神经网络(QCNNs),我们发现它们通过隐藏特征映射嵌入物理系统参数,并且利用量子临界性生成适合的基函数集,池化层选择能够形成高性能决策边界的基函数,而模型的泛化性能依赖于嵌入类型。 +介绍了一个新的基准测试,评估了用于生成简要住院病程摘要的大语言模型在健康保健领域中的性能并提出相应的自适应策略 </p> <p> </p> <p> -通过分析量子卷积神经网络(QCNNs),我们可以得出以下结论:1)通过隐藏特征映射,工作于量子数据可以被视为嵌入物理系统参数;2)对于量子相位识别,其高性能可以归因于在基态嵌入期间生成非常适合的基函数集,其中自旋模型的量子临界性导致具有快速变化特征的基函数;3)QCNN的池化层负责选择那些能够有助于形成高性能决策边界的基函数,学习过程对应于适应性测量,使得少量量子比特算符映射到整个寄存器可观测量;4)QCNN模型的泛化强烈依赖于嵌入类型,基于傅里叶基的旋转特征映射需要仔细的特征工程;5)基于有限数量的测量次数的读出的QCNN的准确性和泛化能力倾向于地面态。 +简要住院病程(BHC)摘要是通过总结临床记录而生成的常见临床文件。虽然大型语言模型(LLMs)在自动化实际任务方面展现出显著能力,但它们在医疗应用(如BHC合成)中的能力尚未得到展示。为了使LLMs能够适应BHC合成,我们引入了一个新颖的基准测试,其中包含从MIMIC-IV记录中提取的经过预处理的数据集,封装了临床记录和简要住院病程(BHC)对。我们评估了两个通用LLMs和三个医疗领域适应的LLMs的性能,以改进从临床记录生成BHC。我们使用临床记录作为输入来生成BHC,采用基于提示的(使用上下文学习)和基于微调的自适应策略来应用于三个开源LLMs(Clinical-T5-Large,Llama2-13B,FLAN-UL2)和两个专有LLMs(GPT-3.5,GPT-4)。我们定量评估了性能。 </p> <p> -We can learn from analyzing quantum convolutional neural networks (QCNNs) that: 1) working with quantum data can be perceived as embedding physical system parameters through a hidden feature map; 2) their high performance for quantum phase recognition can be attributed to generation of a very suitable basis set during the ground state embedding, where quantum criticality of spin models leads to basis functions with rapidly changing features; 3) pooling layers of QCNNs are responsible for picking those basis functions that can contribute to forming a high-performing decision boundary, and the learning process corresponds to adapting the measurement such that few-qubit operators are mapped to full-register observables; 4) generalization of QCNN models strongly depends on the embedding type, and that rotation-based feature maps with the Fourier basis require careful feature engineering; 5) accuracy and generalization of QCNNs with readout based on a limited number of shots favor the groun -</p>本文针对特征分布偏斜的联邦学习提出了FedRDN方法,在输入层级上实现了数据增强,将整个联邦数据集的统计信息注入到本地客户端数据中,以缓解特征漂移问题。http://arxiv.org/abs/2306.09363<p> -一种简单的面向特征分布偏斜联邦学习的数据增强方法 +arXiv:2403.05720v1 Announce Type: cross Abstract: Brief hospital course (BHC) summaries are common clinical documents generated by summarizing clinical notes. While large language models (LLMs) depict remarkable capabilities in automating real-world tasks, their capabilities for healthcare applications such as BHC synthesis have not been shown. To enable the adaptation of LLMs for BHC synthesis, we introduce a novel benchmark consisting of a pre-processed dataset extracted from MIMIC-IV notes, encapsulating clinical note, and brief hospital course (BHC) pairs. We assess the performance of two general-purpose LLMs and three healthcare-adapted LLMs to improve BHC synthesis from clinical notes. Using clinical notes as input for generating BHCs, we apply prompting-based (using in-context learning) and fine-tuning-based adaptation strategies to three open-source LLMs (Clinical-T5-Large, Llama2-13B, FLAN-UL2) and two proprietary LLMs (GPT-3.5, GPT-4). We quantitatively evaluate the performa +</p>该论文提出了第一个能够精确重构批量$b >1$的算法,在联邦学习中解决了梯度反演攻击的问题。https://arxiv.org/abs/2403.03945<p> +SPEAR:联邦学习中批量精确梯度反演 </p> <p> -A Simple Data Augmentation for Feature Distribution Skewed Federated Learning. (arXiv:2306.09363v1 [cs.LG]) +SPEAR:Exact Gradient Inversion of Batches in Federated Learning </p> <p> -http://arxiv.org/abs/2306.09363 +https://arxiv.org/abs/2403.03945 </p> <p> -本文针对特征分布偏斜的联邦学习提出了FedRDN方法,在输入层级上实现了数据增强,将整个联邦数据集的统计信息注入到本地客户端数据中,以缓解特征漂移问题。 +该论文提出了第一个能够精确重构批量$b >1$的算法,在联邦学习中解决了梯度反演攻击的问题。 </p> <p> </p> <p> -联邦学习(FL)是一种分布式协作学习方法,可以确保隐私保护。然而,由于数据异构性(即非独立同分布数据),它的性能必然受到影响。本文针对特征分布偏斜的FL场景展开研究,提出了一种通用的数据增强方法,以减轻由本地数据集之间潜在分布不同导致的特征漂移问题。 +联邦学习是一种流行的协作机器学习框架,在这个框架中,多个客户端仅与服务器共享他们本地数据的梯度更新,而不是实际数据。不幸的是,最近发现梯度反演攻击可以从这些共享的梯度中重构出数据。现有的攻击只能在重要的诚实但好奇设置中对批量大小为$b=1$的数据进行精确重构,对于更大的批量只能进行近似重构。在这项工作中,我们提出了\emph{第一个准确重建批量$b >1$的算法}。这种方法结合了对梯度显式低秩结构的数学见解和基于采样的算法。关键的是,我们利用ReLU诱导的梯度稀疏性,精确地过滤掉大量错误的样本,使最终的重建步骤可行。我们为全连接提供了高效的GPU实现 </p> <p> -Federated learning (FL) facilitates collaborative learning among multiple clients in a distributed manner, while ensuring privacy protection. However, its performance is inevitably degraded as suffering data heterogeneity, i.e., non-IID data. In this paper, we focus on the feature distribution skewed FL scenario, which is widespread in real-world applications. The main challenge lies in the feature shift caused by the different underlying distributions of local datasets. While the previous attempts achieved progress, few studies pay attention to the data itself, the root of this issue. Therefore, the primary goal of this paper is to develop a general data augmentation technique at the input level, to mitigate the feature shift. To achieve this goal, we propose FedRDN, a simple yet remarkably effective data augmentation method for feature distribution skewed FL, which randomly injects the statistics of the dataset from the entire federation into the client's data. By this, our method ca -</p>本文提出了一种新的评分差异流模型(SD flow),它可以最优地减少两个分布之间的散度,同时解决Schr​​ödinger桥问题。与去噪扩散模型不同,它没有对先验分布施加任何限制,在一些基准数据集中优于其他方法。http://arxiv.org/abs/2304.12906<p> -评分差值流模型用于隐式生成建模 +arXiv:2403.03945v1 Announce Type: new Abstract: Federated learning is a popular framework for collaborative machine learning where multiple clients only share gradient updates on their local data with the server and not the actual data. Unfortunately, it was recently shown that gradient inversion attacks can reconstruct this data from these shared gradients. Existing attacks enable exact reconstruction only for a batch size of $b=1$ in the important honest-but-curious setting, with larger batches permitting only approximate reconstruction. In this work, we propose \emph{the first algorithm reconstructing whole batches with $b >1$ exactly}. This approach combines mathematical insights into the explicit low-rank structure of gradients with a sampling-based algorithm. Crucially, we leverage ReLU-induced gradient sparsity to precisely filter out large numbers of incorrect samples, making a final reconstruction step tractable. We provide an efficient GPU implementation for fully connected +</p>本文研究了具有Polyak动量的随机近端梯度方法,在非凸复合优化问题中实现了最佳收敛速度,无论批量大小如何。https://arxiv.org/abs/2403.02967<p> +具有Polyak动量的非凸随机复合优化 </p> <p> -The Score-Difference Flow for Implicit Generative Modeling. (arXiv:2304.12906v1 [cs.LG]) +Non-Convex Stochastic Composite Optimization with Polyak Momentum </p> <p> -http://arxiv.org/abs/2304.12906 +https://arxiv.org/abs/2403.02967 </p> <p> -本文提出了一种新的评分差异流模型(SD flow),它可以最优地减少两个分布之间的散度,同时解决Schr​​ödinger桥问题。与去噪扩散模型不同,它没有对先验分布施加任何限制,在一些基准数据集中优于其他方法。 +本文研究了具有Polyak动量的随机近端梯度方法,在非凸复合优化问题中实现了最佳收敛速度,无论批量大小如何。 </p> <p> </p> <p> -隐式生成建模(IGM)旨在生成符合目标数据分布特征的合成数据样本。最近的研究(例如评分匹配网络、扩散模型)从通过环境空间中的动态扰动或流将合成源数据推向目标分布的角度解决了IGM问题。我们引入了任意目标和源分布之间的评分差异(SD)作为流,它可以最优地减少它们之间的Kullback-Leibler散度,同时解决Schr​​ödinger桥问题。我们将SD流应用于方便的代理分布,当且仅当原始分布对齐时,它们是对齐的。我们在某些条件下展示了这种公式与去噪扩散模型的形式一致性。然而,与扩散模型不同,SD流没有对先验分布施加任何限制。我们还表明,在无限辨别器能力的极限下,生成对抗网络的训练包含SD流。我们的实验表明,SD流在几个基准数据集上优于先前的最新技术。 +随机近端梯度法是广泛使用的随机梯度下降(SGD)方法的一个强大泛化,在机器学习中已经被广泛应用。然而,众所周知,当随机噪声显著时(即仅使用小型或有界批量大小时),该方法在非凸环境中无法收敛。本文关注具有Polyak动量的随机近端梯度方法。我们证明了该方法对于非凸复合优化问题实现了最佳收敛速度,而批量大小大小无关。此外,我们对Polyak动量在复合优化环境中的方差减少效应进行了严格分析,并且我们证明了当近端步骤只能通过近似解来求解时,该方法也会收敛。最后,我们提供了数值实验来验证我们的理论结果。 </p> <p> -Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. We introduce the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schr\"odinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. However, unlike diffusion models, SD flow places no restrictions on the prior distribution. We also show that the training of generative adversarial networks includ +arXiv:2403.02967v1 Announce Type: cross Abstract: The stochastic proximal gradient method is a powerful generalization of the widely used stochastic gradient descent (SGD) method and has found numerous applications in Machine Learning. However, it is notoriously known that this method fails to converge in non-convex settings where the stochastic noise is significant (i.e. when only small or bounded batch sizes are used). In this paper, we focus on the stochastic proximal gradient method with Polyak momentum. We prove this method attains an optimal convergence rate for non-convex composite optimization problems, regardless of batch size. Additionally, we rigorously analyze the variance reduction effect of the Polyak momentum in the composite optimization setting and we show the method also converges when the proximal step can only be solved inexactly. Finally, we provide numerical experiments to validate our theoretical results. +</p>KATE是一种新的优化算法,提出了一种与AdaGrad标度不变的适应方法,并在广义线性模型和一般的非凸问题中证明了其标度不变性。数值实验结果表明,KATE在各种场景中均优于AdaGrad并与Adam性能匹配/超越。https://arxiv.org/abs/2403.02648<p> +移除平方根:一种新的高效标度不变版本的AdaGrad +</p> +<p> +Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad +</p> +<p> +https://arxiv.org/abs/2403.02648 +</p> +<p> +KATE是一种新的优化算法,提出了一种与AdaGrad标度不变的适应方法,并在广义线性模型和一般的非凸问题中证明了其标度不变性。数值实验结果表明,KATE在各种场景中均优于AdaGrad并与Adam性能匹配/超越。 +</p> +<p> + +</p> +<p> +自适应方法在机器学习中非常流行,因为它们可以降低学习速率调整的成本。本文引入了一种名为KATE的新型优化算法,它提出了一个著名的AdaGrad算法的标度不变适应。我们证明了KATE在广义线性模型案例中的标度不变性。此外,对于一般的光滑非凸问题,我们为KATE建立了一个收敛速率为$O \left(\frac{\log T}{\sqrt{T}} \right)$,与AdaGrad和Adam的最佳收敛速率相匹配。我们还通过不同问题的数值实验将KATE与其他最先进的自适应算法Adam和AdaGrad进行了比较,包括在真实数据上进行图像分类和文本分类等复杂机器学习任务。结果表明,在所有考虑到的场景中,KATE始终胜过AdaGrad,并且在性能上匹配/超越Adam。 +</p> +<p> +arXiv:2403.02648v1 Announce Type: cross Abstract: Adaptive methods are extremely popular in machine learning as they make learning rate tuning less expensive. This paper introduces a novel optimization algorithm named KATE, which presents a scale-invariant adaptation of the well-known AdaGrad algorithm. We prove the scale-invariance of KATE for the case of Generalized Linear Models. Moreover, for general smooth non-convex problems, we establish a convergence rate of $O \left(\frac{\log T}{\sqrt{T}} \right)$ for KATE, matching the best-known ones for AdaGrad and Adam. We also compare KATE to other state-of-the-art adaptive algorithms Adam and AdaGrad in numerical experiments with different problems, including complex machine learning tasks like image classification and text classification on real data. The results indicate that KATE consistently outperforms AdaGrad and matches/surpasses the performance of Adam in all considered scenarios. +</p>本文研究了RNNs和Transformer在处理算法问题时的表现能力差距,发现RNNs存在关键瓶颈,即无法完美地从上下文中检索信息,导致无法像Transformer那样轻松解决需要这种能力的任务。https://arxiv.org/abs/2402.18510<p> +RNNs还不是Transformer:在上下文检索中的关键瓶颈 +</p> +<p> +RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval +</p> +<p> +https://arxiv.org/abs/2402.18510 +</p> +<p> +本文研究了RNNs和Transformer在处理算法问题时的表现能力差距,发现RNNs存在关键瓶颈,即无法完美地从上下文中检索信息,导致无法像Transformer那样轻松解决需要这种能力的任务。 +</p> +<p> + +</p> +<p> +本文探讨循环神经网络(RNNs)和Transformer在解决算法问题时的表示能力差距。我们重点关注RNNs是否能在处理长序列时,通过Chain-of-Thought (CoT)提示,与Transformer的性能相匹配。我们的理论分析显示CoT可以改进RNNs,但无法弥补与Transformer之间的差距。关键瓶颈在于RNNs无法完全从上下文中检索信息,即使经过CoT的增强:对于几个明确或隐式需要这种能力的任务,如联想召回和确定图是否为树,我们证明RNNs表达能力不足以解决这些任务,而Transformer可以轻松解决。相反,我们证明采用增强RNNs上下文检索能力的技术,包括 +</p> +<p> +arXiv:2402.18510v1 Announce Type: cross Abstract: This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems. We focus on understanding whether RNNs, known for their memory efficiency in handling long sequences, can match the performance of Transformers, particularly when enhanced with Chain-of-Thought (CoT) prompting. Our theoretical analysis reveals that CoT improves RNNs but is insufficient to close the gap with Transformers. A key bottleneck lies in the inability of RNNs to perfectly retrieve information from the context, even with CoT: for several tasks that explicitly or implicitly require this capability, such as associative recall and determining if a graph is a tree, we prove that RNNs are not expressive enough to solve the tasks while Transformers can solve them with ease. Conversely, we prove that adopting techniques to enhance the in-context retrieval capability of RNNs, inclu +</p>FENs是一种神经网络算法,具有对数深度且可以在线性时间内处理序列,关键创新在于通过训练大致线性数量的常深度神经网络并行学习。https://arxiv.org/abs/2402.15883<p> +融合编码器网络 +</p> +<p> +Fusion Encoder Networks +</p> +<p> +https://arxiv.org/abs/2402.15883 +</p> +<p> +FENs是一种神经网络算法,具有对数深度且可以在线性时间内处理序列,关键创新在于通过训练大致线性数量的常深度神经网络并行学习。 +</p> +<p> + +</p> +<p> +在本文中,我们提出了一种名为融合编码器网络(FENs)的算法类:用于创建将固定长度序列映射到输出的神经网络。生成的神经网络仅具有对数深度(减轻数据在网络中传播时的退化),可以在线性时间内处理序列(或者在具有线性处理器数量的对数时间内)。FENs的关键属性是它们通过训练大致线性数量的常深度神经网络并行学习。这些网络具有常深度意味着反向传播效果良好。需要注意的是,目前FENs的性能仅仅是推测,因为我们尚未实现它们。 +</p> +<p> +arXiv:2402.15883v1 Announce Type: new Abstract: In this paper we present fusion encoder networks (FENs): a class of algorithms for creating neural networks that map fixed-length sequences to outputs. The resulting neural network has only logarithmic depth (alleviating the degradation of data as it propagates through the network) and can process sequences in linear time (or in logarithmic time with a linear number of processors). The crucial property of FENs is that they learn by training a quasi-linear number of constant-depth neural networks in parallel. The fact that these networks are constant depth means that backpropagation works well. We note that currently the performance of FENs is only conjectured as we are yet to implement them. +</p>本文研究了线性神经网络在二次损失函数下的优化问题,证明了梯度下降映射的非奇异性以及全局最小值点集的光滑流形特性,为理解大学习率下梯度下降的稳定性提供了重要线索。https://arxiv.org/abs/2402.13108<p> +关于大学习率下梯度下降的稳定性 +</p> +<p> +On the Stability of Gradient Descent for Large Learning Rate +</p> +<p> +https://arxiv.org/abs/2402.13108 +</p> +<p> +本文研究了线性神经网络在二次损失函数下的优化问题,证明了梯度下降映射的非奇异性以及全局最小值点集的光滑流形特性,为理解大学习率下梯度下降的稳定性提供了重要线索。 +</p> +<p> + +</p> +<p> +目前对理解“稳定性边缘(EoS)”现象存在着相当大的兴趣,这一现象在神经网络训练中被观察到,其特点是损失函数在不同纪元间的非单调下降,而损失的陡峭度(Hessian的谱范数)逐渐接近并稳定在2/(学习率)附近。最近有人提出了使用梯度下降训练时出现EoS的原因——沿梯度下降轨迹附近缺乏平坦的极小值点,同时存在紧致的正向不变集。在本文中,我们证明了在二次损失函数下优化的线性神经网络满足第一个假设以及第二个假设的一个必要条件。更具体地,我们证明了梯度下降映射是非奇异的,损失函数的全局最小值点集构成一个光滑流形,并且稳定的极小值构成有界子集。 +</p> +<p> +arXiv:2402.13108v1 Announce Type: new Abstract: There currently is a significant interest in understanding the Edge of Stability (EoS) phenomenon, which has been observed in neural networks training, characterized by a non-monotonic decrease of the loss function over epochs, while the sharpness of the loss (spectral norm of the Hessian) progressively approaches and stabilizes around 2/(learning rate). Reasons for the existence of EoS when training using gradient descent have recently been proposed -- a lack of flat minima near the gradient descent trajectory together with the presence of compact forward-invariant sets. In this paper, we show that linear neural networks optimized under a quadratic loss function satisfy the first assumption and also a necessary condition for the second assumption. More precisely, we prove that the gradient descent map is non-singular, the set of global minimizers of the loss function forms a smooth manifold, and the stable minima form a bounded subset i +</p>该研究提出了一种基于查询的对抗性攻击方法,通过利用远程语言模型的 API 访问构造对抗性示例,使模型以更高概率发出有害字符串,而非仅仅基于模型之间的转移性攻击。https://arxiv.org/abs/2402.12329<p> +基于查询的对抗性提示生成 +</p> +<p> +Query-Based Adversarial Prompt Generation +</p> +<p> +https://arxiv.org/abs/2402.12329 +</p> +<p> +该研究提出了一种基于查询的对抗性攻击方法,通过利用远程语言模型的 API 访问构造对抗性示例,使模型以更高概率发出有害字符串,而非仅仅基于模型之间的转移性攻击。 +</p> +<p> + +</p> +<p> +最近的研究表明,可以构造对抗性示例,导致一个对其进行了调整的语言模型产生有害字符串或执行有害行为。现有的攻击要么在白盒设置中(完全访问模型权重),要么通过可转移性:一种现象,即在一个模型上精心设计的对抗性示例通常在其他模型上仍然有效。我们通过基于查询的攻击改进以前的工作,利用 API 访问远程语言模型来构造对抗性示例,使模型以(明显)更高的概率发出有害字符串,而不能仅仅使用转移攻击。我们在 GPT-3.5 和 OpenAI 的安全分类器上验证了我们的攻击;我们能够让 GPT-3.5 发出有害字符串,而目前的转移攻击失败了,并且我们几乎以 100% 的概率规避了安全分类器。 +</p> +<p> +arXiv:2402.12329v1 Announce Type: cross Abstract: Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior. Existing attacks work either in the white-box setting (with full access to the model weights), or through transferability: the phenomenon that adversarial examples crafted on one model often remain effective on other models. We improve on prior work with a query-based attack that leverages API access to a remote language model to construct adversarial examples that cause the model to emit harmful strings with (much) higher probability than with transfer-only attacks. We validate our attack on GPT-3.5 and OpenAI's safety classifier; we can cause GPT-3.5 to emit harmful strings that current transfer attacks fail at, and we can evade the safety classifier with nearly 100% probability. +</p>这项研究展示了如何利用贝叶斯学习技术应用于参数高效微调,以防止灾难性遗忘,实现了预训练知识的保留,并在语言建模和语音合成任务中取得成功。https://arxiv.org/abs/2402.12220<p> +贝叶斯参数高效微调以克服灾难性遗忘 +</p> +<p> +Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting +</p> +<p> +https://arxiv.org/abs/2402.12220 +</p> +<p> +这项研究展示了如何利用贝叶斯学习技术应用于参数高效微调,以防止灾难性遗忘,实现了预训练知识的保留,并在语言建模和语音合成任务中取得成功。 +</p> +<p> + +</p> +<p> +虽然最初是被文本转语音合成模型的自适应所激发,但我们认为更通用的参数高效微调(PEFT)是进行这种自适应的适当框架。然而,灾难性遗忘仍然是PEFT面临的问题,它损害了预训练模型固有的能力。我们证明现有的贝叶斯学习技术可以应用于PEFT,以防止灾难性遗忘,只要能够可微地计算微调层的参数转换。在一系列关于语言建模和语音合成任务的基础性实验中,我们利用建立的拉普拉斯近似,包括对角线和Kronecker分解方法,来正则化PEFT与低秩适应(LoRA)并比较它们在保留预训练知识方面的性能。我们的结果表明,我们的方法可以克服灾难性遗忘,而不会降低微调性能。 +</p> +<p> +arXiv:2402.12220v1 Announce Type: cross Abstract: Although motivated by the adaptation of text-to-speech synthesis models, we argue that more generic parameter-efficient fine-tuning (PEFT) is an appropriate framework to do such adaptation. However, catastrophic forgetting remains an issue with PEFT, damaging the pre-trained model's inherent capabilities. We demonstrate that existing Bayesian learning techniques can be applied to PEFT to prevent catastrophic forgetting as long as the parameter shift of the fine-tuned layers can be calculated differentiably. In a principled series of experiments on language modeling and speech synthesis tasks, we utilize established Laplace approximations, including diagonal and Kronecker factored approaches, to regularize PEFT with the low-rank adaptation (LoRA) and compare their performance in pre-training knowledge preservation. Our results demonstrate that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning perfo +</p>通过将大型语言模型推理与量子化学反馈相结合,我们引入了一个AI引导的计算筛选框架,将催化剂发现形式化为一个不确定环境,从而实现高效催化剂的积极搜索https://arxiv.org/abs/2402.10980<p> +CHEMREASONER:使用量子化学反馈在大型语言模型的知识空间中进行启发式搜索 +</p> +<p> +CHEMREASONER: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback +</p> +<p> +https://arxiv.org/abs/2402.10980 +</p> +<p> +通过将大型语言模型推理与量子化学反馈相结合,我们引入了一个AI引导的计算筛选框架,将催化剂发现形式化为一个不确定环境,从而实现高效催化剂的积极搜索 +</p> +<p> + +</p> +<p> +arXiv:2402.10980v1 类型公告:跨领域 摘要:发现新的催化剂对于设计新的更高效的化学过程至关重要,以实现向可持续未来的过渡。我们引入了一种人工智能引导的计算筛选框架,将语言推理与基于量子化学的三维原子表示的反馈统一起来。我们的方法将催化剂发现构建为一个不确定环境,其中一个代理通过大型语言模型(LLM)推导的假设与基于原子图神经网络(GNN)的反馈的迭代组合,积极搜索高效催化剂。在中间搜索步骤确定的催化剂经过基于空间定向、反应途径和稳定性的结构评估。基于吸附能和势垒的评分函数引导在LLM的知识空间中向能量有利、高效的催化剂探索。我们引入了可以自动规划的方法 +</p> +<p> +arXiv:2402.10980v1 Announce Type: cross Abstract: The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts. We introduce planning methods that automaticall +</p>本文提出了动量近似方法,在异步私有联邦学习(FL)中有效结合了动量和异步协议的技术,通过最小化动量更新的偏差来改进模型性能。实证研究证明了动量近似在基准FL数据集上的有效性。https://arxiv.org/abs/2402.09247<p> +异步私有联邦学习中的动量近似 +</p> +<p> +Momentum Approximation in Asynchronous Private Federated Learning +</p> +<p> +https://arxiv.org/abs/2402.09247 +</p> +<p> +本文提出了动量近似方法,在异步私有联邦学习(FL)中有效结合了动量和异步协议的技术,通过最小化动量更新的偏差来改进模型性能。实证研究证明了动量近似在基准FL数据集上的有效性。 +</p> +<p> + +</p> +<p> +异步协议已被证明能够提高大规模客户端联邦学习(FL)的可扩展性。同时,基于动量的方法可以在同步FL中实现最佳模型质量。然而,在异步FL算法中简单地应用动量会导致收敛速度变慢和模型性能下降。如何有效地结合这两种技术以实现双赢目前尚不清楚。在本文中,我们发现异步性引入了对动量更新的隐含偏差。为了解决这个问题,我们提出了动量近似,通过找到所有历史模型更新的最佳加权平均值来最小化偏差。动量近似与安全聚合和差分隐私是兼容的,并且可以在生产的FL系统中很容易地集成,只需较小的通信和存储成本。我们在基准FL数据集上进行了实证研究,证明了动量近似在性能上的改进效果。 +</p> +<p> +arXiv:2402.09247v1 Announce Type: new Abstract: Asynchronous protocols have been shown to improve the scalability of federated learning (FL) with a massive number of clients. Meanwhile, momentum-based methods can achieve the best model quality in synchronous FL. However, naively applying momentum in asynchronous FL algorithms leads to slower convergence and degraded model performance. It is still unclear how to effective combinie these two techniques together to achieve a win-win. In this paper, we find that asynchrony introduces implicit bias to momentum updates. In order to address this problem, we propose momentum approximation that minimizes the bias by finding an optimal weighted average of all historical model updates. Momentum approximation is compatible with secure aggregation as well as differential privacy, and can be easily integrated in production FL systems with a minor communication and storage cost. We empirically demonstrate that on benchmark FL datasets, momentum appro +</p>本研究将因果表示学习和基础模型相结合,研究了如何从数据中学习人类可解释的概念。实验证明了这一统一方法的实用性。https://arxiv.org/abs/2402.09236<p> +学习可解释概念:统一因果表示学习与基础模型 +</p> +<p> +Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models +</p> +<p> +https://arxiv.org/abs/2402.09236 +</p> +<p> +本研究将因果表示学习和基础模型相结合,研究了如何从数据中学习人类可解释的概念。实验证明了这一统一方法的实用性。 +</p> +<p> + +</p> +<p> +构建智能机器学习系统有两种广泛的方法。一种方法是构建天生可解释的模型,这是因果表示学习领域的努力方向。另一种方法是构建高性能的基础模型,然后投入努力去理解它们的工作原理。本研究将这两种方法联系起来,研究如何从数据中学习人类可解释的概念。通过结合这两个领域的思想,我们正式定义了概念的概念,并展示了它们可以从多样的数据中被可靠地恢复出来。对于合成数据和大型语言模型的实验证明了我们统一方法的实用性。 +</p> +<p> +arXiv:2402.09236v1 Announce Type: cross Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach. +</p>该论文提出了一个个性化语言模型的方法,通过在于用户的反馈数据中引入个性化特征来解决强化学习框架在多样化用户偏好下存在的问题。https://arxiv.org/abs/2402.05133<p> +个性化语言模型基于个性化人类反馈 +</p> +<p> +Personalized Language Modeling from Personalized Human Feedback +</p> +<p> +https://arxiv.org/abs/2402.05133 +</p> +<p> +该论文提出了一个个性化语言模型的方法,通过在于用户的反馈数据中引入个性化特征来解决强化学习框架在多样化用户偏好下存在的问题。 +</p> +<p> + +</p> +<p> +从个性化人类反馈中进行强化学习(RLHF)是目前主流的框架,用于调整大型语言模型以更好地符合人类偏好。然而,在这个框架下开发的算法的基本前提在用户偏好多样化的情况下可能会出现问题。在本文中,我们旨在通过开发个性化语言模型的方法来解决这个问题。我们首先正式介绍了从个性化人类反馈中学习的任务,并解释了为什么在这种情况下普通的RLHF可能会存在问题。然后,我们提出了一个通用的个性化-RLHF(P-RLHF)框架,需要同时学习用户模型和语言(或奖励)模型。用户模型接收用户信息并输出用户表示。其结构编码了我们对反馈数据中用户偏好的假设。我们为个性化奖励建模和个性化直接偏好优化开发了新的学习目标。 +</p> +<p> +Reinforcement Learning from Human Feedback (RLHF) is the current dominating framework to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be problematic in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, which requires one to jointly learn a user model and a language (or reward) model. The user model takes in user information and outputs user representations. Its structure encodes our assumptions about user preferences underlying the feedback data. We develop new learning objectives for personalized reward modeling and personalized Direct Preference Optimizat +</p>这项工作提出了一种用于多模态3D物体检测的主动学习框架ActiveAnno3D。通过选择最具信息量的训练数据样本进行标注,我们能够在使用一半的训练数据时实现与传统方法相近的检测性能。https://arxiv.org/abs/2402.03235<p> +ActiveAnno3D - 一种用于多模态3D物体检测的主动学习框架 +</p> +<p> +ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection +</p> +<p> +https://arxiv.org/abs/2402.03235 +</p> +<p> +这项工作提出了一种用于多模态3D物体检测的主动学习框架ActiveAnno3D。通过选择最具信息量的训练数据样本进行标注,我们能够在使用一半的训练数据时实现与传统方法相近的检测性能。 +</p> +<p> + +</p> +<p> +大规模数据集的筛选仍然需要大量的时间和资源,数据通常需要人工标注,创建高质量数据集的难题依然存在。在这项工作中,我们使用主动学习的方法来解决多模态3D物体检测中的研究空白。我们提出了ActiveAnno3D,一个用于选择最具信息量的训练数据样本进行标注的主动学习框架。我们探索了各种连续训练方法,并集成了在计算需求和检测性能方面最高效的方法。此外,我们对nuScenes和TUM Traffic Intersection数据集进行了大量实验和消融研究,使用BEVFusion和PV-RCNN进行了测试。我们展示了当仅使用TUM Traffic Intersection数据集的一半训练数据(77.25 mAP相比于83.50 mAP)时,使用PV-RCNN和基于熵的查询策略几乎可以达到相同的性能,而BEVFusion则在使用一半的训练数据时获得了64.31的mAP。 +</p> +<p> +The curation of large-scale datasets is still costly and requires much time and resources. Data is often manually labeled, and the challenge of creating high-quality datasets remains. In this work, we fill the research gap using active learning for multi-modal 3D object detection. We propose ActiveAnno3D, an active learning framework to select data samples for labeling that are of maximum informativeness for training. We explore various continuous training methods and integrate the most efficient method regarding computational demand and detection performance. Furthermore, we perform extensive experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset. We show that we can achieve almost the same performance with PV-RCNN and the entropy-based query strategy when using only half of the training data (77.25 mAP compared to 83.50 mAP) of the TUM Traffic Intersection dataset. BEVFusion achieved an mAP of 64.31 when using half of the trai +</p>本论文提出了一种名为FuseMoE的专家混合Transformer框架,通过创新的门控函数实现灵活融合多模态数据,能够有效地处理缺失模态和不规则采样数据,同时改善模型的预测性能,在临床风险预测任务中具有实际应用价值。https://arxiv.org/abs/2402.03226<p> +FuseMoE:用于灵活多模态融合的专家混合Transformer +</p> +<p> +FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion +</p> +<p> +https://arxiv.org/abs/2402.03226 +</p> +<p> +本论文提出了一种名为FuseMoE的专家混合Transformer框架,通过创新的门控函数实现灵活融合多模态数据,能够有效地处理缺失模态和不规则采样数据,同时改善模型的预测性能,在临床风险预测任务中具有实际应用价值。 +</p> +<p> + +</p> +<p> +随着机器学习模型在关键领域越来越多地处理多模态数据,它们面临处理多种模态的双重挑战,这些模态经常因缺失元素而不完整,以及收集样本的时间不规则性和稀疏性。成功利用这种复杂数据,同时克服高质量训练样本的稀缺性,是提高这些模型预测性能的关键。我们引入了``FuseMoE'',这是一个集成创新门控函数的专家混合框架。FuseMoE旨在整合多种模态,并且在处理缺失模态和不规则采样数据轨迹的情况下非常有效。在理论上,我们独特的门控函数有助于提高收敛速度,在多个下游任务中表现更好。FuseMoE的实际实用性通过一系列具有挑战性的临床风险预测任务得到验证。 +</p> +<p> +As machine learning models in critical fields increasingly grapple with multimodal data, they face the dual challenges of handling a wide array of modalities, often incomplete due to missing elements, and the temporal irregularity and sparsity of collected samples. Successfully leveraging this complex data, while overcoming the scarcity of high-quality training samples, is key to improving these models' predictive performance. We introduce ``FuseMoE'', a mixture-of-experts framework incorporated with an innovative gating function. Designed to integrate a diverse number of modalities, FuseMoE is effective in managing scenarios with missing modalities and irregularly sampled data trajectories. Theoretically, our unique gating function contributes to enhanced convergence rates, leading to better performance in multiple downstream tasks. The practical utility of FuseMoE in real world is validated by a challenging set of clinical risk prediction tasks. +</p>TopoX是一个用于在拓扑域上进行机器学习的Python软件包套件,包含了构建、计算和嵌入拓扑域的功能,并提供了一套全面的高阶消息传递功能工具箱。https://arxiv.org/abs/2402.02441<p> +TopoX: 一个用于拓扑域上的机器学习的Python软件包套件 +</p> +<p> +TopoX: A Suite of Python Packages for Machine Learning on Topological Domains +</p> +<p> +https://arxiv.org/abs/2402.02441 +</p> +<p> +TopoX是一个用于在拓扑域上进行机器学习的Python软件包套件,包含了构建、计算和嵌入拓扑域的功能,并提供了一套全面的高阶消息传递功能工具箱。 +</p> +<p> + +</p> +<p> +我们介绍了topox,一个提供可靠且用户友好的Python软件包套件,用于在拓扑域(扩展了图的领域)上进行计算和机器学习:超图、单纯、胞腔、路径和组合复合体。topox由三个软件包组成:toponetx用于构建和计算这些域,包括节点、边和高阶单元的处理;topoembedx提供了将拓扑域嵌入到向量空间的方法,类似于流行的基于图的嵌入算法,如node2vec;topomodelx建立在PyTorch之上,为拓扑域上的神经网络提供了一套全面的高阶消息传递功能工具箱。topox的源代码经过广泛的文档化和单元测试,并在https://github.com/pyt-team以MIT许可证的形式提供。 +</p> +<p> +We introduce topox, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. topox consists of three packages: toponetx facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; topoembedx provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; topomodelx is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of topox is available under MIT license at https://github.com/pyt-team. +</p>本文研究了一类$\ell_0$正则化问题的对偶形式,并提出了一种高效的原对偶算法,通过充分利用对偶范围估计和增量策略,提高了最佳子集选择问题的解决方案的效率和统计性质。https://arxiv.org/abs/2402.02322<p> +动态增量优化用于最佳子集选择 +</p> +<p> +Dynamic Incremental Optimization for Best Subset Selection +</p> +<p> +https://arxiv.org/abs/2402.02322 +</p> +<p> +本文研究了一类$\ell_0$正则化问题的对偶形式,并提出了一种高效的原对偶算法,通过充分利用对偶范围估计和增量策略,提高了最佳子集选择问题的解决方案的效率和统计性质。 +</p> +<p> + +</p> +<p> +最佳子集选择被认为是稀疏学习问题的“黄金标准”。已经提出了各种优化技术来攻击这个非光滑非凸问题。本文研究了一类$\ell_0$正则化问题的对偶形式。基于原始问题和对偶问题的结构,我们提出了一种高效的原对偶算法。通过充分利用对偶范围估计和增量策略,我们的算法潜在地减少了冗余计算并改进了最佳子集选择的解决方案。理论分析和对合成和真实数据集的实验验证了所提出解决方案的效率和统计性质。 +</p> +<p> +Best subset selection is considered the `gold standard' for many sparse learning problems. A variety of optimization techniques have been proposed to attack this non-smooth non-convex problem. In this paper, we investigate the dual forms of a family of $\ell_0$-regularized problems. An efficient primal-dual algorithm is developed based on the primal and dual problem structures. By leveraging the dual range estimation along with the incremental strategy, our algorithm potentially reduces redundant computation and improves the solutions of best subset selection. Theoretical analysis and experiments on synthetic and real-world datasets validate the efficiency and statistical properties of the proposed solutions. +</p>GD-CAF提出了一种新颖的方法,将降水预报作为一个时空图序列预报问题,利用图形双流卷积注意力融合来学习历史降水图并在不同空间位置上预测未来的降水。https://arxiv.org/abs/2401.07958<p> +GD-CAF:用于降水预报的图形双流卷积注意力融合 +</p> +<p> +GD-CAF: Graph Dual-stream Convolutional Attention Fusion for Precipitation Nowcasting +</p> +<p> +https://arxiv.org/abs/2401.07958 +</p> +<p> +GD-CAF提出了一种新颖的方法,将降水预报作为一个时空图序列预报问题,利用图形双流卷积注意力融合来学习历史降水图并在不同空间位置上预测未来的降水。 +</p> +<p> + +</p> +<p> +精确的降水预报对于各种应用至关重要,包括洪水预测、灾害管理、优化农业活动、管理交通路线和可再生能源。本文将降水预报形式化为时空图序列预报问题,提出了一种名为图形双流卷积注意力融合(GD-CAF)的新方法,旨在从历史降水图的时空图中学习,并预测未来不同空间位置的降水。 +</p> +<p> +arXiv:2401.07958v2 Announce Type: replace Abstract: Accurate precipitation nowcasting is essential for various applications, including flood prediction, disaster management, optimizing agricultural activities, managing transportation routes and renewable energy. While several studies have addressed this challenging task from a sequence-to-sequence perspective, most of them have focused on a single area without considering the existing correlation between multiple disjoint regions. In this paper, we formulate precipitation nowcasting as a spatiotemporal graph sequence nowcasting problem. In particular, we introduce Graph Dual-stream Convolutional Attention Fusion (GD-CAF), a novel approach designed to learn from historical spatiotemporal graph of precipitation maps and nowcast future time step ahead precipitation at different spatial locations. GD-CAF consists of spatio-temporal convolutional attention as well as gated fusion modules which are equipped with depthwise-separable convolut +</p>该研究提出了一种PAC-Bayes界限,能够同时控制多个错误,并提供丰富的信息,适用于回归中测试损失分布或分类中不同错误分类的概率。https://arxiv.org/abs/2202.05560<p> +使用PAC-Bayes界限同时控制多个错误 +</p> +<p> +Controlling Multiple Errors Simultaneously with a PAC-Bayes Bound +</p> +<p> +https://arxiv.org/abs/2202.05560 +</p> +<p> +该研究提出了一种PAC-Bayes界限,能够同时控制多个错误,并提供丰富的信息,适用于回归中测试损失分布或分类中不同错误分类的概率。 +</p> +<p> + +</p> +<p> +当前的PAC-Bayes泛化界限仅限于性能的标量度量,如损失或错误率。我们提供了第一个能够提供丰富信息的PAC-Bayes界限,通过界定一组M种错误类型的经验概率与真实概率之间的Kullback-Leibler差异来控制可能结果的整个分布。 +</p> +<p> +arXiv:2202.05560v2 Announce Type: replace-cross Abstract: Current PAC-Bayes generalisation bounds are restricted to scalar metrics of performance, such as the loss or error rate. However, one ideally wants more information-rich certificates that control the entire distribution of possible outcomes, such as the distribution of the test loss in regression, or the probabilities of different mis classifications. We provide the first PAC-Bayes bound capable of providing such rich information by bounding the Kullback-Leibler divergence between the empirical and true probabilities of a set of M error types, which can either be discretized loss values for regression, or the elements of the confusion matrix (or a partition thereof) for classification. We transform our bound into a differentiable training objective. Our bound is especially useful in cases where the severity of different mis-classifications may change over time; existing PAC-Bayes bounds can only bound a particular pre-decided w +</p>对于循环模型中含有隐藏因变量的因果发现,已经出现了能够处理这种情况的多种技术方法。http://arxiv.org/abs/2401.13009<p> +循环模型中含有隐藏因变量的因果发现方法的比较研究 +</p> +<p> +Comparative Study of Causal Discovery Methods for Cyclic Models with Hidden Confounders. (arXiv:2401.13009v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2401.13009 +</p> +<p> +对于循环模型中含有隐藏因变量的因果发现,已经出现了能够处理这种情况的多种技术方法。 +</p> +<p> + +</p> +<p> +如今,对因果发现的需求无处不在。理解系统中部分之间的随机依赖性以及实际的因果关系对科学的各个部分都至关重要。因此,寻找可靠的方法来检测因果方向的需求不断增长。在过去的50年里,出现了许多因果发现算法,但大多数仅适用于系统没有反馈环路并且具有因果充分性的假设,即没有未测量的子系统能够影响多个已测量变量。这是不幸的,因为这些限制在实践中往往不能假定。反馈是许多过程的一个重要特性,现实世界的系统很少是完全隔离和完全测量的。幸运的是,在最近几年中,已经发展了几种能够处理循环的、因果不充分的系统的技术。随着多种方法的出现,一种实际的应用方法开始变得可能。 +</p> +<p> +Nowadays, the need for causal discovery is ubiquitous. A better understanding of not just the stochastic dependencies between parts of a system, but also the actual cause-effect relations, is essential for all parts of science. Thus, the need for reliable methods to detect causal directions is growing constantly. In the last 50 years, many causal discovery algorithms have emerged, but most of them are applicable only under the assumption that the systems have no feedback loops and that they are causally sufficient, i.e. that there are no unmeasured subsystems that can affect multiple measured variables. This is unfortunate since those restrictions can often not be presumed in practice. Feedback is an integral feature of many processes, and real-world systems are rarely completely isolated and fully measured. Fortunately, in recent years, several techniques, that can cope with cyclic, causally insufficient systems, have been developed. And with multiple methods available, a practical ap +</p>这个论文提出了一种新颖的特征选择框架,通过使用特征屏蔽方法来消除特征,而不是从数据集中移除它们。这种方法不需要重新训练机器学习模型,可以综合考虑特征子集的重要性,为通用机器学习模型的特征选择问题提供了一种新的解决方案。http://arxiv.org/abs/2401.12644<p> +二进制特征屏蔽优化用于特征选择 +</p> +<p> +Binary Feature Mask Optimization for Feature Selection. (arXiv:2401.12644v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2401.12644 +</p> +<p> +这个论文提出了一种新颖的特征选择框架,通过使用特征屏蔽方法来消除特征,而不是从数据集中移除它们。这种方法不需要重新训练机器学习模型,可以综合考虑特征子集的重要性,为通用机器学习模型的特征选择问题提供了一种新的解决方案。 +</p> +<p> + +</p> +<p> +我们研究了通用机器学习模型的特征选择问题。我们引入了一种新颖的框架,该框架考虑了模型的预测结果来选择特征。我们的框架通过使用一种新颖的特征屏蔽方法,在特征选择过程中消除特征,而不是从数据集中完全移除它们。这使我们能够在特征选择过程中使用相同的机器学习模型,而不像其他特征选择方法那样需要在每次迭代中重新训练机器学习模型,因为数据集的维度不同。我们使用机器学习模型的预测结果来获取屏蔽操作符,这为模型的预测性能提供了对特征子集的全面观察。特征选择文献中存在各种方法。然而,没有研究引入一个针对通用机器学习模型的无需训练的框架,以整体考虑特征子集的重要性,而不是只关注单个特征的重要性。 +</p> +<p> +We investigate feature selection problem for generic machine learning (ML) models. We introduce a novel framework that selects features considering the predictions of the model. Our framework innovates by using a novel feature masking approach to eliminate the features during the selection process, instead of completely removing them from the dataset. This allows us to use the same ML model during feature selection, unlike other feature selection methods where we need to train the ML model again as the dataset has different dimensions on each iteration. We obtain the mask operator using the predictions of the ML model, which offers a comprehensive view on the subsets of the features essential for the predictive performance of the model. A variety of approaches exist in the feature selection literature. However, no study has introduced a training-free framework for a generic ML model to select features while considering the importance of the feature subsets as a whole, instead of focusi +</p>xTrimoPGLM是一个统一的100亿规模预训练蛋白质语言模型,能够同时处理蛋白质理解和生成任务,通过创新的预训练框架和大规模的参数训练,显著优于其他先进模型,在18个蛋白理解基准测试中取得了成功,并能够实现对蛋白质结构的原子分辨率观察。http://arxiv.org/abs/2401.06199<p> +xTrimoPGLM: 统一的百亿规模预训练蛋白质语言模型,用于解析蛋白质的语言 +</p> +<p> +xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein. (arXiv:2401.06199v1 [q-bio.QM]) +</p> +<p> +http://arxiv.org/abs/2401.06199 +</p> +<p> +xTrimoPGLM是一个统一的100亿规模预训练蛋白质语言模型,能够同时处理蛋白质理解和生成任务,通过创新的预训练框架和大规模的参数训练,显著优于其他先进模型,在18个蛋白理解基准测试中取得了成功,并能够实现对蛋白质结构的原子分辨率观察。 +</p> +<p> + +</p> +<p> +蛋白质语言模型在学习蛋白质序列中的生物信息方面显示出显著的成功。然而,大多数现有模型局限于自编码或自回归的预训练目标,这使得它们在处理蛋白质理解和生成任务时很难同时进行。我们提出了一个统一的蛋白质语言模型,xTrimoPGLM,通过创新的预训练框架同时解决这两类任务。我们的关键技术贡献是探索这两类目标的兼容性和联合优化的潜力,从而导致了一个以前所未有的规模,使用1000亿参数和1万亿训练标记来训练xTrimoPGLM的策略。我们广泛的实验证明,1)xTrimoPGLM在四个类别的18个蛋白理解基准测试中明显优于其他先进基线。该模型还有助于对蛋白质结构进行原子分辨率的观察,从而实现了对蛋白质结构的理解和生成。 +</p> +<p> +Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework. Our key technical contribution is an exploration of the compatibility and the potential for joint optimization of the two types of objectives, which has led to a strategy for training xTrimoPGLM at an unprecedented scale of 100 billion parameters and 1 trillion training tokens. Our extensive experiments reveal that 1) xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. The model also facilitates an atomic-resolution view of protein structures, leading to +</p>本论文提出了一种基于特征变换的新型统计方法,用于解决交通市场利率的预测挑战。该方法具有通用的非线性属性和特征变换核函数,能够高效生成特征,并在预测过程中准确识别季节性和制度转换。http://arxiv.org/abs/2401.04857<p> +使用特征变换进行交通市场利率预测 +</p> +<p> +Transportation Market Rate Forecast Using Signature Transform. (arXiv:2401.04857v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2401.04857 +</p> +<p> +本论文提出了一种基于特征变换的新型统计方法,用于解决交通市场利率的预测挑战。该方法具有通用的非线性属性和特征变换核函数,能够高效生成特征,并在预测过程中准确识别季节性和制度转换。 +</p> +<p> + +</p> +<p> +目前,亚马逊在交通市场利率预测上依赖第三方,尽管这些预测质量差且缺乏可解释性。虽然交通市场利率通常很难准确预测,但我们开发了一种基于特征变换的新型统计技术来解决这些挑战,并构建了一个预测和自适应模型来预测市场利率。这种新技术基于特征变换的两个关键属性。第一个是其通用的非线性,它线性化特征空间,从而将预测问题转化为线性回归分析;第二个是特征变换核函数,它允许在时间序列数据之间进行计算有效的相似性比较。结合起来,这些属性允许进行高效的特征生成,并在预测过程中更精确地识别季节性和制度转换。模型的初步结果显示,这种新方法可以改善市场利率的预测性能。 +</p> +<p> +Currently, Amazon relies on third parties for transportation marketplace rate forecasts, despite the poor quality and lack of interpretability of these forecasts. While transportation marketplace rates are typically very challenging to forecast accurately, we have developed a novel signature-based statistical technique to address these challenges and built a predictive and adaptive model to forecast marketplace rates. This novel technique is based on two key properties of the signature transform. The first is its universal nonlinearity which linearizes the feature space and hence translates the forecasting problem into a linear regression analysis; the second is the signature kernel which allows for comparing computationally efficiently similarities between time series data. Combined, these properties allow for efficient feature generation and more precise identification of seasonality and regime switching in the forecasting process. Preliminary result by the model shows that this new +</p>在生成式人工智能时代的物联网,Generative AI的进展带来了巨大的希望,同时也面临着高资源需求、及时工程、设备端推理、安全等关键挑战。http://arxiv.org/abs/2401.01923<p> +在生成式人工智能时代的物联网: 视野与挑战 +</p> +<p> +IoT in the Era of Generative AI: Vision and Challenges. (arXiv:2401.01923v1 [cs.DC]) +</p> +<p> +http://arxiv.org/abs/2401.01923 +</p> +<p> +在生成式人工智能时代的物联网,Generative AI的进展带来了巨大的希望,同时也面临着高资源需求、及时工程、设备端推理、安全等关键挑战。 +</p> +<p> + +</p> +<p> +带有感知、网络和计算能力的物联网设备,如智能手机、可穿戴设备、智能音箱和家庭机器人,已经无缝地融入到我们的日常生活中。最近生成式人工智能(Generative AI)的进展,如GPT、LLaMA、DALL-E和稳定扩散等,给物联网的发展带来了巨大的希望。本文分享了我们对Generative AI在物联网中带来的好处的看法和愿景,并讨论了Generative AI在物联网相关领域的一些重要应用。充分利用Generative AI在物联网中是一个复杂的挑战。我们确定了一些最关键的挑战,包括Generative AI模型的高资源需求、及时工程、设备端推理、卸载、设备端微调、联邦学习、安全以及开发工具和基准,并讨论了当前存在的差距以及使Generative AI在物联网中实现的有希望的机会。我们希望这篇文章能够激发新的研究和创新。 +</p> +<p> +Equipped with sensing, networking, and computing capabilities, Internet of Things (IoT) such as smartphones, wearables, smart speakers, and household robots have been seamlessly weaved into our daily lives. Recent advancements in Generative AI exemplified by GPT, LLaMA, DALL-E, and Stable Difussion hold immense promise to push IoT to the next level. In this article, we share our vision and views on the benefits that Generative AI brings to IoT, and discuss some of the most important applications of Generative AI in IoT-related domains. Fully harnessing Generative AI in IoT is a complex challenge. We identify some of the most critical challenges including high resource demands of the Generative AI models, prompt engineering, on-device inference, offloading, on-device fine-tuning, federated learning, security, as well as development tools and benchmarks, and discuss current gaps as well as promising opportunities on enabling Generative AI for IoT. We hope this article can inspire new res +</p>本研究提出了一种名为ROAM的方法,通过利用先前学习到的行为来实时调节机器人在部署过程中应对未曾见过的情况。在测试中,ROAM可以在单个阶段内实现快速适应,并且在模拟环境和真实场景中取得了成功,具有较高的效率和适应性。http://arxiv.org/abs/2311.01059<p> +在部署时进行实时调节:用于单机器人部署的行为调控 +</p> +<p> +Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment. (arXiv:2311.01059v1 [cs.RO]) +</p> +<p> +http://arxiv.org/abs/2311.01059 +</p> +<p> +本研究提出了一种名为ROAM的方法,通过利用先前学习到的行为来实时调节机器人在部署过程中应对未曾见过的情况。在测试中,ROAM可以在单个阶段内实现快速适应,并且在模拟环境和真实场景中取得了成功,具有较高的效率和适应性。 +</p> +<p> + +</p> +<p> +为了在现实世界中取得成功,机器人必须应对训练过程中未曾见过的情况。本研究探讨了在部署过程中针对这些新场景的实时调节问题,通过利用先前学习到的多样化行为库。我们的方法,RObust Autonomous Modulation(ROAM),引入了基于预训练行为的感知价值的机制,以在特定情况下选择和调整预训练行为。关键是,这种调节过程在测试时的单个阶段内完成,无需任何人类监督。我们对选择机制进行了理论分析,并证明了ROAM使得机器人能够在模拟环境和真实的四足动物Go1上快速适应动态变化,甚至在脚上套着滚轮滑鞋的情况下成功前进。与现有方法相比,我们的方法在面对各种分布情况的部署时能够以超过2倍的效率进行调节,通过有效选择来实现适应。 +</p> +<p> +To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previously learned behaviors. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors to select and adapt pre-trained behaviors to the situation at hand. Crucially, this adaptation process all happens within a single episode at test time, without any human supervision. We provide theoretical analysis of our selection mechanism and demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet. Our approach adapts over 2x as efficiently compared to existing methods when facing a variety of out-of-distribution situations during deployment by effectively choosing +</p>本论文研究了在通信网络中的信息路由问题,提出了一种新颖的状态增强策略,通过部署图神经网络架构,利用图卷积来最大化源节点的聚合信息,从而有效地将所需信息路由到目标节点。http://arxiv.org/abs/2310.00248<p> +在通信网络中学习增强状态策略进行信息路由 +</p> +<p> +Learning State-Augmented Policies for Information Routing in Communication Networks. (arXiv:2310.00248v2 [cs.NI] UPDATED) +</p> +<p> +http://arxiv.org/abs/2310.00248 +</p> +<p> +本论文研究了在通信网络中的信息路由问题,提出了一种新颖的状态增强策略,通过部署图神经网络架构,利用图卷积来最大化源节点的聚合信息,从而有效地将所需信息路由到目标节点。 +</p> +<p> + +</p> +<p> +本文研究了在大规模通信网络中的信息路由问题,该问题可以被形式化为一个只能访问局部信息的约束统计学习问题。我们提出了一种新颖的状态增强(SA)策略,通过在通信网络的拓扑链路上部署图神经网络(GNN)架构,利用图卷积来最大化源节点的聚合信息。所提出的技术仅利用每个节点上的局部信息,并有效地将所需的信息路由到目标节点。我们利用无监督学习过程将GNN架构的输出转换为最优的信息路由策略。实验中,我们对实时网络拓扑进行评估,以验证我们的算法。数值仿真结果显示出与基线算法相比,所提出的方法在训练GNN参数化方面的性能有所提高。 +</p> +<p> +This paper examines the problem of information routing in a large-scale communication network, which can be formulated as a constrained statistical learning problem having access to only local information. We delineate a novel State Augmentation (SA) strategy to maximize the aggregate information at source nodes using graph neural network (GNN) architectures, by deploying graph convolutions over the topological links of the communication network. The proposed technique leverages only the local information available at each node and efficiently routes desired information to the destination nodes. We leverage an unsupervised learning procedure to convert the output of the GNN architecture to optimal information routing strategies. In the experiments, we perform the evaluation on real-time network topologies to validate our algorithms. Numerical simulations depict the improved performance of the proposed method in training a GNN parameterization as compared to baseline algorithms. +</p>通过综合评估深度学习分类器的性能,发现它们缺乏稳定性和可靠性,并建议采用广泛的数据类型和统一的评估指标进行性能基准测试。http://arxiv.org/abs/2308.04137<p> +深度学习分类器性能的综合评估揭示出惊人的缺乏稳定性 +</p> +<p> +Comprehensive Assessment of the Performance of Deep Learning Classifiers Reveals a Surprising Lack of Robustness. (arXiv:2308.04137v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2308.04137 +</p> +<p> +通过综合评估深度学习分类器的性能,发现它们缺乏稳定性和可靠性,并建议采用广泛的数据类型和统一的评估指标进行性能基准测试。 +</p> +<p> + +</p> +<p> +可靠而稳健的评估方法是开发本身稳健可靠的机器学习模型的必要第一步。然而,目前用于评估分类器的常规评估协议在综合评估性能方面存在不足,因为它们往往依赖于有限类型的测试数据,忽视其他类型的数据。例如,使用标准测试数据无法评估分类器对于未经训练的类别样本的预测。另一方面,使用包含未知类别样本的数据进行测试无法评估分类器对于已知类别标签的预测能力。本文提倡使用各种不同类型的数据进行性能基准测试,并使用一种可应用于所有这些数据类型的单一指标,以产生一致的性能评估结果。通过这样的基准测试发现,目前的深度神经网络,包括使用认为是全面的方法进行训练的网络,也存在缺乏稳定性的问题。 +</p> +<p> +Reliable and robust evaluation methods are a necessary first step towards developing machine learning models that are themselves robust and reliable. Unfortunately, current evaluation protocols typically used to assess classifiers fail to comprehensively evaluate performance as they tend to rely on limited types of test data, and ignore others. For example, using the standard test data fails to evaluate the predictions made by the classifier to samples from classes it was not trained on. On the other hand, testing with data containing samples from unknown classes fails to evaluate how well the classifier can predict the labels for known classes. This article advocates bench-marking performance using a wide range of different types of data and using a single metric that can be applied to all such data types to produce a consistent evaluation of performance. Using such a benchmark it is found that current deep neural networks, including those trained with methods that are believed to pro +</p>这篇论文提出了一种针对值函数算法和梯度算法的攻击方法,利用梯度反转重建状态、动作和监督信号,以解决嵌入式人工智能中的隐私泄露问题。http://arxiv.org/abs/2306.09273<p> +你的房间不是私密的:关于强化学习的梯度反转攻击 +</p> +<p> +Your Room is not Private: Gradient Inversion Attack on Reinforcement Learning. (arXiv:2306.09273v2 [cs.RO] UPDATED) +</p> +<p> +http://arxiv.org/abs/2306.09273 +</p> +<p> +这篇论文提出了一种针对值函数算法和梯度算法的攻击方法,利用梯度反转重建状态、动作和监督信号,以解决嵌入式人工智能中的隐私泄露问题。 +</p> +<p> + +</p> +<p> +嵌入式人工智能的显著发展吸引了人们的极大关注,该技术使得机器人可以在虚拟环境中导航、感知和互动。由于计算机视觉和大型语言模型方面的显著进展,隐私问题在嵌入式人工智能领域变得至关重要,因为机器人可以访问大量个人信息。然而,关于强化学习算法中的隐私泄露问题,尤其是关于值函数算法和梯度算法的问题,在研究中尚未得到充分考虑。本文旨在通过提出一种攻击值函数算法和梯度算法的方法,利用梯度反转重建状态、动作和监督信号,来解决这一问题。选择使用梯度进行攻击是因为常用的联邦学习技术仅利用基于私人用户数据计算的梯度来优化模型,而不存储或传输用户数据。 +</p> +<p> +The prominence of embodied Artificial Intelligence (AI), which empowers robots to navigate, perceive, and engage within virtual environments, has attracted significant attention, owing to the remarkable advancements in computer vision and large language models. Privacy emerges as a pivotal concern within the realm of embodied AI, as the robot accesses substantial personal information. However, the issue of privacy leakage in embodied AI tasks, particularly in relation to reinforcement learning algorithms, has not received adequate consideration in research. This paper aims to address this gap by proposing an attack on the value-based algorithm and the gradient-based algorithm, utilizing gradient inversion to reconstruct states, actions, and supervision signals. The choice of using gradients for the attack is motivated by the fact that commonly employed federated learning techniques solely utilize gradients computed based on private user data to optimize models, without storing or trans +</p>该论文提出了一种基于扩散模型的数据增强方法DiffEEG,可以有效地提高癫痫预测的性能,超过了现有的数据扩增方法。http://arxiv.org/abs/2306.08256<p> +基于生成扩散模型的癫痫预测数据增强方法 +</p> +<p> +Data Augmentation for Seizure Prediction with Generative Diffusion Model. (arXiv:2306.08256v1 [eess.SP]) +</p> +<p> +http://arxiv.org/abs/2306.08256 +</p> +<p> +该论文提出了一种基于扩散模型的数据增强方法DiffEEG,可以有效地提高癫痫预测的性能,超过了现有的数据扩增方法。 +</p> +<p> + +</p> +<p> +目标:癫痫预测对于改善患者生活质量具有重要意义,重点在于区分发作前状态与发作后状态。随着机器学习的发展,癫痫预测方法取得了显著进展。然而,发作前与发作后状态数据之间的严重不平衡仍然是一个巨大的挑战,限制了分类器的性能。数据扩增是解决这个问题的一个直观方法。现有的数据扩增方法通过重叠或重新组合数据来生成样本。由于这些转换无法完全探索特征空间并提供新信息,所以生成的样本分布受到原始数据的限制。由于癫痫脑电图表示在不同发作之间具有差异性,这些生成的样本不能提供足够的多样性以在新的癫痫发作中实现高性能。因此,我们提出了一种使用扩散模型的新型数据增强方法DiffEEG。方法:扩散模型是一种建模数据分布的强大工具,我们使用此模型来对原始脑电图数据进行转换以生成多样性的样本,进而提高分类器的性能。结果:DiffEEG在神经网络和SVM模型上进行的实验表明,它可以有效地提高癫痫预测的性能,超过了现有的数据扩增方法。 +</p> +<p> +Objective: Seizure prediction is of great importance to improve the life of patients. The focal point is to distinguish preictal states from interictal ones. With the development of machine learning, seizure prediction methods have achieved significant progress. However, the severe imbalance problem between preictal and interictal data still poses a great challenge, restricting the performance of classifiers. Data augmentation is an intuitive way to solve this problem. Existing data augmentation methods generate samples by overlapping or recombining data. The distribution of generated samples is limited by original data, because such transformations cannot fully explore the feature space and offer new information. As the epileptic EEG representation varies among seizures, these generated samples cannot provide enough diversity to achieve high performance on a new seizure. As a consequence, we propose a novel data augmentation method with diffusion model called DiffEEG. Methods: Diffusi +</p>本文提出了一种新的深度优化器FAME,使用三重指数移动平均值(TEMA)来估计梯度矩,提供更丰富和准确的数据变化和趋势信息,可以提高计算机视觉等领域中模型的性能表现。http://arxiv.org/abs/2306.01423<p> +利用三重指数移动平均值实现快速自适应矩估计 +</p> +<p> +Leveraging the Triple Exponential Moving Average for Fast-Adaptive Moment Estimation. (arXiv:2306.01423v1 [cs.CV]) +</p> +<p> +http://arxiv.org/abs/2306.01423 +</p> +<p> +本文提出了一种新的深度优化器FAME,使用三重指数移动平均值(TEMA)来估计梯度矩,提供更丰富和准确的数据变化和趋势信息,可以提高计算机视觉等领域中模型的性能表现。 +</p> +<p> + +</p> +<p> +网络优化是深度学习领域中的一个关键步骤,直接影响计算机视觉等多种领域中模型的性能。虽然多种优化器已经被开发出来,但目前的方法在准确快速地识别梯度趋势方面仍然有限,这可能会导致网络性能不佳。本文提出了一种新的深度优化器,称为快速自适应矩估计(FAME),它首次使用三重指数移动平均值(TEMA)来估计梯度矩。将TEMA纳入优化过程中,可以提供更丰富和准确的数据变化和趋势信息,与目前所有主要自适应优化方法中使用的标准指数移动平均值相比。我们提出的FAME优化器已经在广泛的基准测试中得到了验证,包括CIFAR-10,CIFAR-100,PASCAL-VOC,MS-COCO和Cityscapes。 +</p> +<p> +Network optimization is a crucial step in the field of deep learning, as it directly affects the performance of models in various domains such as computer vision. Despite the numerous optimizers that have been developed over the years, the current methods are still limited in their ability to accurately and quickly identify gradient trends, which can lead to sub-optimal network performance. In this paper, we propose a novel deep optimizer called Fast-Adaptive Moment Estimation (FAME), which for the first time estimates gradient moments using a Triple Exponential Moving Average (TEMA). Incorporating TEMA into the optimization process provides richer and more accurate information on data changes and trends, as compared to the standard Exponential Moving Average used in essentially all current leading adaptive optimization methods. Our proposed FAME optimizer has been extensively validated through a wide range of benchmarks, including CIFAR-10, CIFAR-100, PASCAL-VOC, MS-COCO, and Cityscap +</p>本文通过提出一种名为“共单调独立分类器”(CIBer)的新技术,专注于特征的最优分区,旨在克服朴素贝叶斯方法带来的挑战,并且证明该技术在不同数据集上具有更高的准确率和更低的错误率。http://arxiv.org/abs/2304.14537<p> +基于贝叶斯分类器的特征最优分区研究 +</p> +<p> +Optimal partition of feature using Bayesian classifier. (arXiv:2304.14537v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2304.14537 +</p> +<p> +本文通过提出一种名为“共单调独立分类器”(CIBer)的新技术,专注于特征的最优分区,旨在克服朴素贝叶斯方法带来的挑战,并且证明该技术在不同数据集上具有更高的准确率和更低的错误率。 +</p> +<p> + +</p> +<p> +朴素贝叶斯分类器是一种应用贝叶斯原理的流行分类方法,尽管输入变量之间的条件依赖关系听起来很好,但实际上会导致大多数投票风格的行为。朴素贝叶斯算法中的某些特征被称为独立特征,因为在预测分类时它们没有条件相关性或依赖性。本文通过提出一种名为“共单调独立分类器”(CIBer)的新技术,专注于特征的最优分区,旨在克服朴素贝叶斯方法带来的挑战。在不同的数据集上,我们明确证明了我们的技术的有效性,在错误率更低、准确率更高或相当的情况下,与随机森林和XGBoost等模型相比。 +</p> +<p> +The Naive Bayesian classifier is a popular classification method employing the Bayesian paradigm. The concept of having conditional dependence among input variables sounds good in theory but can lead to a majority vote style behaviour. Achieving conditional independence is often difficult, and they introduce decision biases in the estimates. In Naive Bayes, certain features are called independent features as they have no conditional correlation or dependency when predicting a classification. In this paper, we focus on the optimal partition of features by proposing a novel technique called the Comonotone-Independence Classifier (CIBer) which is able to overcome the challenges posed by the Naive Bayes method. For different datasets, we clearly demonstrate the efficacy of our technique, where we achieve lower error rates and higher or equivalent accuracy compared to models such as Random Forests and XGBoost. +</p>本研究旨在提高程序生成环境中强化学习的样本效率。研究证明,使用模仿学习进行预训练和同时进行模仿学习和在线强化学习的方法可以提高效率。http://arxiv.org/abs/2304.09825<p> +利用离线数据加速程序生成环境中的强化学习 +</p> +<p> +Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments. (arXiv:2304.09825v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2304.09825 +</p> +<p> +本研究旨在提高程序生成环境中强化学习的样本效率。研究证明,使用模仿学习进行预训练和同时进行模仿学习和在线强化学习的方法可以提高效率。 +</p> +<p> + +</p> +<p> +强化学习面临的主要挑战之一是代理能够将其学习策略推广到未见过的环境中。此外,训练强化学习代理需要与环境进行大量交互。受离线强化学习和模仿学习的最近成功启发,我们进行了一项研究,以调查代理是否可以利用轨迹的离线数据来提高程序生成环境中的样本效率。我们考虑了两种使用离线数据的模仿学习方法:(1)在在线强化学习训练之前预训练策略和(2)同时训练在线强化学习和来自离线数据的模仿学习。我们分析了可用的离线轨迹的质量(轨迹的最佳性)和多样性(轨迹数量和覆盖级别)对两种方法有效性的影响。在MiniGrid环境中的四个知名稀疏奖励任务中,我们发现使用模仿学习进行预训练和同时进行模仿学习和在线强化学习的方法可以提供更高的样本效率。 +</p> +<p> +One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently d +</p>本文提出了一种原型引导知识蒸馏(PGKD)方法,它不需要图形边缘,但可以在不考虑边缘的情况下学习结构感知的MLP。http://arxiv.org/abs/2303.13763<p> +无需边缘但具有结构感知性:从GNN到MLP的原型引导知识蒸馏。 +</p> +<p> +Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs. (arXiv:2303.13763v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2303.13763 +</p> +<p> +本文提出了一种原型引导知识蒸馏(PGKD)方法,它不需要图形边缘,但可以在不考虑边缘的情况下学习结构感知的MLP。 +</p> +<p> + +</p> +<p> +将高精度的图神经网络(GNN)在图任务中压缩成低延迟的多层感知器(MLP)已成为热门研究课题。以前的方法会将图的边缘处理成额外的输入给MLP,但这样的图结构对于各种场景可能无法获得。因此,我们提出了一种原型引导知识蒸馏(PGKD)方法,它不需要图形边缘,但可以在不考虑边缘的情况下学习结构感知的MLP。具体而言,我们分析了GNN教师中的图形结构信息,并通过原型在无边缘设置中从GNN到MLP进行了知识蒸馏。在流行的图形基准实验中的实验结果表明了所提出的PGKD方法的有效性和鲁棒性。 +</p> +<p> +Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely exclusively on the node features and fail to capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose a Prototype-Guided Knowledge Distillation~(PGKD) method, which does not require graph edges~(edge-free) yet learns structure-aware MLPs. Specifically, we analyze the graph structural information in GNN teachers, and distill such information from GNNs to MLPs via prototypes in an edge-free setting. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD. +</p>本文提出了一种新的对抗性攻击,该攻击是广义了DeepFool攻击,既有效又计算效率高,适用于评估大型深度神经网络的鲁棒性。http://arxiv.org/abs/2303.12481<p> +重新审视DeepFool:泛化和改进 +</p> +<p> +Revisiting DeepFool: generalization and improvement. (arXiv:2303.12481v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2303.12481 +</p> +<p> +本文提出了一种新的对抗性攻击,该攻击是广义了DeepFool攻击,既有效又计算效率高,适用于评估大型深度神经网络的鲁棒性。 +</p> +<p> + +</p> +<p> +深度神经网络被已知容易受到对抗样本的攻击,这些输入稍加修改便会导致网络做出错误的预测。这导致了大量研究,以评估这些网络对此类扰动的鲁棒性度量。最小l2对抗扰动的鲁棒性,是一种特别重要的鲁棒性度量。然而,现有的用于评估此类鲁棒性度量的方法,要么计算成本高,要么不太准确。在本文中,我们引入了一种新的对抗性攻击方法,它在效果和计算效率之间保持平衡。我们提出的攻击是广义了深度欺骗(DeepFool)攻击,但它们仍然易于理解和实现。我们展示了我们的攻击在效果和计算效率方面均优于现有方法。我们提出的攻击也适用于评估大型深度神经网络的鲁棒性。 +</p> +<p> +Deep neural networks have been known to be vulnerable to adversarial examples, which are inputs that are modified slightly to fool the network into making incorrect predictions. This has led to a significant amount of research on evaluating the robustness of these networks against such perturbations. One particularly important robustness metric is the robustness to minimal l2 adversarial perturbations. However, existing methods for evaluating this robustness metric are either computationally expensive or not very accurate. In this paper, we introduce a new family of adversarial attacks that strike a balance between effectiveness and computational efficiency. Our proposed attacks are generalizations of the well-known DeepFool (DF) attack, while they remain simple to understand and implement. We demonstrate that our attacks outperform existing methods in terms of both effectiveness and computational efficiency. Our proposed attacks are also suitable for evaluating the robustness of large +</p>本文提出一种利用语言控制扩散模型的分层规划器,有效而高效地扩展扩散模型,解决长时间跨度自然语言指令下的控制问题,实现了较高的单任务和多任务成功率,并极大地提高计算效率。http://arxiv.org/abs/2210.15629<p> +语言控制扩散:通过空间、时间和任务高效扩展 +</p> +<p> +Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks. (arXiv:2210.15629v2 [cs.LG] UPDATED) +</p> +<p> +http://arxiv.org/abs/2210.15629 +</p> +<p> +本文提出一种利用语言控制扩散模型的分层规划器,有效而高效地扩展扩散模型,解决长时间跨度自然语言指令下的控制问题,实现了较高的单任务和多任务成功率,并极大地提高计算效率。 +</p> +<p> + +</p> +<p> +训练通用型智能体在各个方面都很困难,需要处理高维输入(空间)、长时间跨度(时间)和多个新任务。最近的结构方面的进展使得我们可以沿着其中一个或两个维度提高扩展性能力,但计算成本仍然很高。本文提出使用语言控制扩散模型作为一种基于自然语言条件的分层规划器(LCD)来应对这三个方面。我们有效而高效地扩展扩散模型,以应对时间、状态和任务空间维度的长时间跨度控制问题。我们在CALVIN语言机器人基准测试中将LCD与其他最先进的模型进行比较,发现LCD在多任务成功率方面优于其他最先进的方法,而单任务成功率(SR)为88.7%,远高于以前的最佳成绩82.6%,大大提高了计算效率。 +</p> +<p> +Training generalist agents is difficult across several axes, requiring us to deal with high-dimensional inputs (space), long horizons (time), and multiple and new tasks. Recent advances with architectures have allowed for improved scaling along one or two of these dimensions, but are still prohibitive computationally. In this paper, we propose to address all three axes by leveraging Language to Control Diffusion models as a hierarchical planner conditioned on language (LCD). We effectively and efficiently scale diffusion models for planning in extended temporal, state, and task dimensions to tackle long horizon control problems conditioned on natural language instructions. We compare LCD with other state-of-the-art models on the CALVIN language robotics benchmark and find that LCD outperforms other SOTA methods in multi task success rates while dramatically improving computational efficiency with a single task success rate (SR) of 88.7% against the previous best of 82.6%. We show that </p> \ No newline at end of file diff --git a/econ.md b/econ.md index ae96e25c4..883bbac4e 100644 --- a/econ.md +++ b/econ.md @@ -2,52 +2,112 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [Quantile Granger Causality in the Presence of Instability](https://arxiv.org/abs/2402.09744) | 我们提出了一种在不稳定环境中评估分位数格兰杰因果关系的新框架,该框架具有一致性、非平凡的功效和某些重要特殊情况下的中心性。蒙特卡洛模拟显示,所提出的检验统计量具有正确的经验大小和较高的功效,即使在没有结构性变化的情况下也是如此。两个实证应用进一步证明了我们方法的适用性。 | -| [^2] | [Flexible Covariate Adjustments in Regression Discontinuity Designs.](http://arxiv.org/abs/2107.07942) | 本研究提出了一种在回归不连续设计中更有效地利用协变量信息的估计器类,可以容纳大量的离散或连续协变量,并经由机器学习、非参数回归或经典参数方法来估计。通过对结果变量适当修改,这种估计器易于实现,可以选择类似于传统RD分析的调整参数。 | -| [^3] | [Filtered and Unfiltered Treatment Effects with Targeting Instruments.](http://arxiv.org/abs/2007.10432) | 本文研究如何使用有目标工具来控制多值处理中的选择偏差,并建立了组合编译器群体的条件来确定反事实平均值和处理效果。 | +| [^1] | [Unified Merger List in the Container Shipping Industry from 1966: A Structural Estimation of the Transition of Importance of a Firm's Age, Tonnage Capacity, and Geographical Proximity on Merger Decision.](http://arxiv.org/abs/2310.09938) | 本研究构建了集装箱航运行业1966年至2022年的统一合并清单,并通过结构匹配模型研究了公司的年龄、规模和地理邻近性对合并决策的重要性的转变。研究发现,公司规模在1991年至2005年期间作为正向因素对合并激励更加重要,但在2006年至2022年期间作为负向因素起到抑制作用。同时,公司之间的地理距离对合并决策也产生影响。 | +| [^2] | [Estimating Effects of Long-Term Treatments.](http://arxiv.org/abs/2308.08152) | 本论文介绍了一个纵向替代框架,用于准确估计长期治疗的效果。论文通过分解长期治疗效果为一系列函数,考虑用户属性、短期中间指标和治疗分配等因素。 | +| [^3] | [One-step nonparametric instrumental regression using smoothing splines.](http://arxiv.org/abs/2307.14867) | 这个论文提出了一种一步非参数的方法,使用平滑样条来处理内生性和仪器变量,同时解决了单调性限制的问题,并在估计Engel曲线时表现出良好性能。 | +| [^4] | [Statistical Tests for Replacing Human Decision Makers with Algorithms.](http://arxiv.org/abs/2306.11689) | 本文提出了一种利用人工智能改善人类决策的统计框架,通过基准测试与机器预测,替换部分人类决策者的决策制定,并经过实验检验得出算法具有更高的真阳性率和更低的假阳性率,尤其是来自农村地区的医生的诊断更容易被替代。 | +| [^5] | [Characterizing the Feasible Payoff Set of OLG Repeated Games.](http://arxiv.org/abs/2303.12988) | 本研究完整描述了OLG重复博弈的收益集合,发现随着玩家贴现因子接近1,可行收益集合变得更小。 | +| [^6] | [Efficient Public Good Provision in a Multipolar World.](http://arxiv.org/abs/2303.10514) | 本研究建立了一个具有组、位置不确定性和观察学习的公共物品博弈模型,展示了在固定的时间视野内,自私的玩家之间可以实现全面合作,并发现了各组内的同时高效提供是一个均衡策略。 | +| [^7] | [Policy Choice in Time Series by Empirical Welfare Maximization.](http://arxiv.org/abs/2205.03970) | 本文提出了一种新方法以在一定时间段内进行政策选择,通过经验福利最大化方法来估计这一政策规则,旨在达到更好的条件福利。我们表征了此方法可达到最优策略选择的条件,并用模拟研究和实证应用来证明其可行性,在宏观经济领域应用上也能够起到积极的作用。 | # 详细 -[^1]: 在不稳定环境中评估分位数格兰杰因果关系的新框架 +[^1]: 《1966年以来集装箱航运行业的统一合并清单:一家公司的年龄、吨位容量和地理邻近性对合并决策的重要性的结构估计》的翻译题目 - Quantile Granger Causality in the Presence of Instability + Unified Merger List in the Container Shipping Industry from 1966: A Structural Estimation of the Transition of Importance of a Firm's Age, Tonnage Capacity, and Geographical Proximity on Merger Decision. (arXiv:2310.09938v1 [econ.GN]) - [https://arxiv.org/abs/2402.09744](https://arxiv.org/abs/2402.09744) + [http://arxiv.org/abs/2310.09938](http://arxiv.org/abs/2310.09938) - 我们提出了一种在不稳定环境中评估分位数格兰杰因果关系的新框架,该框架具有一致性、非平凡的功效和某些重要特殊情况下的中心性。蒙特卡洛模拟显示,所提出的检验统计量具有正确的经验大小和较高的功效,即使在没有结构性变化的情况下也是如此。两个实证应用进一步证明了我们方法的适用性。 + 本研究构建了集装箱航运行业1966年至2022年的统一合并清单,并通过结构匹配模型研究了公司的年龄、规模和地理邻近性对合并决策的重要性的转变。研究发现,公司规模在1991年至2005年期间作为正向因素对合并激励更加重要,但在2006年至2022年期间作为负向因素起到抑制作用。同时,公司之间的地理距离对合并决策也产生影响。 - 我们提出了一种在不稳定环境中评估分位数格兰杰因果关系的新框架,该框架可以针对固定分位数或连续分位数水平进行评估。我们提出的检验统计量在固定备择假设下是一致的,对于局部备择假设具有非平凡的功效,并且在某些重要特殊情况下是中心的。此外,我们还展示了当渐近分布依赖于干扰参数时一种自举过程的有效性。蒙特卡洛模拟显示,所提出的检验统计量具有正确的经验大小和较高的功效,即使在没有结构性变化的情况下也是如此。最后,两个能源经济学和宏观经济学的实证应用突出了我们方法的适用性,因为新的检验提供了更强的格兰杰因果关系证据。 + 我们构建了一个新颖的全球集装箱航运行业1966年至2022年之间的统一合并清单。将该清单与专有数据结合,我们构建了一个结构匹配模型,描述了公司的年龄、规模和地理邻近性在合并决策中的历史性转变。我们发现,在1991年至2005年期间,作为正面因素,一家公司的规模在合并激励中比公司的年龄重要9.974倍。然而,在2006年至2022年期间,作为负面因素,一家公司的规模在合并激励中比公司的年龄重要0.026-0.630倍,即公司的规模起到了抑制作用。我们还发现,买方公司和卖方公司之间的距离在整个期间都起到了抑制作用,但在近年来的经济重要性已减弱到微不足道的程度。在反事实模拟中,我们观察到同一国家的公司之间合并的禁止会影响合并配置。 - arXiv:2402.09744v1 Announce Type: new Abstract: We propose a new framework for assessing Granger causality in quantiles in unstable environments, for a fixed quantile or over a continuum of quantile levels. Our proposed test statistics are consistent against fixed alternatives, they have nontrivial power against local alternatives, and they are pivotal in certain important special cases. In addition, we show the validity of a bootstrap procedure when asymptotic distributions depend on nuisance parameters. Monte Carlo simulations reveal that the proposed test statistics have correct empirical size and high power, even in absence of structural breaks. Finally, two empirical applications in energy economics and macroeconomics highlight the applicability of our method as the new tests provide stronger evidence of Granger causality. + We construct a novel unified merger list in the global container shipping industry between 1966 (the beginning of the industry) and 2022. Combining the list with proprietary data, we construct a structural matching model to describe the historical transition of the importance of a firm's age, size, and geographical proximity on merger decisions. We find that, as a positive factor, a firm's size is more important than a firm's age by 9.974 times as a merger incentive between 1991 and 2005. However, between 2006 and 2022, as a negative factor, a firm's size is more important than a firm's age by 0.026-0.630 times, that is, a firm's size works as a disincentive. We also find that the distance between buyer and seller firms works as a disincentive for the whole period, but the importance has dwindled to economic insignificance in recent years. In counterfactual simulations, we observe that the prohibition of mergers between firms in the same country would affect the merger configuration of -[^2]: 回归不连续设计中的灵活协变量调整 +[^2]: 估计长期治疗效果 - Flexible Covariate Adjustments in Regression Discontinuity Designs. (arXiv:2107.07942v2 [econ.EM] UPDATED) + Estimating Effects of Long-Term Treatments. (arXiv:2308.08152v1 [econ.EM]) - [http://arxiv.org/abs/2107.07942](http://arxiv.org/abs/2107.07942) + [http://arxiv.org/abs/2308.08152](http://arxiv.org/abs/2308.08152) - 本研究提出了一种在回归不连续设计中更有效地利用协变量信息的估计器类,可以容纳大量的离散或连续协变量,并经由机器学习、非参数回归或经典参数方法来估计。通过对结果变量适当修改,这种估计器易于实现,可以选择类似于传统RD分析的调整参数。 + 本论文介绍了一个纵向替代框架,用于准确估计长期治疗的效果。论文通过分解长期治疗效果为一系列函数,考虑用户属性、短期中间指标和治疗分配等因素。 - 实证回归不连续(RD)研究通常使用协变量来增加其估计结果的精度。本文提出了一种新颖的估计器类,比目前实践中广泛使用的线性调整估计器更有效地利用这些协变量信息。我们的方法可以容纳可能大量的离散或连续协变量。它涉及使用适当修改了的结果变量运行标准RD分析,该变量的形式为原始结果与协变量函数的差异。我们表征了导致渐近方差最小的估计器的函数,并展示了如何通过现代机器学习、非参数回归或经典参数方法来估计它。由此产生的估计器易于实现,因为可以选择类似于传统RD分析的调整参数。广泛的模拟研究说明了我们的方法在有限样本中的性能,另外,一个案例研究突出了它的实证相关性。 + 在A/B测试中,估计长期治疗的效果是一个巨大的挑战。这种治疗措施包括产品功能的更新、用户界面设计和推荐算法等,旨在在其发布后长期存在系统中。然而,由于长期试验的限制,从业者通常依赖短期实验结果来做产品发布决策。如何使用短期实验数据准确估计长期治疗效果仍然是一个未解决的问题。为了解决这个问题,我们引入了一个纵向替代框架。我们展示了,在标准假设下,长期治疗效果可以分解为一系列函数,这些函数依赖于用户属性、短期中间指标和治疗分配。我们描述了识别假设、估计策略和推理技术。 - Empirical regression discontinuity (RD) studies often use covariates to increase the precision of their estimates. In this paper, we propose a novel class of estimators that use such covariate information more efficiently than the linear adjustment estimators that are currently used widely in practice. Our approach can accommodate a possibly large number of either discrete or continuous covariates. It involves running a standard RD analysis with an appropriately modified outcome variable, which takes the form of the difference between the original outcome and a function of the covariates. We characterize the function that leads to the estimator with the smallest asymptotic variance, and show how it can be estimated via modern machine learning, nonparametric regression, or classical parametric methods. The resulting estimator is easy to implement, as tuning parameters can be chosen as in a conventional RD analysis. An extensive simulation study illustrates the performance of our approac + Estimating the effects of long-term treatments in A/B testing presents a significant challenge. Such treatments -- including updates to product functions, user interface designs, and recommendation algorithms -- are intended to remain in the system for a long period after their launches. On the other hand, given the constraints of conducting long-term experiments, practitioners often rely on short-term experimental results to make product launch decisions. It remains an open question how to accurately estimate the effects of long-term treatments using short-term experimental data. To address this question, we introduce a longitudinal surrogate framework. We show that, under standard assumptions, the effects of long-term treatments can be decomposed into a series of functions, which depend on the user attributes, the short-term intermediate metrics, and the treatment assignments. We describe the identification assumptions, the estimation strategies, and the inference technique under thi -[^3]: 有目标工具的过滤与未过滤处理效果 +[^3]: 一步非参数仪器回归使用平滑样条 - Filtered and Unfiltered Treatment Effects with Targeting Instruments. (arXiv:2007.10432v3 [econ.EM] UPDATED) + One-step nonparametric instrumental regression using smoothing splines. (arXiv:2307.14867v1 [econ.EM]) - [http://arxiv.org/abs/2007.10432](http://arxiv.org/abs/2007.10432) + [http://arxiv.org/abs/2307.14867](http://arxiv.org/abs/2307.14867) - 本文研究如何使用有目标工具来控制多值处理中的选择偏差,并建立了组合编译器群体的条件来确定反事实平均值和处理效果。 + 这个论文提出了一种一步非参数的方法,使用平滑样条来处理内生性和仪器变量,同时解决了单调性限制的问题,并在估计Engel曲线时表现出良好性能。 - 在应用中,多值处理是很常见的。我们探讨了在这种情况下使用离散工具来控制选择偏差的方法。我们强调了有关定位(工具定位于哪些处理)和过滤(限制分析师对给定观测的处理分配的知识)的假设作用。这允许我们建立条件,使得针对组合编译器群体,可以确定反事实平均值和处理效果。我们通过将其应用于Head Start Impact Study和Student Achievement and Retention Project的数据来说明我们框架的实用性。 + 我们将非参数回归平滑样条扩展到一种情境,即存在内生性和使用仪器变量。与流行的现有估计方法不同,结果估计器是一步的,并依赖于唯一的正则化参数。我们导出了估计器及其一阶导数的均匀收敛速率。我们还解决了在估计中施加单调性的问题。模拟结果证实了我们的估计器与两步程序相比的良好性能。当用于估计Engel曲线时,我们的方法产生了经济上有意义的结果。 - Multivalued treatments are commonplace in applications. We explore the use of discrete-valued instruments to control for selection bias in this setting. Our discussion stresses the role of assumptions on targeting (which instruments target which treatments) and filtering (limits on the analyst's knowledge of the treatment assigned to a given observation). It allows us to establish conditions under which counterfactual averages and treatment effects are identified for composite complier groups. We illustrate the usefulness of our framework by applying it to data from the Head Start Impact Study and the Student Achievement and Retention Project. + We extend nonparametric regression smoothing splines to a context where there is endogeneity and instrumental variables are available. Unlike popular existing estimators, the resulting estimator is one-step and relies on a unique regularization parameter. We derive uniform rates of the convergence for the estimator and its first derivative. We also address the issue of imposing monotonicity in estimation. Simulations confirm the good performances of our estimator compared to two-step procedures. Our method yields economically sensible results when used to estimate Engel curves. + +[^4]: 统计测试替代人类决策者的算法 + + Statistical Tests for Replacing Human Decision Makers with Algorithms. (arXiv:2306.11689v1 [econ.EM]) + + [http://arxiv.org/abs/2306.11689](http://arxiv.org/abs/2306.11689) + + 本文提出了一种利用人工智能改善人类决策的统计框架,通过基准测试与机器预测,替换部分人类决策者的决策制定,并经过实验检验得出算法具有更高的真阳性率和更低的假阳性率,尤其是来自农村地区的医生的诊断更容易被替代。 + + + + 本文提出了一个统计框架,可以通过人工智能来改善人类的决策。首先将每个人类决策者的表现与机器预测进行基准测试;然后用所提出的人工智能算法的建议替换决策制定者的一个子集所做出的决策。利用全国大型孕产结果和繁殖年龄夫妇孕前检查的医生诊断数据集,我们试验了一种启发式高频率方法以及一种贝叶斯后验损失函数方法,并将其应用于异常出生检测。我们发现,我们的算法在一个测试数据集上的结果比仅由医生诊断的结果具有更高的总体真阳性率和更低的假阳性率。我们还发现,来自农村地区的医生的诊断更容易被替代,这表明人工智能辅助决策制定更容易提高精确度。 + + This paper proposes a statistical framework with which artificial intelligence can improve human decision making. The performance of each human decision maker is first benchmarked against machine predictions; we then replace the decisions made by a subset of the decision makers with the recommendation from the proposed artificial intelligence algorithm. Using a large nationwide dataset of pregnancy outcomes and doctor diagnoses from prepregnancy checkups of reproductive age couples, we experimented with both a heuristic frequentist approach and a Bayesian posterior loss function approach with an application to abnormal birth detection. We find that our algorithm on a test dataset results in a higher overall true positive rate and a lower false positive rate than the diagnoses made by doctors only. We also find that the diagnoses of doctors from rural areas are more frequently replaceable, suggesting that artificial intelligence assisted decision making tends to improve precision more i + +[^5]: 描述OLG重复博弈可行收益集合的特征 + + Characterizing the Feasible Payoff Set of OLG Repeated Games. (arXiv:2303.12988v1 [econ.TH]) + + [http://arxiv.org/abs/2303.12988](http://arxiv.org/abs/2303.12988) + + 本研究完整描述了OLG重复博弈的收益集合,发现随着玩家贴现因子接近1,可行收益集合变得更小。 + + + + 本文研究了OLG重复博弈的可行收益集合,并给出了完整的特征描述。本文还首次提供了关于由玩家贴现因子和交互长度引起的可行收益集合的新比较静态。而令人惊讶的是,随着玩家贴现因子接近1,可行收益集合变得更小了。 + + We study the set of feasible payoffs of OLG repeated games. We first provide a complete characterization of the feasible payoffs. Second, we provide a novel comparative statics of the feasible payoff set with respect to players' discount factor and the length of interaction. Perhaps surprisingly, the feasible payoff set becomes smaller as the players' discount factor approaches to one. + +[^6]: 在多极世界中高效提供公共物品 + + Efficient Public Good Provision in a Multipolar World. (arXiv:2303.10514v1 [econ.TH]) + + [http://arxiv.org/abs/2303.10514](http://arxiv.org/abs/2303.10514) + + 本研究建立了一个具有组、位置不确定性和观察学习的公共物品博弈模型,展示了在固定的时间视野内,自私的玩家之间可以实现全面合作,并发现了各组内的同时高效提供是一个均衡策略。 + + + + 我们建立了一个具有组、位置不确定性和观察学习的公共物品博弈模型。各组内的贡献是同时进行的,而各组的游戏则是基于过去贡献不完整样本的观察依次进行的。我们展示了在固定的时间视野内,即使是自私的玩家之间也可以实现全面合作。位置不确定性意味着存在一个均衡,即玩家组有条件地合作以期影响后续组玩家的行动。条件合作意味着每个组成员都是关键的,因此各组内的同时高效提供是一个均衡策略。 + + We model a public goods game with groups, position uncertainty, and observational learning. Contributions are simultaneous within groups, but groups play sequentially based on their observation of an incomplete sample of past contributions. We show that full cooperation between and within groups is possible with self-interested players on a fixed horizon. Position uncertainty implies the existence of an equilibrium where groups of players conditionally cooperate in the hope of influencing further groups. Conditional cooperation implies that each group member is pivotal, so that efficient simultaneous provision within groups is an equilibrium. + +[^7]: 基于经验福利最大化的时间序列政策选择方法研究 + + Policy Choice in Time Series by Empirical Welfare Maximization. (arXiv:2205.03970v3 [econ.EM] UPDATED) + + [http://arxiv.org/abs/2205.03970](http://arxiv.org/abs/2205.03970) + + 本文提出了一种新方法以在一定时间段内进行政策选择,通过经验福利最大化方法来估计这一政策规则,旨在达到更好的条件福利。我们表征了此方法可达到最优策略选择的条件,并用模拟研究和实证应用来证明其可行性,在宏观经济领域应用上也能够起到积极的作用。 + + + + 本文提出了一种在多元时间序列条件下进行政策选择的新方法。建立在统计治疗选择框架的基础上,我们提出了时间序列经验福利最大化(T-EWM)方法,通过最大化使用非参数潜在结果时间序列构造的经验福利准则来估计当前或多个时期的最优政策规则。我们表征了T-EWM在什么条件下能够一致地学习到在时间序列历史给定条件福利下最优的政策选择。然后推导出了条件福利遗憾及其极小下界的非渐进上限。为了说明T-EWM的实现和应用,我们进行了模拟研究,并将该方法应用于从宏观经济时间序列数据中估计最优货币政策规则。 + + This paper develops a novel method for policy choice in a dynamic setting where the available data is a multivariate time series. Building on the statistical treatment choice framework, we propose Time-series Empirical Welfare Maximization (T-EWM) methods to estimate an optimal policy rule for the current period or over multiple periods by maximizing an empirical welfare criterion constructed using nonparametric potential outcome time-series. We characterize conditions under which T-EWM consistently learns a policy choice that is optimal in terms of conditional welfare given the time-series history. We then derive a nonasymptotic upper bound for conditional welfare regret and its minimax lower bound. To illustrate the implementation and uses of T-EWM, we perform simulation studies and apply the method to estimate optimal monetary policy rules from macroeconomic time-series data. diff --git a/econ.xml b/econ.xml index b5f3dd6d5..ce5a7f540 100644 --- a/econ.xml +++ b/econ.xml @@ -1,61 +1,141 @@ -Chat Arxiv econhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for econ我们提出了一种在不稳定环境中评估分位数格兰杰因果关系的新框架,该框架具有一致性、非平凡的功效和某些重要特殊情况下的中心性。蒙特卡洛模拟显示,所提出的检验统计量具有正确的经验大小和较高的功效,即使在没有结构性变化的情况下也是如此。两个实证应用进一步证明了我们方法的适用性。https://arxiv.org/abs/2402.09744<p> -在不稳定环境中评估分位数格兰杰因果关系的新框架 +Chat Arxiv econhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for econ本研究构建了集装箱航运行业1966年至2022年的统一合并清单,并通过结构匹配模型研究了公司的年龄、规模和地理邻近性对合并决策的重要性的转变。研究发现,公司规模在1991年至2005年期间作为正向因素对合并激励更加重要,但在2006年至2022年期间作为负向因素起到抑制作用。同时,公司之间的地理距离对合并决策也产生影响。http://arxiv.org/abs/2310.09938<p> +《1966年以来集装箱航运行业的统一合并清单:一家公司的年龄、吨位容量和地理邻近性对合并决策的重要性的结构估计》的翻译题目 </p> <p> -Quantile Granger Causality in the Presence of Instability +Unified Merger List in the Container Shipping Industry from 1966: A Structural Estimation of the Transition of Importance of a Firm's Age, Tonnage Capacity, and Geographical Proximity on Merger Decision. (arXiv:2310.09938v1 [econ.GN]) </p> <p> -https://arxiv.org/abs/2402.09744 +http://arxiv.org/abs/2310.09938 </p> <p> -我们提出了一种在不稳定环境中评估分位数格兰杰因果关系的新框架,该框架具有一致性、非平凡的功效和某些重要特殊情况下的中心性。蒙特卡洛模拟显示,所提出的检验统计量具有正确的经验大小和较高的功效,即使在没有结构性变化的情况下也是如此。两个实证应用进一步证明了我们方法的适用性。 +本研究构建了集装箱航运行业1966年至2022年的统一合并清单,并通过结构匹配模型研究了公司的年龄、规模和地理邻近性对合并决策的重要性的转变。研究发现,公司规模在1991年至2005年期间作为正向因素对合并激励更加重要,但在2006年至2022年期间作为负向因素起到抑制作用。同时,公司之间的地理距离对合并决策也产生影响。 </p> <p> </p> <p> -我们提出了一种在不稳定环境中评估分位数格兰杰因果关系的新框架,该框架可以针对固定分位数或连续分位数水平进行评估。我们提出的检验统计量在固定备择假设下是一致的,对于局部备择假设具有非平凡的功效,并且在某些重要特殊情况下是中心的。此外,我们还展示了当渐近分布依赖于干扰参数时一种自举过程的有效性。蒙特卡洛模拟显示,所提出的检验统计量具有正确的经验大小和较高的功效,即使在没有结构性变化的情况下也是如此。最后,两个能源经济学和宏观经济学的实证应用突出了我们方法的适用性,因为新的检验提供了更强的格兰杰因果关系证据。 +我们构建了一个新颖的全球集装箱航运行业1966年至2022年之间的统一合并清单。将该清单与专有数据结合,我们构建了一个结构匹配模型,描述了公司的年龄、规模和地理邻近性在合并决策中的历史性转变。我们发现,在1991年至2005年期间,作为正面因素,一家公司的规模在合并激励中比公司的年龄重要9.974倍。然而,在2006年至2022年期间,作为负面因素,一家公司的规模在合并激励中比公司的年龄重要0.026-0.630倍,即公司的规模起到了抑制作用。我们还发现,买方公司和卖方公司之间的距离在整个期间都起到了抑制作用,但在近年来的经济重要性已减弱到微不足道的程度。在反事实模拟中,我们观察到同一国家的公司之间合并的禁止会影响合并配置。 </p> <p> -arXiv:2402.09744v1 Announce Type: new Abstract: We propose a new framework for assessing Granger causality in quantiles in unstable environments, for a fixed quantile or over a continuum of quantile levels. Our proposed test statistics are consistent against fixed alternatives, they have nontrivial power against local alternatives, and they are pivotal in certain important special cases. In addition, we show the validity of a bootstrap procedure when asymptotic distributions depend on nuisance parameters. Monte Carlo simulations reveal that the proposed test statistics have correct empirical size and high power, even in absence of structural breaks. Finally, two empirical applications in energy economics and macroeconomics highlight the applicability of our method as the new tests provide stronger evidence of Granger causality. -</p>本研究提出了一种在回归不连续设计中更有效地利用协变量信息的估计器类,可以容纳大量的离散或连续协变量,并经由机器学习、非参数回归或经典参数方法来估计。通过对结果变量适当修改,这种估计器易于实现,可以选择类似于传统RD分析的调整参数。http://arxiv.org/abs/2107.07942<p> -回归不连续设计中的灵活协变量调整 +We construct a novel unified merger list in the global container shipping industry between 1966 (the beginning of the industry) and 2022. Combining the list with proprietary data, we construct a structural matching model to describe the historical transition of the importance of a firm's age, size, and geographical proximity on merger decisions. We find that, as a positive factor, a firm's size is more important than a firm's age by 9.974 times as a merger incentive between 1991 and 2005. However, between 2006 and 2022, as a negative factor, a firm's size is more important than a firm's age by 0.026-0.630 times, that is, a firm's size works as a disincentive. We also find that the distance between buyer and seller firms works as a disincentive for the whole period, but the importance has dwindled to economic insignificance in recent years. In counterfactual simulations, we observe that the prohibition of mergers between firms in the same country would affect the merger configuration of +</p>本论文介绍了一个纵向替代框架,用于准确估计长期治疗的效果。论文通过分解长期治疗效果为一系列函数,考虑用户属性、短期中间指标和治疗分配等因素。http://arxiv.org/abs/2308.08152<p> +估计长期治疗效果 </p> <p> -Flexible Covariate Adjustments in Regression Discontinuity Designs. (arXiv:2107.07942v2 [econ.EM] UPDATED) +Estimating Effects of Long-Term Treatments. (arXiv:2308.08152v1 [econ.EM]) </p> <p> -http://arxiv.org/abs/2107.07942 +http://arxiv.org/abs/2308.08152 </p> <p> -本研究提出了一种在回归不连续设计中更有效地利用协变量信息的估计器类,可以容纳大量的离散或连续协变量,并经由机器学习、非参数回归或经典参数方法来估计。通过对结果变量适当修改,这种估计器易于实现,可以选择类似于传统RD分析的调整参数。 +本论文介绍了一个纵向替代框架,用于准确估计长期治疗的效果。论文通过分解长期治疗效果为一系列函数,考虑用户属性、短期中间指标和治疗分配等因素。 </p> <p> </p> <p> -实证回归不连续(RD)研究通常使用协变量来增加其估计结果的精度。本文提出了一种新颖的估计器类,比目前实践中广泛使用的线性调整估计器更有效地利用这些协变量信息。我们的方法可以容纳可能大量的离散或连续协变量。它涉及使用适当修改了的结果变量运行标准RD分析,该变量的形式为原始结果与协变量函数的差异。我们表征了导致渐近方差最小的估计器的函数,并展示了如何通过现代机器学习、非参数回归或经典参数方法来估计它。由此产生的估计器易于实现,因为可以选择类似于传统RD分析的调整参数。广泛的模拟研究说明了我们的方法在有限样本中的性能,另外,一个案例研究突出了它的实证相关性。 +在A/B测试中,估计长期治疗的效果是一个巨大的挑战。这种治疗措施包括产品功能的更新、用户界面设计和推荐算法等,旨在在其发布后长期存在系统中。然而,由于长期试验的限制,从业者通常依赖短期实验结果来做产品发布决策。如何使用短期实验数据准确估计长期治疗效果仍然是一个未解决的问题。为了解决这个问题,我们引入了一个纵向替代框架。我们展示了,在标准假设下,长期治疗效果可以分解为一系列函数,这些函数依赖于用户属性、短期中间指标和治疗分配。我们描述了识别假设、估计策略和推理技术。 </p> <p> -Empirical regression discontinuity (RD) studies often use covariates to increase the precision of their estimates. In this paper, we propose a novel class of estimators that use such covariate information more efficiently than the linear adjustment estimators that are currently used widely in practice. Our approach can accommodate a possibly large number of either discrete or continuous covariates. It involves running a standard RD analysis with an appropriately modified outcome variable, which takes the form of the difference between the original outcome and a function of the covariates. We characterize the function that leads to the estimator with the smallest asymptotic variance, and show how it can be estimated via modern machine learning, nonparametric regression, or classical parametric methods. The resulting estimator is easy to implement, as tuning parameters can be chosen as in a conventional RD analysis. An extensive simulation study illustrates the performance of our approac -</p>本文研究如何使用有目标工具来控制多值处理中的选择偏差,并建立了组合编译器群体的条件来确定反事实平均值和处理效果。http://arxiv.org/abs/2007.10432<p> -有目标工具的过滤与未过滤处理效果 +Estimating the effects of long-term treatments in A/B testing presents a significant challenge. Such treatments -- including updates to product functions, user interface designs, and recommendation algorithms -- are intended to remain in the system for a long period after their launches. On the other hand, given the constraints of conducting long-term experiments, practitioners often rely on short-term experimental results to make product launch decisions. It remains an open question how to accurately estimate the effects of long-term treatments using short-term experimental data. To address this question, we introduce a longitudinal surrogate framework. We show that, under standard assumptions, the effects of long-term treatments can be decomposed into a series of functions, which depend on the user attributes, the short-term intermediate metrics, and the treatment assignments. We describe the identification assumptions, the estimation strategies, and the inference technique under thi +</p>这个论文提出了一种一步非参数的方法,使用平滑样条来处理内生性和仪器变量,同时解决了单调性限制的问题,并在估计Engel曲线时表现出良好性能。http://arxiv.org/abs/2307.14867<p> +一步非参数仪器回归使用平滑样条 </p> <p> -Filtered and Unfiltered Treatment Effects with Targeting Instruments. (arXiv:2007.10432v3 [econ.EM] UPDATED) +One-step nonparametric instrumental regression using smoothing splines. (arXiv:2307.14867v1 [econ.EM]) </p> <p> -http://arxiv.org/abs/2007.10432 +http://arxiv.org/abs/2307.14867 </p> <p> -本文研究如何使用有目标工具来控制多值处理中的选择偏差,并建立了组合编译器群体的条件来确定反事实平均值和处理效果。 +这个论文提出了一种一步非参数的方法,使用平滑样条来处理内生性和仪器变量,同时解决了单调性限制的问题,并在估计Engel曲线时表现出良好性能。 </p> <p> </p> <p> -在应用中,多值处理是很常见的。我们探讨了在这种情况下使用离散工具来控制选择偏差的方法。我们强调了有关定位(工具定位于哪些处理)和过滤(限制分析师对给定观测的处理分配的知识)的假设作用。这允许我们建立条件,使得针对组合编译器群体,可以确定反事实平均值和处理效果。我们通过将其应用于Head Start Impact Study和Student Achievement and Retention Project的数据来说明我们框架的实用性。 +我们将非参数回归平滑样条扩展到一种情境,即存在内生性和使用仪器变量。与流行的现有估计方法不同,结果估计器是一步的,并依赖于唯一的正则化参数。我们导出了估计器及其一阶导数的均匀收敛速率。我们还解决了在估计中施加单调性的问题。模拟结果证实了我们的估计器与两步程序相比的良好性能。当用于估计Engel曲线时,我们的方法产生了经济上有意义的结果。 </p> <p> -Multivalued treatments are commonplace in applications. We explore the use of discrete-valued instruments to control for selection bias in this setting. Our discussion stresses the role of assumptions on targeting (which instruments target which treatments) and filtering (limits on the analyst's knowledge of the treatment assigned to a given observation). It allows us to establish conditions under which counterfactual averages and treatment effects are identified for composite complier groups. We illustrate the usefulness of our framework by applying it to data from the Head Start Impact Study and the Student Achievement and Retention Project. +We extend nonparametric regression smoothing splines to a context where there is endogeneity and instrumental variables are available. Unlike popular existing estimators, the resulting estimator is one-step and relies on a unique regularization parameter. We derive uniform rates of the convergence for the estimator and its first derivative. We also address the issue of imposing monotonicity in estimation. Simulations confirm the good performances of our estimator compared to two-step procedures. Our method yields economically sensible results when used to estimate Engel curves. +</p>本文提出了一种利用人工智能改善人类决策的统计框架,通过基准测试与机器预测,替换部分人类决策者的决策制定,并经过实验检验得出算法具有更高的真阳性率和更低的假阳性率,尤其是来自农村地区的医生的诊断更容易被替代。http://arxiv.org/abs/2306.11689<p> +统计测试替代人类决策者的算法 +</p> +<p> +Statistical Tests for Replacing Human Decision Makers with Algorithms. (arXiv:2306.11689v1 [econ.EM]) +</p> +<p> +http://arxiv.org/abs/2306.11689 +</p> +<p> +本文提出了一种利用人工智能改善人类决策的统计框架,通过基准测试与机器预测,替换部分人类决策者的决策制定,并经过实验检验得出算法具有更高的真阳性率和更低的假阳性率,尤其是来自农村地区的医生的诊断更容易被替代。 +</p> +<p> + +</p> +<p> +本文提出了一个统计框架,可以通过人工智能来改善人类的决策。首先将每个人类决策者的表现与机器预测进行基准测试;然后用所提出的人工智能算法的建议替换决策制定者的一个子集所做出的决策。利用全国大型孕产结果和繁殖年龄夫妇孕前检查的医生诊断数据集,我们试验了一种启发式高频率方法以及一种贝叶斯后验损失函数方法,并将其应用于异常出生检测。我们发现,我们的算法在一个测试数据集上的结果比仅由医生诊断的结果具有更高的总体真阳性率和更低的假阳性率。我们还发现,来自农村地区的医生的诊断更容易被替代,这表明人工智能辅助决策制定更容易提高精确度。 +</p> +<p> +This paper proposes a statistical framework with which artificial intelligence can improve human decision making. The performance of each human decision maker is first benchmarked against machine predictions; we then replace the decisions made by a subset of the decision makers with the recommendation from the proposed artificial intelligence algorithm. Using a large nationwide dataset of pregnancy outcomes and doctor diagnoses from prepregnancy checkups of reproductive age couples, we experimented with both a heuristic frequentist approach and a Bayesian posterior loss function approach with an application to abnormal birth detection. We find that our algorithm on a test dataset results in a higher overall true positive rate and a lower false positive rate than the diagnoses made by doctors only. We also find that the diagnoses of doctors from rural areas are more frequently replaceable, suggesting that artificial intelligence assisted decision making tends to improve precision more i +</p>本研究完整描述了OLG重复博弈的收益集合,发现随着玩家贴现因子接近1,可行收益集合变得更小。http://arxiv.org/abs/2303.12988<p> +描述OLG重复博弈可行收益集合的特征 +</p> +<p> +Characterizing the Feasible Payoff Set of OLG Repeated Games. (arXiv:2303.12988v1 [econ.TH]) +</p> +<p> +http://arxiv.org/abs/2303.12988 +</p> +<p> +本研究完整描述了OLG重复博弈的收益集合,发现随着玩家贴现因子接近1,可行收益集合变得更小。 +</p> +<p> + +</p> +<p> +本文研究了OLG重复博弈的可行收益集合,并给出了完整的特征描述。本文还首次提供了关于由玩家贴现因子和交互长度引起的可行收益集合的新比较静态。而令人惊讶的是,随着玩家贴现因子接近1,可行收益集合变得更小了。 +</p> +<p> +We study the set of feasible payoffs of OLG repeated games. We first provide a complete characterization of the feasible payoffs. Second, we provide a novel comparative statics of the feasible payoff set with respect to players' discount factor and the length of interaction. Perhaps surprisingly, the feasible payoff set becomes smaller as the players' discount factor approaches to one. +</p>本研究建立了一个具有组、位置不确定性和观察学习的公共物品博弈模型,展示了在固定的时间视野内,自私的玩家之间可以实现全面合作,并发现了各组内的同时高效提供是一个均衡策略。http://arxiv.org/abs/2303.10514<p> +在多极世界中高效提供公共物品 +</p> +<p> +Efficient Public Good Provision in a Multipolar World. (arXiv:2303.10514v1 [econ.TH]) +</p> +<p> +http://arxiv.org/abs/2303.10514 +</p> +<p> +本研究建立了一个具有组、位置不确定性和观察学习的公共物品博弈模型,展示了在固定的时间视野内,自私的玩家之间可以实现全面合作,并发现了各组内的同时高效提供是一个均衡策略。 +</p> +<p> + +</p> +<p> +我们建立了一个具有组、位置不确定性和观察学习的公共物品博弈模型。各组内的贡献是同时进行的,而各组的游戏则是基于过去贡献不完整样本的观察依次进行的。我们展示了在固定的时间视野内,即使是自私的玩家之间也可以实现全面合作。位置不确定性意味着存在一个均衡,即玩家组有条件地合作以期影响后续组玩家的行动。条件合作意味着每个组成员都是关键的,因此各组内的同时高效提供是一个均衡策略。 +</p> +<p> +We model a public goods game with groups, position uncertainty, and observational learning. Contributions are simultaneous within groups, but groups play sequentially based on their observation of an incomplete sample of past contributions. We show that full cooperation between and within groups is possible with self-interested players on a fixed horizon. Position uncertainty implies the existence of an equilibrium where groups of players conditionally cooperate in the hope of influencing further groups. Conditional cooperation implies that each group member is pivotal, so that efficient simultaneous provision within groups is an equilibrium. +</p>本文提出了一种新方法以在一定时间段内进行政策选择,通过经验福利最大化方法来估计这一政策规则,旨在达到更好的条件福利。我们表征了此方法可达到最优策略选择的条件,并用模拟研究和实证应用来证明其可行性,在宏观经济领域应用上也能够起到积极的作用。http://arxiv.org/abs/2205.03970<p> +基于经验福利最大化的时间序列政策选择方法研究 +</p> +<p> +Policy Choice in Time Series by Empirical Welfare Maximization. (arXiv:2205.03970v3 [econ.EM] UPDATED) +</p> +<p> +http://arxiv.org/abs/2205.03970 +</p> +<p> +本文提出了一种新方法以在一定时间段内进行政策选择,通过经验福利最大化方法来估计这一政策规则,旨在达到更好的条件福利。我们表征了此方法可达到最优策略选择的条件,并用模拟研究和实证应用来证明其可行性,在宏观经济领域应用上也能够起到积极的作用。 +</p> +<p> + +</p> +<p> +本文提出了一种在多元时间序列条件下进行政策选择的新方法。建立在统计治疗选择框架的基础上,我们提出了时间序列经验福利最大化(T-EWM)方法,通过最大化使用非参数潜在结果时间序列构造的经验福利准则来估计当前或多个时期的最优政策规则。我们表征了T-EWM在什么条件下能够一致地学习到在时间序列历史给定条件福利下最优的政策选择。然后推导出了条件福利遗憾及其极小下界的非渐进上限。为了说明T-EWM的实现和应用,我们进行了模拟研究,并将该方法应用于从宏观经济时间序列数据中估计最优货币政策规则。 +</p> +<p> +This paper develops a novel method for policy choice in a dynamic setting where the available data is a multivariate time series. Building on the statistical treatment choice framework, we propose Time-series Empirical Welfare Maximization (T-EWM) methods to estimate an optimal policy rule for the current period or over multiple periods by maximizing an empirical welfare criterion constructed using nonparametric potential outcome time-series. We characterize conditions under which T-EWM consistently learns a policy choice that is optimal in terms of conditional welfare given the time-series history. We then derive a nonasymptotic upper bound for conditional welfare regret and its minimax lower bound. To illustrate the implementation and uses of T-EWM, we perform simulation studies and apply the method to estimate optimal monetary policy rules from macroeconomic time-series data. </p> \ No newline at end of file diff --git a/latest_updated.txt b/latest_updated.txt index 7debe8601..779c0c9cb 100644 --- a/latest_updated.txt +++ b/latest_updated.txt @@ -1 +1 @@ -2024-12-10 03:21:12 \ No newline at end of file +2024-12-10 09:07:17 \ No newline at end of file diff --git a/q-fin.md b/q-fin.md index 794140b33..922ba918d 100644 --- a/q-fin.md +++ b/q-fin.md @@ -2,37 +2,37 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [Provisions and Economic Capital for Credit Losses.](http://arxiv.org/abs/2401.07728) | 基于超模性排序特性,该论文证明了信用损失的凸风险测度相对于椭圆分布潜在因素的信用-信用和信用-市场协方差是非递减的,这对于计算信用拨备、经济资本、压力测试和风险管理分析非常有帮助。 | -| [^2] | [Path Integral Method for Barrier Option Pricing Under Vasicek Model.](http://arxiv.org/abs/2307.07103) | 该论文使用路径积分方法研究了Vasicek模型中的障碍期权定价问题,通过类比量子理论中的散射问题和方势井问题,导出了定价核和期权价格表达式,并给出了标的资产价格变化对期权价格的数值结果。 | +| [^1] | [Unified Merger List in the Container Shipping Industry from 1966: A Structural Estimation of the Transition of Importance of a Firm's Age, Tonnage Capacity, and Geographical Proximity on Merger Decision.](http://arxiv.org/abs/2310.09938) | 本研究构建了集装箱航运行业1966年至2022年的统一合并清单,并通过结构匹配模型研究了公司的年龄、规模和地理邻近性对合并决策的重要性的转变。研究发现,公司规模在1991年至2005年期间作为正向因素对合并激励更加重要,但在2006年至2022年期间作为负向因素起到抑制作用。同时,公司之间的地理距离对合并决策也产生影响。 | +| [^2] | [Consistency of MLE for partially observed diffusions, with application in market microstructure modeling.](http://arxiv.org/abs/2201.07656) | 本文提出了部分观测扩散过程的极大似然估计一致性的足够条件,并在市场微观结构建模中实现了该模型的未知参数的极大似然估计量的一致性。 | # 详细 -[^1]: 信用损失的条款和经济资本 +[^1]: 《1966年以来集装箱航运行业的统一合并清单:一家公司的年龄、吨位容量和地理邻近性对合并决策的重要性的结构估计》的翻译题目 - Provisions and Economic Capital for Credit Losses. (arXiv:2401.07728v1 [q-fin.RM]) + Unified Merger List in the Container Shipping Industry from 1966: A Structural Estimation of the Transition of Importance of a Firm's Age, Tonnage Capacity, and Geographical Proximity on Merger Decision. (arXiv:2310.09938v1 [econ.GN]) - [http://arxiv.org/abs/2401.07728](http://arxiv.org/abs/2401.07728) + [http://arxiv.org/abs/2310.09938](http://arxiv.org/abs/2310.09938) - 基于超模性排序特性,该论文证明了信用损失的凸风险测度相对于椭圆分布潜在因素的信用-信用和信用-市场协方差是非递减的,这对于计算信用拨备、经济资本、压力测试和风险管理分析非常有帮助。 + 本研究构建了集装箱航运行业1966年至2022年的统一合并清单,并通过结构匹配模型研究了公司的年龄、规模和地理邻近性对合并决策的重要性的转变。研究发现,公司规模在1991年至2005年期间作为正向因素对合并激励更加重要,但在2006年至2022年期间作为负向因素起到抑制作用。同时,公司之间的地理距离对合并决策也产生影响。 - 基于超模性排序特性,我们证明了信用损失的凸风险测度相对于椭圆分布潜在因素的信用-信用和错误风险设置下的信用-市场协方差是非递减的。这些结果支持使用这样的设置来计算信用拨备和经济资本,进行压力测试和风险管理分析。 + 我们构建了一个新颖的全球集装箱航运行业1966年至2022年之间的统一合并清单。将该清单与专有数据结合,我们构建了一个结构匹配模型,描述了公司的年龄、规模和地理邻近性在合并决策中的历史性转变。我们发现,在1991年至2005年期间,作为正面因素,一家公司的规模在合并激励中比公司的年龄重要9.974倍。然而,在2006年至2022年期间,作为负面因素,一家公司的规模在合并激励中比公司的年龄重要0.026-0.630倍,即公司的规模起到了抑制作用。我们还发现,买方公司和卖方公司之间的距离在整个期间都起到了抑制作用,但在近年来的经济重要性已减弱到微不足道的程度。在反事实模拟中,我们观察到同一国家的公司之间合并的禁止会影响合并配置。 - Based on supermodularity ordering properties, we show that convex risk measures of credit losses are nondecreasing w.r.t. credit-credit and, in a wrong-way risk setup, credit-market, covariances of elliptically distributed latent factors. These results support the use of such setups for computing credit provisions and economic capital or for conducting stress test exercises and risk management analysis. + We construct a novel unified merger list in the global container shipping industry between 1966 (the beginning of the industry) and 2022. Combining the list with proprietary data, we construct a structural matching model to describe the historical transition of the importance of a firm's age, size, and geographical proximity on merger decisions. We find that, as a positive factor, a firm's size is more important than a firm's age by 9.974 times as a merger incentive between 1991 and 2005. However, between 2006 and 2022, as a negative factor, a firm's size is more important than a firm's age by 0.026-0.630 times, that is, a firm's size works as a disincentive. We also find that the distance between buyer and seller firms works as a disincentive for the whole period, but the importance has dwindled to economic insignificance in recent years. In counterfactual simulations, we observe that the prohibition of mergers between firms in the same country would affect the merger configuration of -[^2]: 使用路径积分方法对Vasicek模型中的障碍期权定价 +[^2]: 部分观测扩散过程的极大似然估计一致性及在市场微观结构建模中的应用 - Path Integral Method for Barrier Option Pricing Under Vasicek Model. (arXiv:2307.07103v1 [q-fin.PR]) + Consistency of MLE for partially observed diffusions, with application in market microstructure modeling. (arXiv:2201.07656v2 [math.ST] UPDATED) - [http://arxiv.org/abs/2307.07103](http://arxiv.org/abs/2307.07103) + [http://arxiv.org/abs/2201.07656](http://arxiv.org/abs/2201.07656) - 该论文使用路径积分方法研究了Vasicek模型中的障碍期权定价问题,通过类比量子理论中的散射问题和方势井问题,导出了定价核和期权价格表达式,并给出了标的资产价格变化对期权价格的数值结果。 + 本文提出了部分观测扩散过程的极大似然估计一致性的足够条件,并在市场微观结构建模中实现了该模型的未知参数的极大似然估计量的一致性。 - 量子理论中的路径积分方法为随时间变化的期权定价提供了一种新的思路。对于障碍期权,期权价格变化过程类似于量子力学中的无穷高势垒散射问题;对于双障碍期权,期权价格变化过程类似于粒子在一个无穷方势井中运动。利用路径积分方法,可以导出Vasicek随机利率模型下的定价核和期权价格表达式。同时还展示了期权价格随标的资产价格变化的数值结果。 + 本文提出了一个易于处理的足够条件,用于描述与完全观测的扩散过程相关的稳态分布,从而得出在未知参数值有限的情况下,极大似然估计量的一致性条件。我们将该足够条件应用于市场微观结构的潜在价格模型中,并验证了该模型下未知参数的极大似然估计量的一致性。最后,我们利用纳斯达克交易所的历史金融数据计算了这些估计值。 - Path integral method in quantum theory provides a new thinking for time dependent option pricing. For barrier options, the option price changing process is similar to the infinite high barrier scattering problem in quantum mechanics; for double barrier options, the option price changing process is analogous to a particle moving in a infinite square potential well. Using path integral method, the expressions of pricing kernel and option price under Vasicek stochastic interest rate model could be derived. Numerical results of options price as functions of underlying prices are also shown. + This paper presents a tractable sufficient condition for the consistency of maximum likelihood estimators (MLEs) in partially observed diffusion models, stated in terms of stationary distribution of the associated fully observed diffusion, under the assumption that the set of unknown parameter values is finite. This sufficient condition is then verified in the context of a latent price model of market microstructure, yielding consistency of maximum likelihood estimators of the unknown parameters in this model. Finally, we compute the latter estimators using historical financial data taken from the NASDAQ exchange. diff --git a/q-fin.xml b/q-fin.xml index 6485931e6..7a33b12fb 100644 --- a/q-fin.xml +++ b/q-fin.xml @@ -1,41 +1,41 @@ -Chat Arxiv q-finhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for q-fin基于超模性排序特性,该论文证明了信用损失的凸风险测度相对于椭圆分布潜在因素的信用-信用和信用-市场协方差是非递减的,这对于计算信用拨备、经济资本、压力测试和风险管理分析非常有帮助。http://arxiv.org/abs/2401.07728<p> -信用损失的条款和经济资本 +Chat Arxiv q-finhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for q-fin本研究构建了集装箱航运行业1966年至2022年的统一合并清单,并通过结构匹配模型研究了公司的年龄、规模和地理邻近性对合并决策的重要性的转变。研究发现,公司规模在1991年至2005年期间作为正向因素对合并激励更加重要,但在2006年至2022年期间作为负向因素起到抑制作用。同时,公司之间的地理距离对合并决策也产生影响。http://arxiv.org/abs/2310.09938<p> +《1966年以来集装箱航运行业的统一合并清单:一家公司的年龄、吨位容量和地理邻近性对合并决策的重要性的结构估计》的翻译题目 </p> <p> -Provisions and Economic Capital for Credit Losses. (arXiv:2401.07728v1 [q-fin.RM]) +Unified Merger List in the Container Shipping Industry from 1966: A Structural Estimation of the Transition of Importance of a Firm's Age, Tonnage Capacity, and Geographical Proximity on Merger Decision. (arXiv:2310.09938v1 [econ.GN]) </p> <p> -http://arxiv.org/abs/2401.07728 +http://arxiv.org/abs/2310.09938 </p> <p> -基于超模性排序特性,该论文证明了信用损失的凸风险测度相对于椭圆分布潜在因素的信用-信用和信用-市场协方差是非递减的,这对于计算信用拨备、经济资本、压力测试和风险管理分析非常有帮助。 +本研究构建了集装箱航运行业1966年至2022年的统一合并清单,并通过结构匹配模型研究了公司的年龄、规模和地理邻近性对合并决策的重要性的转变。研究发现,公司规模在1991年至2005年期间作为正向因素对合并激励更加重要,但在2006年至2022年期间作为负向因素起到抑制作用。同时,公司之间的地理距离对合并决策也产生影响。 </p> <p> </p> <p> -基于超模性排序特性,我们证明了信用损失的凸风险测度相对于椭圆分布潜在因素的信用-信用和错误风险设置下的信用-市场协方差是非递减的。这些结果支持使用这样的设置来计算信用拨备和经济资本,进行压力测试和风险管理分析。 +我们构建了一个新颖的全球集装箱航运行业1966年至2022年之间的统一合并清单。将该清单与专有数据结合,我们构建了一个结构匹配模型,描述了公司的年龄、规模和地理邻近性在合并决策中的历史性转变。我们发现,在1991年至2005年期间,作为正面因素,一家公司的规模在合并激励中比公司的年龄重要9.974倍。然而,在2006年至2022年期间,作为负面因素,一家公司的规模在合并激励中比公司的年龄重要0.026-0.630倍,即公司的规模起到了抑制作用。我们还发现,买方公司和卖方公司之间的距离在整个期间都起到了抑制作用,但在近年来的经济重要性已减弱到微不足道的程度。在反事实模拟中,我们观察到同一国家的公司之间合并的禁止会影响合并配置。 </p> <p> -Based on supermodularity ordering properties, we show that convex risk measures of credit losses are nondecreasing w.r.t. credit-credit and, in a wrong-way risk setup, credit-market, covariances of elliptically distributed latent factors. These results support the use of such setups for computing credit provisions and economic capital or for conducting stress test exercises and risk management analysis. -</p>该论文使用路径积分方法研究了Vasicek模型中的障碍期权定价问题,通过类比量子理论中的散射问题和方势井问题,导出了定价核和期权价格表达式,并给出了标的资产价格变化对期权价格的数值结果。http://arxiv.org/abs/2307.07103<p> -使用路径积分方法对Vasicek模型中的障碍期权定价 +We construct a novel unified merger list in the global container shipping industry between 1966 (the beginning of the industry) and 2022. Combining the list with proprietary data, we construct a structural matching model to describe the historical transition of the importance of a firm's age, size, and geographical proximity on merger decisions. We find that, as a positive factor, a firm's size is more important than a firm's age by 9.974 times as a merger incentive between 1991 and 2005. However, between 2006 and 2022, as a negative factor, a firm's size is more important than a firm's age by 0.026-0.630 times, that is, a firm's size works as a disincentive. We also find that the distance between buyer and seller firms works as a disincentive for the whole period, but the importance has dwindled to economic insignificance in recent years. In counterfactual simulations, we observe that the prohibition of mergers between firms in the same country would affect the merger configuration of +</p>本文提出了部分观测扩散过程的极大似然估计一致性的足够条件,并在市场微观结构建模中实现了该模型的未知参数的极大似然估计量的一致性。http://arxiv.org/abs/2201.07656<p> +部分观测扩散过程的极大似然估计一致性及在市场微观结构建模中的应用 </p> <p> -Path Integral Method for Barrier Option Pricing Under Vasicek Model. (arXiv:2307.07103v1 [q-fin.PR]) +Consistency of MLE for partially observed diffusions, with application in market microstructure modeling. (arXiv:2201.07656v2 [math.ST] UPDATED) </p> <p> -http://arxiv.org/abs/2307.07103 +http://arxiv.org/abs/2201.07656 </p> <p> -该论文使用路径积分方法研究了Vasicek模型中的障碍期权定价问题,通过类比量子理论中的散射问题和方势井问题,导出了定价核和期权价格表达式,并给出了标的资产价格变化对期权价格的数值结果。 +本文提出了部分观测扩散过程的极大似然估计一致性的足够条件,并在市场微观结构建模中实现了该模型的未知参数的极大似然估计量的一致性。 </p> <p> </p> <p> -量子理论中的路径积分方法为随时间变化的期权定价提供了一种新的思路。对于障碍期权,期权价格变化过程类似于量子力学中的无穷高势垒散射问题;对于双障碍期权,期权价格变化过程类似于粒子在一个无穷方势井中运动。利用路径积分方法,可以导出Vasicek随机利率模型下的定价核和期权价格表达式。同时还展示了期权价格随标的资产价格变化的数值结果。 +本文提出了一个易于处理的足够条件,用于描述与完全观测的扩散过程相关的稳态分布,从而得出在未知参数值有限的情况下,极大似然估计量的一致性条件。我们将该足够条件应用于市场微观结构的潜在价格模型中,并验证了该模型下未知参数的极大似然估计量的一致性。最后,我们利用纳斯达克交易所的历史金融数据计算了这些估计值。 </p> <p> -Path integral method in quantum theory provides a new thinking for time dependent option pricing. For barrier options, the option price changing process is similar to the infinite high barrier scattering problem in quantum mechanics; for double barrier options, the option price changing process is analogous to a particle moving in a infinite square potential well. Using path integral method, the expressions of pricing kernel and option price under Vasicek stochastic interest rate model could be derived. Numerical results of options price as functions of underlying prices are also shown. +This paper presents a tractable sufficient condition for the consistency of maximum likelihood estimators (MLEs) in partially observed diffusion models, stated in terms of stationary distribution of the associated fully observed diffusion, under the assumption that the set of unknown parameter values is finite. This sufficient condition is then verified in the context of a latent price model of market microstructure, yielding consistency of maximum likelihood estimators of the unknown parameters in this model. Finally, we compute the latter estimators using historical financial data taken from the NASDAQ exchange. </p> \ No newline at end of file diff --git a/stat.ML.md b/stat.ML.md index 538be1076..6d7e6a0d2 100644 --- a/stat.ML.md +++ b/stat.ML.md @@ -2,67 +2,157 @@ | Ref | Title | Summary | | --- | --- | --- | -| [^1] | [Voronoi Candidates for Bayesian Optimization](https://arxiv.org/abs/2402.04922) | 使用Voronoi候选点边界可以在贝叶斯优化中有效地优化黑盒函数,提高了多起始连续搜索的执行时间。 | -| [^2] | [Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models.](http://arxiv.org/abs/2310.12000) | 这篇文章介绍了用于潜在高斯过程模型中的Vecchia-Laplace近似法的迭代方法,相比于传统的Cholesky分解方法,可以显著加快计算速度。 | -| [^3] | [Memorization with neural nets: going beyond the worst case.](http://arxiv.org/abs/2310.00327) | 本文研究了神经网络的插值问题,提出了一种简单的随机算法,在给定的数据集和两个类的情况下,能够以很高的概率构建一个插值的神经网络。这些结果与训练数据规模无关。 | -| [^4] | [The Score-Difference Flow for Implicit Generative Modeling.](http://arxiv.org/abs/2304.12906) | 本文提出了一种新的评分差异流模型(SD flow),它可以最优地减少两个分布之间的散度,同时解决Schr​​ödinger桥问题。与去噪扩散模型不同,它没有对先验分布施加任何限制,在一些基准数据集中优于其他方法。 | +| [^1] | [Functional Bilevel Optimization for Machine Learning](https://arxiv.org/abs/2403.20233) | 介绍了机器学习中的函数双层优化问题,提出了不依赖于强凸假设的方法,并展示了在仪表回归和强化学习任务中使用神经网络的优势。 | +| [^2] | [Auditing Fairness under Unobserved Confounding](https://arxiv.org/abs/2403.14713) | 在未观测混杂因素的情况下,本文展示了即使在放宽或甚至在排除所有相关风险因素被观测到的假设的情况下,仍然可以给出对高风险个体分配率的信息丰富的界限。 | +| [^3] | [Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations](https://arxiv.org/abs/2403.08121) | 本文研究了训练深度齐次神经网络时梯度流动力学的动态性,发现在足够小的初始化下,神经网络的权重在训练早期阶段保持较小规范,并且沿着神经相关函数的KKT点方向近似收敛。 | +| [^4] | [RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval](https://arxiv.org/abs/2402.18510) | 本文研究了RNNs和Transformer在处理算法问题时的表现能力差距,发现RNNs存在关键瓶颈,即无法完美地从上下文中检索信息,导致无法像Transformer那样轻松解决需要这种能力的任务。 | +| [^5] | [Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models](https://arxiv.org/abs/2402.09236) | 本研究将因果表示学习和基础模型相结合,研究了如何从数据中学习人类可解释的概念。实验证明了这一统一方法的实用性。 | +| [^6] | [Dynamic Incremental Optimization for Best Subset Selection](https://arxiv.org/abs/2402.02322) | 本文研究了一类$\ell_0$正则化问题的对偶形式,并提出了一种高效的原对偶算法,通过充分利用对偶范围估计和增量策略,提高了最佳子集选择问题的解决方案的效率和统计性质。 | +| [^7] | [A flexible Bayesian g-formula for causal survival analyses with time-dependent confounding](https://arxiv.org/abs/2402.02306) | 本文提出了一种更灵活的贝叶斯g形式估计器,用于具有时变混杂的因果生存分析。它采用贝叶斯附加回归树来模拟时变生成组件,并引入了纵向平衡分数以降低模型错误规范引起的偏差。 | +| [^8] | [Controlling Multiple Errors Simultaneously with a PAC-Bayes Bound](https://arxiv.org/abs/2202.05560) | 该研究提出了一种PAC-Bayes界限,能够同时控制多个错误,并提供丰富的信息,适用于回归中测试损失分布或分类中不同错误分类的概率。 | +| [^9] | [Comparative Study of Causal Discovery Methods for Cyclic Models with Hidden Confounders.](http://arxiv.org/abs/2401.13009) | 对于循环模型中含有隐藏因变量的因果发现,已经出现了能够处理这种情况的多种技术方法。 | +| [^10] | [Statistical Tests for Replacing Human Decision Makers with Algorithms.](http://arxiv.org/abs/2306.11689) | 本文提出了一种利用人工智能改善人类决策的统计框架,通过基准测试与机器预测,替换部分人类决策者的决策制定,并经过实验检验得出算法具有更高的真阳性率和更低的假阳性率,尤其是来自农村地区的医生的诊断更容易被替代。 | # 详细 -[^1]: Voronoi Candidates用于贝叶斯优化 +[^1]: 机器学习中的函数双层优化 - Voronoi Candidates for Bayesian Optimization + Functional Bilevel Optimization for Machine Learning - [https://arxiv.org/abs/2402.04922](https://arxiv.org/abs/2402.04922) + [https://arxiv.org/abs/2403.20233](https://arxiv.org/abs/2403.20233) - 使用Voronoi候选点边界可以在贝叶斯优化中有效地优化黑盒函数,提高了多起始连续搜索的执行时间。 + 介绍了机器学习中的函数双层优化问题,提出了不依赖于强凸假设的方法,并展示了在仪表回归和强化学习任务中使用神经网络的优势。 - 贝叶斯优化(BO)为高效优化黑盒函数提供了一种优雅的方法。然而,采集准则需要进行具有挑战性的内部优化,这可能引起很大的开销。许多实际的BO方法,尤其是在高维情况下,不采用对采集函数进行形式化连续优化,而是在有限的空间填充候选集上进行离散搜索。在这里,我们提议使用候选点,其位于当前设计点的Voronoi镶嵌边界上,因此它们与两个或多个设计点等距离。我们讨论了通过直接采样Voronoi边界而不明确生成镶嵌的策略,从而适应高维度中的大设计。通过使用高斯过程和期望改进来对一组测试问题进行优化,我们的方法在不损失准确性的情况下显著提高了多起始连续搜索的执行时间。 + 在本文中,我们介绍了针对机器学习中的双层优化问题的一种新的函数视角,其中内部目标在函数空间上被最小化。这些类型的问题通常通过在参数设置下开发的方法来解决,其中内部目标对于预测函数的参数强凸。函数视角不依赖于此假设,特别允许使用超参数化的神经网络作为内部预测函数。我们提出了可扩展和高效的算法来解决函数双层优化问题,并展示了我们方法在适合自然函数双层结构的仪表回归和强化学习任务上的优势。 - Bayesian optimization (BO) offers an elegant approach for efficiently optimizing black-box functions. However, acquisition criteria demand their own challenging inner-optimization, which can induce significant overhead. Many practical BO methods, particularly in high dimension, eschew a formal, continuous optimization of the acquisition function and instead search discretely over a finite set of space-filling candidates. Here, we propose to use candidates which lie on the boundary of the Voronoi tessellation of the current design points, so they are equidistant to two or more of them. We discuss strategies for efficient implementation by directly sampling the Voronoi boundary without explicitly generating the tessellation, thus accommodating large designs in high dimension. On a battery of test problems optimized via Gaussian processes with expected improvement, our proposed approach significantly improves the execution time of a multi-start continuous search without a loss in accuracy + arXiv:2403.20233v1 Announce Type: cross Abstract: In this paper, we introduce a new functional point of view on bilevel optimization problems for machine learning, where the inner objective is minimized over a function space. These types of problems are most often solved by using methods developed in the parametric setting, where the inner objective is strongly convex with respect to the parameters of the prediction function. The functional point of view does not rely on this assumption and notably allows using over-parameterized neural networks as the inner prediction function. We propose scalable and efficient algorithms for the functional bilevel optimization problem and illustrate the benefits of our approach on instrumental regression and reinforcement learning tasks, which admit natural functional bilevel structures. -[^2]: Vecchia-Laplace近似法在潜在高斯过程模型中的迭代方法 +[^2]: 在未观测混杂因素下审计公平性 - Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models. (arXiv:2310.12000v1 [stat.ME]) + Auditing Fairness under Unobserved Confounding - [http://arxiv.org/abs/2310.12000](http://arxiv.org/abs/2310.12000) + [https://arxiv.org/abs/2403.14713](https://arxiv.org/abs/2403.14713) - 这篇文章介绍了用于潜在高斯过程模型中的Vecchia-Laplace近似法的迭代方法,相比于传统的Cholesky分解方法,可以显著加快计算速度。 + 在未观测混杂因素的情况下,本文展示了即使在放宽或甚至在排除所有相关风险因素被观测到的假设的情况下,仍然可以给出对高风险个体分配率的信息丰富的界限。 - 潜在高斯过程(GP)模型是灵活的概率非参数函数模型。Vecchia近似是用于克服大数据计算瓶颈的准确近似方法,Laplace近似是一种快速方法,可以近似非高斯似然函数的边缘似然和后验预测分布,并具有渐近收敛保证。然而,当与直接求解方法(如Cholesky分解)结合使用时,Vecchia-Laplace近似的计算复杂度增长超线性地随样本大小增加。因此,与Vecchia-Laplace近似计算相关的运算在通常情况下是最准确的大型数据集时会变得非常缓慢。在本文中,我们提出了几种用于Vecchia-Laplace近似推断的迭代方法,相比于基于Cholesky的计算,可以大大加快计算速度。我们对我们的方法进行了分析。 + 决策系统中的一个基本问题是跨越人口统计线存在不公平性。然而,不公平性可能难以量化,特别是如果我们对公平性的理解依赖于难以衡量的风险等观念(例如,对于那些没有其治疗就会死亡的人平等获得治疗)。审计这种不公平性需要准确测量个体风险,而在未观测混杂的现实环境中,难以估计。在这些未观测到的因素“解释”明显差异的情况下,我们可能低估或高估不公平性。在本文中,我们展示了即使在放宽或(令人惊讶地)甚至在排除所有相关风险因素被观测到的假设的情况下,仍然可以对高风险个体的分配率给出信息丰富的界限。我们利用了在许多实际环境中(例如引入新型治疗)我们拥有在任何分配之前的数据的事实。 - Latent Gaussian process (GP) models are flexible probabilistic non-parametric function models. Vecchia approximations are accurate approximations for GPs to overcome computational bottlenecks for large data, and the Laplace approximation is a fast method with asymptotic convergence guarantees to approximate marginal likelihoods and posterior predictive distributions for non-Gaussian likelihoods. Unfortunately, the computational complexity of combined Vecchia-Laplace approximations grows faster than linearly in the sample size when used in combination with direct solver methods such as the Cholesky decomposition. Computations with Vecchia-Laplace approximations thus become prohibitively slow precisely when the approximations are usually the most accurate, i.e., on large data sets. In this article, we present several iterative methods for inference with Vecchia-Laplace approximations which make computations considerably faster compared to Cholesky-based calculations. We analyze our propo + arXiv:2403.14713v1 Announce Type: cross Abstract: A fundamental problem in decision-making systems is the presence of inequity across demographic lines. However, inequity can be difficult to quantify, particularly if our notion of equity relies on hard-to-measure notions like risk (e.g., equal access to treatment for those who would die without it). Auditing such inequity requires accurate measurements of individual risk, which is difficult to estimate in the realistic setting of unobserved confounding. In the case that these unobservables "explain" an apparent disparity, we may understate or overstate inequity. In this paper, we show that one can still give informative bounds on allocation rates among high-risk individuals, even while relaxing or (surprisingly) even when eliminating the assumption that all relevant risk factors are observed. We utilize the fact that in many real-world settings (e.g., the introduction of a novel treatment) we have data from a period prior to any alloc -[^3]: 神经网络的记忆化:超越最坏情况 +[^3]: 早期方向性收敛在深度齐次神经网络中进行小初始化时的分析 - Memorization with neural nets: going beyond the worst case. (arXiv:2310.00327v1 [stat.ML]) + Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations - [http://arxiv.org/abs/2310.00327](http://arxiv.org/abs/2310.00327) + [https://arxiv.org/abs/2403.08121](https://arxiv.org/abs/2403.08121) - 本文研究了神经网络的插值问题,提出了一种简单的随机算法,在给定的数据集和两个类的情况下,能够以很高的概率构建一个插值的神经网络。这些结果与训练数据规模无关。 + 本文研究了训练深度齐次神经网络时梯度流动力学的动态性,发现在足够小的初始化下,神经网络的权重在训练早期阶段保持较小规范,并且沿着神经相关函数的KKT点方向近似收敛。 - 在实践中,深度神经网络通常能够轻松地插值其训练数据。为了理解这一现象,许多研究都旨在量化神经网络架构的记忆能力:即在任意放置这些点并任意分配标签的情况下,架构能够插值的最大点数。然而,对于实际数据,人们直觉地期望存在一种良性结构,使得插值在比记忆能力建议的较小网络尺寸上已经发生。在本文中,我们通过采用实例特定的观点来研究插值。我们引入了一个简单的随机算法,它可以在多项式时间内给定一个固定的有限数据集和两个类的情况下,以很高的概率构建出一个插值三层神经网络。所需的参数数量与这两个类的几何特性及其相互排列有关。因此,我们获得了与训练数据规模无关的保证。 + 本文研究了训练深度齐次神经网络时梯度流动力学的动态性,这些网络从小初始化开始。本文考虑到具有局部Lipschitz梯度和阶数严格大于两的神经网络。文章证明了对于足够小的初始化,在训练的早期阶段,神经网络的权重保持规范较小,并且在Karush-Kuhn-Tucker (KKT)点处近似沿着神经相关函数的方向收敛。此外,对于平方损失并在神经网络权重上进行可分离假设的情况下,还展示了在损失函数的某些鞍点附近梯度流动动态的类似方向性收敛。 - In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of t + arXiv:2403.08121v1 Announce Type: new Abstract: This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks, starting with small initializations. The present work considers neural networks that are assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. This paper demonstrates that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker (KKT) points of the neural correlation function introduced in [1]. Additionally, for square loss and under a separability assumption on the weights of neural networks, a similar directional convergence of gradient flow dynamics is shown near certain saddle points of the loss function. -[^4]: 评分差值流模型用于隐式生成建模 +[^4]: RNNs还不是Transformer:在上下文检索中的关键瓶颈 - The Score-Difference Flow for Implicit Generative Modeling. (arXiv:2304.12906v1 [cs.LG]) + RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval - [http://arxiv.org/abs/2304.12906](http://arxiv.org/abs/2304.12906) + [https://arxiv.org/abs/2402.18510](https://arxiv.org/abs/2402.18510) - 本文提出了一种新的评分差异流模型(SD flow),它可以最优地减少两个分布之间的散度,同时解决Schr​​ödinger桥问题。与去噪扩散模型不同,它没有对先验分布施加任何限制,在一些基准数据集中优于其他方法。 + 本文研究了RNNs和Transformer在处理算法问题时的表现能力差距,发现RNNs存在关键瓶颈,即无法完美地从上下文中检索信息,导致无法像Transformer那样轻松解决需要这种能力的任务。 - 隐式生成建模(IGM)旨在生成符合目标数据分布特征的合成数据样本。最近的研究(例如评分匹配网络、扩散模型)从通过环境空间中的动态扰动或流将合成源数据推向目标分布的角度解决了IGM问题。我们引入了任意目标和源分布之间的评分差异(SD)作为流,它可以最优地减少它们之间的Kullback-Leibler散度,同时解决Schr​​ödinger桥问题。我们将SD流应用于方便的代理分布,当且仅当原始分布对齐时,它们是对齐的。我们在某些条件下展示了这种公式与去噪扩散模型的形式一致性。然而,与扩散模型不同,SD流没有对先验分布施加任何限制。我们还表明,在无限辨别器能力的极限下,生成对抗网络的训练包含SD流。我们的实验表明,SD流在几个基准数据集上优于先前的最新技术。 + 本文探讨循环神经网络(RNNs)和Transformer在解决算法问题时的表示能力差距。我们重点关注RNNs是否能在处理长序列时,通过Chain-of-Thought (CoT)提示,与Transformer的性能相匹配。我们的理论分析显示CoT可以改进RNNs,但无法弥补与Transformer之间的差距。关键瓶颈在于RNNs无法完全从上下文中检索信息,即使经过CoT的增强:对于几个明确或隐式需要这种能力的任务,如联想召回和确定图是否为树,我们证明RNNs表达能力不足以解决这些任务,而Transformer可以轻松解决。相反,我们证明采用增强RNNs上下文检索能力的技术,包括 - Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. We introduce the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schr\"odinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. However, unlike diffusion models, SD flow places no restrictions on the prior distribution. We also show that the training of generative adversarial networks includ + arXiv:2402.18510v1 Announce Type: cross Abstract: This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems. We focus on understanding whether RNNs, known for their memory efficiency in handling long sequences, can match the performance of Transformers, particularly when enhanced with Chain-of-Thought (CoT) prompting. Our theoretical analysis reveals that CoT improves RNNs but is insufficient to close the gap with Transformers. A key bottleneck lies in the inability of RNNs to perfectly retrieve information from the context, even with CoT: for several tasks that explicitly or implicitly require this capability, such as associative recall and determining if a graph is a tree, we prove that RNNs are not expressive enough to solve the tasks while Transformers can solve them with ease. Conversely, we prove that adopting techniques to enhance the in-context retrieval capability of RNNs, inclu + +[^5]: 学习可解释概念:统一因果表示学习与基础模型 + + Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models + + [https://arxiv.org/abs/2402.09236](https://arxiv.org/abs/2402.09236) + + 本研究将因果表示学习和基础模型相结合,研究了如何从数据中学习人类可解释的概念。实验证明了这一统一方法的实用性。 + + + + 构建智能机器学习系统有两种广泛的方法。一种方法是构建天生可解释的模型,这是因果表示学习领域的努力方向。另一种方法是构建高性能的基础模型,然后投入努力去理解它们的工作原理。本研究将这两种方法联系起来,研究如何从数据中学习人类可解释的概念。通过结合这两个领域的思想,我们正式定义了概念的概念,并展示了它们可以从多样的数据中被可靠地恢复出来。对于合成数据和大型语言模型的实验证明了我们统一方法的实用性。 + + arXiv:2402.09236v1 Announce Type: cross Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach. + +[^6]: 动态增量优化用于最佳子集选择 + + Dynamic Incremental Optimization for Best Subset Selection + + [https://arxiv.org/abs/2402.02322](https://arxiv.org/abs/2402.02322) + + 本文研究了一类$\ell_0$正则化问题的对偶形式,并提出了一种高效的原对偶算法,通过充分利用对偶范围估计和增量策略,提高了最佳子集选择问题的解决方案的效率和统计性质。 + + + + 最佳子集选择被认为是稀疏学习问题的“黄金标准”。已经提出了各种优化技术来攻击这个非光滑非凸问题。本文研究了一类$\ell_0$正则化问题的对偶形式。基于原始问题和对偶问题的结构,我们提出了一种高效的原对偶算法。通过充分利用对偶范围估计和增量策略,我们的算法潜在地减少了冗余计算并改进了最佳子集选择的解决方案。理论分析和对合成和真实数据集的实验验证了所提出解决方案的效率和统计性质。 + + Best subset selection is considered the `gold standard' for many sparse learning problems. A variety of optimization techniques have been proposed to attack this non-smooth non-convex problem. In this paper, we investigate the dual forms of a family of $\ell_0$-regularized problems. An efficient primal-dual algorithm is developed based on the primal and dual problem structures. By leveraging the dual range estimation along with the incremental strategy, our algorithm potentially reduces redundant computation and improves the solutions of best subset selection. Theoretical analysis and experiments on synthetic and real-world datasets validate the efficiency and statistical properties of the proposed solutions. + +[^7]: 弹性贝叶斯g形式在具有时变混杂的因果生存分析中的应用 + + A flexible Bayesian g-formula for causal survival analyses with time-dependent confounding + + [https://arxiv.org/abs/2402.02306](https://arxiv.org/abs/2402.02306) + + 本文提出了一种更灵活的贝叶斯g形式估计器,用于具有时变混杂的因果生存分析。它采用贝叶斯附加回归树来模拟时变生成组件,并引入了纵向平衡分数以降低模型错误规范引起的偏差。 + + + + 在具有时间至事件结果的纵向观察性研究中,因果分析的常见目标是在研究群体中估计在假设干预情景下的因果生存曲线。g形式是这种分析的一个特别有用的工具。为了增强传统的参数化g形式方法,我们开发了一种更灵活的贝叶斯g形式估计器。该估计器同时支持纵向预测和因果推断。它在模拟时变生成组件的建模中引入了贝叶斯附加回归树,旨在减轻由于模型错误规范造成的偏差。具体而言,我们引入了一类更通用的离散生存数据g形式。这些公式可以引入纵向平衡分数,这在处理越来越多的时变混杂因素时是一种有效的降维方法。 + + In longitudinal observational studies with a time-to-event outcome, a common objective in causal analysis is to estimate the causal survival curve under hypothetical intervention scenarios within the study cohort. The g-formula is a particularly useful tool for this analysis. To enhance the traditional parametric g-formula approach, we developed a more adaptable Bayesian g-formula estimator. This estimator facilitates both longitudinal predictive and causal inference. It incorporates Bayesian additive regression trees in the modeling of the time-evolving generative components, aiming to mitigate bias due to model misspecification. Specifically, we introduce a more general class of g-formulas for discrete survival data. These formulas can incorporate the longitudinal balancing scores, which serve as an effective method for dimension reduction and are vital when dealing with an expanding array of time-varying confounders. The minimum sufficient formulation of these longitudinal balancing + +[^8]: 使用PAC-Bayes界限同时控制多个错误 + + Controlling Multiple Errors Simultaneously with a PAC-Bayes Bound + + [https://arxiv.org/abs/2202.05560](https://arxiv.org/abs/2202.05560) + + 该研究提出了一种PAC-Bayes界限,能够同时控制多个错误,并提供丰富的信息,适用于回归中测试损失分布或分类中不同错误分类的概率。 + + + + 当前的PAC-Bayes泛化界限仅限于性能的标量度量,如损失或错误率。我们提供了第一个能够提供丰富信息的PAC-Bayes界限,通过界定一组M种错误类型的经验概率与真实概率之间的Kullback-Leibler差异来控制可能结果的整个分布。 + + arXiv:2202.05560v2 Announce Type: replace-cross Abstract: Current PAC-Bayes generalisation bounds are restricted to scalar metrics of performance, such as the loss or error rate. However, one ideally wants more information-rich certificates that control the entire distribution of possible outcomes, such as the distribution of the test loss in regression, or the probabilities of different mis classifications. We provide the first PAC-Bayes bound capable of providing such rich information by bounding the Kullback-Leibler divergence between the empirical and true probabilities of a set of M error types, which can either be discretized loss values for regression, or the elements of the confusion matrix (or a partition thereof) for classification. We transform our bound into a differentiable training objective. Our bound is especially useful in cases where the severity of different mis-classifications may change over time; existing PAC-Bayes bounds can only bound a particular pre-decided w + +[^9]: 循环模型中含有隐藏因变量的因果发现方法的比较研究 + + Comparative Study of Causal Discovery Methods for Cyclic Models with Hidden Confounders. (arXiv:2401.13009v1 [cs.LG]) + + [http://arxiv.org/abs/2401.13009](http://arxiv.org/abs/2401.13009) + + 对于循环模型中含有隐藏因变量的因果发现,已经出现了能够处理这种情况的多种技术方法。 + + + + 如今,对因果发现的需求无处不在。理解系统中部分之间的随机依赖性以及实际的因果关系对科学的各个部分都至关重要。因此,寻找可靠的方法来检测因果方向的需求不断增长。在过去的50年里,出现了许多因果发现算法,但大多数仅适用于系统没有反馈环路并且具有因果充分性的假设,即没有未测量的子系统能够影响多个已测量变量。这是不幸的,因为这些限制在实践中往往不能假定。反馈是许多过程的一个重要特性,现实世界的系统很少是完全隔离和完全测量的。幸运的是,在最近几年中,已经发展了几种能够处理循环的、因果不充分的系统的技术。随着多种方法的出现,一种实际的应用方法开始变得可能。 + + Nowadays, the need for causal discovery is ubiquitous. A better understanding of not just the stochastic dependencies between parts of a system, but also the actual cause-effect relations, is essential for all parts of science. Thus, the need for reliable methods to detect causal directions is growing constantly. In the last 50 years, many causal discovery algorithms have emerged, but most of them are applicable only under the assumption that the systems have no feedback loops and that they are causally sufficient, i.e. that there are no unmeasured subsystems that can affect multiple measured variables. This is unfortunate since those restrictions can often not be presumed in practice. Feedback is an integral feature of many processes, and real-world systems are rarely completely isolated and fully measured. Fortunately, in recent years, several techniques, that can cope with cyclic, causally insufficient systems, have been developed. And with multiple methods available, a practical ap + +[^10]: 统计测试替代人类决策者的算法 + + Statistical Tests for Replacing Human Decision Makers with Algorithms. (arXiv:2306.11689v1 [econ.EM]) + + [http://arxiv.org/abs/2306.11689](http://arxiv.org/abs/2306.11689) + + 本文提出了一种利用人工智能改善人类决策的统计框架,通过基准测试与机器预测,替换部分人类决策者的决策制定,并经过实验检验得出算法具有更高的真阳性率和更低的假阳性率,尤其是来自农村地区的医生的诊断更容易被替代。 + + + + 本文提出了一个统计框架,可以通过人工智能来改善人类的决策。首先将每个人类决策者的表现与机器预测进行基准测试;然后用所提出的人工智能算法的建议替换决策制定者的一个子集所做出的决策。利用全国大型孕产结果和繁殖年龄夫妇孕前检查的医生诊断数据集,我们试验了一种启发式高频率方法以及一种贝叶斯后验损失函数方法,并将其应用于异常出生检测。我们发现,我们的算法在一个测试数据集上的结果比仅由医生诊断的结果具有更高的总体真阳性率和更低的假阳性率。我们还发现,来自农村地区的医生的诊断更容易被替代,这表明人工智能辅助决策制定更容易提高精确度。 + + This paper proposes a statistical framework with which artificial intelligence can improve human decision making. The performance of each human decision maker is first benchmarked against machine predictions; we then replace the decisions made by a subset of the decision makers with the recommendation from the proposed artificial intelligence algorithm. Using a large nationwide dataset of pregnancy outcomes and doctor diagnoses from prepregnancy checkups of reproductive age couples, we experimented with both a heuristic frequentist approach and a Bayesian posterior loss function approach with an application to abnormal birth detection. We find that our algorithm on a test dataset results in a higher overall true positive rate and a lower false positive rate than the diagnoses made by doctors only. We also find that the diagnoses of doctors from rural areas are more frequently replaceable, suggesting that artificial intelligence assisted decision making tends to improve precision more i diff --git a/stat.ML.xml b/stat.ML.xml index 5ed2eefcd..66bc9560c 100644 --- a/stat.ML.xml +++ b/stat.ML.xml @@ -1,81 +1,201 @@ -Chat Arxiv stat.MLhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for stat.ML使用Voronoi候选点边界可以在贝叶斯优化中有效地优化黑盒函数,提高了多起始连续搜索的执行时间。https://arxiv.org/abs/2402.04922<p> -Voronoi Candidates用于贝叶斯优化 +Chat Arxiv stat.MLhttps://github.com/qhduan/cn-chat-arxivThis is arxiv RSS feed for stat.ML介绍了机器学习中的函数双层优化问题,提出了不依赖于强凸假设的方法,并展示了在仪表回归和强化学习任务中使用神经网络的优势。https://arxiv.org/abs/2403.20233<p> +机器学习中的函数双层优化 </p> <p> -Voronoi Candidates for Bayesian Optimization +Functional Bilevel Optimization for Machine Learning </p> <p> -https://arxiv.org/abs/2402.04922 +https://arxiv.org/abs/2403.20233 </p> <p> -使用Voronoi候选点边界可以在贝叶斯优化中有效地优化黑盒函数,提高了多起始连续搜索的执行时间。 +介绍了机器学习中的函数双层优化问题,提出了不依赖于强凸假设的方法,并展示了在仪表回归和强化学习任务中使用神经网络的优势。 </p> <p> </p> <p> -贝叶斯优化(BO)为高效优化黑盒函数提供了一种优雅的方法。然而,采集准则需要进行具有挑战性的内部优化,这可能引起很大的开销。许多实际的BO方法,尤其是在高维情况下,不采用对采集函数进行形式化连续优化,而是在有限的空间填充候选集上进行离散搜索。在这里,我们提议使用候选点,其位于当前设计点的Voronoi镶嵌边界上,因此它们与两个或多个设计点等距离。我们讨论了通过直接采样Voronoi边界而不明确生成镶嵌的策略,从而适应高维度中的大设计。通过使用高斯过程和期望改进来对一组测试问题进行优化,我们的方法在不损失准确性的情况下显著提高了多起始连续搜索的执行时间。 +在本文中,我们介绍了针对机器学习中的双层优化问题的一种新的函数视角,其中内部目标在函数空间上被最小化。这些类型的问题通常通过在参数设置下开发的方法来解决,其中内部目标对于预测函数的参数强凸。函数视角不依赖于此假设,特别允许使用超参数化的神经网络作为内部预测函数。我们提出了可扩展和高效的算法来解决函数双层优化问题,并展示了我们方法在适合自然函数双层结构的仪表回归和强化学习任务上的优势。 </p> <p> -Bayesian optimization (BO) offers an elegant approach for efficiently optimizing black-box functions. However, acquisition criteria demand their own challenging inner-optimization, which can induce significant overhead. Many practical BO methods, particularly in high dimension, eschew a formal, continuous optimization of the acquisition function and instead search discretely over a finite set of space-filling candidates. Here, we propose to use candidates which lie on the boundary of the Voronoi tessellation of the current design points, so they are equidistant to two or more of them. We discuss strategies for efficient implementation by directly sampling the Voronoi boundary without explicitly generating the tessellation, thus accommodating large designs in high dimension. On a battery of test problems optimized via Gaussian processes with expected improvement, our proposed approach significantly improves the execution time of a multi-start continuous search without a loss in accuracy -</p>这篇文章介绍了用于潜在高斯过程模型中的Vecchia-Laplace近似法的迭代方法,相比于传统的Cholesky分解方法,可以显著加快计算速度。http://arxiv.org/abs/2310.12000<p> -Vecchia-Laplace近似法在潜在高斯过程模型中的迭代方法 +arXiv:2403.20233v1 Announce Type: cross Abstract: In this paper, we introduce a new functional point of view on bilevel optimization problems for machine learning, where the inner objective is minimized over a function space. These types of problems are most often solved by using methods developed in the parametric setting, where the inner objective is strongly convex with respect to the parameters of the prediction function. The functional point of view does not rely on this assumption and notably allows using over-parameterized neural networks as the inner prediction function. We propose scalable and efficient algorithms for the functional bilevel optimization problem and illustrate the benefits of our approach on instrumental regression and reinforcement learning tasks, which admit natural functional bilevel structures. +</p>在未观测混杂因素的情况下,本文展示了即使在放宽或甚至在排除所有相关风险因素被观测到的假设的情况下,仍然可以给出对高风险个体分配率的信息丰富的界限。https://arxiv.org/abs/2403.14713<p> +在未观测混杂因素下审计公平性 </p> <p> -Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models. (arXiv:2310.12000v1 [stat.ME]) +Auditing Fairness under Unobserved Confounding </p> <p> -http://arxiv.org/abs/2310.12000 +https://arxiv.org/abs/2403.14713 </p> <p> -这篇文章介绍了用于潜在高斯过程模型中的Vecchia-Laplace近似法的迭代方法,相比于传统的Cholesky分解方法,可以显著加快计算速度。 +在未观测混杂因素的情况下,本文展示了即使在放宽或甚至在排除所有相关风险因素被观测到的假设的情况下,仍然可以给出对高风险个体分配率的信息丰富的界限。 </p> <p> </p> <p> -潜在高斯过程(GP)模型是灵活的概率非参数函数模型。Vecchia近似是用于克服大数据计算瓶颈的准确近似方法,Laplace近似是一种快速方法,可以近似非高斯似然函数的边缘似然和后验预测分布,并具有渐近收敛保证。然而,当与直接求解方法(如Cholesky分解)结合使用时,Vecchia-Laplace近似的计算复杂度增长超线性地随样本大小增加。因此,与Vecchia-Laplace近似计算相关的运算在通常情况下是最准确的大型数据集时会变得非常缓慢。在本文中,我们提出了几种用于Vecchia-Laplace近似推断的迭代方法,相比于基于Cholesky的计算,可以大大加快计算速度。我们对我们的方法进行了分析。 +决策系统中的一个基本问题是跨越人口统计线存在不公平性。然而,不公平性可能难以量化,特别是如果我们对公平性的理解依赖于难以衡量的风险等观念(例如,对于那些没有其治疗就会死亡的人平等获得治疗)。审计这种不公平性需要准确测量个体风险,而在未观测混杂的现实环境中,难以估计。在这些未观测到的因素“解释”明显差异的情况下,我们可能低估或高估不公平性。在本文中,我们展示了即使在放宽或(令人惊讶地)甚至在排除所有相关风险因素被观测到的假设的情况下,仍然可以对高风险个体的分配率给出信息丰富的界限。我们利用了在许多实际环境中(例如引入新型治疗)我们拥有在任何分配之前的数据的事实。 </p> <p> -Latent Gaussian process (GP) models are flexible probabilistic non-parametric function models. Vecchia approximations are accurate approximations for GPs to overcome computational bottlenecks for large data, and the Laplace approximation is a fast method with asymptotic convergence guarantees to approximate marginal likelihoods and posterior predictive distributions for non-Gaussian likelihoods. Unfortunately, the computational complexity of combined Vecchia-Laplace approximations grows faster than linearly in the sample size when used in combination with direct solver methods such as the Cholesky decomposition. Computations with Vecchia-Laplace approximations thus become prohibitively slow precisely when the approximations are usually the most accurate, i.e., on large data sets. In this article, we present several iterative methods for inference with Vecchia-Laplace approximations which make computations considerably faster compared to Cholesky-based calculations. We analyze our propo -</p>本文研究了神经网络的插值问题,提出了一种简单的随机算法,在给定的数据集和两个类的情况下,能够以很高的概率构建一个插值的神经网络。这些结果与训练数据规模无关。http://arxiv.org/abs/2310.00327<p> -神经网络的记忆化:超越最坏情况 +arXiv:2403.14713v1 Announce Type: cross Abstract: A fundamental problem in decision-making systems is the presence of inequity across demographic lines. However, inequity can be difficult to quantify, particularly if our notion of equity relies on hard-to-measure notions like risk (e.g., equal access to treatment for those who would die without it). Auditing such inequity requires accurate measurements of individual risk, which is difficult to estimate in the realistic setting of unobserved confounding. In the case that these unobservables "explain" an apparent disparity, we may understate or overstate inequity. In this paper, we show that one can still give informative bounds on allocation rates among high-risk individuals, even while relaxing or (surprisingly) even when eliminating the assumption that all relevant risk factors are observed. We utilize the fact that in many real-world settings (e.g., the introduction of a novel treatment) we have data from a period prior to any alloc +</p>本文研究了训练深度齐次神经网络时梯度流动力学的动态性,发现在足够小的初始化下,神经网络的权重在训练早期阶段保持较小规范,并且沿着神经相关函数的KKT点方向近似收敛。https://arxiv.org/abs/2403.08121<p> +早期方向性收敛在深度齐次神经网络中进行小初始化时的分析 </p> <p> -Memorization with neural nets: going beyond the worst case. (arXiv:2310.00327v1 [stat.ML]) +Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations </p> <p> -http://arxiv.org/abs/2310.00327 +https://arxiv.org/abs/2403.08121 </p> <p> -本文研究了神经网络的插值问题,提出了一种简单的随机算法,在给定的数据集和两个类的情况下,能够以很高的概率构建一个插值的神经网络。这些结果与训练数据规模无关。 +本文研究了训练深度齐次神经网络时梯度流动力学的动态性,发现在足够小的初始化下,神经网络的权重在训练早期阶段保持较小规范,并且沿着神经相关函数的KKT点方向近似收敛。 </p> <p> </p> <p> -在实践中,深度神经网络通常能够轻松地插值其训练数据。为了理解这一现象,许多研究都旨在量化神经网络架构的记忆能力:即在任意放置这些点并任意分配标签的情况下,架构能够插值的最大点数。然而,对于实际数据,人们直觉地期望存在一种良性结构,使得插值在比记忆能力建议的较小网络尺寸上已经发生。在本文中,我们通过采用实例特定的观点来研究插值。我们引入了一个简单的随机算法,它可以在多项式时间内给定一个固定的有限数据集和两个类的情况下,以很高的概率构建出一个插值三层神经网络。所需的参数数量与这两个类的几何特性及其相互排列有关。因此,我们获得了与训练数据规模无关的保证。 +本文研究了训练深度齐次神经网络时梯度流动力学的动态性,这些网络从小初始化开始。本文考虑到具有局部Lipschitz梯度和阶数严格大于两的神经网络。文章证明了对于足够小的初始化,在训练的早期阶段,神经网络的权重保持规范较小,并且在Karush-Kuhn-Tucker (KKT)点处近似沿着神经相关函数的方向收敛。此外,对于平方损失并在神经网络权重上进行可分离假设的情况下,还展示了在损失函数的某些鞍点附近梯度流动动态的类似方向性收敛。 </p> <p> -In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of t -</p>本文提出了一种新的评分差异流模型(SD flow),它可以最优地减少两个分布之间的散度,同时解决Schr​​ödinger桥问题。与去噪扩散模型不同,它没有对先验分布施加任何限制,在一些基准数据集中优于其他方法。http://arxiv.org/abs/2304.12906<p> -评分差值流模型用于隐式生成建模 +arXiv:2403.08121v1 Announce Type: new Abstract: This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks, starting with small initializations. The present work considers neural networks that are assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. This paper demonstrates that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker (KKT) points of the neural correlation function introduced in [1]. Additionally, for square loss and under a separability assumption on the weights of neural networks, a similar directional convergence of gradient flow dynamics is shown near certain saddle points of the loss function. +</p>本文研究了RNNs和Transformer在处理算法问题时的表现能力差距,发现RNNs存在关键瓶颈,即无法完美地从上下文中检索信息,导致无法像Transformer那样轻松解决需要这种能力的任务。https://arxiv.org/abs/2402.18510<p> +RNNs还不是Transformer:在上下文检索中的关键瓶颈 </p> <p> -The Score-Difference Flow for Implicit Generative Modeling. (arXiv:2304.12906v1 [cs.LG]) +RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval </p> <p> -http://arxiv.org/abs/2304.12906 +https://arxiv.org/abs/2402.18510 </p> <p> -本文提出了一种新的评分差异流模型(SD flow),它可以最优地减少两个分布之间的散度,同时解决Schr​​ödinger桥问题。与去噪扩散模型不同,它没有对先验分布施加任何限制,在一些基准数据集中优于其他方法。 +本文研究了RNNs和Transformer在处理算法问题时的表现能力差距,发现RNNs存在关键瓶颈,即无法完美地从上下文中检索信息,导致无法像Transformer那样轻松解决需要这种能力的任务。 </p> <p> </p> <p> -隐式生成建模(IGM)旨在生成符合目标数据分布特征的合成数据样本。最近的研究(例如评分匹配网络、扩散模型)从通过环境空间中的动态扰动或流将合成源数据推向目标分布的角度解决了IGM问题。我们引入了任意目标和源分布之间的评分差异(SD)作为流,它可以最优地减少它们之间的Kullback-Leibler散度,同时解决Schr​​ödinger桥问题。我们将SD流应用于方便的代理分布,当且仅当原始分布对齐时,它们是对齐的。我们在某些条件下展示了这种公式与去噪扩散模型的形式一致性。然而,与扩散模型不同,SD流没有对先验分布施加任何限制。我们还表明,在无限辨别器能力的极限下,生成对抗网络的训练包含SD流。我们的实验表明,SD流在几个基准数据集上优于先前的最新技术。 +本文探讨循环神经网络(RNNs)和Transformer在解决算法问题时的表示能力差距。我们重点关注RNNs是否能在处理长序列时,通过Chain-of-Thought (CoT)提示,与Transformer的性能相匹配。我们的理论分析显示CoT可以改进RNNs,但无法弥补与Transformer之间的差距。关键瓶颈在于RNNs无法完全从上下文中检索信息,即使经过CoT的增强:对于几个明确或隐式需要这种能力的任务,如联想召回和确定图是否为树,我们证明RNNs表达能力不足以解决这些任务,而Transformer可以轻松解决。相反,我们证明采用增强RNNs上下文检索能力的技术,包括 </p> <p> -Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. We introduce the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schr\"odinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. However, unlike diffusion models, SD flow places no restrictions on the prior distribution. We also show that the training of generative adversarial networks includ +arXiv:2402.18510v1 Announce Type: cross Abstract: This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems. We focus on understanding whether RNNs, known for their memory efficiency in handling long sequences, can match the performance of Transformers, particularly when enhanced with Chain-of-Thought (CoT) prompting. Our theoretical analysis reveals that CoT improves RNNs but is insufficient to close the gap with Transformers. A key bottleneck lies in the inability of RNNs to perfectly retrieve information from the context, even with CoT: for several tasks that explicitly or implicitly require this capability, such as associative recall and determining if a graph is a tree, we prove that RNNs are not expressive enough to solve the tasks while Transformers can solve them with ease. Conversely, we prove that adopting techniques to enhance the in-context retrieval capability of RNNs, inclu +</p>本研究将因果表示学习和基础模型相结合,研究了如何从数据中学习人类可解释的概念。实验证明了这一统一方法的实用性。https://arxiv.org/abs/2402.09236<p> +学习可解释概念:统一因果表示学习与基础模型 +</p> +<p> +Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models +</p> +<p> +https://arxiv.org/abs/2402.09236 +</p> +<p> +本研究将因果表示学习和基础模型相结合,研究了如何从数据中学习人类可解释的概念。实验证明了这一统一方法的实用性。 +</p> +<p> + +</p> +<p> +构建智能机器学习系统有两种广泛的方法。一种方法是构建天生可解释的模型,这是因果表示学习领域的努力方向。另一种方法是构建高性能的基础模型,然后投入努力去理解它们的工作原理。本研究将这两种方法联系起来,研究如何从数据中学习人类可解释的概念。通过结合这两个领域的思想,我们正式定义了概念的概念,并展示了它们可以从多样的数据中被可靠地恢复出来。对于合成数据和大型语言模型的实验证明了我们统一方法的实用性。 +</p> +<p> +arXiv:2402.09236v1 Announce Type: cross Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach. +</p>本文研究了一类$\ell_0$正则化问题的对偶形式,并提出了一种高效的原对偶算法,通过充分利用对偶范围估计和增量策略,提高了最佳子集选择问题的解决方案的效率和统计性质。https://arxiv.org/abs/2402.02322<p> +动态增量优化用于最佳子集选择 +</p> +<p> +Dynamic Incremental Optimization for Best Subset Selection +</p> +<p> +https://arxiv.org/abs/2402.02322 +</p> +<p> +本文研究了一类$\ell_0$正则化问题的对偶形式,并提出了一种高效的原对偶算法,通过充分利用对偶范围估计和增量策略,提高了最佳子集选择问题的解决方案的效率和统计性质。 +</p> +<p> + +</p> +<p> +最佳子集选择被认为是稀疏学习问题的“黄金标准”。已经提出了各种优化技术来攻击这个非光滑非凸问题。本文研究了一类$\ell_0$正则化问题的对偶形式。基于原始问题和对偶问题的结构,我们提出了一种高效的原对偶算法。通过充分利用对偶范围估计和增量策略,我们的算法潜在地减少了冗余计算并改进了最佳子集选择的解决方案。理论分析和对合成和真实数据集的实验验证了所提出解决方案的效率和统计性质。 +</p> +<p> +Best subset selection is considered the `gold standard' for many sparse learning problems. A variety of optimization techniques have been proposed to attack this non-smooth non-convex problem. In this paper, we investigate the dual forms of a family of $\ell_0$-regularized problems. An efficient primal-dual algorithm is developed based on the primal and dual problem structures. By leveraging the dual range estimation along with the incremental strategy, our algorithm potentially reduces redundant computation and improves the solutions of best subset selection. Theoretical analysis and experiments on synthetic and real-world datasets validate the efficiency and statistical properties of the proposed solutions. +</p>本文提出了一种更灵活的贝叶斯g形式估计器,用于具有时变混杂的因果生存分析。它采用贝叶斯附加回归树来模拟时变生成组件,并引入了纵向平衡分数以降低模型错误规范引起的偏差。https://arxiv.org/abs/2402.02306<p> +弹性贝叶斯g形式在具有时变混杂的因果生存分析中的应用 +</p> +<p> +A flexible Bayesian g-formula for causal survival analyses with time-dependent confounding +</p> +<p> +https://arxiv.org/abs/2402.02306 +</p> +<p> +本文提出了一种更灵活的贝叶斯g形式估计器,用于具有时变混杂的因果生存分析。它采用贝叶斯附加回归树来模拟时变生成组件,并引入了纵向平衡分数以降低模型错误规范引起的偏差。 +</p> +<p> + +</p> +<p> +在具有时间至事件结果的纵向观察性研究中,因果分析的常见目标是在研究群体中估计在假设干预情景下的因果生存曲线。g形式是这种分析的一个特别有用的工具。为了增强传统的参数化g形式方法,我们开发了一种更灵活的贝叶斯g形式估计器。该估计器同时支持纵向预测和因果推断。它在模拟时变生成组件的建模中引入了贝叶斯附加回归树,旨在减轻由于模型错误规范造成的偏差。具体而言,我们引入了一类更通用的离散生存数据g形式。这些公式可以引入纵向平衡分数,这在处理越来越多的时变混杂因素时是一种有效的降维方法。 +</p> +<p> +In longitudinal observational studies with a time-to-event outcome, a common objective in causal analysis is to estimate the causal survival curve under hypothetical intervention scenarios within the study cohort. The g-formula is a particularly useful tool for this analysis. To enhance the traditional parametric g-formula approach, we developed a more adaptable Bayesian g-formula estimator. This estimator facilitates both longitudinal predictive and causal inference. It incorporates Bayesian additive regression trees in the modeling of the time-evolving generative components, aiming to mitigate bias due to model misspecification. Specifically, we introduce a more general class of g-formulas for discrete survival data. These formulas can incorporate the longitudinal balancing scores, which serve as an effective method for dimension reduction and are vital when dealing with an expanding array of time-varying confounders. The minimum sufficient formulation of these longitudinal balancing +</p>该研究提出了一种PAC-Bayes界限,能够同时控制多个错误,并提供丰富的信息,适用于回归中测试损失分布或分类中不同错误分类的概率。https://arxiv.org/abs/2202.05560<p> +使用PAC-Bayes界限同时控制多个错误 +</p> +<p> +Controlling Multiple Errors Simultaneously with a PAC-Bayes Bound +</p> +<p> +https://arxiv.org/abs/2202.05560 +</p> +<p> +该研究提出了一种PAC-Bayes界限,能够同时控制多个错误,并提供丰富的信息,适用于回归中测试损失分布或分类中不同错误分类的概率。 +</p> +<p> + +</p> +<p> +当前的PAC-Bayes泛化界限仅限于性能的标量度量,如损失或错误率。我们提供了第一个能够提供丰富信息的PAC-Bayes界限,通过界定一组M种错误类型的经验概率与真实概率之间的Kullback-Leibler差异来控制可能结果的整个分布。 +</p> +<p> +arXiv:2202.05560v2 Announce Type: replace-cross Abstract: Current PAC-Bayes generalisation bounds are restricted to scalar metrics of performance, such as the loss or error rate. However, one ideally wants more information-rich certificates that control the entire distribution of possible outcomes, such as the distribution of the test loss in regression, or the probabilities of different mis classifications. We provide the first PAC-Bayes bound capable of providing such rich information by bounding the Kullback-Leibler divergence between the empirical and true probabilities of a set of M error types, which can either be discretized loss values for regression, or the elements of the confusion matrix (or a partition thereof) for classification. We transform our bound into a differentiable training objective. Our bound is especially useful in cases where the severity of different mis-classifications may change over time; existing PAC-Bayes bounds can only bound a particular pre-decided w +</p>对于循环模型中含有隐藏因变量的因果发现,已经出现了能够处理这种情况的多种技术方法。http://arxiv.org/abs/2401.13009<p> +循环模型中含有隐藏因变量的因果发现方法的比较研究 +</p> +<p> +Comparative Study of Causal Discovery Methods for Cyclic Models with Hidden Confounders. (arXiv:2401.13009v1 [cs.LG]) +</p> +<p> +http://arxiv.org/abs/2401.13009 +</p> +<p> +对于循环模型中含有隐藏因变量的因果发现,已经出现了能够处理这种情况的多种技术方法。 +</p> +<p> + +</p> +<p> +如今,对因果发现的需求无处不在。理解系统中部分之间的随机依赖性以及实际的因果关系对科学的各个部分都至关重要。因此,寻找可靠的方法来检测因果方向的需求不断增长。在过去的50年里,出现了许多因果发现算法,但大多数仅适用于系统没有反馈环路并且具有因果充分性的假设,即没有未测量的子系统能够影响多个已测量变量。这是不幸的,因为这些限制在实践中往往不能假定。反馈是许多过程的一个重要特性,现实世界的系统很少是完全隔离和完全测量的。幸运的是,在最近几年中,已经发展了几种能够处理循环的、因果不充分的系统的技术。随着多种方法的出现,一种实际的应用方法开始变得可能。 +</p> +<p> +Nowadays, the need for causal discovery is ubiquitous. A better understanding of not just the stochastic dependencies between parts of a system, but also the actual cause-effect relations, is essential for all parts of science. Thus, the need for reliable methods to detect causal directions is growing constantly. In the last 50 years, many causal discovery algorithms have emerged, but most of them are applicable only under the assumption that the systems have no feedback loops and that they are causally sufficient, i.e. that there are no unmeasured subsystems that can affect multiple measured variables. This is unfortunate since those restrictions can often not be presumed in practice. Feedback is an integral feature of many processes, and real-world systems are rarely completely isolated and fully measured. Fortunately, in recent years, several techniques, that can cope with cyclic, causally insufficient systems, have been developed. And with multiple methods available, a practical ap +</p>本文提出了一种利用人工智能改善人类决策的统计框架,通过基准测试与机器预测,替换部分人类决策者的决策制定,并经过实验检验得出算法具有更高的真阳性率和更低的假阳性率,尤其是来自农村地区的医生的诊断更容易被替代。http://arxiv.org/abs/2306.11689<p> +统计测试替代人类决策者的算法 +</p> +<p> +Statistical Tests for Replacing Human Decision Makers with Algorithms. (arXiv:2306.11689v1 [econ.EM]) +</p> +<p> +http://arxiv.org/abs/2306.11689 +</p> +<p> +本文提出了一种利用人工智能改善人类决策的统计框架,通过基准测试与机器预测,替换部分人类决策者的决策制定,并经过实验检验得出算法具有更高的真阳性率和更低的假阳性率,尤其是来自农村地区的医生的诊断更容易被替代。 +</p> +<p> + +</p> +<p> +本文提出了一个统计框架,可以通过人工智能来改善人类的决策。首先将每个人类决策者的表现与机器预测进行基准测试;然后用所提出的人工智能算法的建议替换决策制定者的一个子集所做出的决策。利用全国大型孕产结果和繁殖年龄夫妇孕前检查的医生诊断数据集,我们试验了一种启发式高频率方法以及一种贝叶斯后验损失函数方法,并将其应用于异常出生检测。我们发现,我们的算法在一个测试数据集上的结果比仅由医生诊断的结果具有更高的总体真阳性率和更低的假阳性率。我们还发现,来自农村地区的医生的诊断更容易被替代,这表明人工智能辅助决策制定更容易提高精确度。 +</p> +<p> +This paper proposes a statistical framework with which artificial intelligence can improve human decision making. The performance of each human decision maker is first benchmarked against machine predictions; we then replace the decisions made by a subset of the decision makers with the recommendation from the proposed artificial intelligence algorithm. Using a large nationwide dataset of pregnancy outcomes and doctor diagnoses from prepregnancy checkups of reproductive age couples, we experimented with both a heuristic frequentist approach and a Bayesian posterior loss function approach with an application to abnormal birth detection. We find that our algorithm on a test dataset results in a higher overall true positive rate and a lower false positive rate than the diagnoses made by doctors only. We also find that the diagnoses of doctors from rural areas are more frequently replaceable, suggesting that artificial intelligence assisted decision making tends to improve precision more i </p> \ No newline at end of file