Add changes

qhduan · Dec 10, 2024 · 909dbfc · 909dbfc
1 parent f147a55
commit 909dbfc
Show file tree

Hide file tree

Showing 15 changed files with 3,676 additions and 456 deletions.
diff --git a/cs.AI.md b/cs.AI.md
diff --git a/cs.AI.xml b/cs.AI.xml
diff --git a/cs.CL.md b/cs.CL.md
diff --git a/cs.CL.xml b/cs.CL.xml
diff --git a/cs.IR.md b/cs.IR.md
@@ -2,37 +2,22 @@
 
 | Ref | Title | Summary |
 | --- | --- | --- |
-| [^1] | [All-in-One: Heterogeneous Interaction Modeling for Cold-Start Rating Prediction](https://arxiv.org/abs/2403.17740) | 提出了异质交互评分网络（HIRE）框架，通过异质交互模块（HIM）来共同建模异质交互并直接推断重要特征 |
-| [^2] | [TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval.](http://arxiv.org/abs/2401.13509) | 本文提出一种基于Transformer的伪相关反馈模型（TPRF），适用于资源受限的环境。TPRF相比其他深度语言模型在内存占用和推理时间方面具备更小的开销，并能有效地结合来自稠密文具表示的相关反馈信号。 |
+| [^1] | [Croissant: A Metadata Format for ML-Ready Datasets](https://arxiv.org/abs/2403.19546) | Croissant是一种面向机器学习数据集的元数据格式，使数据集更易发现、可移植和互操作，有助于解决ML数据管理和负责任AI中的重要挑战。 |
 
 # 详细
 
-[^1]: 一体化：异质交互建模用于冷启动评分预测
+[^1]: Croissant：一种面向机器学习数据集的元数据格式
 
-    All-in-One: Heterogeneous Interaction Modeling for Cold-Start Rating Prediction
+    Croissant: A Metadata Format for ML-Ready Datasets
 
-    [https://arxiv.org/abs/2403.17740](https://arxiv.org/abs/2403.17740)
+    [https://arxiv.org/abs/2403.19546](https://arxiv.org/abs/2403.19546)
 
-    提出了异质交互评分网络（HIRE）框架，通过异质交互模块（HIM）来共同建模异质交互并直接推断重要特征
+    Croissant是一种面向机器学习数据集的元数据格式，使数据集更易发现、可移植和互操作，有助于解决ML数据管理和负责任AI中的重要挑战。
 
 
 
-    冷启动评分预测是推荐系统中一个基本问题，已得到广泛研究。许多方法已经被提出，利用现有数据之间的显式关系，例如协同过滤、社交推荐和异构信息网络，以缓解冷启动用户和物品的数据不足问题。然而，基于不同角色之间的数据构建的显式关系可能不可靠且无关，从而限制了特定推荐任务的性能上限。受此启发，本文提出了一个灵活的框架，名为异质交互评分网络（HIRE）。HIRE不仅仅依赖于预先定义的交互模式或手动构建的异构信息网络。相反，我们设计了一个异质交互模块（HIM），来共同建模异质交互并直接推断重要特征。
+    数据是机器学习（ML）的关键资源，但处理数据仍然是一个主要的摩擦点。本文介绍了Croissant，一种用于数据集的元数据格式，简化了数据被ML工具和框架使用的方式。Croissant使数据集更易发现、可移植和互操作，从而解决了ML数据管理和负责任AI中的重要挑战。Croissant已得到几个流行数据集库的支持，涵盖数十万个数据集，可以加载到最流行的ML框架中。
 
-    arXiv:2403.17740v1 Announce Type: cross  Abstract: Cold-start rating prediction is a fundamental problem in recommender systems that has been extensively studied. Many methods have been proposed that exploit explicit relations among existing data, such as collaborative filtering, social recommendations and heterogeneous information network, to alleviate the data insufficiency issue for cold-start users and items. However, the explicit relations constructed based on data between different roles may be unreliable and irrelevant, which limits the performance ceiling of the specific recommendation task. Motivated by this, in this paper, we propose a flexible framework dubbed heterogeneous interaction rating network (HIRE). HIRE dose not solely rely on the pre-defined interaction pattern or the manually constructed heterogeneous information network. Instead, we devise a Heterogeneous Interaction Module (HIM) to jointly model the heterogeneous interactions and directly infer the important in
-
-[^2]: TPRF:一种基于Transformer的伪相关反馈模型，用于高效且有效的检索。
-
-    TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval. (arXiv:2401.13509v1 [cs.IR])
-
-    [http://arxiv.org/abs/2401.13509](http://arxiv.org/abs/2401.13509)
-
-    本文提出一种基于Transformer的伪相关反馈模型（TPRF），适用于资源受限的环境。TPRF相比其他深度语言模型在内存占用和推理时间方面具备更小的开销，并能有效地结合来自稠密文具表示的相关反馈信号。
-
-
-
-    本文考虑在资源受限的环境中，如廉价云实例或嵌入式系统（如智能手机和智能手表）中，针对稠密检索器的伪相关反馈（PRF）方法，其中内存和CPU受限，没有GPU。为此，我们提出了一种基于Transformer的PRF方法（TPRF），与采用PRF机制的其他深度语言模型相比，具有更小的内存占用和更快的推理时间，较小的效果损失。TPRF学习如何有效地结合来自稠密文具表示的相关反馈信号。具体而言，TPRF提供了一种建模查询和相关反馈信号之间关系和权重的机制。该方法对所使用的具体稠密表示不加偏见，因此可以广泛应用于任何稠密检索器。
-
-    This paper considers Pseudo-Relevance Feedback (PRF) methods for dense retrievers in a resource constrained environment such as that of cheap cloud instances or embedded systems (e.g., smartphones and smartwatches), where memory and CPU are limited and GPUs are not present. For this, we propose a transformer-based PRF method (TPRF), which has a much smaller memory footprint and faster inference time compared to other deep language models that employ PRF mechanisms, with a marginal effectiveness loss. TPRF learns how to effectively combine the relevance feedback signals from dense passage representations. Specifically, TPRF provides a mechanism for modelling relationships and weights between the query and the relevance feedback signals. The method is agnostic to the specific dense representation used and thus can be generally applied to any dense retriever.
+    arXiv:2403.19546v1 Announce Type: cross  Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.
 
 
diff --git a/cs.IR.xml b/cs.IR.xml
@@ -1,41 +1,21 @@
-<rss version="2.0"><channel><title>Chat Arxiv cs.IR</title><link>https://github.com/qhduan/cn-chat-arxiv</link><description>This is arxiv RSS feed for cs.IR</description><item><title>&#25552;&#20986;&#20102;&#24322;&#36136;&#20132;&#20114;&#35780;&#20998;&#32593;&#32476;&#65288;HIRE&#65289;&#26694;&#26550;&#65292;&#36890;&#36807;&#24322;&#36136;&#20132;&#20114;&#27169;&#22359;&#65288;HIM&#65289;&#26469;&#20849;&#21516;&#24314;&#27169;&#24322;&#36136;&#20132;&#20114;&#24182;&#30452;&#25509;&#25512;&#26029;&#37325;&#35201;&#29305;&#24449;</title><link>https://arxiv.org/abs/2403.17740</link><description>&lt;p&gt;
-&#19968;&#20307;&#21270;&#65306;&#24322;&#36136;&#20132;&#20114;&#24314;&#27169;&#29992;&#20110;&#20919;&#21551;&#21160;&#35780;&#20998;&#39044;&#27979;
+<rss version="2.0"><channel><title>Chat Arxiv cs.IR</title><link>https://github.com/qhduan/cn-chat-arxiv</link><description>This is arxiv RSS feed for cs.IR</description><item><title>Croissant&#26159;&#19968;&#31181;&#38754;&#21521;&#26426;&#22120;&#23398;&#20064;&#25968;&#25454;&#38598;&#30340;&#20803;&#25968;&#25454;&#26684;&#24335;&#65292;&#20351;&#25968;&#25454;&#38598;&#26356;&#26131;&#21457;&#29616;&#12289;&#21487;&#31227;&#26893;&#21644;&#20114;&#25805;&#20316;&#65292;&#26377;&#21161;&#20110;&#35299;&#20915;ML&#25968;&#25454;&#31649;&#29702;&#21644;&#36127;&#36131;&#20219;AI&#20013;&#30340;&#37325;&#35201;&#25361;&#25112;&#12290;</title><link>https://arxiv.org/abs/2403.19546</link><description>&lt;p&gt;
+Croissant&#65306;&#19968;&#31181;&#38754;&#21521;&#26426;&#22120;&#23398;&#20064;&#25968;&#25454;&#38598;&#30340;&#20803;&#25968;&#25454;&#26684;&#24335;
 &lt;/p&gt;
 &lt;p&gt;
-All-in-One: Heterogeneous Interaction Modeling for Cold-Start Rating Prediction
+Croissant: A Metadata Format for ML-Ready Datasets
 &lt;/p&gt;
 &lt;p&gt;
-https://arxiv.org/abs/2403.17740
+https://arxiv.org/abs/2403.19546
 &lt;/p&gt;
 &lt;p&gt;
-&#25552;&#20986;&#20102;&#24322;&#36136;&#20132;&#20114;&#35780;&#20998;&#32593;&#32476;&#65288;HIRE&#65289;&#26694;&#26550;&#65292;&#36890;&#36807;&#24322;&#36136;&#20132;&#20114;&#27169;&#22359;&#65288;HIM&#65289;&#26469;&#20849;&#21516;&#24314;&#27169;&#24322;&#36136;&#20132;&#20114;&#24182;&#30452;&#25509;&#25512;&#26029;&#37325;&#35201;&#29305;&#24449;
+Croissant&#26159;&#19968;&#31181;&#38754;&#21521;&#26426;&#22120;&#23398;&#20064;&#25968;&#25454;&#38598;&#30340;&#20803;&#25968;&#25454;&#26684;&#24335;&#65292;&#20351;&#25968;&#25454;&#38598;&#26356;&#26131;&#21457;&#29616;&#12289;&#21487;&#31227;&#26893;&#21644;&#20114;&#25805;&#20316;&#65292;&#26377;&#21161;&#20110;&#35299;&#20915;ML&#25968;&#25454;&#31649;&#29702;&#21644;&#36127;&#36131;&#20219;AI&#20013;&#30340;&#37325;&#35201;&#25361;&#25112;&#12290;
 &lt;/p&gt;
 &lt;p&gt;
 
 &lt;/p&gt;
 &lt;p&gt;
-&#20919;&#21551;&#21160;&#35780;&#20998;&#39044;&#27979;&#26159;&#25512;&#33616;&#31995;&#32479;&#20013;&#19968;&#20010;&#22522;&#26412;&#38382;&#39064;&#65292;&#24050;&#24471;&#21040;&#24191;&#27867;&#30740;&#31350;&#12290;&#35768;&#22810;&#26041;&#27861;&#24050;&#32463;&#34987;&#25552;&#20986;&#65292;&#21033;&#29992;&#29616;&#26377;&#25968;&#25454;&#20043;&#38388;&#30340;&#26174;&#24335;&#20851;&#31995;&#65292;&#20363;&#22914;&#21327;&#21516;&#36807;&#28388;&#12289;&#31038;&#20132;&#25512;&#33616;&#21644;&#24322;&#26500;&#20449;&#24687;&#32593;&#32476;&#65292;&#20197;&#32531;&#35299;&#20919;&#21551;&#21160;&#29992;&#25143;&#21644;&#29289;&#21697;&#30340;&#25968;&#25454;&#19981;&#36275;&#38382;&#39064;&#12290;&#28982;&#32780;&#65292;&#22522;&#20110;&#19981;&#21516;&#35282;&#33394;&#20043;&#38388;&#30340;&#25968;&#25454;&#26500;&#24314;&#30340;&#26174;&#24335;&#20851;&#31995;&#21487;&#33021;&#19981;&#21487;&#38752;&#19988;&#26080;&#20851;&#65292;&#20174;&#32780;&#38480;&#21046;&#20102;&#29305;&#23450;&#25512;&#33616;&#20219;&#21153;&#30340;&#24615;&#33021;&#19978;&#38480;&#12290;&#21463;&#27492;&#21551;&#21457;&#65292;&#26412;&#25991;&#25552;&#20986;&#20102;&#19968;&#20010;&#28789;&#27963;&#30340;&#26694;&#26550;&#65292;&#21517;&#20026;&#24322;&#36136;&#20132;&#20114;&#35780;&#20998;&#32593;&#32476;&#65288;HIRE&#65289;&#12290;HIRE&#19981;&#20165;&#20165;&#20381;&#36182;&#20110;&#39044;&#20808;&#23450;&#20041;&#30340;&#20132;&#20114;&#27169;&#24335;&#25110;&#25163;&#21160;&#26500;&#24314;&#30340;&#24322;&#26500;&#20449;&#24687;&#32593;&#32476;&#12290;&#30456;&#21453;&#65292;&#25105;&#20204;&#35774;&#35745;&#20102;&#19968;&#20010;&#24322;&#36136;&#20132;&#20114;&#27169;&#22359;&#65288;HIM&#65289;&#65292;&#26469;&#20849;&#21516;&#24314;&#27169;&#24322;&#36136;&#20132;&#20114;&#24182;&#30452;&#25509;&#25512;&#26029;&#37325;&#35201;&#29305;&#24449;&#12290;
+&#25968;&#25454;&#26159;&#26426;&#22120;&#23398;&#20064;&#65288;ML&#65289;&#30340;&#20851;&#38190;&#36164;&#28304;&#65292;&#20294;&#22788;&#29702;&#25968;&#25454;&#20173;&#28982;&#26159;&#19968;&#20010;&#20027;&#35201;&#30340;&#25705;&#25830;&#28857;&#12290;&#26412;&#25991;&#20171;&#32461;&#20102;Croissant&#65292;&#19968;&#31181;&#29992;&#20110;&#25968;&#25454;&#38598;&#30340;&#20803;&#25968;&#25454;&#26684;&#24335;&#65292;&#31616;&#21270;&#20102;&#25968;&#25454;&#34987;ML&#24037;&#20855;&#21644;&#26694;&#26550;&#20351;&#29992;&#30340;&#26041;&#24335;&#12290;Croissant&#20351;&#25968;&#25454;&#38598;&#26356;&#26131;&#21457;&#29616;&#12289;&#21487;&#31227;&#26893;&#21644;&#20114;&#25805;&#20316;&#65292;&#20174;&#32780;&#35299;&#20915;&#20102;ML&#25968;&#25454;&#31649;&#29702;&#21644;&#36127;&#36131;&#20219;AI&#20013;&#30340;&#37325;&#35201;&#25361;&#25112;&#12290;Croissant&#24050;&#24471;&#21040;&#20960;&#20010;&#27969;&#34892;&#25968;&#25454;&#38598;&#24211;&#30340;&#25903;&#25345;&#65292;&#28085;&#30422;&#25968;&#21313;&#19975;&#20010;&#25968;&#25454;&#38598;&#65292;&#21487;&#20197;&#21152;&#36733;&#21040;&#26368;&#27969;&#34892;&#30340;ML&#26694;&#26550;&#20013;&#12290;
 &lt;/p&gt;
 &lt;p&gt;
-arXiv:2403.17740v1 Announce Type: cross  Abstract: Cold-start rating prediction is a fundamental problem in recommender systems that has been extensively studied. Many methods have been proposed that exploit explicit relations among existing data, such as collaborative filtering, social recommendations and heterogeneous information network, to alleviate the data insufficiency issue for cold-start users and items. However, the explicit relations constructed based on data between different roles may be unreliable and irrelevant, which limits the performance ceiling of the specific recommendation task. Motivated by this, in this paper, we propose a flexible framework dubbed heterogeneous interaction rating network (HIRE). HIRE dose not solely rely on the pre-defined interaction pattern or the manually constructed heterogeneous information network. Instead, we devise a Heterogeneous Interaction Module (HIM) to jointly model the heterogeneous interactions and directly infer the important in
-&lt;/p&gt;</description></item><item><title>&#26412;&#25991;&#25552;&#20986;&#19968;&#31181;&#22522;&#20110;Transformer&#30340;&#20266;&#30456;&#20851;&#21453;&#39304;&#27169;&#22411;&#65288;TPRF&#65289;&#65292;&#36866;&#29992;&#20110;&#36164;&#28304;&#21463;&#38480;&#30340;&#29615;&#22659;&#12290;TPRF&#30456;&#27604;&#20854;&#20182;&#28145;&#24230;&#35821;&#35328;&#27169;&#22411;&#22312;&#20869;&#23384;&#21344;&#29992;&#21644;&#25512;&#29702;&#26102;&#38388;&#26041;&#38754;&#20855;&#22791;&#26356;&#23567;&#30340;&#24320;&#38144;&#65292;&#24182;&#33021;&#26377;&#25928;&#22320;&#32467;&#21512;&#26469;&#33258;&#31264;&#23494;&#25991;&#20855;&#34920;&#31034;&#30340;&#30456;&#20851;&#21453;&#39304;&#20449;&#21495;&#12290;</title><link>http://arxiv.org/abs/2401.13509</link><description>&lt;p&gt;
-TPRF:&#19968;&#31181;&#22522;&#20110;Transformer&#30340;&#20266;&#30456;&#20851;&#21453;&#39304;&#27169;&#22411;&#65292;&#29992;&#20110;&#39640;&#25928;&#19988;&#26377;&#25928;&#30340;&#26816;&#32034;&#12290;
-&lt;/p&gt;
-&lt;p&gt;
-TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval. (arXiv:2401.13509v1 [cs.IR])
-&lt;/p&gt;
-&lt;p&gt;
-http://arxiv.org/abs/2401.13509
-&lt;/p&gt;
-&lt;p&gt;
-&#26412;&#25991;&#25552;&#20986;&#19968;&#31181;&#22522;&#20110;Transformer&#30340;&#20266;&#30456;&#20851;&#21453;&#39304;&#27169;&#22411;&#65288;TPRF&#65289;&#65292;&#36866;&#29992;&#20110;&#36164;&#28304;&#21463;&#38480;&#30340;&#29615;&#22659;&#12290;TPRF&#30456;&#27604;&#20854;&#20182;&#28145;&#24230;&#35821;&#35328;&#27169;&#22411;&#22312;&#20869;&#23384;&#21344;&#29992;&#21644;&#25512;&#29702;&#26102;&#38388;&#26041;&#38754;&#20855;&#22791;&#26356;&#23567;&#30340;&#24320;&#38144;&#65292;&#24182;&#33021;&#26377;&#25928;&#22320;&#32467;&#21512;&#26469;&#33258;&#31264;&#23494;&#25991;&#20855;&#34920;&#31034;&#30340;&#30456;&#20851;&#21453;&#39304;&#20449;&#21495;&#12290;
-&lt;/p&gt;
-&lt;p&gt;
-
-&lt;/p&gt;
-&lt;p&gt;
-&#26412;&#25991;&#32771;&#34385;&#22312;&#36164;&#28304;&#21463;&#38480;&#30340;&#29615;&#22659;&#20013;&#65292;&#22914;&#24265;&#20215;&#20113;&#23454;&#20363;&#25110;&#23884;&#20837;&#24335;&#31995;&#32479;&#65288;&#22914;&#26234;&#33021;&#25163;&#26426;&#21644;&#26234;&#33021;&#25163;&#34920;&#65289;&#20013;&#65292;&#38024;&#23545;&#31264;&#23494;&#26816;&#32034;&#22120;&#30340;&#20266;&#30456;&#20851;&#21453;&#39304;&#65288;PRF&#65289;&#26041;&#27861;&#65292;&#20854;&#20013;&#20869;&#23384;&#21644;CPU&#21463;&#38480;&#65292;&#27809;&#26377;GPU&#12290;&#20026;&#27492;&#65292;&#25105;&#20204;&#25552;&#20986;&#20102;&#19968;&#31181;&#22522;&#20110;Transformer&#30340;PRF&#26041;&#27861;&#65288;TPRF&#65289;&#65292;&#19982;&#37319;&#29992;PRF&#26426;&#21046;&#30340;&#20854;&#20182;&#28145;&#24230;&#35821;&#35328;&#27169;&#22411;&#30456;&#27604;&#65292;&#20855;&#26377;&#26356;&#23567;&#30340;&#20869;&#23384;&#21344;&#29992;&#21644;&#26356;&#24555;&#30340;&#25512;&#29702;&#26102;&#38388;&#65292;&#36739;&#23567;&#30340;&#25928;&#26524;&#25439;&#22833;&#12290;TPRF&#23398;&#20064;&#22914;&#20309;&#26377;&#25928;&#22320;&#32467;&#21512;&#26469;&#33258;&#31264;&#23494;&#25991;&#20855;&#34920;&#31034;&#30340;&#30456;&#20851;&#21453;&#39304;&#20449;&#21495;&#12290;&#20855;&#20307;&#32780;&#35328;&#65292;TPRF&#25552;&#20379;&#20102;&#19968;&#31181;&#24314;&#27169;&#26597;&#35810;&#21644;&#30456;&#20851;&#21453;&#39304;&#20449;&#21495;&#20043;&#38388;&#20851;&#31995;&#21644;&#26435;&#37325;&#30340;&#26426;&#21046;&#12290;&#35813;&#26041;&#27861;&#23545;&#25152;&#20351;&#29992;&#30340;&#20855;&#20307;&#31264;&#23494;&#34920;&#31034;&#19981;&#21152;&#20559;&#35265;&#65292;&#22240;&#27492;&#21487;&#20197;&#24191;&#27867;&#24212;&#29992;&#20110;&#20219;&#20309;&#31264;&#23494;&#26816;&#32034;&#22120;&#12290;
-&lt;/p&gt;
-&lt;p&gt;
-This paper considers Pseudo-Relevance Feedback (PRF) methods for dense retrievers in a resource constrained environment such as that of cheap cloud instances or embedded systems (e.g., smartphones and smartwatches), where memory and CPU are limited and GPUs are not present. For this, we propose a transformer-based PRF method (TPRF), which has a much smaller memory footprint and faster inference time compared to other deep language models that employ PRF mechanisms, with a marginal effectiveness loss. TPRF learns how to effectively combine the relevance feedback signals from dense passage representations. Specifically, TPRF provides a mechanism for modelling relationships and weights between the query and the relevance feedback signals. The method is agnostic to the specific dense representation used and thus can be generally applied to any dense retriever.
+arXiv:2403.19546v1 Announce Type: cross  Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.
 &lt;/p&gt;</description></item></channel></rss>