cs.CL.xml

<rss version="2.0"><channel><title>Chat Arxiv cs.CL</title><link>https://github.com/qhduan/cn-chat-arxiv</link><description>This is arxiv RSS feed for cs.CL</description><item><title>&#25552;&#20986;&#20102;ClapNQ&#65292;&#19968;&#20010;&#29992;&#20110;&#23436;&#25972;RAG&#31649;&#36947;&#30340;&#22522;&#20934;&#38271;&#26684;&#24335;&#38382;&#31572;&#25968;&#25454;&#38598;&#65292;&#35201;&#27714;RAG&#27169;&#22411;&#33021;&#36866;&#24212;&#21253;&#25324;&#31616;&#27905;&#12289;&#19968;&#33268;&#21644;&#19981;&#36830;&#32493;&#27573;&#33853;&#29255;&#27573;&#30340;&#31572;&#26696;&#29305;&#24615;&#12290;</title><link>https://arxiv.org/abs/2404.02103</link><description>&lt;p&gt;
CLAPNQ&#65306;&#33258;&#28982;&#38382;&#31572;&#20013;&#30340;&#27573;&#33853;&#20869;&#19968;&#33268;&#38271;&#26684;&#24335;&#31572;&#26696;&#36866;&#29992;&#20110;RAG&#31995;&#32479;
&lt;/p&gt;
&lt;p&gt;
CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems
&lt;/p&gt;
&lt;p&gt;
https://arxiv.org/abs/2404.02103
&lt;/p&gt;
&lt;p&gt;
&#25552;&#20986;&#20102;ClapNQ&#65292;&#19968;&#20010;&#29992;&#20110;&#23436;&#25972;RAG&#31649;&#36947;&#30340;&#22522;&#20934;&#38271;&#26684;&#24335;&#38382;&#31572;&#25968;&#25454;&#38598;&#65292;&#35201;&#27714;RAG&#27169;&#22411;&#33021;&#36866;&#24212;&#21253;&#25324;&#31616;&#27905;&#12289;&#19968;&#33268;&#21644;&#19981;&#36830;&#32493;&#27573;&#33853;&#29255;&#27573;&#30340;&#31572;&#26696;&#29305;&#24615;&#12290;
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#26816;&#32034;&#22686;&#24378;&#29983;&#25104;&#65288;RAG&#65289;&#24050;&#25104;&#20026;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;&#28909;&#38376;&#24212;&#29992;&#12290;&#25104;&#21151;&#30340;RAG&#31995;&#32479;&#26368;&#22909;&#25552;&#20379;&#30001;&#27573;&#33853;&#25903;&#25345;&#19988;&#27809;&#26377;&#38169;&#35273;&#30340;&#20934;&#30830;&#31572;&#26696;&#12290;&#20026;&#20102;&#26500;&#24314;&#23436;&#25972;&#30340;RAG&#31649;&#36947;&#65292;&#38656;&#35201;&#24320;&#23637;&#22823;&#37327;&#24037;&#20316;&#65292;&#21516;&#26102;&#20063;&#38656;&#35201;&#33021;&#22815;&#23545;&#24615;&#33021;&#36827;&#34892;&#22522;&#20934;&#27979;&#35797;&#12290;&#25105;&#20204;&#25552;&#20986;&#20102;ClapNQ&#65292;&#19968;&#20010;&#29992;&#20110;&#23436;&#25972;RAG&#31649;&#36947;&#30340;&#22522;&#20934;&#38271;&#26684;&#24335;&#38382;&#31572;&#25968;&#25454;&#38598;&#12290;ClapNQ&#21253;&#25324;&#20855;&#26377;&#33258;&#28982;&#38382;&#39064;&#65288;NQ&#65289;&#20013;&#22522;&#20110;&#27573;&#33853;&#30340;&#37329;&#26631;&#27573;&#33853;&#30340;&#38271;&#31572;&#26696;&#65292;&#20197;&#21450;&#19968;&#20010;&#29992;&#20110;&#25191;&#34892;&#26816;&#32034;&#12289;&#29983;&#25104;&#25110;&#23436;&#25972;RAG&#31649;&#36947;&#30340;&#35821;&#26009;&#24211;&#12290;ClapNQ&#30340;&#31572;&#26696;&#31616;&#27905;&#65292;&#27604;&#23436;&#25972;&#27573;&#33853;&#23567;3&#20493;&#65292;&#24182;&#19988;&#19968;&#33268;&#65292;&#21253;&#21547;&#19981;&#36830;&#32493;&#30340;&#22810;&#20010;&#27573;&#33853;&#29255;&#27573;&#12290;RAG&#27169;&#22411;&#24517;&#39035;&#36866;&#24212;&#36825;&#20123;&#29305;&#24615;&#25165;&#33021;&#22312;ClapNQ&#19978;&#21462;&#24471;&#25104;&#21151;&#12290;&#25105;&#20204;&#20026;ClapNQ&#25552;&#20986;&#20102;&#22522;&#32447;&#23454;&#39564;&#21644;&#20998;&#26512;&#65292;&#31361;&#20986;&#20102;&#20173;&#26377;&#26174;&#33879;&#25361;&#25112;&#30340;&#39046;&#22495;&#12290;
&lt;/p&gt;
&lt;p&gt;
arXiv:2404.02103v1 Announce Type: new  Abstract: Retrieval Augmented Generation (RAG) has become a popular application for large language models. It is preferable that successful RAG systems provide accurate answers that are supported by being grounded in a passage without any hallucinations. While considerable work is required for building a full RAG pipeline, being able to benchmark performance is also necessary. We present ClapNQ, a benchmark Long-form Question Answering dataset for the full RAG pipeline. ClapNQ includes long answers with grounded gold passages from Natural Questions (NQ) and a corpus to perform either retrieval, generation, or the full RAG pipeline. The ClapNQ answers are concise, 3x smaller than the full passage, and cohesive, with multiple pieces of the passage that are not contiguous. RAG models must adapt to these properties to be successful at ClapNQ. We present baseline experiments and analysis for ClapNQ that highlight areas where there is still significant 
&lt;/p&gt;</description></item><item><title>&#26412;&#25991;&#25552;&#20986;&#19968;&#31181;&#36890;&#29992;&#19988;&#20415;&#21033;&#30340;&#26041;&#27861;&#65292;&#36890;&#36807;&#21033;&#29992;&#23567;&#22411;&#32534;&#30721;&#22120;&#35821;&#35328;&#27169;&#22411;&#21644;&#20132;&#21449;&#27880;&#24847;&#21147;&#65292;&#20351;&#21407;&#22987;&#35821;&#35328;&#27169;&#22411;&#21487;&#20197;&#35206;&#30422;&#26356;&#38271;&#30340;&#19978;&#19979;&#25991;&#65292;&#20174;&#32780;&#25552;&#39640;&#24320;&#25918;&#39046;&#22495;&#38382;&#31572;&#20219;&#21153;&#30340;&#24615;&#33021;&#12290;</title><link>https://arxiv.org/abs/2404.02022</link><description>&lt;p&gt;
&#20248;&#21270;&#21521;&#37327;&#21270;&#19978;&#19979;&#25991;&#30340;&#26816;&#32034;&#22686;&#24378;&#24320;&#25918;&#39046;&#22495;&#38382;&#31572;
&lt;/p&gt;
&lt;p&gt;
Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts
&lt;/p&gt;
&lt;p&gt;
https://arxiv.org/abs/2404.02022
&lt;/p&gt;
&lt;p&gt;
&#26412;&#25991;&#25552;&#20986;&#19968;&#31181;&#36890;&#29992;&#19988;&#20415;&#21033;&#30340;&#26041;&#27861;&#65292;&#36890;&#36807;&#21033;&#29992;&#23567;&#22411;&#32534;&#30721;&#22120;&#35821;&#35328;&#27169;&#22411;&#21644;&#20132;&#21449;&#27880;&#24847;&#21147;&#65292;&#20351;&#21407;&#22987;&#35821;&#35328;&#27169;&#22411;&#21487;&#20197;&#35206;&#30422;&#26356;&#38271;&#30340;&#19978;&#19979;&#25991;&#65292;&#20174;&#32780;&#25552;&#39640;&#24320;&#25918;&#39046;&#22495;&#38382;&#31572;&#20219;&#21153;&#30340;&#24615;&#33021;&#12290;
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#22312;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#26102;&#20195;&#65292;&#24212;&#29992;&#26816;&#32034;&#22686;&#24378;&#29983;&#25104;&#31561;&#25216;&#26415;&#21487;&#20197;&#26356;&#22909;&#22320;&#35299;&#20915;&#24320;&#25918;&#39046;&#22495;&#38382;&#31572;&#38382;&#39064;&#12290;&#30001;&#20110;&#27169;&#22411;&#22823;&#23567;&#21644;&#35745;&#31639;&#36164;&#28304;&#31561;&#32422;&#26463;&#65292;&#19978;&#19979;&#25991;&#38271;&#24230;&#36890;&#24120;&#21463;&#38480;&#65292;&#35753;&#27169;&#22411;&#35206;&#30422;&#36807;&#38271;&#30340;&#19978;&#19979;&#25991;&#24182;&#22238;&#31572;&#26469;&#33258;&#24320;&#25918;&#39046;&#22495;&#30340;&#38382;&#39064;&#21464;&#24471;&#20855;&#26377;&#25361;&#25112;&#24615;&#12290;&#26412;&#25991;&#25552;&#20986;&#20102;&#19968;&#31181;&#22312;&#24320;&#25918;&#39046;&#22495;&#38382;&#31572;&#20219;&#21153;&#20013;&#35206;&#30422;&#26356;&#38271;&#19978;&#19979;&#25991;&#30340;&#36890;&#29992;&#12289;&#26041;&#20415;&#26041;&#27861;&#12290;&#23427;&#21033;&#29992;&#19968;&#20010;&#23567;&#22411;&#32534;&#30721;&#22120;&#35821;&#35328;&#27169;&#22411;&#26377;&#25928;&#32534;&#30721;&#19978;&#19979;&#25991;&#65292;&#24182;&#23545;&#21407;&#22987;&#36755;&#20837;&#24212;&#29992;&#20132;&#21449;&#27880;&#24847;&#21147;&#12290;&#36890;&#36807;&#25105;&#20204;&#30340;&#26041;&#27861;&#65292;&#21407;&#22987;&#35821;&#35328;&#27169;&#22411;&#21487;&#20197;&#35206;&#30422;&#20960;&#20493;&#38271;&#30340;&#19978;&#19979;&#25991;&#65292;&#21516;&#26102;&#20445;&#25345;&#19982;&#22522;&#32447;&#25509;&#36817;&#30340;&#35745;&#31639;&#38656;&#27714;&#12290;&#25105;&#20204;&#30340;&#23454;&#39564;&#34920;&#26126;&#65292;&#22312;&#24494;&#35843;&#21518;&#65292;&#24615;&#33021;&#22312;&#20004;&#20010;&#20445;&#23384;&#30340;&#25968;&#25454;&#38598;&#12289;&#22235;&#20010;&#20445;&#30041;&#30340;&#25968;&#25454;&#38598;&#20197;&#21450;&#20004;&#20010;In Context
&lt;/p&gt;
&lt;p&gt;
arXiv:2404.02022v1 Announce Type: new  Abstract: In the era of large language models, applying techniques such as Retrieval Augmented Generation can better address Open-Domain Question-Answering problems. Due to constraints including model sizes and computing resources, the length of context is often limited, and it becomes challenging to empower the model to cover overlong contexts while answering questions from open domains. This paper proposes a general and convenient method to covering longer contexts in Open-Domain Question-Answering tasks. It leverages a small encoder language model that effectively encodes contexts, and the encoding applies cross-attention with origin inputs. With our method, the origin language models can cover several times longer contexts while keeping the computing requirements close to the baseline. Our experiments demonstrate that after fine-tuning, there is improved performance across two held-in datasets, four held-out datasets, and also in two In Contex
&lt;/p&gt;</description></item><item><title>&#26412;&#30740;&#31350;&#24341;&#20837;&#20102;&#22522;&#20110;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;&#34394;&#20551;&#26032;&#38395;&#20256;&#25773;&#20223;&#30495;&#26694;&#26550;&#65292;&#30740;&#31350;&#20102;&#34394;&#20551;&#26032;&#38395;&#20256;&#25773;&#30340;&#36235;&#21183;&#21644;&#25511;&#21046;&#65292;&#27599;&#20010;&#20195;&#29702;&#20154;&#22312;&#20223;&#30495;&#20013;&#20195;&#34920;&#20855;&#26377;&#29420;&#29305;&#20010;&#24615;&#30340;&#20010;&#20307;&#12290;</title><link>https://arxiv.org/abs/2403.09498</link><description>&lt;p&gt;
&#20174;&#24576;&#30097;&#21040;&#25509;&#21463;&#65306;&#27169;&#25311;&#23545;&#34394;&#20551;&#26032;&#38395;&#24577;&#24230;&#21160;&#24577;&#30340;&#21464;&#21270;
&lt;/p&gt;
&lt;p&gt;
From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News
&lt;/p&gt;
&lt;p&gt;
https://arxiv.org/abs/2403.09498
&lt;/p&gt;
&lt;p&gt;
&#26412;&#30740;&#31350;&#24341;&#20837;&#20102;&#22522;&#20110;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;&#34394;&#20551;&#26032;&#38395;&#20256;&#25773;&#20223;&#30495;&#26694;&#26550;&#65292;&#30740;&#31350;&#20102;&#34394;&#20551;&#26032;&#38395;&#20256;&#25773;&#30340;&#36235;&#21183;&#21644;&#25511;&#21046;&#65292;&#27599;&#20010;&#20195;&#29702;&#20154;&#22312;&#20223;&#30495;&#20013;&#20195;&#34920;&#20855;&#26377;&#29420;&#29305;&#20010;&#24615;&#30340;&#20010;&#20307;&#12290;
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#22312;&#25968;&#23383;&#26102;&#20195;&#65292;&#34394;&#20551;&#26032;&#38395;&#21644;&#35875;&#35328;&#36890;&#36807;&#31038;&#20132;&#32593;&#32476;&#36805;&#36895;&#20256;&#25773;&#65292;&#24102;&#26469;&#20102;&#26174;&#33879;&#30340;&#31038;&#20250;&#25361;&#25112;&#65292;&#24433;&#21709;&#30528;&#20844;&#20247;&#33286;&#35770;&#12290;&#20256;&#32479;&#30340;&#34394;&#20551;&#26032;&#38395;&#24314;&#27169;&#36890;&#24120;&#39044;&#27979;&#19981;&#21516;&#32676;&#20307;&#30340;&#26222;&#36941;&#27969;&#34892;&#36235;&#21183;&#25110;&#25968;&#23383;&#21270;&#20195;&#34920;&#24847;&#35265;&#36716;&#21464;&#12290;&#28982;&#32780;&#65292;&#36825;&#20123;&#26041;&#27861;&#32463;&#24120;&#36807;&#20110;&#31616;&#21270;&#29616;&#23454;&#19990;&#30028;&#30340;&#22797;&#26434;&#24615;&#65292;&#24573;&#35270;&#20102;&#26032;&#38395;&#25991;&#26412;&#20016;&#23500;&#30340;&#35821;&#20041;&#20449;&#24687;&#12290;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#65288;LLMs&#65289;&#30340;&#20986;&#29616;&#25552;&#20379;&#20102;&#27169;&#25311;&#24494;&#22937;&#24847;&#35265;&#21160;&#24577;&#30340;&#21487;&#33021;&#24615;&#12290;&#22240;&#27492;&#65292;&#22312;&#36825;&#39033;&#24037;&#20316;&#20013;&#65292;&#25105;&#20204;&#24341;&#20837;&#20102;&#22522;&#20110;LLM&#30340;&#34394;&#20551;&#26032;&#38395;&#20256;&#25773;&#20223;&#30495;&#26694;&#26550;&#65288;FPS&#65289;&#65292;&#35814;&#32454;&#30740;&#31350;&#34394;&#20551;&#26032;&#38395;&#20256;&#25773;&#30340;&#36235;&#21183;&#21644;&#25511;&#21046;&#12290;&#20855;&#20307;&#22320;&#65292;&#20223;&#30495;&#20013;&#30340;&#27599;&#20010;&#20195;&#29702;&#20154;&#20195;&#34920;&#20855;&#26377;&#29420;&#29305;&#20010;&#24615;&#30340;&#20010;&#20154;&#12290;&#20182;&#20204;&#37197;&#22791;&#20102;&#30701;&#26399;&#21644;&#38271;&#26399;&#35760;&#24518;&#65292;&#20197;&#21450;&#21453;&#24605;&#26426;&#21046;&#26469;&#27169;&#20223;&#31867;&#20154;&#24605;&#32500;&#12290;&#27599;&#22825;&#65292;
&lt;/p&gt;
&lt;p&gt;
arXiv:2403.09498v1 Announce Type: cross  Abstract: In the digital era, the rapid propagation of fake news and rumors via social networks brings notable societal challenges and impacts public opinion regulation. Traditional fake news modeling typically forecasts the general popularity trends of different groups or numerically represents opinions shift. However, these methods often oversimplify real-world complexities and overlook the rich semantic information of news text. The advent of large language models (LLMs) provides the possibility of modeling subtle dynamics of opinion. Consequently, in this work, we introduce a Fake news Propagation Simulation framework (FPS) based on LLM, which studies the trends and control of fake news propagation in detail. Specifically, each agent in the simulation represents an individual with a distinct personality. They are equipped with both short-term and long-term memory, as well as a reflective mechanism to mimic human-like thinking. Every day, the
&lt;/p&gt;</description></item><item><title>&#26412;&#25991;&#25552;&#20986;&#20102;OmniPred&#26694;&#26550;&#65292;&#29992;&#20110;&#35757;&#32451;&#35821;&#35328;&#27169;&#22411;&#20316;&#20026;&#36890;&#29992;&#30340;&#31471;&#21040;&#31471;&#22238;&#24402;&#22120;&#65292;&#23454;&#39564;&#35777;&#26126;&#65292;&#22312;&#22810;&#20010;&#20219;&#21153;&#19978;&#35757;&#32451;&#26102;&#65292;&#35821;&#35328;&#27169;&#22411;&#33021;&#22815;&#26174;&#33879;&#20248;&#20110;&#20256;&#32479;&#22238;&#24402;&#27169;&#22411;&#12290;</title><link>https://arxiv.org/abs/2402.14547</link><description>&lt;p&gt;
OmniPred&#65306;&#35821;&#35328;&#27169;&#22411;&#20316;&#20026;&#36890;&#29992;&#22238;&#24402;&#22120;
&lt;/p&gt;
&lt;p&gt;
OmniPred: Language Models as Universal Regressors
&lt;/p&gt;
&lt;p&gt;
https://arxiv.org/abs/2402.14547
&lt;/p&gt;
&lt;p&gt;
&#26412;&#25991;&#25552;&#20986;&#20102;OmniPred&#26694;&#26550;&#65292;&#29992;&#20110;&#35757;&#32451;&#35821;&#35328;&#27169;&#22411;&#20316;&#20026;&#36890;&#29992;&#30340;&#31471;&#21040;&#31471;&#22238;&#24402;&#22120;&#65292;&#23454;&#39564;&#35777;&#26126;&#65292;&#22312;&#22810;&#20010;&#20219;&#21153;&#19978;&#35757;&#32451;&#26102;&#65292;&#35821;&#35328;&#27169;&#22411;&#33021;&#22815;&#26174;&#33879;&#20248;&#20110;&#20256;&#32479;&#22238;&#24402;&#27169;&#22411;&#12290;
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#22312;&#23454;&#39564;&#35774;&#35745;&#30340;&#24191;&#38420;&#39046;&#22495;&#20013;&#65292;&#22238;&#24402;&#19968;&#30452;&#26159;&#19968;&#20010;&#24378;&#22823;&#30340;&#24037;&#20855;&#65292;&#21487;&#20197;&#20934;&#30830;&#39044;&#27979;&#31995;&#32479;&#25110;&#27169;&#22411;&#22312;&#32473;&#23450;&#19968;&#32452;&#21442;&#25968;&#30340;&#24773;&#20917;&#19979;&#30340;&#32467;&#26524;&#25351;&#26631;&#65292;&#20294;&#20256;&#32479;&#19978;&#21482;&#38480;&#20110;&#36866;&#29992;&#20110;&#29305;&#23450;&#20219;&#21153;&#30340;&#26041;&#27861;&#12290;&#22312;&#26412;&#25991;&#20013;&#65292;&#25105;&#20204;&#25552;&#20986;&#20102;OmniPred&#65292;&#36825;&#26159;&#19968;&#20010;&#29992;&#20110;&#35757;&#32451;&#35821;&#35328;&#27169;&#22411;&#20316;&#20026;&#36890;&#29992;&#31471;&#21040;&#31471;&#22238;&#24402;&#22120;&#30340;&#26694;&#26550;&#65292;&#20351;&#29992;&#26469;&#33258;&#22810;&#26679;&#30495;&#23454;&#19990;&#30028;&#23454;&#39564;&#30340;$(x,y)$&#35780;&#20272;&#25968;&#25454;&#12290;&#36890;&#36807;&#20351;&#29992;&#28304;&#33258;Google Vizier&#30340;&#25968;&#25454;&#65292;&#36825;&#26159;&#19990;&#30028;&#19978;&#26368;&#22823;&#30340;&#40657;&#30418;&#20248;&#21270;&#25968;&#25454;&#24211;&#20043;&#19968;&#65292;&#25105;&#20204;&#30340;&#22823;&#37327;&#23454;&#39564;&#34920;&#26126;&#65292;&#20165;&#36890;&#36807;&#25968;&#23398;&#21442;&#25968;&#21644;&#20540;&#30340;&#25991;&#26412;&#34920;&#31034;&#65292;&#35821;&#35328;&#27169;&#22411;&#33021;&#22815;&#36827;&#34892;&#38750;&#24120;&#31934;&#30830;&#30340;&#25968;&#20540;&#22238;&#24402;&#65292;&#22914;&#26524;&#26377;&#26426;&#20250;&#35757;&#32451;&#22810;&#20010;&#20219;&#21153;&#65292;&#21017;&#21487;&#20197;&#26174;&#33879;&#20248;&#20110;&#20256;&#32479;&#30340;&#22238;&#24402;&#27169;&#22411;&#12290;
&lt;/p&gt;
&lt;p&gt;
arXiv:2402.14547v1 Announce Type: cross  Abstract: Over the broad landscape of experimental design, regression has been a powerful tool to accurately predict the outcome metrics of a system or model given a set of parameters, but has been traditionally restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over $(x,y)$ evaluation data from diverse real world experiments. Using data sourced from Google Vizier, one of the largest blackbox optimization databases in the world, our extensive experiments demonstrate that through only textual representations of mathematical parameters and values, language models are capable of very precise numerical regression, and if given the opportunity to train over multiple tasks, can significantly outperform traditional regression models.
&lt;/p&gt;</description></item><item><title>&#36825;&#39033;&#30740;&#31350;&#24341;&#20837;&#20102;&#8220;CosmoAgent&#8221;&#65292;&#21033;&#29992;LLM&#27169;&#25311;&#20154;&#31867;&#21644;&#22806;&#26143;&#25991;&#26126;&#20043;&#38388;&#30340;&#22797;&#26434;&#20114;&#21160;&#65292;&#35780;&#20272;&#21644;&#24179;&#20849;&#23384;&#30340;&#21487;&#34892;&#24615;&#65292;&#24182;&#37327;&#21270;&#35780;&#20272;&#25991;&#26126;&#30340;&#21457;&#23637;&#36712;&#36857;&#65292;&#21516;&#26102;&#32771;&#34385;&#19981;&#21516;&#25991;&#26126;&#20043;&#38388;&#30340;&#24040;&#22823;&#22810;&#26679;&#24615;&#12290;</title><link>https://arxiv.org/abs/2402.13184</link><description>&lt;p&gt;
&#22914;&#26524;LLM&#20855;&#26377;&#19981;&#21516;&#30340;&#19990;&#30028;&#35266;&#65306;&#20351;&#29992;&#22522;&#20110;LLM&#30340;&#20195;&#29702;&#27169;&#25311;&#22806;&#26143;&#25991;&#26126;
&lt;/p&gt;
&lt;p&gt;
What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents
&lt;/p&gt;
&lt;p&gt;
https://arxiv.org/abs/2402.13184
&lt;/p&gt;
&lt;p&gt;
&#36825;&#39033;&#30740;&#31350;&#24341;&#20837;&#20102;&#8220;CosmoAgent&#8221;&#65292;&#21033;&#29992;LLM&#27169;&#25311;&#20154;&#31867;&#21644;&#22806;&#26143;&#25991;&#26126;&#20043;&#38388;&#30340;&#22797;&#26434;&#20114;&#21160;&#65292;&#35780;&#20272;&#21644;&#24179;&#20849;&#23384;&#30340;&#21487;&#34892;&#24615;&#65292;&#24182;&#37327;&#21270;&#35780;&#20272;&#25991;&#26126;&#30340;&#21457;&#23637;&#36712;&#36857;&#65292;&#21516;&#26102;&#32771;&#34385;&#19981;&#21516;&#25991;&#26126;&#20043;&#38388;&#30340;&#24040;&#22823;&#22810;&#26679;&#24615;&#12290;
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#22312;&#36825;&#39033;&#30740;&#31350;&#20013;&#65292;&#25105;&#20204;&#20171;&#32461;&#20102;&#8220;CosmoAgent&#8221;&#65292;&#36825;&#26159;&#19968;&#20010;&#21019;&#26032;&#30340;&#20154;&#24037;&#26234;&#33021;&#26694;&#26550;&#65292;&#21033;&#29992;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#65288;LLMs&#65289;&#26469;&#27169;&#25311;&#20154;&#31867;&#19982;&#22806;&#26143;&#25991;&#26126;&#20043;&#38388;&#22797;&#26434;&#30340;&#20132;&#20114;&#65292;&#29305;&#21035;&#24378;&#35843;&#21490;&#33922;&#33452;&#183;&#38669;&#37329;&#20851;&#20110;&#19981;&#35201;&#38543;&#24847;&#21521;&#23431;&#23449;&#21457;&#36865;&#26080;&#32447;&#30005;&#20449;&#21495;&#30340;&#35880;&#24910;&#24314;&#35758;&#12290;&#35813;&#30740;&#31350;&#30340;&#30446;&#26631;&#26159;&#35780;&#20272;&#21644;&#24179;&#20849;&#23384;&#30340;&#21487;&#34892;&#24615;&#65292;&#21516;&#26102;&#32771;&#34385;&#21487;&#33021;&#23041;&#32961;&#21892;&#24847;&#25991;&#26126;&#30340;&#28508;&#22312;&#39118;&#38505;&#12290;&#36890;&#36807;&#37319;&#29992;&#25968;&#23398;&#27169;&#22411;&#21644;&#29366;&#24577;&#36716;&#25442;&#30697;&#38453;&#65292;&#25105;&#20204;&#30340;&#26041;&#27861;&#23450;&#37327;&#35780;&#20272;&#25991;&#26126;&#30340;&#21457;&#23637;&#36712;&#36857;&#65292;&#20026;&#22312;&#20851;&#38190;&#22686;&#38271;&#21644;&#39281;&#21644;&#28857;&#20570;&#20986;&#26410;&#26469;&#20915;&#31574;&#25552;&#20379;&#35265;&#35299;&#12290;&#27492;&#22806;&#65292;&#26412;&#25991;&#25215;&#35748;&#23431;&#23449;&#20013;&#28508;&#22312;&#29983;&#27963;&#26465;&#20214;&#30340;&#24040;&#22823;&#22810;&#26679;&#24615;&#21487;&#33021;&#20250;&#20419;&#36827;&#19981;&#21516;&#25991;&#26126;&#20043;&#38388;&#29420;&#29305;&#30340;&#23431;&#23449;&#35266;&#12289;&#36947;&#24503;&#20934;&#21017;&#21644;&#19990;&#30028;&#35266;&#12290;&#35748;&#35782;&#21040;&#22320;&#29699;&#19978;--
&lt;/p&gt;
&lt;p&gt;
arXiv:2402.13184v1 Announce Type: new  Abstract: In this study, we introduce "CosmoAgent," an innovative artificial intelligence framework utilizing Large Language Models (LLMs) to simulate complex interactions between human and extraterrestrial civilizations, with a special emphasis on Stephen Hawking's cautionary advice about not sending radio signals haphazardly into the universe. The goal is to assess the feasibility of peaceful coexistence while considering potential risks that could threaten well-intentioned civilizations. Employing mathematical models and state transition matrices, our approach quantitatively evaluates the development trajectories of civilizations, offering insights into future decision-making at critical points of growth and saturation. Furthermore, the paper acknowledges the vast diversity in potential living conditions across the universe, which could foster unique cosmologies, ethical codes, and worldviews among various civilizations. Recognizing the Earth-c
&lt;/p&gt;</description></item><item><title>&#25552;&#20986;&#20102;&#19968;&#31181;&#32467;&#21512;&#23567;&#22411;&#24494;&#35843;&#27169;&#22411;&#21644;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;LinkNER&#26694;&#26550;&#65292;&#36890;&#36807;&#19981;&#30830;&#23450;&#24615;&#30340;&#38142;&#25509;&#31574;&#30053;RDC&#65292;&#20351;&#24494;&#35843;&#27169;&#22411;&#33021;&#22815;&#34917;&#20805;&#40657;&#30418;LLMs</title><link>https://arxiv.org/abs/2402.10573</link><description>&lt;p&gt;
LinkNER: &#20351;&#29992;&#19981;&#30830;&#23450;&#24615;&#23558;&#26412;&#22320;&#21629;&#21517;&#23454;&#20307;&#35782;&#21035;&#27169;&#22411;&#19982;&#22823;&#35821;&#35328;&#27169;&#22411;&#36827;&#34892;&#38142;&#25509;
&lt;/p&gt;
&lt;p&gt;
LinkNER: Linking Local Named Entity Recognition Models to Large Language Models using Uncertainty
&lt;/p&gt;
&lt;p&gt;
https://arxiv.org/abs/2402.10573
&lt;/p&gt;
&lt;p&gt;
&#25552;&#20986;&#20102;&#19968;&#31181;&#32467;&#21512;&#23567;&#22411;&#24494;&#35843;&#27169;&#22411;&#21644;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;LinkNER&#26694;&#26550;&#65292;&#36890;&#36807;&#19981;&#30830;&#23450;&#24615;&#30340;&#38142;&#25509;&#31574;&#30053;RDC&#65292;&#20351;&#24494;&#35843;&#27169;&#22411;&#33021;&#22815;&#34917;&#20805;&#40657;&#30418;LLMs
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#21629;&#21517;&#23454;&#20307;&#35782;&#21035;&#65288;NER&#65289;&#20316;&#20026;&#33258;&#28982;&#35821;&#35328;&#29702;&#35299;&#20013;&#30340;&#22522;&#26412;&#20219;&#21153;&#65292;&#30452;&#25509;&#24433;&#21709;&#30528;&#32593;&#32476;&#20869;&#23481;&#20998;&#26512;&#12289;&#25628;&#32034;&#24341;&#25806;&#21644;&#20449;&#24687;&#26816;&#32034;&#31995;&#32479;&#12290;&#24494;&#35843;&#21518;&#30340;NER&#27169;&#22411;&#22312;&#26631;&#20934;NER&#22522;&#20934;&#19978;&#34920;&#29616;&#20986;&#20196;&#20154;&#28385;&#24847;&#30340;&#24615;&#33021;&#12290;&#28982;&#32780;&#65292;&#30001;&#20110;&#26377;&#38480;&#30340;&#24494;&#35843;&#25968;&#25454;&#21644;&#32570;&#20047;&#30693;&#35782;&#65292;&#23427;&#22312;&#26410;&#35265;&#23454;&#20307;&#35782;&#21035;&#19978;&#34920;&#29616;&#19981;&#20339;&#12290;&#22240;&#27492;&#65292;NER&#27169;&#22411;&#22312;&#32593;&#32476;&#30456;&#20851;&#24212;&#29992;&#20013;&#30340;&#21487;&#29992;&#24615;&#21644;&#21487;&#38752;&#24615;&#21463;&#21040;&#24433;&#21709;&#12290;&#30456;&#21453;&#65292;&#20687;GPT-4&#36825;&#26679;&#30340;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#65288;LLM&#65289;&#20855;&#26377;&#20016;&#23500;&#30340;&#22806;&#37096;&#30693;&#35782;&#65292;&#20294;&#30740;&#31350;&#34920;&#26126;&#23427;&#20204;&#32570;&#20047;NER&#20219;&#21153;&#30340;&#19987;&#19994;&#24615;&#12290;&#27492;&#22806;&#65292;&#31169;&#26377;&#21644;&#22823;&#35268;&#27169;&#26435;&#37325;&#20351;LLM&#30340;&#35843;&#25972;&#22256;&#38590;&#12290;&#20026;&#20102;&#35299;&#20915;&#36825;&#20123;&#25361;&#25112;&#65292;&#25105;&#20204;&#25552;&#20986;&#20102;&#19968;&#20010;&#26694;&#26550;&#65292;&#32467;&#21512;&#20102;&#23567;&#22411;&#24494;&#35843;&#27169;&#22411;&#21644;LLMs&#65288;LinkNER&#65289;&#65292;&#20197;&#21450;&#19968;&#31181;&#22522;&#20110;&#19981;&#30830;&#23450;&#24615;&#30340;&#38142;&#25509;&#31574;&#30053;RDC&#65292;&#20351;&#24494;&#35843;&#27169;&#22411;&#33021;&#22815;&#34917;&#20805;&#40657;&#30418;LLMs&#12290;
&lt;/p&gt;
&lt;p&gt;
arXiv:2402.10573v1 Announce Type: new  Abstract: Named Entity Recognition (NER) serves as a fundamental task in natural language understanding, bearing direct implications for web content analysis, search engines, and information retrieval systems. Fine-tuned NER models exhibit satisfactory performance on standard NER benchmarks. However, due to limited fine-tuning data and lack of knowledge, it performs poorly on unseen entity recognition. As a result, the usability and reliability of NER models in web-related applications are compromised. Instead, Large Language Models (LLMs) like GPT-4 possess extensive external knowledge, but research indicates that they lack specialty for NER tasks. Furthermore, non-public and large-scale weights make tuning LLMs difficult. To address these challenges, we propose a framework that combines small fine-tuned models with LLMs (LinkNER) and an uncertainty-based linking strategy called RDC that enables fine-tuned models to complement black-box LLMs, ach
&lt;/p&gt;</description></item><item><title>&#26412;&#30740;&#31350;&#38024;&#23545;&#21253;&#25324;&#28145;&#24230;&#38598;&#21512;&#12289;&#21464;&#21387;&#22120;&#12289;&#29366;&#24577;&#31354;&#38388;&#27169;&#22411;&#21644;&#31616;&#21333;&#36882;&#24402;&#31070;&#32463;&#32593;&#32476;&#31561;&#22810;&#31181;&#26550;&#26500;&#65292;&#25506;&#32034;&#20102;&#21487;&#35777;&#26126;&#30340;&#38271;&#24230;&#21644;&#32452;&#21512;&#27867;&#21270;&#65292;&#35748;&#20026;&#23545;&#20110;&#38271;&#24230;&#21644;&#32452;&#21512;&#27867;&#21270;&#65292;&#19981;&#21516;&#26550;&#26500;&#38656;&#35201;&#19981;&#21516;&#31243;&#24230;&#30340;&#34920;&#31034;&#35782;&#21035;&#12290;</title><link>https://arxiv.org/abs/2402.04875</link><description>&lt;p&gt;
&#20851;&#20110;&#21487;&#35777;&#26126;&#30340;&#38271;&#24230;&#21644;&#32452;&#21512;&#27867;&#21270;
&lt;/p&gt;
&lt;p&gt;
On Provable Length and Compositional Generalization
&lt;/p&gt;
&lt;p&gt;
https://arxiv.org/abs/2402.04875
&lt;/p&gt;
&lt;p&gt;
&#26412;&#30740;&#31350;&#38024;&#23545;&#21253;&#25324;&#28145;&#24230;&#38598;&#21512;&#12289;&#21464;&#21387;&#22120;&#12289;&#29366;&#24577;&#31354;&#38388;&#27169;&#22411;&#21644;&#31616;&#21333;&#36882;&#24402;&#31070;&#32463;&#32593;&#32476;&#31561;&#22810;&#31181;&#26550;&#26500;&#65292;&#25506;&#32034;&#20102;&#21487;&#35777;&#26126;&#30340;&#38271;&#24230;&#21644;&#32452;&#21512;&#27867;&#21270;&#65292;&#35748;&#20026;&#23545;&#20110;&#38271;&#24230;&#21644;&#32452;&#21512;&#27867;&#21270;&#65292;&#19981;&#21516;&#26550;&#26500;&#38656;&#35201;&#19981;&#21516;&#31243;&#24230;&#30340;&#34920;&#31034;&#35782;&#21035;&#12290;
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#38271;&#24230;&#27867;&#21270;&#8212;&#8212;&#23545;&#35757;&#32451;&#26102;&#26410;&#35265;&#21040;&#30340;&#26356;&#38271;&#24207;&#21015;&#30340;&#27867;&#21270;&#33021;&#21147;&#65292;&#20197;&#21450;&#32452;&#21512;&#27867;&#21270;&#8212;&#8212;&#23545;&#35757;&#32451;&#26102;&#26410;&#35265;&#21040;&#30340;&#20196;&#29260;&#32452;&#21512;&#30340;&#27867;&#21270;&#33021;&#21147;&#65292;&#22312;&#24207;&#21015;&#21040;&#24207;&#21015;&#27169;&#22411;&#20013;&#26159;&#37325;&#35201;&#30340;&#38750;&#20998;&#24067;&#21270;&#27867;&#21270;&#24418;&#24335;&#12290;&#22312;&#36825;&#39033;&#24037;&#20316;&#20013;&#65292;&#25105;&#20204;&#22312;&#21253;&#25324;&#28145;&#24230;&#38598;&#21512;&#12289;&#21464;&#21387;&#22120;&#12289;&#29366;&#24577;&#31354;&#38388;&#27169;&#22411;&#21644;&#31616;&#21333;&#36882;&#24402;&#31070;&#32463;&#32593;&#32476;&#22312;&#20869;&#30340;&#19968;&#31995;&#21015;&#26550;&#26500;&#20013;&#65292;&#26397;&#30528;&#21487;&#35777;&#26126;&#30340;&#38271;&#24230;&#21644;&#32452;&#21512;&#27867;&#21270;&#36808;&#20986;&#20102;&#31532;&#19968;&#27493;&#12290;&#26681;&#25454;&#26550;&#26500;&#30340;&#19981;&#21516;&#65292;&#25105;&#20204;&#35777;&#26126;&#20102;&#19981;&#21516;&#31243;&#24230;&#30340;&#34920;&#31034;&#35782;&#21035;&#30340;&#24517;&#35201;&#24615;&#65292;&#20363;&#22914;&#19982;&#30495;&#23454;&#34920;&#31034;&#20855;&#26377;&#32447;&#24615;&#25110;&#25490;&#21015;&#20851;&#31995;&#12290;
&lt;/p&gt;
&lt;p&gt;
Length generalization -- the ability to generalize to longer sequences than ones seen during training, and compositional generalization -- the ability to generalize to token combinations not seen during training, are crucial forms of out-of-distribution generalization in sequence-to-sequence models. In this work, we take the first steps towards provable length and compositional generalization for a range of architectures, including deep sets, transformers, state space models, and simple recurrent neural nets. Depending on the architecture, we prove different degrees of representation identification, e.g., a linear or a permutation relation with ground truth representation, is necessary for length and compositional generalization.
&lt;/p&gt;</description></item><item><title>&#36825;&#20010;&#35770;&#25991;&#25552;&#20986;&#20102;&#19968;&#20010;&#33258;&#36866;&#24212;&#27714;&#35299;&#22120;&#26694;&#26550;&#65292;&#29992;&#20110;&#22312;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#25512;&#29702;&#20013;&#26681;&#25454;&#38382;&#39064;&#38590;&#24230;&#35843;&#25972;&#27714;&#35299;&#31574;&#30053;&#12290;&#36825;&#35299;&#20915;&#20102;&#29616;&#26377;&#26041;&#27861;&#21018;&#24615;&#37319;&#29992;&#32479;&#19968;&#26041;&#27861;&#30340;&#38382;&#39064;&#65292;&#25552;&#39640;&#20102;&#35745;&#31639;&#24615;&#33021;&#12290;</title><link>http://arxiv.org/abs/2310.01446</link><description>&lt;p&gt;
&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#25512;&#29702;&#20013;&#30340;&#21160;&#24577;&#31574;&#30053;&#36873;&#25321;&#33258;&#36866;&#24212;&#27714;&#35299;&#22120;&#26694;&#26550;
&lt;/p&gt;
&lt;p&gt;
Adaptive-Solver Framework for Dynamic Strategy Selection in Large Language Model Reasoning. (arXiv:2310.01446v1 [cs.CL])
&lt;/p&gt;
&lt;p&gt;
http://arxiv.org/abs/2310.01446
&lt;/p&gt;
&lt;p&gt;
&#36825;&#20010;&#35770;&#25991;&#25552;&#20986;&#20102;&#19968;&#20010;&#33258;&#36866;&#24212;&#27714;&#35299;&#22120;&#26694;&#26550;&#65292;&#29992;&#20110;&#22312;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#25512;&#29702;&#20013;&#26681;&#25454;&#38382;&#39064;&#38590;&#24230;&#35843;&#25972;&#27714;&#35299;&#31574;&#30053;&#12290;&#36825;&#35299;&#20915;&#20102;&#29616;&#26377;&#26041;&#27861;&#21018;&#24615;&#37319;&#29992;&#32479;&#19968;&#26041;&#27861;&#30340;&#38382;&#39064;&#65292;&#25552;&#39640;&#20102;&#35745;&#31639;&#24615;&#33021;&#12290;
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;(LLM)&#22312;&#22788;&#29702;&#22797;&#26434;&#25512;&#29702;&#20219;&#21153;&#26102;&#23637;&#31034;&#20102;&#20196;&#20154;&#21360;&#35937;&#28145;&#21051;&#30340;&#33021;&#21147;&#12290;&#22312;&#29616;&#23454;&#19990;&#30028;&#20013;&#65292;&#38382;&#39064;&#24448;&#24448;&#28041;&#21450;&#21508;&#31181;&#22797;&#26434;&#24615;&#12290;&#20154;&#31867;&#26412;&#33021;&#22320;&#26681;&#25454;&#20219;&#21153;&#30340;&#22797;&#26434;&#24615;&#35843;&#25972;&#20182;&#20204;&#30340;&#38382;&#39064;&#35299;&#20915;&#26041;&#27861;&#12290;&#28982;&#32780;&#65292;&#22823;&#22810;&#25968;&#21033;&#29992;LLM&#30340;&#26041;&#27861;&#20542;&#21521;&#20110;&#37319;&#29992;&#19968;&#31181;&#32479;&#19968;&#30340;&#26041;&#27861;: &#19981;&#31649;&#38382;&#39064;&#30340;&#22797;&#26434;&#24615;&#22914;&#20309;&#65292;&#37117;&#20351;&#29992;&#19968;&#33268;&#30340;&#27169;&#22411;&#12289;&#25552;&#31034;&#26041;&#27861;&#21644;&#38382;&#39064;&#20998;&#35299;&#31243;&#24230;&#12290;&#36825;&#31181;&#21018;&#24615;&#21487;&#33021;&#20250;&#24102;&#26469;&#19981;&#24517;&#35201;&#30340;&#35745;&#31639;&#24320;&#38144;&#25110;&#27425;&#20248;&#30340;&#24615;&#33021;&#12290;&#20026;&#20102;&#35299;&#20915;&#36825;&#20010;&#38382;&#39064;&#65292;&#25105;&#20204;&#24341;&#20837;&#20102;&#19968;&#20010;&#33258;&#36866;&#24212;&#27714;&#35299;&#22120;&#26694;&#26550;&#12290;&#23427;&#26681;&#25454;&#38382;&#39064;&#30340;&#38590;&#24230;&#31574;&#30053;&#24615;&#22320;&#35843;&#25972;&#27714;&#35299;&#31574;&#30053;&#12290;&#32473;&#23450;&#19968;&#20010;&#21021;&#22987;&#35299;&#20915;&#26041;&#26696;&#65292;&#35813;&#26694;&#26550;&#20351;&#29992;&#20004;&#20010;&#20027;&#35201;&#27169;&#22359;&#12290;&#21021;&#22987;&#35780;&#20272;&#27169;&#22359;&#35780;&#20272;&#24403;&#21069;&#35299;&#20915;&#26041;&#26696;&#30340;&#20805;&#20998;&#24615;&#12290;&#22914;&#26524;&#38656;&#35201;&#25913;&#36827;&#65292;&#25509;&#19979;&#26469;&#30340;&#33258;&#36866;&#24212;&#27169;&#22359;&#20250;&#20171;&#20837;&#12290;&#22312;&#36825;&#20010;&#27169;&#22359;&#20869;&#65292;&#26377;&#19977;&#20010;&#20851;&#38190;&#30340;&#33258;&#36866;&#24212;&#31574;&#30053;&#12290;
&lt;/p&gt;
&lt;p&gt;
Large Language Models (LLMs) are showcasing impressive ability in handling complex reasoning tasks. In real-world situations, problems often span a spectrum of complexities. Humans inherently adjust their problem-solving approaches based on task complexity. However, most methodologies that leverage LLMs tend to adopt a uniform approach: utilizing consistent models, prompting methods, and degrees of problem decomposition, regardless of the problem complexity. Inflexibility of them can bring unnecessary computational overhead or sub-optimal performance. To address this problem, we introduce an Adaptive-Solver framework. It strategically modulates solving strategies based on the difficulties of the problems. Given an initial solution, the framework functions with two primary modules. The initial evaluation module assesses the adequacy of the current solution. If improvements are needed, the subsequent adaptation module comes into play. Within this module, three key adaptation strategies a
&lt;/p&gt;</description></item><item><title>&#35813;&#35770;&#25991;&#25552;&#20986;&#20102;&#19968;&#20010;&#22522;&#20110;&#33021;&#21147;&#30340;&#35821;&#35328;&#27169;&#22411;&#20998;&#26512;&#26694;&#26550;CALM&#65292;&#36890;&#36807;&#26377;&#38024;&#23545;&#24615;&#30340;&#24178;&#39044;&#26469;&#30772;&#22351;&#35821;&#35328;&#27169;&#22411;&#30340;&#20869;&#37096;&#34920;&#31034;&#65292;&#35780;&#20272;&#20854;&#22312;&#25191;&#34892;&#20219;&#21153;&#26102;&#23545;&#19981;&#21516;&#34920;&#31034;&#30340;&#20351;&#29992;&#12290;&#30740;&#31350;&#34920;&#26126;&#65292;&#35821;&#35328;&#27169;&#22411;&#23545;&#20851;&#31995;&#23646;&#24615;&#30340;&#21033;&#29992;&#23384;&#22312;&#19968;&#23450;&#30340;&#19981;&#19968;&#33268;&#24615;&#12290;</title><link>http://arxiv.org/abs/2303.00333</link><description>&lt;p&gt;
&#22522;&#20110;&#33021;&#21147;&#30340;&#35821;&#35328;&#27169;&#22411;&#20998;&#26512;
&lt;/p&gt;
&lt;p&gt;
Competence-Based Analysis of Language Models. (arXiv:2303.00333v2 [cs.CL] UPDATED)
&lt;/p&gt;
&lt;p&gt;
http://arxiv.org/abs/2303.00333
&lt;/p&gt;
&lt;p&gt;
&#35813;&#35770;&#25991;&#25552;&#20986;&#20102;&#19968;&#20010;&#22522;&#20110;&#33021;&#21147;&#30340;&#35821;&#35328;&#27169;&#22411;&#20998;&#26512;&#26694;&#26550;CALM&#65292;&#36890;&#36807;&#26377;&#38024;&#23545;&#24615;&#30340;&#24178;&#39044;&#26469;&#30772;&#22351;&#35821;&#35328;&#27169;&#22411;&#30340;&#20869;&#37096;&#34920;&#31034;&#65292;&#35780;&#20272;&#20854;&#22312;&#25191;&#34892;&#20219;&#21153;&#26102;&#23545;&#19981;&#21516;&#34920;&#31034;&#30340;&#20351;&#29992;&#12290;&#30740;&#31350;&#34920;&#26126;&#65292;&#35821;&#35328;&#27169;&#22411;&#23545;&#20851;&#31995;&#23646;&#24615;&#30340;&#21033;&#29992;&#23384;&#22312;&#19968;&#23450;&#30340;&#19981;&#19968;&#33268;&#24615;&#12290;
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#23613;&#31649;&#22823;&#22411;&#39044;&#35757;&#32451;&#35821;&#35328;&#27169;&#22411;&#65288;LMs&#65289;&#22312;&#21508;&#31181;&#25552;&#31034;&#20219;&#21153;&#19978;&#21462;&#24471;&#20102;&#26174;&#33879;&#25104;&#21151;&#65292;&#20294;&#36825;&#20123;&#27169;&#22411;&#23545;&#36755;&#20837;&#25110;&#24212;&#29992;&#29615;&#22659;&#20013;&#30340;&#24494;&#23567;&#21464;&#21270;&#21364;&#24322;&#24120;&#33030;&#24369;&#12290;&#20026;&#20102;&#26356;&#22909;&#22320;&#29702;&#35299;&#36825;&#31181;&#34892;&#20026;&#24182;&#28608;&#21169;&#35774;&#35745;&#26356;&#20581;&#22766;&#30340;LMs&#65292;&#25105;&#20204;&#25552;&#20986;&#20102;&#19968;&#20010;&#36890;&#29992;&#30340;&#23454;&#39564;&#26694;&#26550;CALM&#65288;&#22522;&#20110;&#33021;&#21147;&#30340;&#35821;&#35328;&#27169;&#22411;&#20998;&#26512;&#65289;&#65292;&#20854;&#20013;&#21033;&#29992;&#26377;&#38024;&#23545;&#24615;&#30340;&#22240;&#26524;&#24178;&#39044;&#26469;&#30772;&#22351;LM&#22312;&#21508;&#31181;&#35821;&#35328;&#23646;&#24615;&#19978;&#30340;&#20869;&#37096;&#34920;&#31034;&#65292;&#20197;&#35780;&#20272;&#23427;&#22312;&#25191;&#34892;&#32473;&#23450;&#20219;&#21153;&#26102;&#23545;&#27599;&#20010;&#34920;&#31034;&#30340;&#20351;&#29992;&#12290;&#25105;&#20204;&#23558;&#36825;&#20123;&#24178;&#39044;&#23454;&#29616;&#20026;&#22522;&#20110;&#26799;&#24230;&#30340;&#23545;&#25239;&#25915;&#20987;&#65292;&#19982;&#20808;&#21069;&#30340;&#22240;&#26524;&#25506;&#26597;&#26041;&#27861;&#30456;&#27604;&#65292;&#23427;&#20204;&#33021;&#22815;&#38024;&#23545;&#20219;&#24847;&#32534;&#30721;&#30340;&#20851;&#31995;&#23646;&#24615;&#36827;&#34892;&#25915;&#20987;&#65292;&#24182;&#36827;&#34892;&#20102;&#19968;&#20010;&#26696;&#20363;&#30740;&#31350;&#65292;&#20998;&#26512;&#20102;BERT-like LMs&#22312;&#25191;&#34892;&#30456;&#20851;&#20851;&#31995;&#25552;&#31034;&#20219;&#21153;&#26102;&#22914;&#20309;&#20351;&#29992;&#22810;&#31181;&#20851;&#31995;&#23646;&#24615;&#30340;&#34920;&#31034;&#12290;&#25105;&#20204;&#21457;&#29616;&#65292;&#34429;&#28982;&#34920;&#31034;&#30340;&#36873;&#25321;&#23545;LM&#30340;&#24615;&#33021;&#20135;&#29983;&#20102;&#24433;&#21709;&#65292;&#20294;&#27169;&#22411;&#23545;&#26576;&#20123;&#29305;&#23450;&#20851;&#31995;&#23646;&#24615;&#30340;&#21033;&#29992;&#24182;&#19981;&#19968;&#33268;&#12290;
&lt;/p&gt;
&lt;p&gt;
Despite the recent success of large pretrained language models (LMs) on a variety of prompting tasks, these models can be alarmingly brittle to small changes in inputs or application contexts. To better understand such behavior and motivate the design of more robust LMs, we propose a general experimental framework, CALM (Competence-based Analysis of Language Models), where targeted causal interventions are utilized to damage an LM's internal representation of various linguistic properties in order to evaluate its use of each representation in performing a given task. We implement these interventions as gradient-based adversarial attacks, which (in contrast to prior causal probing methodologies) are able to target arbitrarily-encoded representations of relational properties, and carry out a case study of this approach to analyze how BERT-like LMs use representations of several relational properties in performing associated relation prompting tasks. We find that, while the representation
&lt;/p&gt;</description></item></channel></rss>