Skip to content

Commit

Permalink
Add changes
Browse files Browse the repository at this point in the history
  • Loading branch information
qhduan committed Nov 21, 2024
1 parent 300535f commit 757ef2d
Show file tree
Hide file tree
Showing 15 changed files with 475 additions and 1,173 deletions.
232 changes: 56 additions & 176 deletions cs.AI.md

Large diffs are not rendered by default.

272 changes: 56 additions & 216 deletions cs.AI.xml

Large diffs are not rendered by default.

72 changes: 21 additions & 51 deletions cs.CL.md

Large diffs are not rendered by default.

82 changes: 21 additions & 61 deletions cs.CL.xml

Large diffs are not rendered by default.

15 changes: 1 addition & 14 deletions cs.IR.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,9 @@

| Ref | Title | Summary |
| --- | --- | --- |
| [^1] | [Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation.](http://arxiv.org/abs/2307.11019) | 本研究初步分析了大型语言模型的事实知识边界,并研究了检索增强对开放域问答任务中大型语言模型的影响。结果显示大型语言模型在回答问题时表现出自信,并且回答准确。 |

# 详细

[^1]: 用检索增强研究大型语言模型的事实知识边界

Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation. (arXiv:2307.11019v2 [cs.CL] UPDATED)

[http://arxiv.org/abs/2307.11019](http://arxiv.org/abs/2307.11019)

本研究初步分析了大型语言模型的事实知识边界,并研究了检索增强对开放域问答任务中大型语言模型的影响。结果显示大型语言模型在回答问题时表现出自信,并且回答准确。


# 详细

知识密集型任务(例如,开放域问答(QA))需要大量的事实知识,并经常依赖外部信息进行协助。最近,大型语言模型(LLMs)(例如,ChatGPT)在解决包括知识密集型任务在内的各种任务上展现出了惊人的能力。然而,目前尚不清楚LLMs在感知其事实知识边界方面表现如何,特别是在使用检索增强时的行为。在本研究中,我们对LLMs的事实知识边界进行了初步分析,并研究了检索增强对LLMs在开放域QA上的影响。具体而言,我们关注了三个主要研究问题,并通过检查LLMs的QA性能、先验判断和后验判断来进行分析。我们提供了证据表明LLMs对于自己回答问题的能力和回答的准确性充满了自信。

Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount of factual knowledge and often rely on external information for assistance. Recently, large language models (LLMs) (e.g., ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks with world knowledge, including knowledge-intensive tasks. However, it remains unclear how well LLMs are able to perceive their factual knowledge boundaries, particularly how they behave when incorporating retrieval augmentation. In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially, we focus on three primary research questions and analyze them by examining QA performance, priori judgement and posteriori judgement of LLMs. We show evidence that LLMs possess unwavering confidence in their capabilities to respond to questions and the accuracy of their responses. Furthermore, retrieval


22 changes: 1 addition & 21 deletions cs.IR.xml
Original file line number Diff line number Diff line change
@@ -1,21 +1 @@
<rss version="2.0"><channel><title>Chat Arxiv cs.IR</title><link>https://github.com/qhduan/cn-chat-arxiv</link><description>This is arxiv RSS feed for cs.IR</description><item><title>&#26412;&#30740;&#31350;&#21021;&#27493;&#20998;&#26512;&#20102;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;&#20107;&#23454;&#30693;&#35782;&#36793;&#30028;&#65292;&#24182;&#30740;&#31350;&#20102;&#26816;&#32034;&#22686;&#24378;&#23545;&#24320;&#25918;&#22495;&#38382;&#31572;&#20219;&#21153;&#20013;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;&#24433;&#21709;&#12290;&#32467;&#26524;&#26174;&#31034;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#22312;&#22238;&#31572;&#38382;&#39064;&#26102;&#34920;&#29616;&#20986;&#33258;&#20449;&#65292;&#24182;&#19988;&#22238;&#31572;&#20934;&#30830;&#12290;</title><link>http://arxiv.org/abs/2307.11019</link><description>&lt;p&gt;
&#29992;&#26816;&#32034;&#22686;&#24378;&#30740;&#31350;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;&#20107;&#23454;&#30693;&#35782;&#36793;&#30028;
&lt;/p&gt;
&lt;p&gt;
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation. (arXiv:2307.11019v2 [cs.CL] UPDATED)
&lt;/p&gt;
&lt;p&gt;
http://arxiv.org/abs/2307.11019
&lt;/p&gt;
&lt;p&gt;
&#26412;&#30740;&#31350;&#21021;&#27493;&#20998;&#26512;&#20102;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;&#20107;&#23454;&#30693;&#35782;&#36793;&#30028;&#65292;&#24182;&#30740;&#31350;&#20102;&#26816;&#32034;&#22686;&#24378;&#23545;&#24320;&#25918;&#22495;&#38382;&#31572;&#20219;&#21153;&#20013;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;&#24433;&#21709;&#12290;&#32467;&#26524;&#26174;&#31034;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#22312;&#22238;&#31572;&#38382;&#39064;&#26102;&#34920;&#29616;&#20986;&#33258;&#20449;&#65292;&#24182;&#19988;&#22238;&#31572;&#20934;&#30830;&#12290;
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#30693;&#35782;&#23494;&#38598;&#22411;&#20219;&#21153;&#65288;&#20363;&#22914;&#65292;&#24320;&#25918;&#22495;&#38382;&#31572;&#65288;QA&#65289;&#65289;&#38656;&#35201;&#22823;&#37327;&#30340;&#20107;&#23454;&#30693;&#35782;&#65292;&#24182;&#32463;&#24120;&#20381;&#36182;&#22806;&#37096;&#20449;&#24687;&#36827;&#34892;&#21327;&#21161;&#12290;&#26368;&#36817;&#65292;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#65288;LLMs&#65289;&#65288;&#20363;&#22914;&#65292;ChatGPT&#65289;&#22312;&#35299;&#20915;&#21253;&#25324;&#30693;&#35782;&#23494;&#38598;&#22411;&#20219;&#21153;&#22312;&#20869;&#30340;&#21508;&#31181;&#20219;&#21153;&#19978;&#23637;&#29616;&#20986;&#20102;&#24778;&#20154;&#30340;&#33021;&#21147;&#12290;&#28982;&#32780;&#65292;&#30446;&#21069;&#23578;&#19981;&#28165;&#26970;LLMs&#22312;&#24863;&#30693;&#20854;&#20107;&#23454;&#30693;&#35782;&#36793;&#30028;&#26041;&#38754;&#34920;&#29616;&#22914;&#20309;&#65292;&#29305;&#21035;&#26159;&#22312;&#20351;&#29992;&#26816;&#32034;&#22686;&#24378;&#26102;&#30340;&#34892;&#20026;&#12290;&#22312;&#26412;&#30740;&#31350;&#20013;&#65292;&#25105;&#20204;&#23545;LLMs&#30340;&#20107;&#23454;&#30693;&#35782;&#36793;&#30028;&#36827;&#34892;&#20102;&#21021;&#27493;&#20998;&#26512;&#65292;&#24182;&#30740;&#31350;&#20102;&#26816;&#32034;&#22686;&#24378;&#23545;LLMs&#22312;&#24320;&#25918;&#22495;QA&#19978;&#30340;&#24433;&#21709;&#12290;&#20855;&#20307;&#32780;&#35328;&#65292;&#25105;&#20204;&#20851;&#27880;&#20102;&#19977;&#20010;&#20027;&#35201;&#30740;&#31350;&#38382;&#39064;&#65292;&#24182;&#36890;&#36807;&#26816;&#26597;LLMs&#30340;QA&#24615;&#33021;&#12289;&#20808;&#39564;&#21028;&#26029;&#21644;&#21518;&#39564;&#21028;&#26029;&#26469;&#36827;&#34892;&#20998;&#26512;&#12290;&#25105;&#20204;&#25552;&#20379;&#20102;&#35777;&#25454;&#34920;&#26126;LLMs&#23545;&#20110;&#33258;&#24049;&#22238;&#31572;&#38382;&#39064;&#30340;&#33021;&#21147;&#21644;&#22238;&#31572;&#30340;&#20934;&#30830;&#24615;&#20805;&#28385;&#20102;&#33258;&#20449;&#12290;
&lt;/p&gt;
&lt;p&gt;
Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount of factual knowledge and often rely on external information for assistance. Recently, large language models (LLMs) (e.g., ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks with world knowledge, including knowledge-intensive tasks. However, it remains unclear how well LLMs are able to perceive their factual knowledge boundaries, particularly how they behave when incorporating retrieval augmentation. In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially, we focus on three primary research questions and analyze them by examining QA performance, priori judgement and posteriori judgement of LLMs. We show evidence that LLMs possess unwavering confidence in their capabilities to respond to questions and the accuracy of their responses. Furthermore, retrieval
&lt;/p&gt;</description></item></channel></rss>
<rss version="2.0"><channel><title>Chat Arxiv cs.IR</title><link>https://github.com/qhduan/cn-chat-arxiv</link><description>This is arxiv RSS feed for cs.IR</description></channel></rss>
Loading

0 comments on commit 757ef2d

Please sign in to comment.