-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.html
200 lines (200 loc) · 8.36 KB
/
README.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
<h1>Polly</h1>
<p><strong>Note: I developed this for my personal use. Do not use it unless you
understand the ramifications of using security software which almost
certainly has bugs which black hats could exploit.</strong></p>
<p>Build a corpus of common words from messages in an IMAP folder then
use that to generate <a href="https://xkcd.com/936/">XKCD936-style passwords</a>.</p>
<p>Run "python polly.py -h" to get a brief description of how to run the
program. See polly.cfg.sample for a sample config file. It's particularly
easy to connect to a Gmail server. Generate an application password to use
in the password field, then create a filter to funnel common <em>public</em>
mailing list mail into a "polly" label, then set the folder option to polly.
Once you have the config file set up, just run it like so:</p>
<p>python polly.py -c polly.cfg</p>
<p>Enter "help" at the "?" prompt to get a sumary of commands you can run at
the prompt.</p>
<h2>Motivation</h2>
<p>I got the idea from a <a href="https://mail.python.org/pipermail/python-list/2014-August/827854.html">post by Chris Angelico to
comp.lang.python</a>.
In Chris's game, Polly is a parrot who listens to the chatter of D&D players
and spits out passwords when asked. I thought it was an excellent idea, but
as I don't play Dungeons & Dragons, I needed another way to build a
dictionary of common words. It occurred to me that searching messages posted
to public mailing lists from an IMAP server for commonly used words might
work. I'm a Gmail user, so it was easy to create a new filter which labeled
messages sent to a number of public mailing lists and Internet forums as
"polly". Instant corpus! The polly program is pointed at the polly
"folder" on my Gmail account and collects common words to use as the basis
of a modified XKCD 936 passphrase generator.</p>
<p>Is this a new idea? No. It is mostly a programming exercise. Any
messages received from a number of public mailing lists and Internet
forums I subscribe to are tagged with that label. In that sense, the
dictionary from which words are chosen is probably unique, containing
words which are familiar to me, but unlikely to be found in other
similar word lists like "codepoints" and "chainstay." Beyond that,
it's probably not too different from other systems like
<a href="http://world.std.com/~reinhold/diceware.html">Diceware</a>, though
slightly more automated.</p>
<h2>Basic idea</h2>
<ol>
<li>Choose a set of random words (default four) from the dictionary
(basic XKCD 936
passphrase). For example: <code>correct horse battery staple</code>.</li>
<li>Optionally separate the words using punctuation or digits. For
example: <code>correct!horse^battery5staple</code>.</li>
<li>Optionally upshift individual letters in the words (with low
probability). For example: <code>corRect!horsE^battery5Staple</code>.</li>
<li>Optionally insert punctuation or digits between letters (with
even lower probability). For example: <code>corRec3t!horsE^bat_tery5Staple</code>.</li>
</ol>
<p>The user can choose to use any or all of the above tweaks in the
config file.</p>
<h2>Constraints</h2>
<ul>
<li>
<p>Minimum word length is configurable, but defaults to four letters.</p>
</li>
<li>
<p>Words will not be selected if they contain any character which is not
an ASCII lower case letter.</p>
</li>
<li>
<p>Processing the mail is dumb. It just tries to process "words" in the text
portions of each message it downloads.</p>
</li>
<li>
<p>The specified IMAP server is not queried by default. Once you have
generated a corpus, you can just use it to generate
passwords. Execute the "read" command to instruct polly to process
new emails from the IMAP server. It grabs the most recent 100 days
worth of message ids, discarding any which have already been processed.</p>
</li>
</ul>
<h2>Caveats</h2>
<p>There are a number of caveats to this sort of program:</p>
<ul>
<li>
<p>The XKCD 936 password scheme needs a large enough corpus to choose
from. If your corpus is too small, the amount of entropy available
in the suggested passwords will be small. This URL might be worth
reading: http://security.stackexchange.com/questions/62832</p>
</li>
<li>
<p>I specifically set up my IMAP folder to only contain words which
appear on public mailing lists to which I subscribe. While adding
other sources of words is probably okay, perhaps you should think
twice before adding words from private mail to your polly
folder. Still, if you included all your email, the risk of exposing
private information is low, as all suggestions are generated by you,
and capitalized words or words containing punctuation or numbers are
avoided.</p>
</li>
<li>
<p>Polly is probably not going to be all that helpful on systems which
truncate passwords past a certain limit. Login passwords on many Unix
systems come to mind. While it appears that modern systems are catching
up, you might still find your system uses DES encryption, limiting you to
just eight characters: http://stackoverflow.com/questions/2179649</p>
</li>
<li>
<p>I'm just scratching an itch here. You're welcome to do what you want
with polly, even suggest enhancements. Just don't expect any formal
support. (Fork away all you Github aficionados!)</p>
</li>
<li>
<p>I allow you to cheat a little. If you're having trouble generating a
large enough corpus or simply don't want to go the IMAP route, you
can use the add command to tell polly to select a number of words at
random from the given file. As the typical Unix words file contains
many not-so-common words, I included a common-words file you can use
for this purpose. The In fact, if you don't actually want to go to
the trouble of setting up the IMAP thing, just execute "add
common-words 2048". The common-words file contains a little more
than 4200 words.</p>
</li>
<li>
<p>I had never before tried to communicate with an IMAP server. I am
probably doing this inefficiently, if not downright wrong.</p>
</li>
</ul>
<h2>Commands</h2>
<ul>
<li>
<p>add dictfile n - add n random words from dictfile</p>
</li>
<li>
<p>bad word ... - mark one or more words as bad</p>
</li>
<li>
<p>dict dictfile - report words not present in dictfile</p>
</li>
<li>
<p>exit - quit the program</p>
</li>
<li>
<p>quit - quit the program</p>
</li>
<li>
<p>good dictfile - declare the words in dictfile to be "good" when
executing the dict command.</p>
</li>
<li>
<p>help or ? - print this help</p>
</li>
<li>
<p>option - display all options and their current values</p>
</li>
<li>
<p>option name value - set option "name" to value</p>
</li>
<li>
<p>password [n] - generate n passwords (default 1)</p>
</li>
<li>
<p>read - read messages from the IMAP server in a second thread</p>
</li>
<li>
<p>rebuild - rebuild the 'good' words list</p>
</li>
<li>
<p>save - write the pickle save file and bad words file</p>
</li>
<li>
<p>stat - print some simple statistics about the collected words</p>
</li>
<li>
<p>verbose - toggle verbose flag</p>
</li>
</ul>
<p>Readline support is enabled. The default editing mode is emacs. You can set
the edit-mode option in the config file to select vi.</p>
<h2>Options</h2>
<h3>Dictionary Construction</h3>
<ul>
<li>server - IMAP server</li>
<li>user - email address on the server</li>
<li>password - password on the server</li>
<li>folder - name of the folder to process</li>
<li>nwords - size of dictionary</li>
</ul>
<h3>Generating Passwords</h3>
<ul>
<li>punctuation - whether to include punctuation in passwords (True/False)</li>
<li>digits - whether to use digits in passwords (True/False)</li>
<li>upper - whether to randomly upcase some letters (True/False)</li>
<li>minchars - minimum word length</li>
<li>maxchars - maximum word length</li>
<li>length - number of words in a passphrase</li>
</ul>
<h2>Testing</h2>
<p>There's not much to the testing, just some test configs in tests/cfgs which
are run with a predictable "random" number generator. For this, a special
"unittests" option is used. Don't use it for anything else, as it completely
wrecks the random number generator.</p>
<p>To run the tests, execute:</p>
<pre><code>bash tests/runtests.sh
</code></pre>
<p>The output will be compared with tests/output/expected.out. To add new
tests, add new config files to tests/cfgs (".cfg" is the required extension)
and run the script with the --generate command line flag. The output will be
written to stdout.</p>