Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to suppress double word output? #28

Open
magnumripper opened this issue Jan 20, 2015 · 8 comments
Open

Add option to suppress double word output? #28

magnumripper opened this issue Jan 20, 2015 · 8 comments

Comments

@magnumripper
Copy link
Contributor

How about an option that suppresses output words where the same element is used twice in a row, eg. "correctcorrect" or "correcthorsebatterybatterystaple"?

Not sure how to implement it but I reckon it can be useful (at times) for limiting the keyspace.

@jsteube
Copy link
Member

jsteube commented Jan 21, 2015

Yeah, I think it's a good idea as long as we make it so that the user has to turn that option on by himself.

Just as a reminder, prince is also about brute-force. The sorting of the keyspace creates the 'smooth transition' that leads into a pure brute-force attack in case the user uses one-letter words in the input wordlist. In brute-force we actually want same element used twice.

About the implementation. I think we should do that straight forward. What I mean is we check on chain level, against the *buf array. Use the current element position with -1 and +1 and check for the same element number. @Sc00bz any ideas here?

@Sc00bz
Copy link
Member

Sc00bz commented Jan 21, 2015

When I heard of PP I looked into no duplicate words because I was thinking of the "pick 4+ things near you right now" type of passwords, but even that isn't that much of a difference:
(100^4 - 100 * 99 * 98 * 97) / 100^4 = 5.8906% (no duplicate words)
(100^4 - 100 * 99 * 99 * 99) / 100^4 = 2.9701% (no double words)

The key space is only 3% smaller with a small number of words, 100, and four words of the same size next to each other. 6% smaller if you did no duplicate words. Yes if you look at really small words you might have less but I don't think the complexity of this and the new skip is worth it.

The only real problem is that the first 1% of the key space will all be duplicate/double words. So really if we wanted to change something it would be the order passwords are outputted. The easiest is just start with offsets of 3,2,1,0 (maybe there's a better method like 0/4_N,1/4_N,2/4_N,3/4_N) then just "%N" and "overflow" at positions 3,2,1,0. That makes skip super simple and shouldn't slow this down much. With offsets 3,2,1,0, the second to last 1% of the key space are all duplicate/double words. So the first 98% of the key space has 2% of the key space's double words and the last 2% of the key space has 1% of the key space's.

Huh after writing that I think we should do that by default and have no other option.

Also when I say key space I'm talking about a chain's key space and not the whole key space.

@jsteube
Copy link
Member

jsteube commented Jan 22, 2015

I'm confused. With princeprocessor, there should be no such case. Maybe I've understood it somehow wrong. OK, we expect the input wordlist to be of unique words only. But if that's the case, then there is no duplicate word.

root@et:/princeprocessor/src# cat words
1
2
3
4
root@et:
/princeprocessor/src# ./pp64.bin --elem-cnt-min 4 --elem-cnt-max 4 < words | wc -l
256
root@et:~/princeprocessor/src# ./pp64.bin --elem-cnt-min 4 --elem-cnt-max 4 < words | sort -u | wc -l
256

If this is not what you meant, can you please make an practical example that explains what you mean?

@Sc00bz
Copy link
Member

Sc00bz commented Jan 22, 2015

Repeated words in a single password:

$ ./pp64-o.bin --elem-cnt-min 4 --elem-cnt-max 4 --limit 10 < words
1111
2111
3111
4111
1211
2211
3211
4211
1311
2311

Besides all of these with multiple 1's there's "2211" which has a double 2 and a double 1. @magnumripper is just talking about when they are next to each other. So "2121" would be fine. When I was saying no duplicates "2121" would not be fine because there are multiple 1's and 2's.

@magnumripper
Copy link
Contributor Author

My initial idea was only for two or more consecutive elements so "2121" would be just fine (but "2112" would not) from the elements "1" and "2".
I'm thinking short wordlists, producing sentences like "IloveSarah" and in that case there should almost never be two (or more) same words (elements) in a row. In particular, we don't want even worse candidates like "III" or "IIIIIIII" within in a huge number of output words just because that's a short element.

Anyway, this is only worthwhile if it can be done outside of the password generation loop or otherwise with no performance impact.

@jsteube
Copy link
Member

jsteube commented Jan 22, 2015

So if it's really like that, than I understood it correctly from the beginning. In that case the best way to detect such a case would be to check current element position with -1 and +1 and check for the same element number. At least I think so. But it will cost a bit of performance, hard to say how much.

@magnumripper
Copy link
Contributor Author

I'll do some experiments when I get the time.

@magnumripper
Copy link
Contributor Author

Re-found this issue after thinking about https://hashcat.net/forum/thread-5074.html. In that case, non-consecutive dupes would better be rejected too.

Perhaps both options (ie. consecutive or not) are useful, for different use cases. But after re-reading #28 (comment) above I think that's a better idea - we should try to get candidates consisting of dupe elements produced later instead of rejecting them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants