-
v1.27
-
Changed
- Set the default for
-lcc
to 3 instead of 0 to only search the 3 latest indexes for Common Crawl instead of all of them.
- Set the default for
-
-
v1.26
-
Changed
- Allow an input value of just a TLD, e.g.
.mil
. If a TLD is passed then resources for all domains with that TLD will be retrieved. NOTE: If a TLD is passed then the Alien Vault OTX source is excluded because it needs a full domain.
- Allow an input value of just a TLD, e.g.
-
-
v1.25
-
Changed
- Fix a bug that always strips the port number from URLs found. It should only remove the port if it is :80 or :443
-
-
v1.24
-
Changed
- Handle errors with the config file better. Display specific message to say if the file isn't found or if there is a formatting error. If there is any other kind of error, the error message will be displayed. THe default values will be used in the case of any of these errors.
-
-
v1.23
-
Changed
- The
-ko
/--keywords-only
argument can now be passed without a value, which will use theFILTER_KEYWORDS
inconfig.yml
as before, or passed with a Regex value that will be used instead. For example,-ko "admin"
to only get links containing the wordadmin
, or-ko "\.js(\?\|$)"
to only get JS files. The Regex check is NOT case sensitive.
- The
-
-
v1.22
-
Changed
- Fix issue xnl-h4ck3r#23. If a file is passed as input, an error would occur if any of the domains in the file contained a capital letter or ended with a full stop. The regex in
validateArgInput
has been amended to fix this, adn any.
on the end of a domain is stripped and domain converted to lowercase before processing.
- Fix issue xnl-h4ck3r#23. If a file is passed as input, an error would occur if any of the domains in the file contained a capital letter or ended with a full stop. The regex in
-
-
v1.21
-
Changed
- Fix issue xnl-h4ck3r#24. If the
FILTER_CODE
inconfig.yml
is set to one status code then it is needs to be explicitly set to a string ingetConfig()
- Fix issue xnl-h4ck3r#24. If the
-
-
v1.20
-
New
- Add argument
-fc
for filtering HTTP status codes. Using this will override theFILTER_CODE
value fromconfig.yml
. This is for specifying HTTP status codes you want to exclude from the results, and are provided in a comma separated list. - Add argument
-mc
for matching HTTP status codes. Using this will override theFILTER_CODE
value fromconfig.yml
AND the-fc
argument. This is for specifying HTTP status codes you want to match from the results, and are provided in a comma separated list.
- Add argument
-
Changed
- Changed how filters are specified in the request to the Common Crawl API. Removes the regex negative lookahead which is not needed if you use
filter=!
- Changed how filters are specified in the request to the Common Crawl API. Removes the regex negative lookahead which is not needed if you use
-
-
v1.19
-
Changed
- Bug fix - ignore any blank lines in the input file when validating if input is in the correct format
-
-
v1.18
-
Changed
- Cache the Common Crawl
collinfo.json
file locally. The file is only updated a few times per year so there is no point in requesting it every time waymore is run. Common Crawl can struggle with volume against it's API which can cause timeouts, and currently, about 10% of all requests they get are for thecollinfo.json
! - Add a HTTPAdapter specifically for Common Crawl to have
retries
andbackoff_factor
increased which seems to reduce the errors and maximize the results found.
- Cache the Common Crawl
-
-
v1.17
- Changed
- If an input file has a sub domain starting with _ or - then an error was raised, but these are valid. This bug has been fixed.
- In addition to the fix above, the error message will show what line was flagged in error so the user can raise an issue on Github about it if they believe it is an error.
- Changed
-
v1.16
- Changed
- Fix a bug that raises
ERROR processURLOutput 6: [Errno 2] No such file or directory: ''
if the value passed to-oU
has no directory specified as part of the file name.
- Fix a bug that raises
- Changed
-
v1.15
- Changed
- Fix bug that shows an error when
-v
is passed and-oU
does not specify a directory, just a filename
- Fix bug that shows an error when
- Changed
-
v1.14
- Changed
- Fix a bug with the
-c
/--config
option
- Fix a bug with the
- Changed
-
v1.13
-
New
- Added argument
-oU
/--output-urls
to allow the user to specify a filename (including path) for the URL links file when-mode U
(orB
oth) is used. If not passed, then the filewaymore.txt
will be created in theresults/{target.domain}
directory as normal. If a path is passed with the file, then any directories will be created. For example:-oU ~/Recon/Redbull/waymoreUrls.txt
- Added argument
-oR
/--output-responses
to allow the user to specify a directory (or path) where the archived responses andindex.txt
file is written when-mode R
(orB
oth) is used. If any directories in the path do not exist they will be created. For example:-oR ~/Recon/Redbull/waymoreResponses
- Added argument
-
Changed
- When removing all web archive references in the downloaded archived response, there were a few occasions this wasn't working so the regex has been changed to be more specific to ensure this works.
-
-
v1.12
- New
- Added argument
-c
/--config
to specify the full path of a YML config file. If not passed, it looks for fileconfig.yml
in the same directory as runtime filewaymore.py
- Added argument
- New
-
v1.11
- New
- Added argument
-nlf
/--new-links-file
. If passed, and you run-mode U
or-mode B
to get URLs more than once for the same target, thewaymore.txt
will still be appended with new links (unless-ow
is passed), but a new output file calledwaymore.new
will also be written. If there are no new links, the empty file will still be created. This can be used for continuous monitoring of a target. - Added a
waymore
folder containing a new__init__.py
file that contains the__version__
value. - Added argument
--verison
to display the current version. - Show better error messages if the archive.org site returns a
Blocked Site Error
.
- Added argument
- Changed
- If a file of domains is passed as input, make sure spaces are stripped from the lines.
- Change
.gitignore
to include__pycache__
. - Move images to
waymore/images
folder.
- New
-
v1.10
- New
- If
-mode U
is run for the same target again, by default new links found will be added to thewaymore.txt
file and duplicates removed. - Added argument
-ow
/--output-overwrite
that can be passed to force thewaymore.txt
file to be overwritten with newly found links instead of being appended.
- If
- Changed
- Change the README.md to reflect new changes
- New
-
v1.9
- New
- Add functionality to continue downloading archived responses if it does not complete for any reason. When downloading archived responses, a file called
responses.tmp
will be created with the links of all responses that will be downloaded. There will also be acontinueresp.tmp
that will store the index of the current response being saved. If these files exist when run again, the user will be prompted whether to continue a previous run (so new filters will be ignored) or start a new one. - Add
CONTINUE_RESPONSES_IF_PIPED
toconfig.yml
. Ifstdin
orstdout
is piped from another process, the user is not prompted whether they want a previous run of downloading responses. This value will determine whether to continue a previous run, or start a new one, in that situation.
- Add functionality to continue downloading archived responses if it does not complete for any reason. When downloading archived responses, a file called
- Changed
- Corrected the total pages shown when getting wayback URLs
- Included missing packages in the
requirements.txt
document. - Fix Issue #16 (xnl-h4ck3r#16)
- New
-
v1.8
- Changed
- When archived responses are saved as files, the extension
.xnl
will no longer be used if-url-filename
is passed. If-url-filename
is not passed then the filename is represented by a hash value. The extension of these files will be set to.xnl
only of the original file type cannot be derived from the original URL.
- When archived responses are saved as files, the extension
- Changed
-
v1.7
- New
- Added
-xwm
parameter to exclude getting URL's from Wayback Machine (archive.org) - Added
-lr
/--limit-requests
that can be used to limit the number of requests made per source (excluding Common Crawl) when getting URL's. For example, if you run waymore for-i twitter.com
it says there are 28,903,799 requests to archive.org that need to be made (that could take almost 1000 days for some people!!!). The default value for the argument is 0 (Zero) which will apply no limit as before. There is also an problem with the Wayback Machine CDX API where the number of pages returned is not correct when filters are applied and can cause issues. Setting this parameter to a sensible value can relieve that issue (it may take a while for archive.org to address the problem).
- Added
- Changed
- Make sure that filters in API URL's are escaped correctly
- Add error handling to
getMemory()
to avoid any errors ifpsutil
is not installed
- New
-
v1.6
- New
- Add a docker option to run
waymore
. Include instructions inREADME.md
and a newDockerFile
- Add a docker option to run
- Changed
- If multiple domain/URLs are passed by file or STDIN, remove
*.
from the start of any input values. - Change the default
FILTER_KEYWORDS
to include more useful words. - If a link found from an API has port 80 or 443 specified, e.g.
https://exmaple.com:80/example
then remove the:80
. Many links have this in archive.org so this could reduce the number of similar links reported. - Amend
setup.py
to includeurlparse3
that is now used to get the domain and port of links found
- If multiple domain/URLs are passed by file or STDIN, remove
- New
-
v1.5
- New
- Add argument
-ko
/--keywords-only
which if passed, will only get Links (unless-f
is passed) that have a specified keyword in the URL, and will only download responses (regardless of-f
) where the keyword is in the URL. These multiple keywords can be specified inconfig.yml
in a comma separated list. - Add a
FILTER_KEYWORDS
key/pair toconfig.yml
(and default value in code) initially set toadmin,login,logon,signin,register,dashboard,portal,ftp,cpanel
- Add argument
- Changed
- Only add to the MIME type list if the
-v
option is used because they are not displayed otherwise. - Warn the user if there is a value missing from the config.yml file
- Fixed small bug in
getURLScanUrls
that raised an error forgetSPACER
- Only add to the MIME type list if the
- New
-
v1.4
- New
- Added
-m /--memory-threshold
argument to set memory threshold percentage. If the machines memory goes above the threshold, the program will be stopped and ended gracefully before running out of memory (default: 95) - If
-v
verbose output was used, memory stats will be output at the end, and also shown on teh progress bar downloading responses. - Included
psutil
insetup.py
- Added
- Changed
- Fix some display issues not completely done in v1.3, regarding trailing spaces when errors are displayed.
- Remove line
os.kill(os.getpid(),SIGINT)
fromprocessArchiveUrl
which isn't needed and just causes more error if a user does press Ctrl-C.
- New
-
v1.3
- New
- Added functionality to allow output Links output to be piped to another program (the output file will still be written). Errors and progress bar are written to STDERR. No information about archived responses will be piped.
- Added functionality to allow input to be piped to waymore. This will be the same as passing to
-i
argument.
- Changed
- Use a better way to add trailing spaces to strings to cover up other strings (like progress bar), regardless of terminal width.
- Change the README to mention
-xus
argument and how to get a URLScan API key to add to the config file.
- New
-
v1.2
- Changed
- Removed User-Agent
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1
because it caused problems on some domains inxnLinkFinder
tool, so removing from here too. - Base the length of the progress bar (show when downloading archived responses) on the width of the terminal so it displays better and you don't get multiple lines on smaller windows.
- Amend
.gitignore
to include other unwanted files
- Removed User-Agent
- Changed
-
v1.1
- New
- Allow a file of domains/URL's to be passed as input with
-i
instead of just one.
- Allow a file of domains/URL's to be passed as input with
- Changed
- Remove version numbers from
requirements.txt
as these aren't really needed and may cause some issues.
- Remove version numbers from
- New
-
v1.0
- New
- Added URLScan as a source of URL's. Waymore now has all the same sources for URLs as gau
- Added
-xus
parameter to exclude URLScan when getting URL's. - Added
-r
parameter to specify the number of times requests are retried if they return 429, 500, 502, 503 or 504 (default: 1). - Made requests use a retry strategy using
-r
value, and also abackoff_factor
of 1 for Too Many Redirect (429) responses. - General bug fixes.
- Changed
- Fixed a bug with was preventing HTTP Status Code filtering from working on Alien Vaults requests.
- Fixed a bug that was preventing MIME type filtering from working on Common Crawl requests.
- Correctly escape all characters is strings compared in regex with re.escape instead of just changing
.
to\.
- Changed default MIME type filter to include: video/webm,video/3gpp,application/font-ttf,audio/mp3,audio/x-wav,image/pjpeg,audio/basic
- Changed default URL filter to include:/jquery,/bootstrap
- If Ctrl-C is used to end the program, try to ensure that results at that point are still saved before ending.
- New
-
v0.3
- New
- Added Alien Vault OTX as a source of URL's. Results cannot be checked against MIME filters though because that info is not available in the API response.
- Added
-xav
parameter to exclude Alien Vault when getting URL's.
- Changed
- Improved regex for the
-i
input value to ensure it's a valid domain, with or without sub domains and path, but no query string or fragments. - General tidying up and improvements
- Improved regex for the
- New
-
v0.2
- New
- Added to the TODO list on README.md of changes coming soon
- Changed
- When getting URl's from archive.org it now uses pagination. Instead of one API call (how waybackurls does it), it makes one call per page of URL's (how gau does it). This actually results in a lot more URL's being returned even though the archive.org API docs seem to imply it should be the same. So in comparison to gau it now returns the same number of URL's from archive.org
- Ensure input domain/path is URL encoded when added to the API call URL's
- New
-
v0.1 - Initial release