Monthly Reports

Table of Contents

[[TOC]]

July 2020

July 7

☑️ Past Week:

Completed the algorithm for deciding which test to run for exit relays
Added GeoIP information to produce graphs for CAPTCHA rate per country
Solved the memory leak issue
Added annotations to the data
Started versioning the codebase
Added new the tests for fetching with "firefox_over_tor" and additional websites like https://www.fiverr.com

🔲 Week Ahead:

Finishing implementing the Cloudflare API module to carry out tests with different Cloudflare security levels
Sending an email to tor-dev mailing list to convey the updates on the project
Updating the dashboard to show new features like annotations, versions, etc.

🛑 Current Blockers: None

June 2020

June 30

☑️ Past Week:

Switched to using HTTP Header Live extension to collect HTTP headers instead of using seleniumwire
- Seleniumwire was triggering the MITM detection on the Cloudflare end and it was causing an unreaslistic increase in the CAPTCHA rate
Added the support for testing with different Tor Browser versions
Added the support for checking the webpage integrity
- Cloudflare sometimes inserts its own JavaScript code into the customer's webpage without letting customers know
- I check for these changes by comparing the MD5 hashes of the page content
Added 'Measurement Search' section to the dashboard to see individual data points
- Added color indicators for each row to quickly highlight the situation of the measurement
  - Green if there was no CAPTCHA and the page integrity was protected
  - Orange if CAPTCHA was detected or page integrity wasn't protected
  - Red if both CAPTCHA was detected and page integrity wasn't protected
- Added the support for sharing the custom searches by copying the dashboard's URL
Added an algorithm for assigning IPv6 only domains only to exit nodes that support IPv6 exiting to increase the efficiency

🔲 Week Ahead:

Creating the algorithm for deciding which test to run for exit relays. This algorithm will add missing tests to the queue when a new relay appears and refresh the measurements for existing relays.
Adding GeoIP information to produce graphs for CAPTCHA rate per country
Utilizing the earlier implemented Cloudflare API module to carry out tests with different Cloudflare security levels

🛑 Current Blockers:

I have a memory leak issue. I don't know how I managed to have a memory leak while using Python but I did :)
Sometimes Tor Browser doesn't quit properly and these 'zombie' instances of Tor Browser keep accumulating and occupying space in the memory. Currently, I'm not sure if this is related to selenium, Tor Browser, or both. I need to solve this issue to keep collecting data without any down time. Otherwise, I need to manually remove the zombie instances and it is not a good solution at all.

June 23

☑️ Past Week:

Updated the dashboard at https://dashboard.captcha.wtf/
Implemented the multiple process based parallelism mentioned last week
Started collecting data with the new code, the collected data is available at the dashboard
Moved the codebase to Tor Project's Gitlab

🔲 Week Ahead:

I will work on further decreasing the measurement times
- Using exit_policy_v6_summary tag from Onionoo to identify exit nodes that support IPv6 and using only these exit nodes for IPv6 tests
Adding the ability to use different versions/releases of the Tor Browser

🛑 Current Blockers: None

June 16

☑️ Past Week:

Updated the Stem integration to set 2 hop circuits for the measurements
- The first hop is chosen randomly and the final hop is the target exit node
- Managed to decrease individual test time to 10-14 seconds range with this update
Experimented with using the "New Identity" button instead of fully restarting the browser
- Selenium had issues with reattaching to the browser when I used the "New Identity" button
Experimented with Docker swarm to run isolated Tor and Tor Browser instances but encountered problems

🔲 Week Ahead: I was using Docker swarm to have multiple measurements in parallel but that method started becoming unnecessarily complex, memory consuming, and difficult to debug. I decided to use multiple processes on the host machine instead. So, I'm will be coding it.

🛑 Current Blockers: None

June 8

☑️ Past Week:

Integrated Tor Stem to specify exit nodes
Integrated Cloudflare API to change security levels
Added the feature to change Tor Browser's security levels
Got the dashboard and data collection system up and running
Started using the pytest framework for testing

🔲 Week Ahead: Currently, it takes about 40 hours to complete the measurements for all exit nodes. The initial plan was to perform these measurements every day. The measurements need to take less time to fit them into a day. So, I will be working on assigning different processes to different metrics to run them in parallel, which should decrease the processing time.

🛑 Current Blockers: None

June 1

☑️ Past Week:

Worked on restructuring the codebase to achieve some of the goals set earlier
Created "fetchers" for different web browsers
Worked on making seleniumwire work with the Tor Browser Bundle
- Spent time on finding correct settings to flip in the browser and finding the correct way to configure the proxy. This is the resulting script that can capture and modify HTTP headers between Tor and Tor Browser.
Wrote a test for testing the existing code

🔲 Week Ahead: Finally got the code for the first version work. So, I plan to have the whole system (including the dashboard) up and running tomorrow. After that, I will work on integrating the Tor Stem and Cloudflare API into the system.

🛑 Current Blockers: None

May 2020

May 25

☑️ Past Week:

Created the trac tickets for milestones for my project
Used the community feedback to update certain aspects of the project
- Modified the previously registered domains to have IPv4 and IPv6 records only [suggested by ticket:33010#comment:2]
  - captcha.wtf -> IPv4 only
  - exit11.online -> IPv6 only
- Updated the project diagram and fixed the wrong wording about DNS & CDN usage [suggested by ticket:33010#comment:28]
- Updated the captcha string to "Cloudflare" to from "Attention Required! | Cloudflare" accommodate possible localizations by Cloudflare [suggested by ticket:33010#comment:25]
Added Let's Encrypt issued SSL certificates to the bypass subdomains on the domains
Added a Let's Encrypt issued SSL certificate to my IRC bouncer
Switched to the Docker versions of the modules/software used in the project
Switched to using a Metabase dashboard from Grafana dashboard to visualize collected data
Switched to using an SQLite database to store collected data. Previously, influxdb was used and it was a very cumbersome process to export data to other formats. Now, the SQLite database can be easily exported to other formats.
Added an SQLite example to the base project code
Created the template for the Read the Docs documentation for the project
- Connected the Read the Docs page to GitHub via webhooks to automate documentation generation process
- https://captcha-monitor.readthedocs.io/

🔲 Week Ahead:

Making the collected data downloadable
Having a fully working (hopefully dockerized) proof of concept
- I already had one working, but it was very poorly implemented since I was trying to do my university work at the same time
Creating better documentation for the code I have at the moment

🛑 Current Blockers: None

May 18

☑️ Past Week: I spent some of my time setting up the IRC “bouncer” infrastructure to receive IRC messages all the time. I started talking to the OONI people about my project. I also took a very long rescue flight to return home from my university location. Meanwhile, I had finals, and I’m done with my final exams, finally.

🔲 Week Ahead: I plan to actually open the trac tickets to define individual tasks for my project. I planned to do it last week, but I couldn’t do it because of the last-minute developments in my life. I will also keep discussing the details of my project with the external researchers I mentioned.

🛑 Current Blockers: None

May 11

☑️ Past Week: I spent my time getting used to IRC and getting know to my mentors. I wrote a wiki article on the Tor Project’s trac to explain my project. The wiki article can be found here 1. My mentors introduced me to a few external researchers that might be helpful for my project. My previous week’s blog post can be found here 1.

🔲 Week Ahead: I plan to open trac tickets to define individual tasks for my project. So that the wider community can make comments on them and watch the progress. I will also discuss the details of my project with the external researchers I mentioned.

🛑 Current Blockers: I have my university finals this week. They don’t really block my progress but they do slow it down.

Home
Code
Documentation
Dataset
Detailed Description
Expected Long-term Impact
Approach
Metrics to Track
Related Tickets
Roadmap
Development
Contact
Contributing

Updates
Monthly Reports
July 2020
June 2020
May 2020
Weekly Blog Posts
July 2020
June 2020
May 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monthly Reports

July 2020

July 7

June 2020

June 30

June 23

June 16

June 8

June 1

May 2020

May 25

May 18

May 11

Clone this wiki locally