Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScanEmail: Restrict Access in WeasyPrint #459

Closed
wants to merge 4 commits into from

Conversation

phutelmyer
Copy link
Contributor

@phutelmyer phutelmyer commented May 23, 2024

Add local_fetch_only Function to Restrict External Network Access in WeasyPrint

Description

This PR introduces a custom URL fetcher function, local_fetch_only, for WeasyPrint. The purpose of this function is to prevent any external network access during the fetching process. It allows only local file paths, base64 encoded data, and relative URLs. All other URLs, including HTTP, HTTPS, FTP, and IP addresses, are blocked. Previously, external calls were observed for things like CSS files and such. This should be restricted.

Implementation

The local_fetch_only function is designed to:

  • Allow Base64 Encoded Data: URLs with the data scheme.
  • Allow Local File Paths: URLs with the file scheme.
  • Allow Relative URLs: URLs without a scheme.

For all other URL schemes (e.g., http, https, ftp), the function returns an empty response, effectively blocking the request.

Code

from urllib.parse import urlparse
from weasyprint import default_url_fetcher

def local_fetch_only(url, *args, **kwargs):
    """
    Custom URL fetcher for WeasyPrint that prevents any external network access.

    This function allows only local file paths, base64 encoded data, and relative URLs. It blocks all other URLs,
    including HTTP, HTTPS, FTP, and IP addresses, ensuring that no external network access occurs during the fetching
    process.

    Args:
        url (str): The URL to fetch.
        *args: Additional positional arguments.
        **kwargs: Additional keyword arguments.

    Returns:
        dict: A dictionary containing an empty string for 'string', 'text/plain' for 'mime_type', and 'utf8' for 'encoding'
              if the URL is blocked. Otherwise, it uses the default fetcher for local resources.
    """
    parsed_url = urlparse(url)

    # Allow base64 encoded data, local file paths, or relative URLs
    if parsed_url.scheme in ('data', 'file', ''):
        return default_url_fetcher(url, *args, **kwargs)

    # Block all other URLs (http, https, ftp, IP addresses, etc.)
    return {
        'string': '',
        'mime_type': 'text/plain',
        'encoding': 'utf8'
    }

Reasoning

The primary motivation for this implementation is security. By blocking external network requests, we ensure that WeasyPrint cannot inadvertently leak data or fetch resources from untrusted sources. This will prevent some resources from loading but ultimately is safer.

Control

Allowing only local file paths and base64 encoded data provides fine-grained control over the resources that can be accessed. Relative URLs are permitted to ensure that internal resources can still be referenced without specifying the full URL.

Use Cases

  1. Base64 Data URL
<img src="">
  1. Local File URL
<img src="file:///path/to/local/image.png">
  1. Relative URL
<img src="/images/local_image.png">

Describe testing procedures
An additional test with an eml file was created to test the retrieval of external (fake) resources. This test will produce a thumbnail of an image without additional tags

Sample output
If this change modifies Strelka's output, then please include a sample of the output here.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of and tested my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings

@phutelmyer phutelmyer requested a review from skalupa May 23, 2024 15:28
@phutelmyer
Copy link
Contributor Author

Closing for preference to #462

@phutelmyer phutelmyer closed this Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants