Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future exception was never retrieved when closing page waiting for download event #823

Closed
tcrs opened this issue Jul 27, 2021 · 3 comments · Fixed by #824
Closed

Future exception was never retrieved when closing page waiting for download event #823

tcrs opened this issue Jul 27, 2021 · 3 comments · Fixed by #824
Labels

Comments

@tcrs
Copy link

tcrs commented Jul 27, 2021

I'm using playwright to download PDFs of URLs from RSS feeds. Some of the URLs are actually links to PDFs (mixed with links to "normal" webpages), and I'd like to handle that by downloading the PDFs. I have an implementation which works, I've included a minimal(ish) version below which accepts a URL and a filename to write it to. You can try for example (where script.py contains the code below):

Convert a web page to PDF: python3 script.py https://arxiv.org/abs/1912.11035 a.pdf
Download a PDF: python3 script.py https://arxiv.org/pdf/1912.11035 b.pdf

The first example (converting a webpage to a PDF) outputs this:

goto success: https://arxiv.org/abs/1912.11035
download exception: https://arxiv.org/abs/1912.11035: Page closed
Future exception was never retrieved
future: <Future finished exception=Error('Target page, context or browser has been closed')>
playwright._impl._api_types.Error: Target page, context or browser has been closed

I can't figure out how to stop the "Future exception was never retrieved" warning being printed. As you can see the "Page closed" exception has been caught in the exception handler for await download_task.

Am I doing something wrong? Or is this an issue in the playwright code?

import sys
import asyncio
from playwright.async_api import async_playwright

async def download(url, filename):
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        context = await browser.new_context(accept_downloads = True, java_script_enabled = False)
        page = await context.new_page()

        download_task = asyncio.create_task(page.wait_for_event('download'))
        goto_task = asyncio.create_task(page.goto(url, wait_until='networkidle'))
        try:
            await goto_task
            await page.pdf(path=filename)
            print('goto success: ' + url)
            await page.close()
            success = True
        except Exception as e:
            print('goto exception: {}: {}'.format(url, e))

        try:
            download = await download_task
            await download.save_as(filename)
            print('download success: ' + url)
            await page.close()
            success = True
        except Exception as e:
            print('download exception: {}: {}'.format(url, e))

        if not success:
            await page.close()

if __name__ == '__main__':
    asyncio.run(download(sys.argv[1], sys.argv[2]))

@mxschmitt
Copy link
Member

This looks like unexpected behaviour, but since this is about the logic which gets rewritten in #820 I will check it after #820 got fixed (soon). Thanks for your great reproducible!

@tcrs
Copy link
Author

tcrs commented Jul 28, 2021

Thanks for the quick fix!

@mkbdes
Copy link

mkbdes commented Aug 9, 2022

seu cod:

  1.     except Exception as e:
         print('download exception: {}: {}'.format(url, e))
    

remover o aviso
2. except:
print('download exception: {}: {}'.format(url))

Estou usando o dramaturgo para baixar PDFs de URLs de feeds RSS. Alguns dos URLs são na verdade links para PDFs (misturados com links para páginas da web "normais"), e eu gostaria de lidar com isso baixando os PDFs. Eu tenho uma implementação que funciona, eu incluí uma versão mínima (ish) abaixo que aceita uma URL e um nome de arquivo para escrevê-lo. Você pode tentar, por exemplo (onde contém o código abaixo):script.py

Converta uma página da Web em PDF: Baixe um PDF: python3 script.py https://arxiv.org/abs/1912.11035 a.pdf``python3 script.py https://arxiv.org/pdf/1912.11035 b.pdf

O primeiro exemplo (converter uma página da Web em um PDF) é o resultado:

goto success: https://arxiv.org/abs/1912.11035
download exception: https://arxiv.org/abs/1912.11035: Page closed
Future exception was never retrieved
future: <Future finished exception=Error('Target page, context or browser has been closed')>
playwright._impl._api_types.Error: Target page, context or browser has been closed

Não consigo descobrir como parar o aviso de "Exceção do Futuro nunca foi recuperada" sendo impresso. Como você pode ver, a exceção "Página fechada" foi capturada no manipulador de exceção para .await download_task

Estou fazendo algo errado? Ou isso é um problema no código do dramaturgo?

import sys
import asyncio
from playwright.async_api import async_playwright

async def download(url, filename):
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        context = await browser.new_context(accept_downloads = True, java_script_enabled = False)
        page = await context.new_page()

        download_task = asyncio.create_task(page.wait_for_event('download'))
        goto_task = asyncio.create_task(page.goto(url, wait_until='networkidle'))
        try:
            await goto_task
            await page.pdf(path=filename)
            print('goto success: ' + url)
            await page.close()
            success = True
        except Exception as e:
            print('goto exception: {}: {}'.format(url, e))

        try:
            download = await download_task
            await download.save_as(filename)
            print('download success: ' + url)
            await page.close()
            success = True
        except Exception as e:
            print('download exception: {}: {}'.format(url, e))

        if not success:
            await page.close()

if __name__ == '__main__':
    asyncio.run(download(sys.argv[1], sys.argv[2]))

**seu cod:

  1.     except Exception as e:
         print('download exception: {}: {}'.format(url, e))
    

remover o aviso
2. except:
print('download exception: {}: {}'.format(url))**

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants