Future exception was never retrieved when closing page waiting for download event #823

tcrs · 2021-07-27T17:35:23Z

I'm using playwright to download PDFs of URLs from RSS feeds. Some of the URLs are actually links to PDFs (mixed with links to "normal" webpages), and I'd like to handle that by downloading the PDFs. I have an implementation which works, I've included a minimal(ish) version below which accepts a URL and a filename to write it to. You can try for example (where script.py contains the code below):

Convert a web page to PDF: python3 script.py https://arxiv.org/abs/1912.11035 a.pdf
Download a PDF: python3 script.py https://arxiv.org/pdf/1912.11035 b.pdf

The first example (converting a webpage to a PDF) outputs this:

goto success: https://arxiv.org/abs/1912.11035
download exception: https://arxiv.org/abs/1912.11035: Page closed
Future exception was never retrieved
future: <Future finished exception=Error('Target page, context or browser has been closed')>
playwright._impl._api_types.Error: Target page, context or browser has been closed

I can't figure out how to stop the "Future exception was never retrieved" warning being printed. As you can see the "Page closed" exception has been caught in the exception handler for await download_task.

Am I doing something wrong? Or is this an issue in the playwright code?

import sys
import asyncio
from playwright.async_api import async_playwright

async def download(url, filename):
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        context = await browser.new_context(accept_downloads = True, java_script_enabled = False)
        page = await context.new_page()

        download_task = asyncio.create_task(page.wait_for_event('download'))
        goto_task = asyncio.create_task(page.goto(url, wait_until='networkidle'))
        try:
            await goto_task
            await page.pdf(path=filename)
            print('goto success: ' + url)
            await page.close()
            success = True
        except Exception as e:
            print('goto exception: {}: {}'.format(url, e))

        try:
            download = await download_task
            await download.save_as(filename)
            print('download success: ' + url)
            await page.close()
            success = True
        except Exception as e:
            print('download exception: {}: {}'.format(url, e))

        if not success:
            await page.close()

if __name__ == '__main__':
    asyncio.run(download(sys.argv[1], sys.argv[2]))

The text was updated successfully, but these errors were encountered:

mxschmitt · 2021-07-27T21:16:52Z

This looks like unexpected behaviour, but since this is about the logic which gets rewritten in #820 I will check it after #820 got fixed (soon). Thanks for your great reproducible!

tcrs · 2021-07-28T19:46:23Z

Thanks for the quick fix!

mkbdes · 2022-08-09T22:25:47Z

seu cod:

    except Exception as e:
     print('download exception: {}: {}'.format(url, e))

remover o aviso
2. except:
print('download exception: {}: {}'.format(url))

Estou usando o dramaturgo para baixar PDFs de URLs de feeds RSS. Alguns dos URLs são na verdade links para PDFs (misturados com links para páginas da web "normais"), e eu gostaria de lidar com isso baixando os PDFs. Eu tenho uma implementação que funciona, eu incluí uma versão mínima (ish) abaixo que aceita uma URL e um nome de arquivo para escrevê-lo. Você pode tentar, por exemplo (onde contém o código abaixo):script.py

Converta uma página da Web em PDF: Baixe um PDF: python3 script.py https://arxiv.org/abs/1912.11035 a.pdf``python3 script.py https://arxiv.org/pdf/1912.11035 b.pdf

O primeiro exemplo (converter uma página da Web em um PDF) é o resultado:
goto success: https://arxiv.org/abs/1912.11035
download exception: https://arxiv.org/abs/1912.11035: Page closed
Future exception was never retrieved
future: <Future finished exception=Error('Target page, context or browser has been closed')>
playwright._impl._api_types.Error: Target page, context or browser has been closed
Não consigo descobrir como parar o aviso de "Exceção do Futuro nunca foi recuperada" sendo impresso. Como você pode ver, a exceção "Página fechada" foi capturada no manipulador de exceção para .await download_task

Estou fazendo algo errado? Ou isso é um problema no código do dramaturgo?
import sys
import asyncio
from playwright.async_api import async_playwright

async def download(url, filename):
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        context = await browser.new_context(accept_downloads = True, java_script_enabled = False)
        page = await context.new_page()

        download_task = asyncio.create_task(page.wait_for_event('download'))
        goto_task = asyncio.create_task(page.goto(url, wait_until='networkidle'))
        try:
            await goto_task
            await page.pdf(path=filename)
            print('goto success: ' + url)
            await page.close()
            success = True
        except Exception as e:
            print('goto exception: {}: {}'.format(url, e))

        try:
            download = await download_task
            await download.save_as(filename)
            print('download success: ' + url)
            await page.close()
            success = True
        except Exception as e:
            print('download exception: {}: {}'.format(url, e))

        if not success:
            await page.close()

if __name__ == '__main__':
    asyncio.run(download(sys.argv[1], sys.argv[2]))

**seu cod:

    except Exception as e:
     print('download exception: {}: {}'.format(url, e))

remover o aviso
2. except:
print('download exception: {}: {}'.format(url))**

mxschmitt added the P2-bug label Jul 27, 2021

mxschmitt mentioned this issue Jul 27, 2021

fix: migrate to new waitForEventInfo handling #824

Merged

mxschmitt closed this as completed in #824 Jul 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future exception was never retrieved when closing page waiting for download event #823

Future exception was never retrieved when closing page waiting for download event #823

tcrs commented Jul 27, 2021

mxschmitt commented Jul 27, 2021

tcrs commented Jul 28, 2021

mkbdes commented Aug 9, 2022

Future exception was never retrieved when closing page waiting for download event #823

Future exception was never retrieved when closing page waiting for download event #823

Comments

tcrs commented Jul 27, 2021

mxschmitt commented Jul 27, 2021

tcrs commented Jul 28, 2021

mkbdes commented Aug 9, 2022