-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding detection results in performance difference with other clients #1811
Comments
Not sure if If you need better performance and you know the page encoding you could use |
I easily fixed the problem by using |
@serathius could you check if cchardet is installed and uses c-extension |
The problem is: if But documentation update with mentioning possible performance issue and @serathius would you make a PR for doc update? |
@asvetlov Yes, after some more research and experiments with chardet |
do we have any actionable for this ticket? |
Long story short
When using aiohttp for fetching pages I found strange performance problems. We started comparing timings with other clients like curl and requests and difference was significant. For other clients fetching page was 6 times faster compared to aiohttp. After some digging we found that problem was using method "text" that was used encoding detection from chardet.
Requests is also using chardet, but the difference is that it's skipping it if content-type contains word "text" by using 'ISO-8859-1'. https://github.com/kennethreitz/requests/blob/master/requests/utils.py#L362
Expected behaviour
Matching behavior to other popular clients.
Actual behaviour
Huge performance hit for pages without explicit encoding. For example "Content-Type: text/html"
For 300kB pages time difference for using encoding and not is 8s to 2s respectively. (using method "text" instead of "read"), and for 1.3MB page difference is 33s to 4.5s.
Steps to reproduce
I'm sorry I cannot disclose the page that I used for testing.
Your environment
I tested it on two environments
Linux 4.8 Ubuntu 16.10 Python 3.6
aiohttp==1.0.5
chardet==2.3.0
Linux 4.8 Ubuntu 16.10 Python 3.5.2
aiohttp==2.0.6
chardet==3.0.1
^ for that environment problem was smaller by 30%
The text was updated successfully, but these errors were encountered: