Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue 2176 Time service removal #2182

Closed

Conversation

greg-barnett
Copy link

removed TimeService, which had bugs causing it to return very incorrect dates.

It has been mentioned that TimeService was written as a performance optimization, but I have not been able to recreate any conditions in which the very small performance improvement would be noticeable.

Greg Barnett added 5 commits August 7, 2017 12:24
The main bug was that ceil was called on time returned by loop.time()
before being used to schedule the next iteration of the callback that
updated the TimeService.  The TimeService was then incrementing its
stored time by one second.  This resulted in the time being updated by
one second about every 1.5 seconds, quickly causing it to fall behind.

There was code to reset TimeService's internal time every 10 minutes,
but that has been removed.  Benchmarking has shown that just calling
time.time() has no statistically valid performance difference than the
code that checked for the need to reset.  Even if it were slightly
slower, it is only called once per second. Also, the counter that was
used to compare to the reset limit was never incremented, so the reset
never actually ever happened.

I was tempted to remove the properties for accessing time and loop time,
as time.time() and loop.time() are not expensive calls.  But I didn't
want to break the API for anyone using this.  The only significant
performance gain I was able to find was with strtime.  My use case
doesn't push aiohttp very hard, so I can't replicate whatever
performance issue TimeService was intended to solve.
More simplification of TimeService, only used to cache formatted date
slightly faster and cleaner Date formatting code
removed TimeService in favor of a simple strftime call
@fafhrd91
Copy link
Member

fafhrd91 commented Aug 8, 2017

-1 we should cache date value

@codecov-io
Copy link

codecov-io commented Aug 8, 2017

Codecov Report

Merging #2182 into master will decrease coverage by <.01%.
The diff coverage is 87.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2182      +/-   ##
==========================================
- Coverage   97.12%   97.11%   -0.01%     
==========================================
  Files          39       39              
  Lines        7884     7864      -20     
  Branches     1366     1368       +2     
==========================================
- Hits         7657     7637      -20     
  Misses        101      101              
  Partials      126      126
Impacted Files Coverage Δ
aiohttp/test_utils.py 98.59% <ø> (-0.03%) ⬇️
aiohttp/web_server.py 100% <ø> (ø) ⬆️
aiohttp/web.py 99.65% <ø> (ø) ⬆️
aiohttp/hdrs.py 100% <100%> (ø) ⬆️
aiohttp/web_protocol.py 88.27% <100%> (+0.46%) ⬆️
aiohttp/web_request.py 99.67% <100%> (+0.66%) ⬆️
aiohttp/web_response.py 98.69% <100%> (-0.02%) ⬇️
aiohttp/helpers.py 96.33% <82.6%> (-1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dda632b...ba3b7ab. Read the comment docs.

@greg-barnett
Copy link
Author

@fafhrd91 If my benchmarking is correct, caching the formatted date string saves approximately
0.00000142 seconds per request (will vary on hardware of course). That is a small enough amount of time that I don't think it will ever be noticed.

assert time_service._strtime is None
assert time_service._time > time
assert time_service._count == 0
def time_service():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed some dead code in tests, have removed it.

@fafhrd91
Copy link
Member

fafhrd91 commented Aug 8, 2017

oh, so that means all my assumptions about performance was wrong. all my performance related work is probably wrong as well, nice :)

@asvetlov comments?

@greg-barnett
Copy link
Author

@fafhrd91 Do you have any notes on the benchmarking/profiling you did? If you do, I can try to replicate your results. There might be some case where time.time() or time.gmtime() is slow, in which case TimeService/caching would make sense.

@fafhrd91
Copy link
Member

fafhrd91 commented Aug 8, 2017

I don't have any notes, that was relatively long time ago. but @asvetlov came to similar conclusion, he added TimeService.

@greg-barnett
Copy link
Author

greg-barnett commented Aug 8, 2017

If it matters, the strftime format I'm using is more performant than the one used in TimeService. Don't have the numbers handy right now, but it ran about 30-40% faster. After I tried that, I noticed that the same format was used elsewhere, so I made it into a constant. hdrs.RFC822_FORMAT

@fafhrd91
Copy link
Member

fafhrd91 commented Aug 8, 2017

I am fine with removing cache. just waiting for @asvetlov to approve.

@asvetlov
Copy link
Member

asvetlov commented Aug 10, 2017

IIRC adding time service reduced response time by at about 10% for cases like

async def handler(request):
    return web.Response('OK')

Your number is 0.00000142 sec per request for time service disabled.
Now aiohttp server shows about 16_000 RPS for handling trivial requests.
0.00000142 * 16000 == 0.02272, about 2% of response time is spent on date string formatting.

It's much less than 10% but still significant degradation.
Honestly I don't recall what numbers was at time of adding time service.

But your proposal to use RFC822_FORMAT = '%a, %d %b %Y %H:%M:%S GMT' as time format is wrong.
Formatting used by aiohttp is borrowed from wsgiref module IIRC.
It explicitly uses English names for weekdays and months but %a is Locale’s abbreviated weekday name.
On servers with non-English locale it produces incorrect string.

Maybe we need resurrection of #2180 but I still don't like global vars, we could rewrite TimeService class to use the approach -- yet another syscall during response processing is cheap but string formatting is expensive.

@fafhrd91
Copy link
Member

I think global var is fine, it is private. Removing time service reduces amount of doc we need

@asvetlov
Copy link
Member

We don't mention TimeService in docs, it's totally private API.
Why not just drop updating on call_later and use time.time?

Anyway, if @fafhrd91 @socketpair and other committers prefer globals in this particular case -- I can live with it.
#2180 changes private API only, we could change it again without breaking our users in next releases.

@fafhrd91
Copy link
Member

fafhrd91 commented Aug 10, 2017

it is part of WebRequest api, technically we should document it.

@asvetlov
Copy link
Member

asvetlov commented Aug 10, 2017

@fafhrd91 really? Why?
I have strong feeling either TimeService nor get_formatted_datetime from #2180 should be documented.
We could add underscore prefix to names if needed but users should never import these names explicitly -- we don't provide them in __all__ for aiohttp and aiohttp.web.

@socketpair
Copy link
Contributor

@greg-barnett Could you benchmark old and new code in same environment ?
0.00000142 got from one computer, and 16_000 RPS got from another.

@fafhrd91
Copy link
Member

@asvetlov time_service is public attribute.

I'd prefer to replace class with one function if we touch it in any case. Amount of functionality is too small for whole class. But I am fine with rather decision

@asvetlov
Copy link
Member

@fafhrd91 I see your point now.
After your split web_reqrep.py into web_request.py and web_response.py the history is lost, I'm too lazy for digging deeper.
Maybe I've added the property but maybe you.
Most likely it's my change.

Nevermind, the property should be deleted. Fortunately it's not documented.

I'm fine with rather decision too.

@socketpair please reopen #2180
Wait for @greg-barnett activity for a week (we don't schedule next release yet, a week is good enough for getting feedback).
If he explicitly argue against the proposal -- let's discuss.
If not -- please merge your PR.

@socketpair
Copy link
Contributor

socketpair commented Aug 10, 2017

It's better to fix current PR (add caching) than edit mine. Well, I will mix them up tomorrow.

@greg-barnett
Copy link
Author

I'm probably not going to have time to work on this for at least a few days.

If the solution is going to involve global variables, I'd also prefer that someone else do the commit. I don't want someone to see a global, do a a git blame, and then go away with a bad opinion of me :)

@asvetlov I missed that it was using Locale's names. So, in addition to TimeService, the existing code for headers where I replaced format string with the constant will also need to be fixed.

I suspect that the proper way to do this is something like the following (completely untested)

class TimeService:
    def __init__(self):
        if locale.get_locale(LC_TIME) in LOCALES_WITH_ENGLISH_MONTH_AND_DAY_NAMES:
            self._format_method = self._rfc822_fast
        else:
            self._format_method = self._rfc822_safe

        self._formatted_now = None
        self._time = None

    def rfc822_format(timestamp):
        return self._format_method(timestamp)

    def rfc822_now():
         now = time.time()
        if now != self._time:
            self._time = now
            self._formatted_now = self._format_method(self._time)
        return self._formatted_now

    def _rfc822_fast(timestamp):
         return strtime(RFC822_FORMAT, timestamp)

    def _rfc822_safe(timestamp):
         # use the slightly slower formatting code from existing TimeService

@asvetlov
Copy link
Member

@greg-barnett I like your sketch.
Let's modify time service behavior but not add global vars.
I hate them, @fafhrd91 agree with any decision, @socketpair has no strong objection at least.
Perhaps we need a new PR.
@greg-barnett @socketpair I hope you'll find a time for working on it soon.
I love to get the problem fixed but personally prefer to work on other issues.

@socketpair
Copy link
Contributor

I have nechmarked .strftime() approach and one, used in original code (with tuple lookup). And guess what ? .strftime() is SLOWER (!)

In [29]: %time xxx()
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 28.1 µs
%time time.strftime('%a, %d %b %Y %H:%M:%S GMT')
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 38.6 µs
def xxx():
   ...:     year, month, day, hh, mm, ss, wd, y, z = time.gmtime(time.time())
   ...:     return "%s, %02d %3s %4d %02d:%02d:%02d GMT" % (_weekdayname[wd], da
   ...: y, _monthname[month], year, hh, mm, ss)

@socketpair
Copy link
Contributor

I updated my PR with code borrowed from @greg-barnett. This PR probably could be closed.

@greg-barnett
Copy link
Author

@socketpair your PR doesn't cover the additional issue we've discovered: the locale dependent format being used for headers was already being used for last-modified headers in aiohttp/webresponse.py

This is what I believe to be the requirements to fix this issue:

  • Properly format all date/time header values to match rfc822.
  • Always use the current time for the Date header.
  • Either improve performance, or have only a trivial degradation.

These are not requirements, but they should probably be followed anyway:

  • not introduce any global variables
  • the code should be simpler if at all possible

My current plan, which I hope to have completed within the next 2 days.

Write benchmarking code that tests:

  • getting an rfc822 formatted time for the Date header
  • getting an rfc822 formatted time for the Last-Modified header, passing in a timestamp
  • how many requests per second for a GET request that returns Date header
  • how many requests per second for a GET request that returns both Date and Last-Modified headers

Run the benchmarking code twice, once in en_US, once in a locale where "%a" breaks rfc822 format.

I think my solution is going to be similar to the TimeService I posted in my last comment, but make the class a singleton (with a different name, probably RFC822DateFormatter), and have one public method in helpers.py that calls it. I'm thinking a singleton is less ugly than private global variables.

@greg-barnett
Copy link
Author

greg-barnett commented Aug 14, 2017

I created a very simple server, and ran apache bench against it.

All variations tried came in around 2,000 requests per second, with a fairly large amount of variation.

Code that attempted to pick a formatting strategy based on locale did not increase performance noticeably.

My latest version uses class variables, which are probably just as bad as global variables. @socketpair's code, if expanded to fix Last-Modified headers, is probably the way to go.

If my current approach is the way you want to go, I'll add some unit tests.

@asvetlov
Copy link
Member

asvetlov commented Oct 5, 2017

I believe it's fixed by #2180

@asvetlov
Copy link
Member

asvetlov commented Oct 5, 2017

@socketpair ?

@asvetlov asvetlov closed this Oct 5, 2017
@socketpair
Copy link
Contributor

Yes, TimeService is removed, so bug is closed. But I did not check if caching of datetime may be removed completely. Maybe it saves nano-seconds. I don't know.

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bot:chronographer:provided There is a change note present in this PR outdated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants