Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory & redundant work for concurrent TimeZones construction #356

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

NHDaly
Copy link
Contributor

@NHDaly NHDaly commented Aug 19, 2021

This commit improves the thread-local caching scheme introduced in
#344, by sharing TimeZones
across all thread-local caches (so there's only ever one TimeZone
object instance created per process), and reduces redundant work caused
by multiple concurrent Tasks starting to construct the same TimeZone at
the same time.

Before this commit, multiple Tasks that try to construct the same
TimeZone on different threads would have constructed the same object
multiple times. Whereas now, they would share it, via the "Future".


It's difficult to show this effect, but one way to show it is by logging
when a TimeZone is constructed, via this diff:

diff --git a/src/types/timezone.jl b/src/types/timezone.jl
index 25d36c3..1cea69e 100644
--- a/src/types/timezone.jl
+++ b/src/types/timezone.jl
@@ -68,6 +68,7 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT))
     # Note: If the class `mask` does not match the time zone we'll still load the
     # information into the cache to ensure the result is consistent.
     tz, class = get!(_tz_cache(), str) do
+        Core.println("CONSTRUCTING $str")
         tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)

         if isfile(tz_path)

Before this commit, you can see that every thread constructs the object
twice - once before the clear and once after (total of 8 times):

julia> Threads.nthreads()
4

julia> TimeZones.TZData.compile();

julia> foo() = begin
           @sync for i in 1:20
               if (i == 10)
                   @info "---------clear-------"
                   TimeZones.TZData.compile()
               end
               Threads.@Spawn begin
                   TimeZone("US/Eastern", TimeZones.Class(:LEGACY))
               end
           end
           @info "done"
       end
foo (generic function with 1 method)

julia> @time foo()
[ Info: ---------clear-------
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
[ Info: done
  0.391298 seconds (1.51 M allocations: 64.544 MiB, 2.46% gc time, 0.00% compilation time)

After this commit, it's constructed only twice - once before the clear
and once after:

julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.414059 seconds (1.46 M allocations: 61.972 MiB, 4.55% gc time)

Finally, the other problem this avoids is if we ever accidentally
introduce a Task yield inside the constructor, which could happen if we
used @info instead of Core.println(), then without this PR, the old
code could potentially do redundant work to construct the TimeZone
multiple times - even on the same thread - since each Task's constructor
would see that there's no TZ in the cache, start the work, and then
yield to the next Task, which would do the same, until finally they all
report their work into the cache, overwriting each other.

This is what happens if we use @info in the above diff, instead:

Before this commit - the results are nondeterministic, but in this run,
you can see it redundantly constructed the value all 20 times!:

julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.494492 seconds (1.55 M allocations: 66.754 MiB, 16.67% gc time)

After this commit, just the two we expect. 😊 :

julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.422677 seconds (1.47 M allocations: 62.228 MiB, 4.66% gc time)

@NHDaly
Copy link
Contributor Author

NHDaly commented Aug 19, 2021

Sorry the diff is hard to read; it'll be easier after #344 is merged.

The entirety of this PR is just in this single commit, here: bd68cf9
You can view the diff by clicking that. :)


I think this PR is maybe worth merging separately, since it's nice to see the performance difference and keep it separate from the correctness of the previous PR. Also, we'll want to measure the CPU perf impact to make sure that it's okay before merging this one. I'm not sure how you were measuring it, @omus - can you run that measurement again, for consistency? Thanks!


CC: @Sacha0, @omus.

This commit improves the thread-local caching scheme introduced in
JuliaTime#344, by sharing TimeZones
across _all_ thread-local caches (so there's only ever one TimeZone
object instance created per process), and reduces redundant work caused
by multiple concurrent Tasks starting to construct the same TimeZone at
the same time.

Before this commit, multiple Tasks that try to construct the same
TimeZone on different threads would have constructed the same object
multiple times. Whereas now, they would share it, via the "Future".

------------------------------------------------------------

It's difficult to show this effect, but one way to show it is by logging
when a TimeZone is constructed, via this diff:
```julia
diff --git a/src/types/timezone.jl b/src/types/timezone.jl
index 25d36c3..1cea69e 100644
--- a/src/types/timezone.jl
+++ b/src/types/timezone.jl
@@ -68,6 +68,7 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT))
     # Note: If the class `mask` does not match the time zone we'll still load the
     # information into the cache to ensure the result is consistent.
     tz, class = get!(_tz_cache(), str) do
+        Core.println("CONSTRUCTING $str")
         tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)

         if isfile(tz_path)
```

Before this commit, you can see that every thread constructs the object
twice - once before the clear and once after (total of 8 times):
```julia
julia> Threads.nthreads()
4

julia> TimeZones.TZData.compile();

julia> foo() = begin
           @sync for i in 1:20
               if (i == 10)
                   @info "---------clear-------"
                   TimeZones.TZData.compile()
               end
               Threads.@Spawn begin
                   TimeZone("US/Eastern", TimeZones.Class(:LEGACY))
               end
           end
           @info "done"
       end
foo (generic function with 1 method)

julia> @time foo()
[ Info: ---------clear-------
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
[ Info: done
  0.391298 seconds (1.51 M allocations: 64.544 MiB, 2.46% gc time, 0.00% compilation time)
```

After this commit, it's constructed only twice - once before the clear
and once after:
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.414059 seconds (1.46 M allocations: 61.972 MiB, 4.55% gc time)
```

------------------------------------------------------------------

Finally, the other problem this avoids is if we ever accidentally
introduce a Task yield inside the constructor, which could happen if we
used `@info` instead of `Core.println()`, then without this PR, the old
code could potentially do _redundant work_ to construct the TimeZone
multiple times - even on the same thread - since each Task's constructor
would see that there's no TZ in the cache, start the work, and then
yield to the next Task, which would do the same, until finally they all
report their work into the cache, overwriting each other.

This is what happens if we use `@info` in the above diff, instead:

Before this commit - the results are nondeterministic, but in this run,
you can see it redundantly constructed the value all 20 times!:
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.494492 seconds (1.55 M allocations: 66.754 MiB, 16.67% gc time)
```

After this commit, just the two we expect. 😊 :
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.422677 seconds (1.47 M allocations: 62.228 MiB, 4.66% gc time)
```
I'm not really sure if this is necessary because probably no one is
using threading on 1.0 anyway, but i guess they could be doing something
with `@threads`, so better safe than sorry.

I didn't find `@lock` in Compat.jl, so this seemed like the easiest
approach?
src/types/timezone.jl Outdated Show resolved Hide resolved
@NHDaly NHDaly requested a review from omus August 23, 2021 19:27
@NHDaly
Copy link
Contributor Author

NHDaly commented Aug 25, 2021

@omus: I think this PR is ready for review whenever you get time. if there's anyone else who you'd prefer to do the review, just let me know. 👍 🙁

src/types/timezone.jl Outdated Show resolved Hide resolved
# Since we use thread-local caches, we spawn a task on _each thread_ to clear that
# thread's local cache.
Threads.@threads for i in 1:Threads.nthreads()
@assert Threads.threadid() === i "TimeZones.TZData.compile() must be called from the main, top-level Task."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this guaranteed when calling compile/_reset_tz_cache from the main task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember / understand the reason, but the Threads.@threads macro only works from the top-level task.

The behavior of @threads is that it evenly divides the for-loop across the number of threads, so if you have exactly nthreads() iterations, exactly one iteration will go on each thread.

help?> Threads.@threads
  Threads.@threads [schedule] for ... end

  A macro to parallelize a for loop to run with multiple threads. Splits the iteration space among multiple tasks and runs those tasks on threads according
  to a scheduling policy. A barrier is placed at the end of the loop which waits for all tasks to finish execution.

  The schedule argument can be used to request a particular scheduling policy. The only currently supported value is :static, which creates one task per
  thread and divides the iterations equally among them. Specifying :static is an error if used from inside another @threads loop or from a thread other than
  1.

  The default schedule (used when no schedule argument is present) is subject to change.

  │ Julia 1.5
  │
  │  The schedule argument is available as of Julia 1.5.

It only works from thread 1, for reasons i can't quite remember, but so this basically means you have to start it from the main Task. (I didn't remember that it was a "thread 1" requirement - i thought it was actually a "main Task" requirement.. i can consider changing the assertion message, perhaps? But i think it's easier guidance to say "don't call this concurrently, dude")

Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about being slow to review. Overall this is good just needs a couple of touch ups.

Just wanted to call out that the TimeZone mutex changes is essentially equivalent to a readers/writers lock similar to this implementation: 6522f7a. I'm just calling this out as a possible cleanup to this implementation in the future if a well tested readers/writer lock package emerges.

src/types/timezone.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
Copy link
Contributor Author

@NHDaly NHDaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @omus! Sorry it was me who took so long to get back to it this time 😁

Hope you've had a nice holiday! ⭐

src/types/timezone.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
@NHDaly
Copy link
Contributor Author

NHDaly commented Jan 2, 2022

Just wanted to call out that the TimeZone mutex changes is essentially equivalent to a readers/writers lock similar to this implementation: 6522f7a. I'm just calling this out as a possible cleanup to this implementation in the future if a well tested readers/writer lock package emerges.

FWIW, i still think this will be more performant than a reader/writer lock could be, since this kind of cache locking takes advantage of the cache being append-only, so that you don't need to lock at all on cache hits. I could imagine we could separate out this kind of thread-safe shared cache into a separate package though, so that others could benefit from this implementation 👍

@omus
Copy link
Member

omus commented Feb 17, 2022

Kick starting CI

@omus omus closed this Feb 17, 2022
@omus omus reopened this Feb 17, 2022
Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There definitely seems to be an issue with this implementation as the CI consistently stalls out and I see the same thing when running the tests locally. Some additional work needs to be done here before we can proceed

"Unable to find time zone \"$str\". Try running `TimeZones.build()`."
))
else
throw(ArgumentError("Unknown time zone \"$str\""))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exceptions while constructing will cause threads to be blocked upon waiting for a channel that will never be populated

@omus
Copy link
Member

omus commented Feb 17, 2022

@NHDaly sorry for not giving this the attention it deserves. Been swamped lately. I'll try to be faster on additional reviews

@NHDaly
Copy link
Contributor Author

NHDaly commented Feb 17, 2022

No worries at all - i fully believe that there are bugs in the implementation here.

We actually split this out into its own package, here: https://github.com/NHDaly/MultiThreadedCaches.jl

We found several bugs in the implementation while extracting it out, so the code is much cleaner now. I think this PR can wait until we scrub out all this and use that package instead! :)

It's also worth doing another comparison against a Read/Write Lock and also against a ConcurrentHashTable.

Feel free to review that package directly, if you'd like - most of the code was added in: JuliaConcurrent/MultiThreadedCaches.jl#1, plus a couple follow ups.

Thanks for your patience @omus, sorry for the delay on my end. ALSO i'm about to go on vacation for three weeks, so 👋 we can talk about this again soon! 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants