Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Table indexing issues #7

Open
alexswan10k opened this issue Dec 19, 2018 · 3 comments
Open

Azure Table indexing issues #7

alexswan10k opened this issue Dec 19, 2018 · 3 comments

Comments

@alexswan10k
Copy link

Hi,

Firstly, good stuff so far. This project is looking pretty interesting.

So, I am seriously considering using this, however, I have a couple of scalability concerns which I feel need addressing first. Let me clarify:

Azure table only indexes partition key and row key, which forms a sort of composite primary key. As I understand it there are no secondary indexes at all, so if I were to query a table field which is not partition or row key, I would effectively be running a 'partition scan' (a mini table scan).

Because of this, it is imperative to store (at least a copy) of the data in a way that is read-optimized if it is going to be used as a primary store. Having a dig through the source code it seems that there are basically two common types of query groups:

  • Get all events in stream

  • Get all events in stream where offset > myLocalOffset (there are others but this is most prevalent IMO)

  • Get a specific stream summary by streamId

  • Get all stream summaries (so I can determine if there are new events to accumulate over)

  • Get a specific stream summary (as above, probably more important in practice)

A couple of things jumped out at me -

  • If I want to query multiple stream summaries, I have to do a cross-partition query which is inefficient
  • If I want to do a range query, Position is not indexed, which means scanning all events every time.

The latter one worries me the most, and I think it can easily be fixed by simply rearranging the ID field with the Position field. As far as I can tell the event Ids are arbitrary and are rarely (if ever) queried against. Perhaps we can exchange the RowKey to actually be the position indexer and the Id as a secondary property? The benefit of this is range queries will be executed against a clustered index, and thus will be massively more efficient.

The summaries issue could be reduced by putting the summaries in a separate partition or similar where the RowKey is the streamId, although this has its own scalability issues if you have a lot of streams. Alternatively, you probably at least want an API that gets just a single stream by partition key and row key (stream id). This would be far more efficient than a cross partition query, and seems like the most common operation to me. I often know what I am projecting from, so to provide that explicitly in a query makes sense (although there is an argument to be made that I could just directly get the events after X, which will be 0 events most of the time).

I would be happy to contribute here or help out btw, just I am aware this is a massive breaking change for existing users so I didnt want to dive right in. Let me know if/how I can help though!

There is a very comprehensive article about all this indexing stuff, along with patterns of how to solve here:
https://docs.microsoft.com/en-gb/azure/cosmos-db/table-storage-design-guide#index-entities-pattern

@Dzoukr
Copy link
Owner

Dzoukr commented Dec 20, 2018

Hi @Metal10k,

thanks for looking at this!

I'll start with the Stream metadata/summary: They are stored in the same partition as rest of the Stream data because it is used also for optimistic concurrency. AFAIK Table Storage guarantees transactions in the same partition (Stream in this case), so I need to know that everything went ok within single transaction - storing events + setting metadata. Moving metadata to different partition could theoretically endup with having stream data and its metadata out of sync. And storing all streams to one row could hit the limit of row size.

About Position as RowKey, I agree, this would be probably better design and I was considering doing it like you said. But RowKey is forced to be string type, isn't it? How would it go with queries "position > X && position < Y", with sorting and so on? Frankly, I don't know.

But generally speaking, I don't have any problem with moving to version 2.0.0 with improved design. We just need:

  • be sure it's really worth it to do such breaking change
  • provide migration tool for already existing users + clear description of migration

Thanks again and let's discuss further

@alexswan10k
Copy link
Author

Thanks for getting back :)

I take your point on the stream metadata, that is sensible. Perhaps there should simply be an api to get a single metadata by PK-RK? At the moment there is no way to do this.

As for range rowkey queries, yes they are indeed strings, which is somewhat misleading. It turns out there is a trick to this. The way strings are indexed, bigger number strings actually have a larger byte value in the index (so 3 > 2, 3760 > 15 etc). This allows you to effectively fudge it, and get the same effect as a numerical index whilst actually using strings with numbers in them.

I have previously used a derivative of the below example in production to store time bound groups of documents. The specifics of my scenario was around creating an audit list where only the latest would ever be retrieved for a key. The actual row key comprised of a conceptual string key of my choosing, plus a unix timestamp. I could then ask for latest (they were implicitly ordered by RK, so i can just Top 1), or everything where (key > partialKey && key < partialKey + 1).

The principle is demonstrated more simply here:
Storing time series in table storage

Interestingly this also might resolve the ordering issue, where I will automatically get events in ascending order because the RK ensures they are stored in this way.

Let me know what you think.

@Dzoukr
Copy link
Owner

Dzoukr commented Jan 2, 2019

Hi again and happy new year!

I just added new function GetStream to get single metadata by StreamId which uses PK-RK combination so should be the fastest way.

About storing position as RK... I think we should try it (feel free to fork the project), see if it pass the already existing tests and then benchmark the diff between existing v1.* and possible v2 (I would use something like benchmarkdotnet.org or so). And based on that we will make final decision whether such breaking change makes sense. Based on how fast can table storage be, I still got feeling (not measured, so it is personal opinion only) that final diff will be smaller than we think. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants