Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented pre-filtering procedure on block metadata #1503

Open
wants to merge 64 commits into
base: master
Choose a base branch
from

Conversation

realHannes
Copy link
Collaborator

What is added so far to this pull requst:

  1. The actual pre-filtering procedure is declared and defined in CompressedBlockPrefiltering.h / CompressedBlockPrefiltering.cpp
  2. The filtering procedure is extensively tested in CompressedBlockPrefilteringTest.cpp

realHannes and others added 30 commits April 25, 2024 19:07
Co-authored-by: Johannes Kalmbach <joka921@users.noreply.github.com>
Co-authored-by: Johannes Kalmbach <joka921@users.noreply.github.com>
Co-authored-by: Johannes Kalmbach <joka921@users.noreply.github.com>
Co-authored-by: Johannes Kalmbach <joka921@users.noreply.github.com>
Co-authored-by: Johannes Kalmbach <joka921@users.noreply.github.com>
Co-authored-by: Johannes Kalmbach <joka921@users.noreply.github.com>
Copy link

codecov bot commented Sep 18, 2024

Codecov Report

Attention: Patch coverage is 92.34973% with 14 lines in your changes missing coverage. Please review.

Project coverage is 88.29%. Comparing base (fb7b50b) to head (6c19bf6).

Files with missing lines Patch % Lines
src/index/CompressedBlockPrefiltering.cpp 91.87% 3 Missing and 10 partials ⚠️
src/global/ValueIdComparators.h 92.85% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1503      +/-   ##
==========================================
+ Coverage   88.27%   88.29%   +0.02%     
==========================================
  Files         361      363       +2     
  Lines       27198    27373     +175     
  Branches     3663     3692      +29     
==========================================
+ Hits        24009    24170     +161     
- Misses       1954     1958       +4     
- Partials     1235     1245      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this intial implementation,
I have left some comments, I have looked at everything but the unit tests so far.

src/index/CMakeLists.txt Outdated Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.h Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.h Outdated Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.h Outdated Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.h Outdated Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.cpp Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.cpp Outdated Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.cpp Outdated Show resolved Hide resolved
test/SparqlExpressionTestHelpers.h Outdated Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.h Outdated Show resolved Hide resolved
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really going into the right direction, keep up the good work.

test/CompressedBlockPrefilteringTest.cpp Outdated Show resolved Hide resolved
test/CompressedBlockPrefilteringTest.cpp Outdated Show resolved Hide resolved
test/CompressedBlockPrefilteringTest.cpp Outdated Show resolved Hide resolved
test/CompressedBlockPrefilteringTest.cpp Outdated Show resolved Hide resolved
test/CompressedBlockPrefilteringTest.cpp Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.h Outdated Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.h Outdated Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.cpp Outdated Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.cpp Outdated Show resolved Hide resolved
src/index/CompressedBlockPrefiltering.cpp Outdated Show resolved Hide resolved
Copy link

sonarcloud bot commented Oct 10, 2024

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some reviews,
This looks really good, I just have some questions about the complex setUnionimplementation, I'll think about it, you'll think about it, and then we'll find a solution:)

public:
virtual ~PrefilterExpression() = default;

virtual std::unique_ptr<PrefilterExpression> logicalComplement() const = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a short comment for this function please.

// to be smaller than Ids of all other types.
static auto getTiedMaskedTriple(const BlockMetadata::PermutedTriple& triple,
size_t ignoreIndex = 3) {
static const auto undefined = Id::makeUndefined();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a function Id::min() which does the same, but is clearer here.
otherwise you can static_assert that makeUndefined().getBits() == 0.

// HELPER FUNCTIONS
//______________________________________________________________________________
// Given a PermutedTriple retrieve the suitable Id w.r.t. a column (index).
static auto getIdFromColumnIndex(const BlockMetadata::PermutedTriple& triple,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function returns an ID, you can explicitly state the return type.

static const auto undefined = Id::makeUndefined();
switch (ignoreIndex) {
case 3:
return std::tie(triple.col0Id_, triple.col1Id_, triple.col2Id_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::tie returns a tuple of references, make sure that the lifetime works, or retunr std::tuple{...}

"values.");
}
// Check for order of IDs on evaluation Column.
if (checkInvalid(std::greater<>{}, evaluationColumn + 1)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need fully sorted blocks, so you can always pass 3 as the second argument.

getTiedMaskedTriple(block2.lastTriple_, evalCol);
} else {
return lastTripleBlock1 < firstTripleBlock2;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IU somehow think that you here are masing in the wrong direction.
Can't you just compare the full triples on the first block
block1.firstTriple < block2.firstTriple , and if they are equal you can even AD_CORRECTNESS_CHECK that then the last triples are also equal, because they are the same block (otherwise you have violated invariants).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you can use the logic you already have, but without any masking (which I think shouldn't be necessary and again also think that it is wrong.

Comment on lines +163 to +171
try {
expression->evaluate(input, evaluationColumn);
FAIL() << "Expected thrown error message.";
} catch (const std::runtime_error& error) {
EXPECT_EQ(std::string(error.what()), expectedErrorMessage);
} catch (...) {
FAIL() << "Expected an error of type std::runtime_error, but a "
"different error was thrown.";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have AD_EXPECT_THROW_WITH_MESSAGE int GtestHelpers.h for exactly this case.

using ad_utility::testing::VocabId;
using sparqlExpression::TestContext;
using namespace prefilterExpressions;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In test, move everything into the anonymouse namespace { <everything here } to avoid linker problems with common names in different tests.

{blocks.b1, blocks.b2, blocks.b5, blocks.b6, blocks.b7, blocks.b8,
blocks.b9, blocks.b10, blocks.b11, blocks.b12, blocks.b13, blocks.b14});

testExpression(2, IntId(-12), blocks.blocks, {blocks.b7, blocks.b14});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 and the blocks.blocks are always redundant, write a common lambda that factors out that code, such that only the "moving parts" are visible (same for all tests below probably)

Comment on lines +292 to +296
{blocks.b5, blocks.b6, blocks.b7, blocks.b11, blocks.b12,
blocks.b13, blocks.b14});
testExpression(2, IntId(100), blocks.blocks,
{blocks.b1, blocks.b2, blocks.b3, blocks.b4, blocks.b5,
blocks.b6, blocks.b7, blocks.b8, blocks.b9, blocks.b10,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if you write this test inside a test fixture (TEST_F in Gtest, just google it),which has the blocks b1 - b... directly as members you can always write b1 instead of blocks.b1 which also increases the readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants