Skip to content
This repository has been archived by the owner on Sep 27, 2019. It is now read-only.

[15721] Compiled Catalog Query Access #1339

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

Conversation

nwang57
Copy link
Contributor

@nwang57 nwang57 commented May 4, 2018

We enabled complied catalog lookup in the first pull request. Based upon this, we further support complied insert and delete queries for all catalogs. We also fix bugs for the following issues:

  • QueryCache Bug #1298 Need to manually bind the tuple value expression so that equality checks will work correctly.

  • Sequential scan assumes that the output columns should start at offset 0 otherwise PerformBinding will not correctly find the corresponding column attributes.

  • ZoneMapCatalog needs special care to avoid chicken and egg problem. Each sequential plan needs to check the zone map to know whether to scan the tile or not, but checking the zone map requires ZoneMap catalog access which leads to an infinity loop. So we ensure in the ZoneMap catalog manager that if the scanning table is ZoneMap catalog then it will just scan it without checking the ZoneMap

nwang57 and others added 7 commits May 2, 2018 23:01
add in predicate

fix

enable debug

enable debug logger

use seq scan for get table

test compile

fix unused

write table object constructor

fix bug

query execute

fix bug

fix bug

fix bug

fix query param

print wrapped tuple

fix bug

cache

cache

fix try

fix shared pointer

clean up

performance test

GetDatabaseObject

column_catalog.pp/h

database_catalog

fix seqscan

fix

db_catalog

db_catalog

fix bug

db_catalog

column_catalog.cpp/h

index catalog

lan catalog

column stat

lang catalog

settings catalog

trigger catalog

zone map catalog

proc_catolog.cpp/h

delete code

clean up db and trigger

cleanup settings catalog

delete code

restore catalog test

format

fix include format

fix bug in scan plan by prashanth

add index test

fixed binding for tuple_value_expression, and changed query_cache_test

added Insert and Delete with Compiled Query in abstract_catalog and table_catalog

compiled seq plan for table catalog by looking up table_oid

fix trigger

changed trigger_test, changed the wrong assumption that triggers are in a certain order

fix settings catalog

query metrics catalog

query metrics catalog

Changed zone_map_catalog, having issue running zone_map_scan_test

using expressionPtr

Edited cloumn_catalog, index_catalog, proc_catalog, settings_catalog
Added Insert and Delete with Compiled Query
Fixed Binding for TupleValueExpression

database catalog insert

database catalog bound

index metrics catalog insert

query history catalog

table metrics insert

database catalog delete

modify catalog inserts

change to complied insert plan

intex metrics catalog delete

table metrics catalog delete

update catalog to use compiled delete plan

fixed code review addressed issue

trigger catalog bound

table catalog bound

language catalog bound

added insert, delete and bouding for zone_map_catalog

clean up

deleted redundant comments in index_catalog

index cache uncomment

fix proc catalog

fix proc catalog

added bind in language_catalog's delete

add comment to zone map manager
* Adding new mapping table

* Revert "Fixing non-unique key insert problem"

This reverts commit 4267752.

* Revert LOG_INFO to LOG_TRACE

* Fix segment fault problem by moving munmap() to after ~EpochManager()

* Avoid compiler error

* Enhance log message for mmap()'ed mapping table
Copy link
Contributor

@db-ol db-ol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure header files newly added are necessary. And optimizations can be made in the engineering side.

* @param insert_values tuples to be inserted
* @param txn TransactionContext
* @return Whether insertion is Successful
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned last time, move comments to header files.

* @param predicate Predicate used in the seq scan
* @param txn TransactionContext
* @return Whether deletion is Successful
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, move comments to header files.

* @param txn TransactionContext
*
* @return Unique pointer of vector of logical tiles
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move comments to header files.

* @param column_offsets columns used for seq scan
* @param predicate Predicate used in the seq scan
* @return true if successfully executes
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move comments to header files.

auto constant_expr_7 = new expression::ConstantValueExpression(
val7);

tuples.push_back(std::vector<ExpressionPtr>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use emplace_back instead of push_back as the former would construct the object immediately.

config_value = (*result_tiles)[0]->GetValue(0, 0).ToString();
}
PELOTON_ASSERT(result_tuples.size() <= 1);
if (result_tuples.size() != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use empty method to check for emptiness.

config_value = (*result_tiles)[0]->GetValue(0, 0).ToString();
}
PELOTON_ASSERT(result_tuples.size() <= 1);
if (result_tuples.size() != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use empty method to check for emptiness.

// EXPECT_EQ(nullptr, table_object_1);
// txn_manager.CommitTransaction(txn);
//}
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have problems passing this test? Are you going to uncomment it later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test we create for #1336 . Now it has been fixed.

@@ -24,6 +24,7 @@
#include "storage/storage_manager.h"
#include "type/ephemeral_pool.h"
#include "sql/testing_sql_util.h"
#include "common/timer.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to include this header file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for performance testing. We plan to remove it later.

@@ -17,7 +17,7 @@
// 0: oid (pkey)
// 1: tgrelid : table_oid
// 2: tgname : trigger_name
// 3: tgfoid : function_oid
// 3: tgfoid : function_name
Copy link
Contributor

@db-ol db-ol May 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should tgfoid be changed if function_oid was meant to be function_name?

codegen::BufferingConsumer buffer{{}, context};


bool cached;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to define cached below where it's first assigned to a value.


codegen::BufferingConsumer buffer{column_offsets, context};

bool cached;
Copy link
Contributor

@db-ol db-ol May 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the cached definition to where it's first assigned.


// Create consumer
codegen::BufferingConsumer buffer{column_offsets, scan_context};
bool cached;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the cached definition to where it's first assigned.


codegen::BufferingConsumer buffer{column_offsets, context};

bool cached;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the cached definition to where it's first assigned.


// execute SELECT a FROM table where a == 40;
codegen::BufferingConsumer buffer_1{{0, 1}, context_1};
bool cached;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the cached definition to where it's first assigned.

@gvos94 gvos94 assigned gvos94 and unassigned gvos94 May 10, 2018
@gvos94 gvos94 self-requested a review May 10, 2018 21:14
@@ -142,9 +145,9 @@ bool ColumnCatalog::InsertColumn(oid_t table_oid,
const std::vector<Constraint> &constraints,
type::AbstractPool *pool,
concurrency::TransactionContext *txn) {
(void) pool;
Copy link

@gvos94 gvos94 May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could probably use the "UNUSED ATTRIBUTE" specifier instead.

The same applies to a bunch of other places where "(void) attr" has been used to pacify the compiler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we would want to keep the parameter if we are not using it at all.

values.emplace_back(new expression::ConstantValueExpression(
val11));
values.emplace_back(new expression::ConstantValueExpression(
val12));

Copy link

@gvos94 gvos94 May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the multiple mallocs invocations can be coalesced using an array of objects.


auto *table_oid_expr =
new expression::TupleValueExpression(type::TypeId::INTEGER, 0,
ColumnId::TABLE_OID);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the allocated memory has been freed. Irrespective, you should probably use smart pointers instead of working with raw pointers. The same applies to a bunch of other places in the code where raw pointers could be replaced with smart pointers.

most_common_freqs = tuple.GetValue(ColumnId::MOST_COMMON_FREQS);
hist_bounds = tuple.GetValue(ColumnId::HISTOGRAM_BOUNDS);
column_name = tuple.GetValue(ColumnId::COLUMN_NAME);
has_index = tuple.GetValue(ColumnId::HAS_INDEX);
Copy link

@gvos94 gvos94 May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pooja mentioned in the last meeting that tuple.GetValue() is going to deprecated. However, I vaguely remember her mentioning that it could still be used with catalog tables. You might want to talk to her/Andy about this.

@@ -2864,6 +2866,21 @@ class BwTree : public BwTreeBase {
* the mapping table rather than CAS with nullptr
*/
void InitMappingTable() {
mapping_table = (std::atomic<const BaseNode *> *) \
mmap(NULL, 1024 * 1024 * 1024,
Copy link

@gvos94 gvos94 May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The value (1024 x 1024 x 1024) shouldn't be hardcoded (the MAPPING_TABLE_SIZE is #defined to 1MB).

  2. InitMappingTable() seems to be invoked from the bwtree constructor - is 1GB mmaped for every index created? If so, this seems like an overkill.

// NOTE: Only unmap memory here because we need to access the mapping
// table in the above routine. If it was unmapped in ~BwTree() then this
// function will invoke illegal memory access
int munmap_ret = munmap(tree_p->mapping_table, 1024 * 1024 * 1024);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, a hard-coded value shouldn't be used.

@@ -107,6 +107,7 @@ class ColumnStatsCatalog : public AbstractCatalog {

private:
ColumnStatsCatalog(concurrency::TransactionContext *txn);
std::vector<oid_t> all_column_ids = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

Copy link

@gvos94 gvos94 May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the purpose of all_column_ids is to use it with a copy constructor to construct a vector with all the column oid's.

Firstly, I don't think you need to declare an all_column_ids vector, it can be easily constructed explicitly at runtime. Moreover hardcoding the values it is not a good idea since if anyone changes the schema of the catalog table, he/she must also remember to update the vector declaration.

A better way to construct this at runtime without breaking abstractions is by using the "NumColumns" method and filling the vector using std::iota.

The same applies to every declaration of all_columns_ids and column_ids.

type::Value* getDefaultValue() {
return default_value.get();
}
type::Value *getDefaultValue() { return default_value.get(); }

Copy link

@gvos94 gvos94 May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be made inline.

TYPE_OFF = 2
};
enum ZoneMapOffset { MINIMUM_OFF = 0, MAXIMUM_OFF = 1, TYPE_OFF = 2 };

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why a strongly typed enum has been replaced by an old-style enum. It's better to stick to stongly typed enums.

The same applies to a bunch of places where old-style enums have been used.

val6);
auto constant_expr_7 = new expression::ConstantValueExpression(
val7);

Copy link

@gvos94 gvos94 May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the number of objects to be allocated is known at compile time. Multiple malloc invocations can be replaced by a single invocation by allocating an array of objects with parameterized constructor.

eg: arr_constant_expr = new expression::ConstantValueExpression[8]{{val0},{val1}...{val7}}

This same applies to a bunch of other places where the number of objects to be allocated is known at compile time.

Copy link

@gvos94 gvos94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the code looks clean, other than some minor changes. Good job on including the doxygen comments.


// search for query
codegen::Query *query = codegen::QueryCache::Instance().Find(insert_plan);;
std::unique_ptr<codegen::Query> compiled_query(nullptr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This object compiled_query is never used if the query is cached. Why create it every time? Can be moved inside the if block

query->Execute(std::move(executor_context), buffer,
[&ret](executor::ExecutionResult result) { ret = result; });

return ret.m_result == peloton::ResultType::SUCCESS;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the query in our cache even if it fails?
I am not sure if this is an overkill or even correct but we could probably keep only the compiled queries that return success in our cache

new executor::ExecutorContext(txn, std::move(parameters)));

// search for query
codegen::Query *query = codegen::QueryCache::Instance().Find(insert_plan);;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make this a function and reuse the code instead of re writing it for every other function. Something like GetCompiledQuery() which searches for the query in the cache and if not found, compiles it and adds it to the cache.

We can even make it a macro or an inline function if it is on the critical path and we want to save a function call


size_t column_count = catalog_table_->GetSchema()->GetColumnCount();
for (size_t col_itr = 0; col_itr < column_count; col_itr++) {
// Skip any column for update

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this comment means here. Could you please help me understand what this means

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is only looking for the columns in the tuple which need to be updated. For more detailed explanation you can turn to @mengranwo

: table_oid(tile->GetValue(tupleId, ColumnCatalog::ColumnId::TABLE_OID)

ColumnCatalogObject::ColumnCatalogObject(codegen::WrappedTuple wrapped_tuple)
: table_oid(wrapped_tuple.GetValue(ColumnCatalog::ColumnId::TABLE_OID)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by @gandeevan, GetValue is deprecated now. Please talk to @poojanilangekar to make sure you can use this for a wrapped tuple object safely

tuple->SetValue(ColumnId::COLUMN_NAME, val_column_name, pool);
tuple->SetValue(ColumnId::HAS_INDEX, val_has_index, nullptr);
tuples.emplace_back();
// tuples.push_back(std::vector<ExpressionPtr>());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code

LanguageCatalogObject::LanguageCatalogObject(executor::LogicalTile *tuple)
: lang_oid_(tuple->GetValue(0, 0).GetAs<oid_t>()),
lang_name_(tuple->GetValue(0, 1).GetAs<const char *>()) {}
LanguageCatalogObject::LanguageCatalogObject(codegen::WrappedTuple tuple)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to make a copy of the WrappedTuple parameter? Wouldn't using a reference suffice?


namespace peloton {
namespace catalog {

#define PROC_CATALOG_NAME "pg_proc"

ProcCatalogObject::ProcCatalogObject(executor::LogicalTile *tile,
ProcCatalogObject::ProcCatalogObject(codegen::WrappedTuple wrapped_tuple,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Is a reference not good enough? This applies to all such uses in other constructors as well

GetResultWithCompiledSeqScan(column_ids, predicate, txn);

// carefull! the result could be null!
LOG_INFO("size of the result tiles = %lu", result_tuples.size());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed? Should it be LOG_INFO?

@@ -40,7 +40,7 @@ namespace catalog {

class ColumnCatalogObject {
public:
ColumnCatalogObject(executor::LogicalTile *tile, int tupleId = 0);
ColumnCatalogObject(codegen::WrappedTuple wrapped_tuple);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a reference instead

Copy link

@latelatif latelatif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the test cases fail when running make check

The following tests FAILED:
52 - stats_test (Failed)
54 - zone_map_scan_test (Failed)
Errors while running CTest
make[3]: *** [test/CMakeFiles/check] Error 8
make[2]: *** [test/CMakeFiles/check.dir/all] Error 2
make[1]: *** [test/CMakeFiles/check.dir/rule] Error 2
make: *** [check] Error 2
akanjani@dev3:~/review/peloton/build$ git status
On branch pcq-cp2
Your branch is up-to-date with 'origin/pcq-cp2'.

nothing to commit, working directory clean
akanjani@dev3:~/review/peloton/build$

Zeninma added 5 commits May 12, 2018 16:58
1. valgrind memory leak during fillepredicatearray
2. uninitialized Value being moved/destroyed - this has been solved
3. Concurrency issue, need to put an issue on this
…alue to pass by reference. Also add UNUSED_ATTRIBUTE instead of (void).
@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 77.595% when pulling 89161d8 on nwang57:pcq-cp2 into 5686479 on cmu-db:master.

@apavlo
Copy link
Member

apavlo commented Jun 21, 2018

I am reviving this PR. We will need this when we get rid of the interpreted engine. It also has bug fix for #1362

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants