Skip to content
This repository has been archived by the owner on Sep 27, 2019. It is now read-only.

[15721] Index Suggestion #1347

Open
wants to merge 182 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 160 commits
Commits
Show all changes
182 commits
Select commit Hold shift + click to select a range
d18033d
added the files for cost evaluation
pbollimp Mar 29, 2018
5fdadea
llvm for mac
vkonagar Mar 29, 2018
ec6c94b
Basic classes
sivaprasadsudhir Mar 30, 2018
492b95f
added the configuration enumeration files
pbollimp Mar 30, 2018
8410136
Add Whatif API
vkonagar Mar 30, 2018
96eadf4
Add optimizer cost query func skeleton
vkonagar Mar 30, 2018
9087931
Complete what if API implementation. Testing pending.
vkonagar Apr 5, 2018
0908588
Ignore query planning
vkonagar Apr 5, 2018
5e2cbff
Analyze tables was missing. Fixed it
vkonagar Apr 6, 2018
fcfe058
fix the query
vkonagar Apr 6, 2018
04e49f8
add comments, fix some code style
vkonagar Apr 6, 2018
d62462b
Fix whatif API test
vkonagar Apr 8, 2018
2e19c1c
run formatter
sivaprasadsudhir Apr 8, 2018
ac653aa
Add index selection module skeleton
vkonagar Apr 9, 2018
4d44009
skeleton for admissible column parsing
vkonagar Apr 9, 2018
371fd38
adding cost model classes
sivaprasadsudhir Apr 9, 2018
c23cc36
cleanup and reorganize the code
sivaprasadsudhir Apr 10, 2018
4d694ec
Intermediate changes. Query parser not complete.
vkonagar Apr 10, 2018
a51fe84
Intermediate changes. Query parser not complete.
vkonagar Apr 10, 2018
d043128
removed cost model class
sivaprasadsudhir Apr 11, 2018
32f9040
Add IndexObject Pool
vkonagar Apr 11, 2018
324e430
Memoization support completed
sivaprasadsudhir Apr 11, 2018
5978d32
Complete query parser
vkonagar Apr 11, 2018
a24ded7
Complete query parser
vkonagar Apr 11, 2018
11bc159
multi column index, wip
sivaprasadsudhir Apr 11, 2018
e0cac79
Add tests for admissible indexes
vkonagar Apr 11, 2018
83c1b44
Fix what if index and admissive indexes test
vkonagar Apr 11, 2018
1e5925c
added outline for naive enumeration method
pbollimp Apr 11, 2018
4b463dc
Fix get admissible indexes test
vkonagar Apr 11, 2018
96a41b1
Fix get admissible indexes test
vkonagar Apr 11, 2018
12a343a
Added the IndexConfiguration set difference
pbollimp Apr 11, 2018
e98461a
Minor BUg Fix
sivaprasadsudhir Apr 11, 2018
1ec6f55
Split computing and getting const
sivaprasadsudhir Apr 11, 2018
d23d0dc
Fix compilation error and typos
vkonagar Apr 11, 2018
a94cac9
Finish Configuration Enumeration module
pbollimp Apr 11, 2018
11adba0
Fix the main index selection algorithm
vkonagar Apr 11, 2018
4c8dce7
Finish Merging
pbollimp Apr 12, 2018
6f67e0c
Merge
vkonagar Apr 12, 2018
aa63a5f
cleanup
sivaprasadsudhir Apr 12, 2018
f8a8180
Restructure code
vkonagar Apr 12, 2018
b619333
More refactoring
vkonagar Apr 12, 2018
d01d018
added comments to index selection context
sivaprasadsudhir Apr 12, 2018
d9d0cfc
Added the comparator for the candidate index enumeration
pbollimp Apr 12, 2018
d984e89
Adding comments
pbollimp Apr 12, 2018
11fdce2
Restructure generate candidate indexes
vkonagar Apr 12, 2018
afa1582
Fix merge
vkonagar Apr 12, 2018
3178695
partial test for multi columnindex generation
sivaprasadsudhir Apr 12, 2018
5f4a822
Add candidate index gen test
vkonagar Apr 12, 2018
fd2de46
Minor change to ComputeCost. Formatting and comments.
pbollimp Apr 12, 2018
3db49a7
Add comments
vkonagar Apr 12, 2018
b7c4f9c
comments
sivaprasadsudhir Apr 12, 2018
756ecb8
More formatting and comments.
pbollimp Apr 12, 2018
0d336d0
more comments
vkonagar Apr 12, 2018
f58cf77
brief comments.
pbollimp Apr 12, 2018
213a351
rename pl_assert to peloton_assert
sivaprasadsudhir Apr 12, 2018
e846956
Remove GetCost and rename ComputeCost to GetCost
pbollimp Apr 12, 2018
85705dd
fix multicolumnindex generation
sivaprasadsudhir Apr 12, 2018
920083a
minor fixes
sivaprasadsudhir Apr 12, 2018
93b2214
Fix admissible index and candidate pruning tests
vkonagar Apr 13, 2018
e3b43d0
Fix unused variables
vkonagar Apr 13, 2018
c907ef3
Add more tests to WhatIfAPI and IndexSelection
vkonagar Apr 16, 2018
342f6a3
Implement the suggestions mentioned in the code review
vkonagar Apr 16, 2018
c54f4e0
Uncomment the choose best plan call
vkonagar Apr 16, 2018
39259fb
Fix tests
vkonagar Apr 23, 2018
f323ed9
Add support for multi-column index
chenboy Apr 1, 2018
6330ab6
Fix conflicts after merge
chenboy May 2, 2018
b291f58
nit fixes
sivaprasadsudhir May 3, 2018
f4ce787
Fix what-if index tests
vkonagar May 4, 2018
c6915f7
Add more multi-column index sets in the test cases.
vkonagar May 4, 2018
49b95df
Add testing utility class for index suggestion tests
vkonagar May 4, 2018
a6da36d
Add to cmake for the files in the previous commit
vkonagar May 4, 2018
01c994e
Modify what-if tests to use the utility class
vkonagar May 4, 2018
e1dad43
Fix formatting
vkonagar May 4, 2018
90e7d65
Code review fix
vkonagar May 4, 2018
57c1c83
fix tests
sivaprasadsudhir May 4, 2018
4b4e256
nit
sivaprasadsudhir May 4, 2018
61786ae
Fix memory leaks and misc nit fixes
vkonagar May 5, 2018
fa1dbba
fixed the test temportarily for the index bug
sivaprasadsudhir May 5, 2018
6bbaa94
Rename IndexObject to HypotheticalIndexObject
vkonagar May 5, 2018
5591755
debugging the shared pointer issue
sivaprasadsudhir May 5, 2018
5d0d2b8
Fix segfault. Some more Renames
vkonagar May 5, 2018
28e818b
check the exact indexes
sivaprasadsudhir May 5, 2018
8fd0bf4
Fix the tests to use the util
vkonagar May 5, 2018
3f394f7
fixing the index selection
sivaprasadsudhir May 5, 2018
8f1b897
Fix formatting
vkonagar May 5, 2018
40576fe
Rebase and fix conflicts while rebasing
vkonagar May 5, 2018
10843ca
latest tests
sivaprasadsudhir May 5, 2018
3085a58
Better tests
sivaprasadsudhir May 6, 2018
1e9b959
Add get workload support to the testing utility class.
vkonagar May 6, 2018
55354b9
Fix stray
vkonagar May 6, 2018
96f500b
Comment out the debug code in optimizer
vkonagar May 6, 2018
eb3da24
Add index suggestion task skeleton
vkonagar May 7, 2018
2657e76
Add query history catalog GET methods.
vkonagar May 7, 2018
a564372
Fix formatting
vkonagar May 7, 2018
9f5bdc5
Update index suggestion task
vkonagar May 8, 2018
e290797
Add new workload
vkonagar May 8, 2018
57955b4
Add new test - incomplete
vkonagar May 8, 2018
ecec9ce
Add more than 3 columns cost model test
vkonagar May 8, 2018
4e3370c
Fix join query parsing for table name extraction
vkonagar May 8, 2018
818c583
Add more queries to workload D
vkonagar May 8, 2018
e4865c4
DEBUG -> TRACE
vkonagar May 8, 2018
53c1101
Changed the columns from a set to vector
sivaprasadsudhir May 8, 2018
ae3e26b
Merge branch 'auto_index' of https://github.com/sivaprasadsudhir/pelo…
sivaprasadsudhir May 8, 2018
7152d46
Fix compilation error
vkonagar May 8, 2018
0062cc5
Merge branch 'auto_index' of https://github.com/sivaprasadsudhir/pelo…
sivaprasadsudhir May 8, 2018
fee2bea
Complete the index suggestion task - RPC is pending.
vkonagar May 8, 2018
4642b34
Merge remote-tracking branch 'origin/auto_index' into auto_index
vkonagar May 8, 2018
490677f
Get args at RPC handler
vkonagar May 8, 2018
51d7f56
Refactored the tests
sivaprasadsudhir May 8, 2018
fc0d60e
Merge branch 'auto_index' of https://github.com/sivaprasadsudhir/pelo…
sivaprasadsudhir May 8, 2018
a48e085
Fix compilation issue and list serialization
vkonagar May 8, 2018
a3ac507
Merge remote-tracking branch 'origin/auto_index' into auto_index
vkonagar May 8, 2018
f6b18d0
Complete RPC handler
vkonagar May 8, 2018
eb5239f
fix logs
sivaprasadsudhir May 8, 2018
693516b
Fix compilation error in peloton-bin
vkonagar May 8, 2018
6017790
Merge remote-tracking branch 'origin/auto_index' into auto_index
vkonagar May 8, 2018
b024304
Add dropIndex RPC
vkonagar May 9, 2018
8b2169c
run brain and server together in one process for testing
sivaprasadsudhir May 9, 2018
f718511
Merge branch 'auto_index' of https://github.com/sivaprasadsudhir/pelo…
sivaprasadsudhir May 9, 2018
8639124
MOved tunable knobs into a separate structure
sivaprasadsudhir May 9, 2018
3a5227a
changed the arguments of the constructor
sivaprasadsudhir May 9, 2018
aeabd94
completed the refactor
sivaprasadsudhir May 9, 2018
7ee9b0f
Fix index selection job -- rename some stuff
vkonagar May 9, 2018
99be940
Merge branch 'auto_index' of github.com:sivaprasadsudhir/peloton into…
vkonagar May 9, 2018
1e3cd9c
minor style changes
sivaprasadsudhir May 9, 2018
bd4593b
Rename more stuff
vkonagar May 9, 2018
5fe0108
Merge remote-tracking branch 'origin/auto_index' into auto_index
vkonagar May 9, 2018
a8af555
More renames
vkonagar May 9, 2018
273b89b
Fix DML statement handling in workload
vkonagar May 9, 2018
7091c7f
Fix cost model bug for more than 2 column indexes
vkonagar May 9, 2018
67ff655
Add an extensive test on multi-column optimizer cost model test
vkonagar May 9, 2018
51139e6
concrete test case to show the issues with non-deterministic set of i…
sivaprasadsudhir May 9, 2018
f9b2c5e
Add drop indexes RPC
vkonagar May 9, 2018
cb8d209
Merge branch 'auto_index' of https://github.com/sivaprasadsudhir/pelo…
sivaprasadsudhir May 9, 2018
3c3559e
Run formatter
vkonagar May 9, 2018
2da21af
Merge remote-tracking branch 'origin/auto_index' into auto_index
vkonagar May 9, 2018
71d4213
Fix drop indexes
vkonagar May 9, 2018
7d6fc37
Fix a bug in config enumeration for case where no index is better
pbollimp May 10, 2018
6d48e80
Fix formatter issue
vkonagar May 10, 2018
d22b7bb
Merge remote-tracking branch 'origin/auto_index' into auto_index
vkonagar May 10, 2018
1060627
Fix travis error
vkonagar May 10, 2018
0b12801
Fix the test that is failing non-deteministically due to the optimize…
pbollimp May 10, 2018
5029ed1
Merge branch 'auto_index' of https://github.com/sivaprasadsudhir/pelo…
pbollimp May 10, 2018
1e31d2a
Use only one transaction for the entire run of the job. Also, generat…
vkonagar May 10, 2018
8b937da
hopefully, final version of the algorithm
sivaprasadsudhir May 11, 2018
f8262cd
added multiple choices for the output
sivaprasadsudhir May 11, 2018
f4bca42
more index selection tests
sivaprasadsudhir May 11, 2018
4c37855
Add missing populate index
vkonagar May 11, 2018
38757ac
Consider non-equality predicates for index scan in the cost model
chenboy May 10, 2018
4792d91
Drop the indexes only if it is not suggested this time
vkonagar May 11, 2018
5460082
fixed precision issues
sivaprasadsudhir May 11, 2018
3b757f1
Merge branch 'auto_index' of https://github.com/sivaprasadsudhir/pelo…
sivaprasadsudhir May 11, 2018
8bc5170
minor fixes
sivaprasadsudhir May 12, 2018
51f5a1a
Fix the AnalyzeStats crash
vkonagar May 12, 2018
5c322c1
Fix: Index Selection returns empty set because the
vkonagar May 12, 2018
3ef9128
Fix a bug during where clause parsing to make it work with TPCC
pbollimp May 12, 2018
146100d
Fix the compilation error
vkonagar May 12, 2018
d250fbe
Address some of the code review comments
pbollimp May 12, 2018
3230ec3
Fix create/drop index -- running TPCC
vkonagar May 13, 2018
dc424ea
Fix analyze stats crash. Fix query history logging for PREPARED state…
vkonagar May 13, 2018
43b742b
Change knobs
vkonagar May 13, 2018
c422a63
More misc
vkonagar May 13, 2018
27a0df0
addressing commits
sivaprasadsudhir May 14, 2018
a06189a
Restructure code
vkonagar May 14, 2018
332543f
Reformat code
vkonagar May 14, 2018
9d0a005
small correction to make it compile in debug mode
pbollimp May 15, 2018
11d2f3e
remove the unnecessary commented parts of test and code
pbollimp May 15, 2018
59ee8d3
Restructure code, fix nits
vkonagar May 15, 2018
6817300
remove #define
pbollimp May 15, 2018
3546f6a
Merge remote-tracking branch 'origin/auto_index' into auto_index
pbollimp May 15, 2018
e2e4578
Restructure code
vkonagar May 15, 2018
4f48831
Run formatter
vkonagar May 15, 2018
4dc06ac
fix errors for compilation in debug mode
pbollimp May 15, 2018
65d5a06
Merge remote-tracking branch 'origin/auto_index' into auto_index
pbollimp May 15, 2018
480ae4d
fix query logger test
pbollimp May 15, 2018
81420e7
trying to pass the compilation on travis
pbollimp May 15, 2018
28483e5
change debug logging to trace level logging
pbollimp May 15, 2018
e1bd8ba
Fix warning in IndexConfigComparator
vkonagar May 17, 2018
f8e6eda
trace-->debug
vkonagar May 17, 2018
597e798
Hack to make travis pass the build.
vkonagar May 17, 2018
b99312a
Hack to make travis pass the build.
vkonagar May 17, 2018
50db015
remove multiple of unnecessary debug statements
pbollimp May 17, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
487 changes: 487 additions & 0 deletions src/brain/index_selection.cpp

Large diffs are not rendered by default.

23 changes: 23 additions & 0 deletions src/brain/index_selection_context.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
//===----------------------------------------------------------------------===//
//
// Peloton
//
// index_selection_context.cpp
//
// Identification: src/brain/index_selection_context.cpp
//
// Copyright (c) 2015-2018, Carnegie Mellon University Database Group
//
//===----------------------------------------------------------------------===//

#include "brain/index_selection_context.h"
#include "common/logger.h"

namespace peloton {
namespace brain {

IndexSelectionContext::IndexSelectionContext(IndexSelectionKnobs knobs)
: knobs_(knobs) {}

} // namespace brain
} // namespace peloton
174 changes: 174 additions & 0 deletions src/brain/index_selection_job.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
//===----------------------------------------------------------------------===//
//
// Peloton
//
// index_selection_job.cpp
//
// Identification: src/brain/index_selection_job.cpp
//
// Copyright (c) 2015-2018, Carnegie Mellon University Database Group
//
//===----------------------------------------------------------------------===//

#include "brain/index_selection_util.h"
#include "brain/index_selection_job.h"
#include "brain/index_selection.h"
#include "catalog/query_history_catalog.h"
#include "catalog/system_catalogs.h"
#include "optimizer/stats/stats_storage.h"

namespace peloton {
namespace brain {

#define BRAIN_SUGGESTED_INDEX_MAGIC_STR "brain_suggested_index"

void IndexSelectionJob::OnJobInvocation(BrainEnvironment *env) {
auto &txn_manager = concurrency::TransactionManagerFactory::GetInstance();
auto txn = txn_manager.BeginTransaction();
LOG_INFO("Started Index Suggestion Task");

optimizer::StatsStorage *stats_storage =
optimizer::StatsStorage::GetInstance();

ResultType stats_result = stats_storage->AnalyzeStatsForAllTables(txn);
if (stats_result != ResultType::SUCCESS) {
LOG_ERROR(
"Cannot generate stats for table columns. Not performing index "
"suggestion...");
txn_manager.AbortTransaction(txn);
return;
}

// Query the catalog for new SQL queries.
// New SQL queries are the queries that were added to the system
// after the last_timestamp_
auto &query_catalog = catalog::QueryHistoryCatalog::GetInstance(txn);
auto query_history =
query_catalog.GetQueryStringsAfterTimestamp(last_timestamp_, txn);
if (query_history->size() > num_queries_threshold_) {
LOG_INFO("Tuning threshold has crossed. Time to tune the DB!");

// Run the index selection.
std::vector<std::string> queries;
for (auto query_pair : *query_history) {
queries.push_back(query_pair.second);
}

// TODO: Handle multiple databases
brain::Workload workload(queries, DEFAULT_DB_NAME, txn);
LOG_INFO("Knob Num Indexes: %zu", env->GetIndexSelectionKnobs().num_indexes_);
LOG_INFO("Knob Naive: %zu", env->GetIndexSelectionKnobs().naive_enumeration_threshold_);
LOG_INFO("Knob Num Iterations: %zu", env->GetIndexSelectionKnobs().num_iterations_);
brain::IndexSelection is = {workload, env->GetIndexSelectionKnobs(), txn};
brain::IndexConfiguration best_config;
is.GetBestIndexes(best_config);

if (best_config.IsEmpty()) {
LOG_INFO("Best config is empty");
}

// Get the existing indexes and drop them.
// TODO: Handle multiple databases
auto database_object = catalog::Catalog::GetInstance()->GetDatabaseObject(
DEFAULT_DB_NAME, txn);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is should be addressed at least by having a wrapper function with database name as argument pass in. Multiple database handling is important especially after the catalog refactor.

-- Tianyu, Justin & Tianyi

auto pg_index = catalog::Catalog::GetInstance()
->GetSystemCatalogs(database_object->GetDatabaseOid())
->GetIndexCatalog();
auto indexes = pg_index->GetIndexObjects(txn);
for (auto index : indexes) {
auto index_name = index.second->GetIndexName();
// TODO [vamshi]: REMOVE THIS IN THE FINAL CODE
// This is a hack for now. Add a boolean to the index catalog to
// find out if an index is a brain suggested index/user created index.
if (index_name.find(BRAIN_SUGGESTED_INDEX_MAGIC_STR) !=
std::string::npos) {
bool found = false;
for (auto installed_index: best_config.GetIndexes()) {
if ((index.second.get()->GetTableOid() == installed_index.get()->table_oid) &&
(index.second.get()->GetKeyAttrs() == installed_index.get()->column_oids)) {
found = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: using a break; here can be helpful

}
}
// Drop only indexes which are not suggested this time.
if (!found) {
LOG_DEBUG("Dropping Index: %s", index_name.c_str());
DropIndexRPC(database_object->GetDatabaseOid(), index.second.get());
}
}
}

for (auto index : best_config.GetIndexes()) {
// Create RPC for index creation on the server side.
CreateIndexRPC(index.get());
}

// Update the last_timestamp to the be the latest query's timestamp in
// the current workload, so that we fetch the new queries next time.
// TODO[vamshi]: Make this efficient. Currently assuming that the latest
// query can be anywhere in the vector. if the latest query is always at the
// end, then we can avoid scan over all the queries.
last_timestamp_ = GetLatestQueryTimestamp(query_history.get());
} else {
LOG_INFO("Tuning - not this time");
}
txn_manager.CommitTransaction(txn);
}

void IndexSelectionJob::CreateIndexRPC(brain::HypotheticalIndexObject *index) {
// TODO: Remove hardcoded database name and server end point.
capnp::EzRpcClient client("localhost:15445");
PelotonService::Client peloton_service = client.getMain<PelotonService>();

// Create the index name: concat - db_id, table_id, col_ids
std::stringstream sstream;
sstream << BRAIN_SUGGESTED_INDEX_MAGIC_STR << "_" << index->db_oid << "_"
<< index->table_oid << "_";
std::vector<oid_t> col_oid_vector;
for (auto col : index->column_oids) {
col_oid_vector.push_back(col);
sstream << col << "_";
}
auto index_name = sstream.str();

auto request = peloton_service.createIndexRequest();
request.getRequest().setDatabaseOid(index->db_oid);
request.getRequest().setTableOid(index->table_oid);
request.getRequest().setIndexName(index_name);
request.getRequest().setUniqueKeys(false);

auto col_list =
request.getRequest().initKeyAttrOids(index->column_oids.size());
for (auto i = 0UL; i < index->column_oids.size(); i++) {
col_list.set(i, index->column_oids[i]);
}

PELOTON_ASSERT(index->column_oids.size() > 0);
auto response = request.send().wait(client.getWaitScope());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check the response and through some warning if it does not succeed?

}

void IndexSelectionJob::DropIndexRPC(oid_t database_oid,
catalog::IndexCatalogObject *index) {
// TODO: Remove hardcoded database name and server end point.
capnp::EzRpcClient client("localhost:15445");
PelotonService::Client peloton_service = client.getMain<PelotonService>();

auto request = peloton_service.dropIndexRequest();
request.getRequest().setDatabaseOid(database_oid);
request.getRequest().setIndexOid(index->GetIndexOid());

auto response = request.send().wait(client.getWaitScope());
}

uint64_t IndexSelectionJob::GetLatestQueryTimestamp(
std::vector<std::pair<uint64_t, std::string>> *queries) {
uint64_t latest_time = 0;
for (auto query : *queries) {
if (query.first > latest_time) {
latest_time = query.first;
}
}
return latest_time;
}

}
}
Loading