-
Notifications
You must be signed in to change notification settings - Fork 5
/
TODO
123 lines (104 loc) · 4.76 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
Parser and lexer:
* Support for parentheses
* Type casts
* Trinary conditional operator ("?:")
* Allow location specifiers to be attached to constants
* Unicode identifiers and string literals
* In string literals, support the "\000" syntax (Octal ASCII
character) from C89; or perhaps some syntax for specifying Unicode
code points?
* Error reporting; location of syntax error
* Support for assignments?
* Warn about redundant joins: "foo(A), foo(A);" is redundant
Planner:
* Add support for stratification
* Exploit implied equalities more effectively, suppress duplicate
predicate evaluations
* Avoid Cartesian products, if possible
* We can skip projection in an operator if no operator in the
remainder of the op chain requires values that are NOT in the input
to that operator; e.g. if we do a join and there isn't anything in
the rest of the op chain that needs a value in the scan_rel, we can
skip projection on the output of the join
* This might be complicated by the uniqification of variable names
Executor/Router/Operators:
* More efficient joins: the existing approach to joins essentially
re-materializes intermediate join results. Stems and stairs from
earlier Eddies work might point toward a better way of doing this.
* Use a heap to implement timer deadlines
* Aggs: don't emit a deletion/insertion pair if the agg value is
unchanged (e.g. sum<> on input 0, min<> on non-min input, etc.)
* Similarly, if an agg moves from n => m, we currently delete n and insert
m. Might be more efficient to emit a single "update n => m" tuple
* Implement DRED
Network:
* Add an ad-hoc compression method, to avoid resending table names in
every single packet
* Have the client and server negotiate that "table 10" means "table
foo, with schema bar" once, and use that information for the
remainder of the session
* Add a UDP transport
* Consider adding an SCTP transport
* Consider adding an SSL-over-TCP transport, and/or secure
communication in general
* Consider adding a multicast transport?
* Consider using TCP_NODELAY in TCP transport
Data types and expressions:
* Consider using a variable-size length word for C4String: more
storage-efficient for short strings, which is the common case (or
special-case this just for network format?)
* Replace string location specifier type with an IPv4 endpoint (scalar
value containing IPv4 address + port)
* Consider removing refcount from Tuple OR use the resulting padding
on LP64 machines for something useful (e.g. cache tuple_hash())
* Consider using a packed tuple representation; reorder Tuple fields
to reduce padding requirements
* Allow sum, avg to work on a broader range of data types
* The sum of int4s might be an int8
* Check for integer overflow in addition, multiply, etc.
Tables and storage:
* Add internal "table IDs", and use them instead of table names
* Support for event tables
* Support for BDB persistent tables
* Can we store in-memory Tuple/Datum format directly to BDB?
* Consider adding a "regexp" table type: given a string input, parses
into tuple format by applying a regular expression
* Rather than regexp, look at PADS and related work
* Also do output to external format
* Optimize the hash table implementation
* Consider using Judy trees/arrays instead of hash tables
* Import red-black tree code
* Support for secondary indexes, index scans on PK
* Push predicates down to SQLite table scan
Build system:
* Make use of profile-guided optimization with GCC
* Support GCC 4.5's interprocedural optimization mode
* Only export the official C4 client API from the shared library
* All symbols prefixed with c4_, etc.
* Implement via linker scripts?
APR:
* Report queue performance issue
* Add support for "apr-config --configure"
* Modify queue type to allow variably-sized queue elements, to avoid
the need to malloc() small queue messages
Broader issues:
* Unit testing framework
* Error handling: exceptions via longjmp?
* Add a "$LOCALHOST" variable that expands to the network address of
the evaluating C4 instance
* Complicated by the fact that a machine can have multiple network
addresses (one per interface + localhost + IPv4 vs. IPv6, etc.)
* Perhaps adopt something like Reactor's "ref" concept instead
* Invent something similar to makeNode() from Postgres: infer node
size from node type tag
* Consider caching per-tuple hash code
* Change the node system to work with strict aliasing per C99
* Add a concept of "programs" or "modules"
* Implement a simple interactive shell
* As a first step, read input program from stdin unless terminal
* Locking / concurrency control
Minor:
* If a fact is defined at node X but has a location specifier for node
Y, should we send the tuple to node Y, or simply ignore it?
* Make core dumps more obvious
* Print backtrace on core dump