Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utility modules for lists and maps #18

Merged
merged 7 commits into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,11 @@ import Grammar;
import ParseTree;
import util::Maybe;

import Prelude;

import lang::rascal::grammar::Util;
import util::ListUtil;

alias DelimiterPair = tuple[Maybe[Symbol] begin, Maybe[Symbol] end];

data Direction // Traverse lists of symbols (in productions)...
= forward() // - ...from left to right;
| backward() // - ...from right to left.
;

@synopsis{
Reorder a list according to the specified direction
}

list[&T] reorder(list[&T] l, forward()) = l;
list[&T] reorder(list[&T] l, backward()) = reverse(l);

@synopsis{
Gets the unique leftmost delimiter (`begin`) and the unique rightmost
delimiter `end`, if any, that occur **inside** productions of symbol `s`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,9 @@ import util::Math;
import util::Maybe;

import lang::rascal::grammar::Util;
import util::ListUtil;
import util::MaybeUtil;

@synopsis{
Representation of a traversal direction along a list of symbols
}

data Direction // Traverse lists of symbols (in productions)...
= forward() // - ...from left to right;
| backward() // - ...from right to left.
;

private list[&T] reorder(list[&T] l, forward()) = l;
private list[&T] reorder(list[&T] l, backward()) = reverse(l);

@synopsis{
Computes the *last* set of symbol `s` in grammar `g`
}
Expand All @@ -58,8 +47,8 @@ private map[Symbol, Maybe[set[Symbol]]] firstBySymbol(Grammar g, bool(Symbol) pr

Maybe[set[Symbol]] firstOf([])
= just({});
Maybe[set[Symbol]] firstOf([Symbol h, *Symbol t])
= \set: just({\empty(), *_}) := ret[delabel(h)]
Maybe[set[Symbol]] firstOf([h, *t])
= Maybe[set[Symbol]] \set: just({\empty(), *_}) := ret[delabel(h)]
? util::MaybeUtil::union(\set, firstOf(t))
: ret[delabel(h)];

Expand Down
16 changes: 2 additions & 14 deletions rascal-textmate-core/src/main/rascal/lang/textmate/Conversion.rsc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ import lang::textmate::ConversionConstants;
import lang::textmate::ConversionUnit;
import lang::textmate::Grammar;
import lang::textmate::NameGeneration;
import util::ListUtil;
import util::MapUtil;

alias RscGrammar = Grammar;

Expand Down Expand Up @@ -400,20 +402,6 @@ private list[Symbol] toTerminals(set[Segment] segs) {
return terminals;
}

// TODO: This function could be moved to a separate, generic module
private list[&T] dupLast(list[&T] l)
= reverse(dup(reverse(l))); // TODO: Optimize/avoid `reverse`-ing?

// TODO: This function could be moved to a separate, generic module
private map[&K, list[&V]] insertIn(map[&K, list[&V]] m, map[&K, &V] values)
// Updates the mapping of each key `k` in map of lists `m` to be the union
// of: (1) the existing list `m[k]`, and (2) the new elements-to-be-inserted
// `values[k]`. For instance:
// - m = ("foo": [1, 2, 3], "bar": [], "baz": [1, 2])
// - values = ("foo": [4, 5], "bar": [123], "qux": [3, 4])
// - return = ("foo": [1, 2, 3, 4, 5], "bar": [123], "baz": [1, 2])
= (k: m[k] + (k in values ? [values[k]] : []) | k <- m);

private TmRule toTmRule(RegExp re)
= match(
re.string,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import lang::rascal::grammar::analyze::Delimiters;
import lang::textmate::ConversionConstants;
import lang::textmate::Grammar;
import lang::textmate::NameGeneration;
import util::ListUtil;

@synopsis{
Representation of a production in a Rascal grammar to be converted to a rule
Expand Down Expand Up @@ -164,10 +165,6 @@ set[ConversionUnit] removeStrictPrefixes(set[ConversionUnit] units)
bool isStrictPrefix(ConversionUnit u1, ConversionUnit u2)
= isStrictPrefix(u1.prod.symbols, u2.prod.symbols);

// TODO: This function could be moved to a separate, generic module
private bool isStrictPrefix(list[&T] l1, list[&T] l2)
= size(l1) < size(l2) && !any(i <- [0..size(l1)], l1[i] != l2[i]);

@synopsis{
Representation of a *decomposition* of a list of units (i.e., the lists of
symbols of their productions) into their maximally common *prefix*
Expand Down
13 changes: 0 additions & 13 deletions rascal-textmate-core/src/main/rascal/lang/textmate/Grammar.rsc
Original file line number Diff line number Diff line change
Expand Up @@ -79,19 +79,6 @@ str toJSON(TmGrammar g, int indent = 2, loc l = |unknown:///|) {
return asJSON(g, indent = indent);
}

@synopsis{
Adds a TextMate rule to both the repository and the patterns of TextMate
grammar `g`
}

TmGrammar addRule(TmGrammar g, TmRule r)
= g [repository = g.repository + (r.name: r)]
[patterns = appendIfAbsent(g.patterns, include("#<r.name>"))];

// TODO: This function could be moved to a separate, generic module
private list[&T] appendIfAbsent(list[&T] vs, &T v)
= v in vs ? vs : vs + v;

Comment on lines -82 to -94
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has become unused and is just removed

@synopsis{
Converts list of strings `names` (typically categories) to a map of captures
}
Expand Down
33 changes: 33 additions & 0 deletions rascal-textmate-core/src/main/rascal/util/ListUtil.rsc
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
module util::ListUtil

import List;

@synopsis{
Representation of a traversal direction along a list
}

data Direction // Traverse lists...
= forward() // - ...from left to right;
| backward() // - ...from right to left.
;

@synopsis{
Reorder a list according to the specified direction
}

list[&T] reorder(list[&T] l, forward()) = l;
list[&T] reorder(list[&T] l, backward()) = reverse(l);

@synopsis{
Removes multiple occurrences of elements in a list. The last occurrence
remains (cf. `List::dup`).
}

list[&T] dupLast(list[&T] l) = reverse(dup(reverse(l))); // TODO: Optimize/avoid `reverse`-ing?

@synopsis{
Checks if list `l1` is a strict prefix of list `l2`
}

bool isStrictPrefix(list[&T] l1, list[&T] l2)
= size(l1) < size(l2) && !any(i <- [0..size(l1)], l1[i] != l2[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps, in the challenge:

bool isStrictPrefix(list[&T] l1, list[&T] l2)
    = size(l1) < size(l2) && l1 == l2[..size(l2);

is even shorter ;)

Or even this ;)

bool isStrictPrefix(list[&T] l1:[_,*_], [*l1, _, *_]) = true;
default bool isStrictPrefix(_, _) = false;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's brilliant, I concede 😉

Copy link
Collaborator Author

@sungshik sungshik Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a bit interested about performance of the various implementations (not really relevant for the current use case; just out of curiosity) and took a few anecdotal measurements. For posterity, here it goes...

Implementations

bool isStrictPrefix1([], [])
    = false;
bool isStrictPrefix1([], [_, *_])
    = true;
bool isStrictPrefix1([_, *_], [])
    = false;
bool isStrictPrefix1([head1, *tail1], [head2, *tail2])
    = head1 == head2 && isStrictPrefix1(tail1, tail2);

// -------------------------------------------------------------------------- //

bool isStrictPrefix2(list[&T] l1, list[&T] l2) {
  if (size(l1) >= size(l2)) {
     return false;
  }
  if (i <- [0..size(l1)], l1[i] != l2[i]) {
     return false;
  }
  return true;
}

// -------------------------------------------------------------------------- //

bool isStrictPrefix3(list[&T] l1, list[&T] l2)
    = size(l1) < size(l2) && !any(i <- [0..size(l1)], l1[i] != l2[i]);

// -------------------------------------------------------------------------- //

bool isStrictPrefix4(list[&T] l1, list[&T] l2)
    = size(l1) < size(l2) && l1 == l2[..size(l1)];

// -------------------------------------------------------------------------- //

bool isStrictPrefix5(list[&T] l1:[_,*_], [*l1, _, *_]) = true;
default bool isStrictPrefix5(_, _) = false;

Measurements

import IO;
import util::Benchmark;

void main(int i = 0) {
    cases = (
        "v1": void() { for (_ <- [0..1000]) { isStrictPrefix1([0..i], [0..1000]); } },
        "v2": void() { for (_ <- [0..1000]) { isStrictPrefix2([0..i], [0..1000]); } },
        "v3": void() { for (_ <- [0..1000]) { isStrictPrefix3([0..i], [0..1000]); } },
        "v4": void() { for (_ <- [0..1000]) { isStrictPrefix4([0..i], [0..1000]); } },
        "v5": void() { for (_ <- [0..1000]) { isStrictPrefix5([0..i], [0..1000]); } }
    );
    println(benchmark(cases));
}

Quiz

Question: Which implementation is slowest?

Answer...

isStrictPrefix1 is always slowest. For small prefixes, all other implementations are comparably fast. For larger prefixes, isStrictPrefix2 gets increasingly slower than ...3/4/5 (but never as slow as ...1).

Question: Which implementation is fastest?

Answer...

isStrictPrefix3/4/5 are comparably fast for small prefixes and non-prefixes. For larger prefixes, ...4/5 are clearly fastest (but there is no clear winner between the two).

Question: Which implementation crashes, if any?

Answer...

isStrictPrefix1 crashes with a java.lang.StackOverflowError for a list of size 1000 and a prefix of size 999 (not necessarily a minimal example).

Results

First series

With isStrictPrefix1:

rascal>main(i = 9)
("v1":404,"v2":304,"v3":322,"v4":312,"v5":306)
ok
rascal>main(i = 99)
("v1":1461,"v2":663,"v3":413,"v4":288,"v5":320)
ok
rascal>main(i = 999)
java.lang.StackOverflowError
Second series

Without isStrictPrefix1:

rascal>main(i = 9)
("v2":319,"v3":387,"v4":344,"v5":374)
ok
rascal>main(i = 99)
("v2":594,"v3":450,"v4":332,"v5":421)
ok
rascal>main(i = 999)
("v2":18232,"v3":1713,"v4":577,"v5":520)
ok
rascal>main(i = 1000)
("v2":524,"v3":526,"v4":517,"v5":513)
ok
rascal>main(i = 9999)
("v2":3021,"v3":3003,"v4":3098,"v5":2975)
ok

Copy link
Member

@DavyLandman DavyLandman Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Micro-benchmarks are hard ;) these are doing to much in the loop that is not part of the focus.

On my machine, I get these numbers:

rascal>main_old();
Reloading module Bench
("v2":213,"v3":171,"v4":176,"v5":218)

and after rewriting the benchmark:

void main(int i = 0) {
    full = [0..1000];
    needle = [0..i];
    rounds = [0..1000];
    cases = (
        "v2": void() { for (_ <- rounds) { isStrictPrefix2(needle, full); } },
        "v3": void() { for (_ <- rounds) { isStrictPrefix3(needle, full); } },
        "v4": void() { for (_ <- rounds) { isStrictPrefix4(needle, full); } },
        "v5": void() { for (_ <- rounds) { isStrictPrefix5(needle, full); } }
    );
    println(benchmark(cases));
}

it prints:

Reloading module Bench
("v2":71,"v3":47,"v4":37,"v5":24)

suddenly v5 is the fastest, and the numbers are all quite a bit lower, let's increase the rounds to 100000.

rascal>main()
("v2":908,"v3":852,"v4":839,"v5":187)

so yeah, v5 might be a quite a bit faster than the rest of them ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that's interesting! 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I would have kept V4 for readability ;)

13 changes: 13 additions & 0 deletions rascal-textmate-core/src/main/rascal/util/MapUtil.rsc
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
module util::MapUtil

@synopsis{
Updates the mapping of each key `k` in map of lists `m` to be the union of:
(1) the existing list `m[k]`, and (2) the new elements-to-be-inserted
`values[k]`. For instance:
- m = ("foo": [1, 2, 3], "bar": [], "baz": [1, 2])
- values = ("foo": [4, 5], "bar": [123], "qux": [3, 4])
- return = ("foo": [1, 2, 3, 4, 5], "bar": [123], "baz": [1, 2])
}

map[&K, list[&V]] insertIn(map[&K, list[&V]] m, map[&K, &V] values)
= (k: m[k] + (k in values ? [values[k]] : []) | k <- m);