Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support #tagging within document text #185

Open
cloverich opened this issue Jun 15, 2024 · 1 comment
Open

Support #tagging within document text #185

cloverich opened this issue Jun 15, 2024 · 1 comment

Comments

@cloverich
Copy link
Owner

When I did the work to add tagging in #126 I started with an in-document solution, but ultimately decided to kiss and use just a text box in the header, then do the rest of the integration. To some extent this implies a front matter approach like #127 -- tags would live on the markdown's title lines:

-----
title: My document title
tags: #foo, #bar, #baz
----

...rest of content

The alternative, which apps like Bear and Apple notes use, is to integrate them directly into the editor. I think this can make sense, although if I actually look at how I us that in practice, I always slap my tags at the very top of the document -- hence my decision to go with the front matter approach. Still, I implemented some of the parsing logic and want to track that here, since I may end up doing this in the future.

After fidgeting with my unified parser setup, and failing to fully grasp how I should overide it so that this works:

const slateToStringProcessor = unified()
  .use(slateToRemark)
  .use(tagsTransformer)
  .use(remarkStringify);

const stringToSlateProcessor = parser
  .use(remarkUnwrapImages)
  .use(tagsTransformer)
  .use(remarkToSlate);

Also saving this for context:

// | ........................ process ........................... |
// | .......... parse ... | ... run ... | ... stringify ..........|
//
//           +--------+                     +----------+
// Input ->- | Parser | ->- Syntax Tree ->- | Compiler | ->- Output
//           +--------+          |          +----------+
//                               X
//                               |
//                        +--------------+
//                        | Transformers |
//                        +--------------+

I ended up implementing it as a transformer instead:

export default function transformTags(tree: Root) {
  visit(tree, "text", (node: any, index, parent) => {
    const tagRegex = /#\w+/g;
    const parts = node.value.split(tagRegex);
    const matches = node.value.match(tagRegex);
    const newNodes: any = [];

    if (!matches) return;

    parts.forEach((part, i) => {
      // todo: When case is #mytag, parts will be [""] - add test
      if (part) newNodes.push({ type: "text", value: part });
      if (matches[i]) {
        newNodes.push({
          type: "tag",
          children: { type: "text", value: matches[i] },
        });
      }
    });

    parent?.children.splice(index, 1, ...newNodes);
  });

  return tree;
}

With the implied useage of:

let mdast = myunifiedparser.parse(document);
mdast =  transformTags(mdast)
return myunifiedparser.stringify(mdast) // or compile, etc

There are some useful documentations on their side

I then added supporting basic implementations to mdast-to-slate and slate-to-mdast transformers:

// mdast types

/**
 * (Custom Type) Tag - Represents a tag in markdown
 *
 * ex: #mytag
 */
export interface Tag {
  type: "tag";
  children: [Text]; // todo: probably this should actually just be a single text node?
}


// mdast-to-slate

//...

    case "tag":
      return [createTag(node)];
//...

export interface Tag {
  text: string;
  tag: true;
}

function createTag(node: mdast.Tag): Tag {
  const { type, children } = node;
  return {
    text: children[0].value,
    tag: true,
  };
}


// slate-to-mdast
type Decoration = {
  italic: true | undefined;
  bold: true | undefined;
  strikethrough: true | undefined;
  code: true | undefined;
  tag: true | undefined;
};

const DecorationMapping = {
  italic: "emphasis",
  bold: "strong",
  strikethrough: "delete",
  code: "inlineCode",
  tag: "tag",
};


convertNodes( // ...

              case "bold":
              case "italic":
              case "strikethrough":
              case "tag":
                res = {
                  type: DecorationMapping[k] as any,
                  children: [res],
                };
                break;

// ....
@cloverich
Copy link
Owner Author

I also had some WIP tests:


describe("Tags", function () {
  it.skip("text -> mdast base case", function () {
    const input = "This **text** has a #tag1 and another #tag2";
    const output = stringToMdast(input);

    expect(output).to.deep.equal({
      type: "root",
      children: [
        {
          type: "paragraph",
          depth: 1,
          children: [
            // {
            //   type: "text",
            //   value: "tag1 #tag2 #tag3",
            // },
          ],
        },
      ],
    });
  });

  // todo(chris): re-disable truncateThreshold
  chai.config.truncateThreshold = 0; // Disable truncation

  it("A basic example tag #mytag", function () {
    const input1 = "A basic example tag #mytag";
    const output = stringToMdast(input1);

    // todo: expect one paragraph child to simplify test, or maybe even
    // GET children, and throw a useful error when they aren't present...
    expect(output.children.length).to.equal(1);
    expect(output.children[0].type).to.equal("paragraph");
    expect(output.children[0].children).to.deep.equal([
      {
        type: "text",
        value: "A basic example tag ",
      },
      {
        type: "tag",
        children: [
          {
            type: "text",
            value: "mytag",
          },
        ],
      },
    ]);
  });

  it("#mytag", function () {
    const output = stringToMdast("#mytag");

    expect(output.children.length).to.equal(1);
    expect(output.children[0].type).to.equal("paragraph");
    expect(output.children[0].children).to.deep.equal([
      {
        type: "tag",
        children: [
          {
            type: "text",
            value: "mytag",
          },
        ],
      },
    ]);
  });

  it("#mytag1 #mytag2", function () {
    const output = stringToMdast("#mytag1 #mytag2");

    expect(output.children.length).to.equal(1);
    expect(output.children[0].type).to.equal("paragraph");
    expect(output.children[0].children).to.deep.equal([
      {
        type: "tag",
        children: [
          {
            type: "text",
            value: "mytag1",
          },
        ],
      },
      {
        type: "text",
        value: " ",
      },
      {
        type: "tag",
        children: [
          {
            type: "text",
            value: "mytag2",
          },
        ],
      },
    ]);
  });

  it("# My heading has a tag #mytag", function () {
    const output = stringToMdast("# My heading has a tag #mytag");
    expect(output.children.length).to.equal(1);
    expect(output.children[0].type).to.equal("heading");
    expect(output.children[0].children).to.deep.equal([
      {
        type: "text",
        value: "My heading has a tag ",
      },
      {
        type: "tag",
        children: [
          {
            type: "text",
            value: "mytag",
          },
        ],
      },
    ]);
  });

  it.skip("`#mytag` is not a tag");
  it.skip("Tag inside a block \n ```\n #mytag\n ```");
  it.skip("Tag is child of bold inside a code block \n ```\n **#mytag**\n ```");

  describe("slate -> mdast", function () {
    it("A basic example tag #mytag", function () {
      const input: SlateNode = {
        type: "root",
        children: [
          {
            type: "p",
            children: [
              {
                text: "A basic example tag ",
              },
              {
                text: "mytag",
                tag: true,
              },
            ],
          },
        ],
      };

      const output = slateToMdast(input);

      expect(output).to.exist;
      expect(output.type).to.equal("root");
      expect(output.children).to.exist;
      expect(output.children).to.have.length(1);
      expect(output.children).to.deep.equal([
        {
          type: "paragraph",
          children: [
            {
              type: "text",
              value: "A basic example tag ",
            },
            {
              type: "tag",
              children: [
                {
                  type: "text",
                  value: "mytag",
                },
              ],
            },
          ],
        },
      ]);
    });
  });
});

I broke them when I changed the structure of the output slightly, but its straight forward to go either way, and I think they are in the right direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant