From 42e591925056841d1299effa0112d8134d5d3070 Mon Sep 17 00:00:00 2001 From: "YOMO.LEE" <75362129@qq.com> Date: Sat, 26 Oct 2024 14:21:46 +0800 Subject: [PATCH] [Fix][Doc] Fix LocalFile doc (#7887) Continue to optimize the document about filtering files and add some examples [(#7887)](https://github.com/apache/seatunnel/issues/7887) --- docs/en/connector-v2/source/LocalFile.md | 60 ++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/docs/en/connector-v2/source/LocalFile.md b/docs/en/connector-v2/source/LocalFile.md index 533d7fa91bf..077537f6887 100644 --- a/docs/en/connector-v2/source/LocalFile.md +++ b/docs/en/connector-v2/source/LocalFile.md @@ -256,10 +256,70 @@ Filter pattern, which used for filtering files. The filtering format is similar to wildcard matching file names in Linux. +| Wildcard | Meaning | Example | +|--------------|--------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------| +| * | Match 0 or more characters | f*     Any file starting with f
b*.txt   Any file starting with b, any character in the middle, and ending with. txt | +| [] | Match a single character in parentheses | [abc]*   A file that starts with any one of the characters a, b, or c | +| ? | Match any single character | f?.txt   Any file starting with 'f' followed by a character and ending with '. txt' | +| [!] | Match any single character not in parentheses | [!abc]*   Any file that does not start with abc | +| [a-z] | Match any single character from a to z | [a-z]*   Any file starting with a to z | +| {a,b,c}/a..z | When separated by commas, it represents individual characters
When separated by two dots, represents continuous characters | {a,b,c}*   Files starting with any character from abc
{a..Z}*    Files starting with any character from a to z | + However, it should be noted that unlike Linux wildcard characters, when encountering file suffixes, the middle dot cannot be omitted. For example, `abc20241022.csv`, the normal Linux wildcard `abc*` is sufficient, but here we need to use `abc*.*` , Pay attention to a point in the middle. +File Structure Example: +``` +report.txt +notes.txt +input.csv +abch20241022.csv +abcw20241022.csv +abcx20241022.csv +abcq20241022.csv +abcg20241022.csv +abcv20241022.csv +abcb20241022.csv +old_data.csv +logo.png +script.sh +helpers.sh +``` +Matching Rules Example: + +**Example 1**: *Match all .txt files*,Regular Expression: +``` +*.txt +``` +The result of this example matching is: +``` +report.txt +notes.txt +``` +**Example 2**: *Match all Any file starting with abc*,Regular Expression: +``` +abc*.csv +``` +The result of this example matching is: +``` +abch20241022.csv +abcw20241022.csv +abcx20241022.csv +abcq20241022.csv +abcg20241022.csv +abcv20241022.csv +abcb20241022.csv +``` +**Example 3**: *Match all Any file starting with abc,And the fourth character is either x or g*, the Regular Expression: +``` +abc[x,g]*.csv +``` +The result of this example matching is: +``` +abcx20241022.csv +abcg20241022.csv +``` ### compress_codec [string] The compress codec of files and the details that supported as the following shown: