-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple issues in components/prism-nginx.js #1718
Comments
Would you like to add a PR implementing your proposed solutions?
Don't
That would be ideal but it's usually quite hard to find this expression.
No, it isn't. If they get in your way or produce too many errors feel free to remove or change them. |
They do. But
nginx syntax is pretty strict. A directive can be found by its position. A directive is placed at the beginning of a text or preceded by name_1 parameters_1;
name_2 {
name_2_1 parameters2_1;
# A comment...
name_2_2 parameters_2_2;
}
name_3 parameters_3; Names of directives can be described with 'DIRECTIVE': {
pattern: /((?:^|[;{}])\s*)\w+[^;{]*?(?=\s*[;{]|$)/, // Directive’s name and parameters
lookbehind: true,
inside: {
'NAME': /^\w+(?=[\s;{]|$)/ // Name of a directive
}
} We have to find name and parameters first, because
Sure! Most ideas are already implemented. But I have doubts about directives. What do you think about searching for directives by their position? Doesn’t it sound too naive?
I’ll try to insert your regex into my code. |
👍
This is a suggestion from my side because highlight.js also highlights
I think it's a good approach but your current pattern does not account for strings: name ';oh no'; It's good to know that the strict comment rules work in our favour here. |
Personally, I can’t see the difference between Anyway, I’ll try to add it for compatibility.
I forgot to mention I also have a greedy string pattern. It catches |
I’ll make a PR on the weekend. |
But then this causes problems, doesn't it? name ';oh no' parameter; |
It does. Thank you. Maybe, this works better. 'comment': {
pattern: /(^|[\s;{}])#.*/,
lookbehind: true
},
'function': {
pattern: /((?:^|[;{}])\s*)\w+(?:(["'])(?:\\\2|(?!\2)[\S\s])*\2|\\["';{]|[^"';{])+/,
lookbehind: true,
inside: {
'attr-name': /^\w+(?=[\s;{]|$)/
}
} Here are some test cases I used: name parameter1;
name "parameter1";
name parameter1 parameter2;
name parameter1 "parameter2";
name "parameter1" parameter2;
name para\;meter1;
name "para;meter1";
name "para\;meter1";
internal;
internal ;
# A multiline parameter
name "para
meter1";
name {
name parameter1 'parameter2' \; par#ameter3;
name parameter1 \" 'hello' par#ameter2;
name parameter1; name parameter1;
name parameter1 \{ 'hello';
name {
internal;
name parameter1 parameter2;
name para;meter1; # An error!
name para{meter1; # An error!
name parameter1 {; # An error!
}
} |
This has problems with this: name " #foo"; #bar Also, I don't think that we should capture trailing spaces like in: internal ; This should fix all of that: Prism.languages.nginx = {
'comment': {
pattern: /((?:^|[{};])\s*)#.*/m,
lookbehind: true
},
'directive': {
pattern: /\b\w+(?:[^;{}"'\\]|\\.|("|')(?:(?!\1)[^\\]|\\.)*\1)*?(?=\s*[;{])/,
greedy: true
},
'punctuation': /[{};]/
}; The lookbehind of the comment pattern is a little more complex so we can avoid greedy rematching which causes #1492. I also simplified the escape rules but that shouldn't cause problems. Also, we don't have to concern ourselves about how to parse syntactically incorrect code. |
Hm, this look similar to the lookbehind I used before. Could you explain, what exactly helps to avoid the bug you mentioned, because the example has problems with location = "/example" # This is a comment
{}
|
The reason I made these changes to the lookbehind is to avoid greedy rematching: Because name "#foo"; name; #bar The greedy matching of
This is why I want to avoid rematching. It avoids matching non-comments like this: location = "/example" # This is not a comment
{}
|
Thank you!
This answer from Stack Exchange is not complete. location = "/example" # This IS a comment. There is a space
{# This is a comment after "{". No spaces required
return 200 "Hello, world!";# This is a comment after ";". No spaces required
}# This is a comment after "}". No spaces required
location = /example # This IS a comment. There is a space
{}
location = "/example"# This is NOT a comment. There is no space
{}
location = /example# This is NOT a comment. There is no space
{}
It doesn’t guarantte that the comment is preceded by |
Sorry, but I don't see the proof... I assume
Well, true. It doesn't in the case of the start of the file, but that should be it. Also, is there a spec page of the NGINX configuration syntax? |
Damn. There are examples of
I’m not sure. I haven’t found it yet. I use events # A comment
{
worker_connections 512;
}
# A comment
http# This is not a comment. There is no space
{
server {# A comment
listen 80;# A comment
location = "/example1" # This is a comment. There is a space
{# This is a comment after "{". No spaces required
return 200 "Hello, world!";# This is a comment after ";". No spaces required
}# This is a comment after "}". No spaces required
location = /example2 # This is a comment. There is a space
{}
location = "/example3"# This is not a comment. There is no space
{}
location = /example4# This is not a comment. There is no space
{}
}# A comment
} nginx outputs the following: nginx: [emerg] unknown directive "http#" in /path/to/nginx.conf:9
nginx: [emerg] unexpected "#" in /path/to/nginx.conf:21
nginx: [emerg] invalid number of arguments in "location" directive in /path/to/nginx.conf:25 Thus, the location = "/example" # This is not a comment
{} P.S. I hope I don’t bother you too much :) |
These examples I can work with! But, it's the |
Just to confirm this: |
Alright, assuming strings can indeed only be followed by a white space or Prism.languages.nginx = {
'comment': {
pattern: /(^|[\s{};])#.*/,
lookbehind: true
},
'directive': {
pattern: /\b\w+(?:[^;{}"'\\]|\\.|("|')(?:(?!\1)[^\\]|\\.)*\1)*?(?=\s*[;{])/,
greedy: true,
inside: {
'string': {
// I assumed that " and ' can be escaped using \" and \' and that \ can be escaped using \\
pattern: /((?:^|[^\\])(?:\\\\)*)("|')(?:(?!\2)[^\\]|\\.)*\2/,
lookbehind: true
},
'comment': {
pattern: /(\s)#.*/,
lookbehind: true,
greedy: true
},
'keyword': /^\S+/
// other patterns
}
},
'punctuation': /[{};]/
}; I tested everything with the examples here and this: events # A comment
{
worker_connections 512;
}
# A comment
http# This is not a comment. There is no space
{
server {# A comment
listen 80;# A comment
location = "/example1" # This is a comment. There is a space
{# This is a comment after "{". No spaces required
return 200 "Hello, world!";# This is a comment after ";". No spaces required
}# This is a comment after "}". No spaces required
location = /example2 # This is a comment. There is a space
{}
location = /example4# This is not a comment. There is no space
{}
}# A comment
}
name parameter1;
name "parameter1";
name parameter1 parameter2;
name parameter1 "parameter2";
name "parameter1" parameter2;
name para\;meter1;
name "para;meter1";
name "para\;meter1";
internal;
internal ;
# A multiline parameter
name "para
meter1";
name {
name parameter1 'parameter2' \; par#ameter3;
name parameter1 \" 'he"llo' par#ameter2;
name parameter1; name parameter1;
name parameter1 \{ 'hello';
name {
internal;
name parameter1 parameter2;
name para;meter1; # An error!
name para{meter1; # An error!
name parameter1 {; # An error!
}
}
name "#foo"; name; #bar
name " #foo"; #bar
name ';oh no' parameter; |
It looks that way. |
Your solution is perfect! |
It isn't... name # foo; Also, I have another question: How does NGINX handle the following: name # am I a comment;
; Anyway, here's a corrected version. I also updated the name directive expression for the name: Only Prism.languages.nginx = {
'comment': {
pattern: /(^|[\s{};])#.*/,
lookbehind: true
},
'directive': {
pattern: /(^|\s)\w(?:[^;{}"'\\]|\\.|("|')(?:(?!\2)[^\\]|\\.)*\2)*?(?=[ \t]*[;{])/,
lookbehind: true,
greedy: true,
inside: {
'string': {
// I assumed that " and ' can be escaped using \" and \' and that \ can be escaped using \\
pattern: /((?:^|[^\\])(?:\\\\)*)("|')(?:(?!\2)[^\\]|\\.)*\2/,
lookbehind: true
},
'comment': {
pattern: /(\s)#.*(?=[\r\n])/,
lookbehind: true,
greedy: true
},
'keyword': /^\S+/
// other patterns
}
},
'punctuation': /[{};]/
}; |
nginx recognizes location # A comment;
{
} causes I think, your previous solution could be fixed easily: 'directive': {
pattern: /\b\w+(?:("|')(?:(?!\1)[^\\]|\\.)*\1|(\s)#.*|[^;{}"'\\]|\\.)*?(?=\s*[;{])/,
// ...
}
No syntax errors. The first Oh, I see the GitHub highlighter fails on some examples :) |
I applied your fix! Prism.languages.nginx = {
'comment': {
pattern: /(^|[\s{};])#.*/,
lookbehind: true
},
'directive': {
pattern: /(^|\s)\w(?:[^;{}"'\\\s]|\\.|("|')(?:(?!\2)[^\\]|\\.)*\2|\s(?:#.*)?)*?(?=[ \t]*[;{])/,
lookbehind: true,
greedy: true,
inside: {
'string': {
pattern: /((?:^|[^\\])(?:\\\\)*)("|')(?:(?!\2)[^\\]|\\.)*\2/,
lookbehind: true
},
'comment': {
pattern: /(\s)#.*(?=[\r\n])/,
lookbehind: true,
greedy: true
},
'keyword': /^\S+/
// other patterns
}
},
'punctuation': /[{};]/
}; I just have a bad feeling that |
@RunDevelopment, thank you! I’ll test it and write you back in a couple of days. What about the next pattern? 'directive': {
pattern: /(^|\s)\w(?:[^;{}"'\\\s]|\\.|("|')(?:(?!\2)[^\\]|\\.)*\2|\s#.*|\s+)*?(?=\s*[;{])/,
// ...
} |
Upon testing the two regexes I found that my pattern fails in linear time while yours takes exponential time. So please use my pattern. |
nginx
shouldn’t inherit anything fromclike
.nginx
doesn’t have classes, booleans, operators, etc.;keyword
, because some parameters are keywords too.Comments
Directives
Problem statement
Directives list is obsolete and inconsistent. AFAIK, the current list is based largely on a third‑party gist.
Currently Prism supports 339 directives. nginx contains more than 693 directives. Beside that, some directives shown in
components/prism-nginx.js
are undocumented, removed years ago or not directives at all. See the table below.CONTENT_
,DOCUMENT_
, etc.auth
pop3_auth
in 2007 ¹devpoll_changes
devpoll_events
epoll_events
fastcgi_redirect_errors
fastcgi_intercept_errors
in 2006 ¹if_not_empty
fastcgi_param
andscgi_param
kqueue_changes
kqueue_events
log_format_combined
log_format combined
.combined
is a parametermore_set_headers
ngx_headers_more
. It is not distributed with nginx ²optimize_server_names
server_name_in_redirect
in 2008 ¹, removed in 2015 ³post_action
proxy
proxy_redirect_errors
proxy_intercept_errors
in 2006 ¹proxy_upstream_fail_timeout
proxy_upstream_max_fails
rtsig_overflow_events
rtsig_overflow_test
rtsig_overflow_threshold
rtsig_signo
satisfy_any
satisfy
in 2008 ¹, removed in 2015 ³so_keepalive
listen
worker_rlimit_sigpending
xslt_entities
xml_entities
?¹ https://nginx.org/en/CHANGES.
² https://github.com/openresty/headers-more-nginx-module.
³ https://hg.nginx.org/nginx/rev/2911b7e5491b.
⁴ https://hg.nginx.org/nginx/rev/967594ba7571.
There are many more removed directives. Here they are:
aio
directives andrtsig_signo
(see http://hg.nginx.org/nginx/rev/adba26ff70b5) were removed in 2015;limit_zone
is removed in 2014;secure_link_expires
is removed in 2010;open_file_cache_retest
is removed in 2007;fastcgi_upstream_fail_timeout
,fastcgi_upstream_max_fails
,fastcgi_x_powered_by
,memcached_upstream_fail_timeout
,memcached_upstream_max_fails
,proxy_pass_server
,proxy_pass_x_powered_by
,restrict_host_names
were removed in 2006;fastcgi_root
,fastcgi_set_var
,fastcgi_params
,default_charset
,post_accept_timeout
,proxy_add_x_forwarded_for
,proxy_pass_unparsed_uri
,proxy_preserve_host
,proxy_set_x_real_ip
,proxy_set_x_url
,proxy_x_var
,redirect
,server_names_hash_threshold
,server_names_hash
were removed in 2005.There are many more undocumented directives. These are
degradation
,degrade
,eventport_events
,gzip_hash
,gzip_no_buffer
,gzip_window
,http2_pool_size
,http2_streams_index_size
,postpone_gzipping
,ssi_ignore_recycled_buffers
,uwsgi_string
(see https://forum.nginx.org/read.php?29,277920,277920).A proposed solution
On the one hand, there is a relatively fast way to update the list:
components/prism-nginx.js
asold.txt
;|
inold.txt
with\n
;old.txt
alphabetically;new.txt
;new.txt
:\s+\(\w+\)
;(^[\w]+)(?:\n\1)+$
innew.txt
with\1
to remove duplicates;diff old.txt new.txt | grep '^<' | sed 's/^< *//'
;diff old.txt new.txt | grep '^>' | sed 's/^> *//'
.On the other hand, the resulting list is gonna be really long (I get 631 entries without third‑party modules).
Maybe it’s better to replace the list with a single regular expression?
Numbers
Is it obligatory to highlight numbers? What about
domain2.com
,128k
,192.168.0.1:8000
?Strings
\"
,\'
,\\
,\n
,\r
,\t
. I’ve checked all the other symbols from0x00
to0x7E
inside both"
and'
.\xFF
,\uFFFF
,\u{FFFF}
are not supported."$example"
and"${example}"
). As I previously mentioned, the dollar sign ($
) can’t be escaped with\
, so nginx interprets\$x
as a slash (\
) and a subsequent$x
variable.Variables
$geoip_city_country_code3
,$geoip_country_code3
,$http2
,$time_iso8601
. See https://nginx.org/en/docs/varindex.html.set $name value;
) can only consist of digits, uppercase and lowercase English letters and underscores. Position is irrelevant ($0
,$_
are OK)./\$\w+/
is the best choice for a regular variable.$arg_
,$cookie_
,$http_
,$jwt_claim_
,$jwt_header_
,$sent_http_
,$sent_trailer_
,$upstream_cookie_
,$upstream_http_
,$upstream_trailer_
) accept a trailing part. This part can contain any character except/[\x00-\x1F\s"';\\{]/
./[;{]/
are only allowed inside strings ("$arg_;"
,"$arg_{"
are OK). Escaping with a slash (\
) is useless.The text was updated successfully, but these errors were encountered: