Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awscurl doesn't work for AWS Elasticsearch when query contains CJK multi-byte Unicode characters #106

Open
ToshihikoMakita opened this issue Mar 15, 2021 · 6 comments

Comments

@ToshihikoMakita
Copy link

ToshihikoMakita commented Mar 15, 2021

It works fine if domain allows open-access

I'm confident that my query works fine if Elasticsearch domain allows open-access and I use curl command to issue query.

PS C:\Users\toshi\OneDrive\Documents\ElasticSearch\command-2021> curl -XGET "https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty"  -H "Content-Type: application/json" -d "@search-search-ngram-and-kuromoji-2.json"
{
  "took" : 163,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 144.35374,
    "hits" : [
      {
        "_index" : "search-ngram-and-kuromoji",       
        "_type" : "_doc",
        "_id" : "ucNKC3gBNKbjtYmSAtQp",
        "_score" : 144.35374,
        "_source" : {
          "section_url" : "001.html#topic_i5x_lkz_bgb"
        },
        "highlight" : {
          "section_text" : [
            "突や衝突に近い状態(<em>SRSエアバッグの作動および路上障害物との接触</em>など)が発生した時に車"
          ],
          "section" : [
            "イベントデータレコーダー"
          ]
        }
      }
    ]
  }
}

I attached the search query: search-search-ngram-and-kuromoji-2.json zipped.
search-search-ngram-and-kuromoji-2.zip

If I launch awscurl without specifying --data-binary, no search result returned.

I have changed domain access policy to exhibit open access and allow one IAM role named ESFullAccess. Also ESFullAccess has trusted IAM user called ESProgram.

The command-line Windows Power shell awsescurl.ps1 zipped.

awsescurl.zip

PS C:\Users\toshi\OneDrive\Documents\ElasticSearch\command-2021> ./awsescurl.ps1 -X GET -d "@search-search-ngram-and-kuromoji-2.json" "https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty"

{'access_key': 'ASIAUPPPMJZZ2PAQ6B74',
 'data': '@search-search-ngram-and-kuromoji-2.json',
 'data_binary': False,
 'header': ['Content-Type: application/json'],
 'include': False,
 'insecure': True,
 'profile': 'default',
 'region': 'yyyyyyyyy',
 'request': 'GET',
 'secret_key': 'IpYWCqt3OScsBxA0/dOmVrSFWN7NfEbX1VyQwye9',
 'security_token': 'FwoGZXIvYXdzEMr//////////wEaDKOZeknTfG/fc5Ng+iKwAXTv+jazLkF0NMNGPiSYtytG3WqA1U1cUCU4ElfcHNixm+LFTOphsYQh9iY7xFO9cBh+iRrvF6qB10IeG7Ta+PJtcLZnzOUfOGE8w6a94YqpWciIRQ5CEAL3UDeYNru0IGeulJxVSzHaTRs8crJ7d3DOqSRDVGSKXfNpCQjzOKXwr/nam3JAkPGyyd4u2B8iWhOmPl9lhxORchF5fBb84Npw8YlSGDFPoLSsBM+NjX8jKJLTuoIGMi0eGpiRN7Wo1OKjCQy2C1V5UNVr6u66Q/cY7r0RveYwNeZhaz4DI/ThXpMPD4A=',
 'service': 'es',
 'session_token': None,
 'uri': 'https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty',
 'verbose': True}
'pretty='
('\n'
 'CANONICAL REQUEST = GET\n'
 '/search-ngram-and-kuromoji/_search\n'
 'pretty=\n'
 'host:search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com\n'
 'x-amz-date:20210315T002554Z\n'
 'x-amz-security-token:FwoGZXIvYXdzEMr//////////wEaDKOZeknTfG/fc5Ng+iKwAXTv+jazLkF0NMNGPiSYtytG3WqA1U1cUCU4ElfcHNixm+LFTOphsYQh9iY7xFO9cBh+iRrvF6qB10IeG7Ta+PJtcLZnzOUfOGE8w6a94YqpWciIRQ5CEAL3UDeYNru0IGeulJxVSzHaTRs8crJ7d3DOqSRDVGSKXfNpCQjzOKXwr/nam3JAkPGyyd4u2B8iWhOmPl9lhxORchF5fBb84Npw8YlSGDFPoLSsBM+NjX8jKJLTuoIGMi0eGpiRN7Wo1OKjCQy2C1V5UNVr6u66Q/cY7r0RveYwNeZhaz4DI/ThXpMPD4A=\n'
 '\n'
 'host;x-amz-date;x-amz-security-token\n'
 '41bb10889ba70ce26a2cda05a6d33b4d057c9caef53d6986f252093450167211')
('\n'
 'STRING_TO_SIGN = AWS4-HMAC-SHA256\n'
 '20210315T002554Z\n'
 '20210315/yyyyyyyyy/es/aws4_request\n'
 '09176940c4429ffeab8230044aa84447ff1f6d91a76b84e0b0069495e2538a75')
'\nHEADERS++++++++++++++++++++++++++++++++++++'
{'Authorization': 'AWS4-HMAC-SHA256 '
                  'Credential=ASIAUPPPMJZZ2PAQ6B74/20210315/yyyyyyyyy/es/aws4_request, '
                  'SignedHeaders=host;x-amz-date;x-amz-security-token, '
                  'Signature=66a1ec2e970b6d5d76628f4e5493d3c6d8b6edc3a6cabd1f54b868290ae81418',
 'Content-Type': 'application/json',
 'x-amz-content-sha256': '41bb10889ba70ce26a2cda05a6d33b4d057c9caef53d6986f252093450167211',
 'x-amz-date': '20210315T002554Z',
 'x-amz-security-token': 'FwoGZXIvYXdzEMr//////////wEaDKOZeknTfG/fc5Ng+iKwAXTv+jazLkF0NMNGPiSYtytG3WqA1U1cUCU4ElfcHNixm+LFTOphsYQh9iY7xFO9cBh+iRrvF6qB10IeG7Ta+PJtcLZnzOUfOGE8w6a94YqpWciIRQ5CEAL3UDeYNru0IGeulJxVSzHaTRs8crJ7d3DOqSRDVGSKXfNpCQjzOKXwr/nam3JAkPGyyd4u2B8iWhOmPl9lhxORchF5fBb84Npw8YlSGDFPoLSsBM+NjX8jKJLTuoIGMi0eGpiRN7Wo1OKjCQy2C1V5UNVr6u66Q/cY7r0RveYwNeZhaz4DI/ThXpMPD4A='}
'\nBEGIN REQUEST++++++++++++++++++++++++++++++++++++'
('Request URL = '
 'https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty')
'\nRESPONSE++++++++++++++++++++++++++++++++++++'
'Response code: 200\n'
{'Date': 'Mon, 15 Mar 2021 00:25:55 GMT', 'Content-Type': 'application/json; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'close', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding, User-Agent'}

{
  "took" : 47,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

If I launch awscurl with specifying --data-binary, following error occurs.

The command-line Windows Power shell awsescurl-db.ps1 zipped.

awsescurl-db.zip

PS C:\Users\toshi\OneDrive\Documents\ElasticSearch\command-2021> ./awsescurl-db.ps1 -X GET -d "@search-search-ngram-and-kuromoji-2.json" "https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty"

{'access_key': 'ASIAUPPPMJZZUXODBZUO',
 'data': '@search-search-ngram-and-kuromoji-2.json',
 'data_binary': True,
 'header': ['Content-Type: application/json'],
 'include': False,
 'insecure': True,
 'profile': 'default',
 'region': 'yyyyyyyyy',
 'request': 'GET',
 'secret_key': 'kfGOQpb/p4Ckm4vYMocotYu56K15BI+NIQq3VZei',
 'security_token': 'FwoGZXIvYXdzEMr//////////wEaDJnWC1Gw7ePjDa6+fCKwASSe7Ro/gxR1BN7sts/Kn+RfFBJmYmjRZS+mvLClqjvg2OU/MdnZxJQeCsvxIP9Vzk5Ogdmq0tvp3uib3qVDQoEFHAfIf/AIPB/dRyp7KGLQpEXjjw5/v4/KRx3KDipAbH4aMFGjW0KrGymKsGiH1VMsuqfagWAl34tgclAtxoTQO+XGuxJTBafv1MEM9yAnMgN8S2zHtEEwNLOdJyb5CAQgTAAAJEgjSpjUgTpDsa4mKNLZuoIGMi0E3Ok7VjMeYV8WcpmwSwtMetlgSMIwvRRPArr8AYQVoEPINIzz6ufDE6IIquo=',
 'service': 'es',
 'session_token': None,
 'uri': 'https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty',
 'verbose': True}
'pretty='
Traceback (most recent call last):
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\toshi\AppData\Local\Programs\Python\Python39\Scripts\awscurl.exe\__main__.py", line 7, in <module>
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 500, in main
    inner_main(sys.argv[1:])
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 478, in inner_main
    response = make_request(args.request,
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 100, in make_request
    canonical_request, payload_hash, signed_headers = task_1_create_a_canonical_request(
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 200, in task_1_create_a_canonical_request
    payload_hash = sha256_hash_for_binary_data(data) if data_binary else sha256_hash(data)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\utils.py", line 20, in sha256_hash_for_binary_data
    return hashlib.sha256(val).hexdigest()
TypeError: Unicode-objects must be encoded before hashing

Fixing TypeError

I rarely know Python, but from error message "TypeError: Unicode-objects must be encoded before hashing", I've modified following code from:

https://github.com/okigan/awscurl/blob/master/awscurl/awscurl.py line 200

payload_hash = sha256_hash_for_binary_data(data) if data_binary else sha256_hash(data)

to

payload_hash = sha256_hash(data)

because hashing payload should be done with encoding UTF-8 if --data-binary is specified or not.

However modified awscurl still reports following error:

PS C:\Users\toshi\OneDrive\Documents\ElasticSearch\command-2021> ./awsescurl-db.ps1 -X GET -d "@search-search-ngram-and-kuromoji-2.json" "https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty"

{'access_key': 'ASIAUPPPMJZZ2XEUBLF7',
 'data': '@search-search-ngram-and-kuromoji-2.json',
 'data_binary': True,
 'header': ['Content-Type: application/json'],
 'include': False,
 'insecure': False,
 'profile': 'default',
 'region': 'yyyyyyyyy',
 'request': 'GET',
 'secret_key': 'd9Fd2UQbWWuQSo1YzJeGU74pmi+ERNvI4gN7RDre',
 'security_token': 'FwoGZXIvYXdzEMr//////////wEaDIGnfgGrBvi2QJemgiKwAWOZo7D/VAIvgtVf5gL+z+yF610K45iNCG7q6HHf+vIpxFJfiji+uIEJZXQMWOHTkVONfMvm5dBz8g3Ss8aTVQxjEkXTP3tw1MUPiq15qLiYW6ZeRvv9+kw6gkM2r2TIZm1k3oGOknzz8GTwQQoHySjj+zaDqNdHxN1l/rXMcyCdsaghuH12FvNsAmZV0TelhGJ3ceo9X6omS8BqRHCO5YYhs4DV2ApRI80yCsCOAofxKPTguoIGMi1nDJqy/cmXak2pL0VHU2puG5pbnjSV5kgaVD3oerY2bcbCzDu+t8ml+TplHXI=',
 'service': 'es',
 'session_token': None,
 'uri': 'https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty',
 'verbose': True}
'pretty='
('\n'
 'CANONICAL REQUEST = GET\n'
 '/search-ngram-and-kuromoji/_search\n'
 'pretty=\n'
 'host:search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com\n'
 'x-amz-date:20210315T005516Z\n'
 'x-amz-security-token:FwoGZXIvYXdzEMr//////////wEaDIGnfgGrBvi2QJemgiKwAWOZo7D/VAIvgtVf5gL+z+yF610K45iNCG7q6HHf+vIpxFJfiji+uIEJZXQMWOHTkVONfMvm5dBz8g3Ss8aTVQxjEkXTP3tw1MUPiq15qLiYW6ZeRvv9+kw6gkM2r2TIZm1k3oGOknzz8GTwQQoHySjj+zaDqNdHxN1l/rXMcyCdsaghuH12FvNsAmZV0TelhGJ3ceo9X6omS8BqRHCO5YYhs4DV2ApRI80yCsCOAofxKPTguoIGMi1nDJqy/cmXak2pL0VHU2puG5pbnjSV5kgaVD3oerY2bcbCzDu+t8ml+TplHXI=\n'
 '\n'
 'host;x-amz-date;x-amz-security-token\n'
 '41bb10889ba70ce26a2cda05a6d33b4d057c9caef53d6986f252093450167211')
('\n'
 'STRING_TO_SIGN = AWS4-HMAC-SHA256\n'
 '20210315T005516Z\n'
 '20210315/yyyyyyyyy/es/aws4_request\n'
 '21e16f8efede645709fa4ad0d9b0ca272cf17e4679ab9e75048a315d01ba1d45')
'\nHEADERS++++++++++++++++++++++++++++++++++++'
{'Authorization': 'AWS4-HMAC-SHA256 '
                  'Credential=ASIAUPPPMJZZ2XEUBLF7/20210315/yyyyyyyyy/es/aws4_request, '
                  'SignedHeaders=host;x-amz-date;x-amz-security-token, '
                  'Signature=30d36e27a4962fd248cd58052403e1da72d077c214acd5d67ab984d425927c63',
 'Content-Type': 'application/json',
 'x-amz-content-sha256': '41bb10889ba70ce26a2cda05a6d33b4d057c9caef53d6986f252093450167211',
 'x-amz-date': '20210315T005516Z',
 'x-amz-security-token': 'FwoGZXIvYXdzEMr//////////wEaDIGnfgGrBvi2QJemgiKwAWOZo7D/VAIvgtVf5gL+z+yF610K45iNCG7q6HHf+vIpxFJfiji+uIEJZXQMWOHTkVONfMvm5dBz8g3Ss8aTVQxjEkXTP3tw1MUPiq15qLiYW6ZeRvv9+kw6gkM2r2TIZm1k3oGOknzz8GTwQQoHySjj+zaDqNdHxN1l/rXMcyCdsaghuH12FvNsAmZV0TelhGJ3ceo9X6omS8BqRHCO5YYhs4DV2ApRI80yCsCOAofxKPTguoIGMi1nDJqy/cmXak2pL0VHU2puG5pbnjSV5kgaVD3oerY2bcbCzDu+t8ml+TplHXI='}
'\nBEGIN REQUEST++++++++++++++++++++++++++++++++++++'
('Request URL = '
 'https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty')
Traceback (most recent call last):
  File "C:\Users\toshi\AppData\Local\Programs\Python\Python39\Scripts\awscurl-script.py", line 33, in <module>
    sys.exit(load_entry_point('awscurl==0.21', 'console_scripts', 'awscurl')())
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 499, in main
    inner_main(sys.argv[1:])
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 477, in inner_main
    response = make_request(args.request,
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 135, in make_request
    return __send_request(uri, data, headers, method, verify)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 330, in __send_request
    response = requests.request(method, uri, headers=headers, data=data, verify=verify)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\requests\api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\requests\sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\requests\sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\requests\adapters.py", line 439, in send
    resp = conn.urlopen(
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\urllib3\connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\urllib3\connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\urllib3\connection.py", line 234, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\http\client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\http\client.py", line 1300, in _send_request
    body = _encode(body, 'body')
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\http\client.py", line 164, in _encode
    raise UnicodeEncodeError(
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 322-357: Body ('繧ィ繧「繝舌ャ繧ー縺ョ菴懷虚縺翫h縺ウ) is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

Last challenge: fixing "UnicodeEncodeError"

According to the error message, I modified the following code:

https://github.com/okigan/awscurl/blob/master/awscurl/awscurl.py line 135

    if data_binary:
        return __send_request(uri, data, headers, method, verify)
    else:
        return __send_request(uri, data.encode('utf-8'), headers, method, verify)

to:

    if data_binary:
        return __send_request(uri, data.encode('utf-8'), headers, method, verify)
    else:
        return __send_request(uri, data.encode('utf-8'), headers, method, verify)

The error message has been vanished, but the query returns nothing.

PS C:\Users\toshi\OneDrive\Documents\ElasticSearch\command-2021> ./awsescurl-db.ps1 -X GET -d "@search-search-ngram-and-kuromoji-2.json" "https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty"

{'access_key': 'ASIAUPPPMJZZ5EN6WTOX',
 'data': '@search-search-ngram-and-kuromoji-2.json',
 'data_binary': True,
 'header': ['Content-Type: application/json'],
 'include': False,
 'insecure': False,
 'profile': 'default',
 'region': 'yyyyyyyyy',
 'request': 'GET',
 'secret_key': 'F3kemHCI+/UNHAE19ms59sBi9XHcMaMN4vpb14as',
 'security_token': 'FwoGZXIvYXdzEMv//////////wEaDPJUgKt3g2RxHCHWGiKwAaY60g/Lm1Hp48nED39tUP/34Ia2tqUT/Ljgqe2Rg2SBOGhAlTvQKpkypyNyAS8+vLFEmRGfw11UM6UOZvFmx3NeWi8g6zpV7QpeSCPFRKbwZLnSxTEn2r7n9p3QRXNNkNWlJSPCLDzTZecORr46FGYmdnX6ZkKL97p6dWpTiQ53O62FarMbUID90zReTEDKMEbM5n2oaS9hcfZCz8M7Zr7+zXC8C+5Pw01fi0TKGnYUKL7muoIGMi3hmp5Jw1uw3fZK4azfs7WUg+/EV2N5qN81sQyrF/sxH+X59xJ2cBjfeAYJvjQ=',
 'service': 'es',
 'session_token': None,
 'uri': 'https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty',
 'verbose': True}
'pretty='
('\n'
 'CANONICAL REQUEST = GET\n'
 '/search-ngram-and-kuromoji/_search\n'
 'pretty=\n'
 'host:search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com\n'
 'x-amz-date:20210315T010710Z\n'
 'x-amz-security-token:FwoGZXIvYXdzEMv//////////wEaDPJUgKt3g2RxHCHWGiKwAaY60g/Lm1Hp48nED39tUP/34Ia2tqUT/Ljgqe2Rg2SBOGhAlTvQKpkypyNyAS8+vLFEmRGfw11UM6UOZvFmx3NeWi8g6zpV7QpeSCPFRKbwZLnSxTEn2r7n9p3QRXNNkNWlJSPCLDzTZecORr46FGYmdnX6ZkKL97p6dWpTiQ53O62FarMbUID90zReTEDKMEbM5n2oaS9hcfZCz8M7Zr7+zXC8C+5Pw01fi0TKGnYUKL7muoIGMi3hmp5Jw1uw3fZK4azfs7WUg+/EV2N5qN81sQyrF/sxH+X59xJ2cBjfeAYJvjQ=\n'
 '\n'
 'host;x-amz-date;x-amz-security-token\n'
 '41bb10889ba70ce26a2cda05a6d33b4d057c9caef53d6986f252093450167211')
('\n'
 'STRING_TO_SIGN = AWS4-HMAC-SHA256\n'
 '20210315T010710Z\n'
 '20210315/yyyyyyyyy/es/aws4_request\n'
 '0a5cea16f924e59ed1d26254b5239ff26a8dee5ee5583c6b51d6922f0eefb46e')
'\nHEADERS++++++++++++++++++++++++++++++++++++'
{'Authorization': 'AWS4-HMAC-SHA256 '
                  'Credential=ASIAUPPPMJZZ5EN6WTOX/20210315/yyyyyyyyy/es/aws4_request, '
                  'SignedHeaders=host;x-amz-date;x-amz-security-token, '
                  'Signature=d4243cc6a1264c77bc90743bca2e03cebe5a408b25b1bf238d8a638bea7bda9b',
 'Content-Type': 'application/json',
 'x-amz-content-sha256': '41bb10889ba70ce26a2cda05a6d33b4d057c9caef53d6986f252093450167211',
 'x-amz-date': '20210315T010710Z',
 'x-amz-security-token': 'FwoGZXIvYXdzEMv//////////wEaDPJUgKt3g2RxHCHWGiKwAaY60g/Lm1Hp48nED39tUP/34Ia2tqUT/Ljgqe2Rg2SBOGhAlTvQKpkypyNyAS8+vLFEmRGfw11UM6UOZvFmx3NeWi8g6zpV7QpeSCPFRKbwZLnSxTEn2r7n9p3QRXNNkNWlJSPCLDzTZecORr46FGYmdnX6ZkKL97p6dWpTiQ53O62FarMbUID90zReTEDKMEbM5n2oaS9hcfZCz8M7Zr7+zXC8C+5Pw01fi0TKGnYUKL7muoIGMi3hmp5Jw1uw3fZK4azfs7WUg+/EV2N5qN81sQyrF/sxH+X59xJ2cBjfeAYJvjQ='}
'\nBEGIN REQUEST++++++++++++++++++++++++++++++++++++'
('Request URL = '
 'https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty')
'\nRESPONSE++++++++++++++++++++++++++++++++++++'
'Response code: 200\n'
{'Date': 'Mon, 15 Mar 2021 01:07:12 GMT', 'Content-Type': 'application/json; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'close', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding, User-Agent'}

{
  "took" : 68,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

Do you have any ideas to fix this problem?

Regards,

@okigan
Copy link
Owner

okigan commented Mar 15, 2021

Wow, what an issue report -- 5⭐.

unicode and python seems to be a unique kind of pandora box.

to the effect the the query comes back empty can you audit what is the query received by the ES and does it match to the one received when curl (when using open access)?

@ToshihikoMakita
Copy link
Author

OK. I will investigate.

@ToshihikoMakita
Copy link
Author

ToshihikoMakita commented Mar 18, 2021

Hi, when Elasticsearch domain allows open-access, I could capture the JSON data by using curl on Ubuntu and Wireshark. The main point is to set SSLKEYLOGFILE environment variables before launching curl.

https://everything.curl.dev/usingcurl/tls/sslkeylogfile

See attached query-json.txt.

query-json.txt

Does awscurl support SSLKEYLOGFILE environment variable? If it is supported, I can send you the JSON dump file.

@okigan
Copy link
Owner

okigan commented Mar 18, 2021 via email

@ToshihikoMakita
Copy link
Author

ToshihikoMakita commented Mar 20, 2021

Unfortunately "-v" option does not display the contents of JSON specified "-d" option.
I got the suggestion from AWS technical support to use Cloud Watch to debug the query that sent to Elasticsearch Service.
Here I attach several pattern of the test results.

Results using curl when the ES domain is open

Command-line:
search-ngram-and-kuromoji-open-access-curl-cmd.txt
CloudWatch log:
search-ngram-and-kuromoji-open-access-curl-cloud-watch.txt

Results using awscurl when the ES domain access needs IAM role (without specifying -data-binary)

Command-line:
search-ngram-and-kuromoji-iam-role-access-awscurl-no-data-binary-cmd.txt
CloudWatch log:
search-ngram-and-kuromoji-iam-role-access-awscurl-no-data-binary-cloud-watch.txt

Results using awscurl when the ES domain access needs IAM role (specifying -data-binary⇒on above Last challenge: fixing "UnicodeEncodeError")

Command-line:
search-ngram-and-kuromoji-iam-role-access-awscurl-data-binary-cmd-fail.txt
CloudWatch log:
search-ngram-and-kuromoji-iam-role-access-awscurl-data-binary-cloud-watch-fail.txt

From this result, I found that the JSON file specified -d parameter should be treated as UTF-8 encoded when -data-binary is specified. I have added several changes to awscurl.py and now it works fine.

Command-line:
search-ngram-and-kuromoji-iam-role-access-awscurl-data-binary-cmd-success.txt

CloudWatch log:
search-ngram-and-kuromoji-iam-role-access-awscurl-data-binary-cloud-watch-success.txt

I will submit a pull-request with this fix. But this pull-request will not compatible with #90 because this one seems to use true binary data that should be uploaded to S3.

Please take a look at my pull-request and consider how to handle both UTF-8 encoded JSON and real binary data with awscurl -d parameter.

Regards,

@rdegraaf
Copy link

I had a related issue (trying to send a request that contains 0xff, which is an invalid byte in any UTF-8 sequence). I think that I fixed it by making this change to awscurl.py (line 490 in the current head):

Original:

with open(filename, "r") as post_data_file:

New:

with open(filename, "rb" if args.data_binary else "r") as post_data_file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants