Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual Testing of testnet.polykey.io #487

Closed
14 of 16 tasks
CMCDragonkai opened this issue Oct 24, 2022 · 17 comments
Closed
14 of 16 tasks

Manual Testing of testnet.polykey.io #487

CMCDragonkai opened this issue Oct 24, 2022 · 17 comments
Assignees
Labels
procedure Action that must be executed r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices

Comments

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 24, 2022

Signalling Triad:

signalling triad

Tasks

Setup

  1. Setup the testnet.polykey.io as the the default testnet seed node configuration in source code. Put it in ./src/config.ts.
  2. CloudWatch no longer has information from NLB. Therefore we do not have connection visualisation at this point.

Client Service

  1. Start Polykey locally and run an agent status command. Provide it to the --client-port of 1315 and --client-host of the static IP address.
  2. Success is if we see an agent status report.

Office Node to Seed Node

The office is a carrier-grade NATted network. Therefore it has to contact the seed node and maintain a connection.

  1. Start Polykey locally npm run polykey -- agent start. Observe a successful command execution.
  2. Observe the agent logs and check if a connection has been made.
  3. Observe the cloudwatch logs of the remote agent and check if a connection has been made.
  4. Success is if the DNS resolution resolve to the IP address, and the NodeGraph has been setup.
  5. Adjust automated test for tests/integration in Integration tests for testnet.polykey.io #441 to suit.

Home Node to Seed Node

Home is a regular NAtted network. Therefore it has to contact the seed node and maintain a connection.

  1. Start Polykey locally. Observe a success command execution.
  2. Observe agent logs.
  3. Observe cloudwatch logs.
  4. Success is if the DNS resolution resolve to the IP address, and the NodeGraph has been setup.

Note that automated tests which test connection startup to seed nodes is only one way. There's no way to simulate 2 nodes on the tests atm. So we will only do the required tests as above for tests/testnet/testnetConnection.test.ts.

Office Node to Home Node

  1. Start Polykey in office system, and Polykey on home system
  2. Observe agent logs for signaling operations
  3. Observe that the hole punching works

CGNAT to CGNAT - Office (CGNAT) to Home (CGNAT)

This would resolve #383. This would be necessary for mobile networks.

This is low priority atm, so we will put this to #383.

Multiple Office/Home Nodes to Seed Node

We should be able to start 2 nodes on different ports on the same machine to connect to the same testnet. This should be integrated into #441.

  1. Repeat Office Node to Seed Node for 2 nodes on different ports.
  2. Integrate the change into Integration tests for testnet.polykey.io #441.

It turns out this is actually more complex. It's not possible to do this without the router performing hairpinning. And most routers don't have hairpinning. See: tailscale/tailscale#188 and #487 (comment)

@tegefaulkes
Copy link
Contributor

We can get the status of the seed node. I think the status should show the current version of Polykey. We can take that from the package.json right?

[nix-shell:~/matrixcode/polykey/js-polykey]$ npm run polykey -- agent status --node-id v1mnaq2ppfrbfk5le1i7j68p5sodh3904v12lp4u04pflq1gumks0 --client-host 3.106.178.29  --client-port 1315 -v

> polykey@1.0.1-alpha.0 polykey
> ts-node src/bin/polykey.ts "agent" "status" "--node-id" "v1mnaq2ppfrbfk5le1i7j68p5sodh3904v12lp4u04pflq1gumks0" "--client-host" "3.106.178.29" "--client-port" "1315" "-v"

INFO:PolykeyClient:Creating PolykeyClient
INFO:Session:Creating Session
INFO:Session:Setting session token path to /home/faulkes/.local/share/polykey/token
INFO:Session:Starting Session
INFO:Session:Started Session
INFO:Session:Created Session
INFO:GRPCClientClient:Creating GRPCClientClient connecting to 3.106.178.29:1315
INFO:GRPCClientClient:Created GRPCClientClient connecting to 3.106.178.29:1315
INFO:PolykeyClient:Starting PolykeyClient
INFO:PolykeyClient:Started PolykeyClient
INFO:PolykeyClient:Created PolykeyClient
✔ Please enter the password … ******
INFO:PolykeyClient:Stopping PolykeyClient
INFO:GRPCClientClient:Destroying GRPCClientClient connected to 3.106.178.29:1315
INFO:GRPCClientClient:Destroyed GRPCClientClient connected to 3.106.178.29:1315
INFO:Session:Stopping Session
INFO:Session:Stopped Session
INFO:PolykeyClient:Stopped PolykeyClient
status  LIVE
pid     1
nodeId  v1mnaq2ppfrbfk5le1i7j68p5sodh3904v12lp4u04pflq1gumks0
clientHost      0.0.0.0
clientPort      1315
proxyHost       0.0.0.0
proxyPort       1314
agentHost       127.0.0.1
agentPort       42793
forwardHost     127.0.0.1
forwardPort     39185
rootPublicKeyPem        -----BEGIN PUBLIC KEY-----
        MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAvGV2S76OPXIW3aap6j2o
        lH6BxsJchhUKcIA+kxttXTN/AaXue+8qDp+mpqagHvk5aiZyJ6eGEmDWyqTUle+f
        uESb8A3CkWy+neUDFm6k++Psyvy4Lsblhil3lm3PqgfFG0vjQ3DKUVfn9dq2Bl4c
        +63ArqJioD6Q78+hdxPCbKwP8yn0tFUfw3YzzaXEBBUTLbsW4B5VzNA5jtgqPF3N
        Dvcl0RqMFEij38IHWgYyoe1jxIWwuQJJ6q/Wxl42t3iLFMrsugz8n8AAFscsvkj8
        4Pik52Rk8Si+/6xh+Qq3GG58ov5HECNH/+ckZdhOdJYYcd5SzzS7L6mGu87ng6M+
        coiI/HbkiX6CveCiu6MMJbK3Co3IqY4Pcx2OGzm17JY3maidWYQg8TrSwPSKypN+
        V4/vCKVnRIxRccFFdEnykxmSwVcsRypv85+PZEX2ofj5Bw5EoumRsN9bdxCZrqAD
        vPJcjmsdCI39rhgmx7btY3w68T5JzGxig3qSpERL5DpVpUDvl5s5AJYWEgjiF9Bv
        bKZ5LwHerz6SY6QK/vQbwrZKYnYhR7ZXMkZKj98yQVmKiIlgAzyIcdCxO9oSDYdH
        /1kylZwYJnfS1osMGHXLKaXu2yKhto9qVjw/t/hlanBvR4yrENSzDrurL1a2d0uA
        Ra8EBSILNxX9xJY3M3M6/u8CAwEAAQ==
        -----END PUBLIC KEY-----
rootCertPem     -----BEGIN CERTIFICATE-----
        MIIIKDCCBhCgAwIBAgIFFmZlgpIwDQYJKoZIhvcNAQELBQAwQDE+MDwGA1UEAxM1
        djFtbmFxMnBwZnJiZms1bGUxaTdqNjhwNXNvZGgzOTA0djEybHA0dTA0cGZscTFn
        dW1rczAwHhcNMjIxMDI1MDAzODExWhcNMjMxMDI1MDAzODExWjBAMT4wPAYDVQQD
        EzV2MW1uYXEycHBmcmJmazVsZTFpN2o2OHA1c29kaDM5MDR2MTJscDR1MDRwZmxx
        MWd1bWtzMDCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBALxldku+jj1y
        Ft2mqeo9qJR+gcbCXIYVCnCAPpMbbV0zfwGl7nvvKg6fpqamoB75OWomcienhhJg
        1sqk1JXvn7hEm/ANwpFsvp3lAxZupPvj7Mr8uC7G5YYpd5Ztz6oHxRtL40NwylFX
        5/XatgZeHPutwK6iYqA+kO/PoXcTwmysD/Mp9LRVH8N2M82lxAQVEy27FuAeVczQ
        OY7YKjxdzQ73JdEajBRIo9/CB1oGMqHtY8SFsLkCSeqv1sZeNrd4ixTK7LoM/J/A
        ABbHLL5I/OD4pOdkZPEovv+sYfkKtxhufKL+RxAjR//nJGXYTnSWGHHeUs80uy+p
        hrvO54OjPnKIiPx25Il+gr3gorujDCWytwqNyKmOD3Mdjhs5teyWN5monVmEIPE6
        0sD0isqTfleP7wilZ0SMUXHBRXRJ8pMZksFXLEcqb/Ofj2RF9qH4+QcORKLpkbDf
        W3cQma6gA7zyXI5rHQiN/a4YJse27WN8OvE+ScxsYoN6kqRES+Q6VaVA75ebOQCW
        FhII4hfQb2ymeS8B3q8+kmOkCv70G8K2SmJ2IUe2VzJGSo/fMkFZioiJYAM8iHHQ
        sTvaEg2HR/9ZMpWcGCZ30taLDBh1yyml7tsiobaPalY8P7f4ZWpwb0eMqxDUsw67
        qy9WtndLgEWvBAUiCzcV/cSWNzNzOv7vAgMBAAGjggMnMIIDIzAMBgNVHRMEBTAD
        AQH/MAsGA1UdDwQEAwIC9DA7BgNVHSUENDAyBggrBgEFBQcDAQYIKwYBBQUHAwIG
        CCsGAQUFBwMDBggrBgEFBQcDBAYIKwYBBQUHAwgwEQYJYIZIAYb4QgEBBAQDAgD3
        MFgGA1UdEQRRME+CNXYxbW5hcTJwcGZyYmZrNWxlMWk3ajY4cDVzb2RoMzkwNHYx
        MmxwNHUwNHBmbHExZ3Vta3MwhwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMB0GA1Ud
        DgQWBBSYGAJFOw0Nh2uS6/OdwQUbk2yz/TAhBgsrBgEEAYO+TwICAQEB/wQPVg0x
        LjAuMS1hbHBoYS4wMIICGAYLKwYBBAGDvk8CAgIBAf8EggIERIICAI4zZGHhU4Ff
        vxrVdHRR3LQejwMMZ69ETzJaIaRknWCPBScgaZpt74FtUXGYygzXZsqc5VszuDmc
        tMi5W5E2jrG2h76eH9kp3WTC+BJC/kwZ3YMk1KdG+XasbUYkVdJBTNZ4cB60APE3
        x+R5wImYcZw2sVlcQOuJbIhVdEkkR0JN8IbBBvXXgno03hQyFlsz79Z2jNc0a1vS
        NdBOPjDrQ+unm7EYhG8C6s2jPqaPF1vPmfoH5mPulNGjEuRJ/403LjvQBaIM/l/G
        9olCc5ZJxqyWFyu4VZ3vkSmtJtNXdE4r24RnRt7a3b+uzq55hT9/8ai1PrZAofWK
        NwO1baqptSRbimLKceB/HJMfbQcpQn7cYkDNY+cKuGZfvaawBKTg2X4a2A/B0tRF
        bohlQmLHOfg8clXETDTb1fI2YoNzMXhlyrqAyLwMxwsBwEpnw6rrv1nDJdz96hAZ
        IKZgrVlQqf+0dBA+F93norRTL1uChhbmCQgbj6MiVT/hl7X9odvpCQMF/oI5WlGJ
        rvNEE1xNMuBV02OJ9M66VVCsUoS6zc9xQ2McZkVe/SDdQPO4e77uOSn5gdgnJUVL
        3iDFHujos/PDFT2WX0KdOkN3M9CoFpn9zN32Aomp+RC6kgmknZf3jq5Z6eU5Uzkp
        2oomASDDvfBQSBQrHDNUK8HQ/kmJKut2MA0GCSqGSIb3DQEBCwUAA4ICAQChe0vw
        S9wQpuMTC+KVgV0Bcb0Fd5mt6H4hvBeHR0d3v1322vEYs55XAq0UI1XfDc/8KjWH
        IOg1MczcgRCpLXVBxxPRT7F8Fj2Qz6XpWpcvuDqDdBHOAt+/ou4Xy8ZB85G1wD5R
        x5AcH6UGrjWaXtBnnf18ZYnd5snlxmnbWo/mOznb4pmtVHbAl9d1jab51iRNFbdF
        tcAlZ6w6zjxsgLPm9Q4oUXpXjz1uEhmjLkf0QroBZuakFqVOihm6mtA7paon54A3
        PmlKkaXT3lktgxniLZ3i9CJTrs84MkswQASiB7l7xh15fsLZnb7+kWnfZ0kyJJNV
        yUV9gF9VKhJfJIbjeRF7oEBdicwGC9NGN1a1TJihs0djSV50605I6jhPOK9GZGcz
        luABe1HwKuNRwlM4O+I7CRgYGbX6T+ee2xXUDuCtP7u+GeniYAtcepVCcNpd5mRx
        QUavJdRCgLZ8jQby1gjaRect7FS8uAujHciqJweX6Z4xUzYSv1OAmagNKftVXwed
        A/AYNwxyfldskoCmUrZXkOHlVKLZhCaYFEGvJBSCGkNvrFXnZMvawrzHN4Bf2fAM
        NQ84CIcHBL0sSQYDU7lxtN++AZ79sM3Sdt5mJSLM9hA+zpFA7dYm7R2vpSC6NSMv
        L0yTxwuI3Jv36EPMOnIHF0CTZIwyGJ57+z1c1w==
        -----END CERTIFICATE-----

@tegefaulkes
Copy link
Contributor

Indecently the client commands doesn't support the client hostname?

[nix-shell:~/matrixcode/polykey/js-polykey]$ npm run polykey -- agent status --node-id v1mnaq2ppfrbfk5le1i7j68p5sodh3904v12lp4u04pflq1gumks0 --client-host testnet.polykey.io --client-port 1314

> polykey@1.0.1-alpha.0 polykey
> ts-node src/bin/polykey.ts "agent" "status" "--node-id" "v1mnaq2ppfrbfk5le1i7j68p5sodh3904v12lp4u04pflq1gumks0" "--client-host" "testnet.polykey.io" "--client-port" "1314"

error: option '-ch, --client-host <host>' argument 'testnet.polykey.io' is invalid. Host must be an IPv4 or IPv6 address

Usage: polykey agent status [options]

Get the Status of the Polykey Agent

Options:
  -np, --node-path <path>      Path to Node State (default: "/home/faulkes/.local/share/polykey", env:
                               PK_NODE_PATH)
  -pf, --password-file <path>  Path to Password
  -f, --format <format>        Output Format (choices: "human", "json", default: "human")
  -v, --verbose                Log Verbose Messages (default: 0)
  -ni, --node-id <id>           (env: PK_NODE_ID)
  -ch, --client-host <host>    Client Host Address (env: PK_CLIENT_HOST)
  -cp, --client-port <port>    Client Port (env: PK_CLIENT_PORT)
  -h, --help                   display help for command

@CMCDragonkai
Copy link
Member Author

Can you create task to support hostnames for client commands. Specifically when we "connect to" we should support hostnames, but if we are "listening", then they should not be allowed.

@CMCDragonkai
Copy link
Member Author

The status can show the current PK version. This is already available as part of the config.sourceVersion. Add as new task.

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Oct 25, 2022

Starting the agent with the seed node has a problem. It's failing to fully connect to the seed node. We can see the connection in the seed node's logs.

Also note, failing to connect to the seed node here is causing a crash. The seed node still works.

[nix-shell:~/matrixcode/polykey/js-polykey]$ npm run polykey -- agent start --node-path tmp/PK5 -v --seed-nodes v1mnaq2ppfrbfk5le1i7j68p5sodh3904v12lp4u04pflq1gumks0@testnet.polykey.io:1314

> polykey@1.0.1-alpha.0 polykey
> ts-node src/bin/polykey.ts "agent" "start" "--node-path" "tmp/PK5" "-v" "--seed-nodes" "v1mnaq2ppfrbfk5le1i7j68p5sodh3904v12lp4u04pflq1gumks0@testnet.polykey.io:1314"

✔ Please enter the password … ********
INFO:PolykeyAgent:Creating PolykeyAgent

...

INFO:Proxy:Starting Forward Proxy from 127.0.0.1:0 to 0.0.0.0:0 and Reverse Proxy from 0.0.0.0:0 to 127.0.0.1:44333
INFO:Proxy:Started Forward Proxy from 127.0.0.1:46073 to 0.0.0.0:54060 and Reverse Proxy from 0.0.0.0:54060 to 127.0.0.1:44333
INFO:NodeConnectionManager:Starting NodeConnectionManager
WARN:NodeManager:Duplicate refreshBucket task was found for bucket 255, cancelling
INFO:NodeConnectionManager:Started NodeConnectionManager
INFO:NodeManager:Syncing nodeGraph
INFO:ConnectionForward 3.106.178.29:1314:Starting Connection Forward
INFO:NodeConnection 3.106.178.29:1314:Creating NodeConnection
INFO:clientFactory:Creating GRPCClientAgent connecting to 3.106.178.29:1314
INFO:Proxy:Handling CONNECT to 3.106.178.29:1314
INFO:NodeConnection 3.106.178.29:1314:Destroying NodeConnection
INFO:NodeConnection 3.106.178.29:1314:Destroyed NodeConnection
ErrorNodeConnectionTimeout: Polykey error
  exitCode      69
  timestamp     Tue Oct 25 2022 12:45:10 GMT+1100 (Australian Eastern Daylight Time)
  cause: ErrorGRPCClientTimeout: Client connection timed out
    exitCode    69
    timestamp   Tue Oct 25 2022 12:45:10 GMT+1100 (Australian Eastern Daylight Time)
    cause: ErrorGRPCClientTimeout: Client connection timed out
      exitCode  69
      timestamp Tue Oct 25 2022 12:45:10 GMT+1100 (Australian Eastern Daylight Time)

The seed node's logs.

time log
2022-10-25T12:44:51.162+11:00 {"level":"INFO","key":"Proxy","msg":"Handling connection from 138.199.33.227:3391"}
  2022-10-25T12:44:51.194+11:00 {"level":"INFO","key":"ConnectionReverse 138.199.33.227:3391","msg":"Starting Connection Reverse"}
  2022-10-25T12:45:11.184+11:00 {"level":"WARN","key":"Proxy","msg":"Failed connection from 138.199.33.227:3391 - ErrorConnectionStartTimeout"}
  2022-10-25T12:45:11.184+11:00 {"level":"INFO","key":"Proxy","msg":"Handled connection from 138.199.33.227:3391"}

@CMCDragonkai
Copy link
Member Author

Seed node connection appears to be working now. So we are proceeding to consolidate our manual tests into the automated tests in PR #441.

@tegefaulkes
Copy link
Contributor

I'm getting the same problem as before where the seed node is failing to connect back to me. I've narrowed it down to the VM i'm using since running on the laptop works.

@CMCDragonkai
Copy link
Member Author

Is your VM using natted network or bridge?

@tegefaulkes
Copy link
Contributor

The default switch is using a nat. I also configured it to share the hosts physical adapter and that had the same problem. It might just be a quirk with VMs or the windows hypervisor.

I can test it with the VM i set up on my NAS. AFAIK it has it's own IP address so it's a separate machine so far as my home router cares.

@CMCDragonkai
Copy link
Member Author

Don't bother right now. If it works on the laptop from your home, let's proceed to office to home and double node to testnet.

@CMCDragonkai
Copy link
Member Author

From the office with 2 nodes, we are seeing the signalling operation complete. However upon attempting to hole punch each other (with the same IP address but different ports), while we see these packets get sent out, we are not receiving any of these packets.

This is true even after opening the firewall on the local systems. One would have to assume that the CGNAT in the office is blocking these packets.

At home however, it's not behind a CGNAT, so now we are switching to testing with 2 nodes on from home. Also we used tailscale to connect to the home system, which shows up with a direct connection with tailscale status, this confirms that hole punching can work between office and home, which should imply that home NAT is a normal NAT, and not a CGNAT.


Some additional clarification on CGNAT is required:

Basically hairpinning is technically a solution a problem of internal communication within the same CGNAT. But the reason why CGNAT will not allow hole punching is due to the NAT translation.

The IP and port observed by the seed node is not the same IP and port that is within the CGNAT. If the CGNAT can rewrite the IP and port to be as if it came from the outside IP/port, then it would work, but this is not what the CGNATs with or without hair pinning.

This means we expect that this test to not work on our CGNAT situation, and we can only do this with relaying.

@tegefaulkes I think at this point you just have to merge in the fixes to the signalling mechanism.

And we will need to test on the home network.

@CMCDragonkai
Copy link
Member Author

I noticed on tailscale, if we try to connect to one of the IPs, it recognises that the the system is on the same local network, and instead of going through the larger network, it just uses the direct IP:

100.117.43.4    matrix-win-1         tagged-devices windows active; direct 192.168.1.100:41641, tx 140520 rx 776152

Where 192.168.1.100 is the local IP.

This means it's not even doing hole punching or anything, it's a direct connection.


This is pretty smart. I wonder if it's possible for the seed node to also acquire this information and send this back so that way it's possible for us to know that a local IP should be used instead.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Oct 28, 2022

Ok home to office connection succeeded. That's NAT to CGNAT.

It's possible even CGNAT to CGNAT could work, but we don't know this until we have 2 CGNATs.

It turns out that it doesn't work if we are in the same network. This is because lots of routers lack hairpinning support including both the office router and the home router.

Without hairpinning, one cannot do hole punching between hosts on the same subnet. This is why our double node to seed node test didn't work. It didn't work in the office, and it didn't work at home either.

This comment https://news.ycombinator.com/item?id=8229792 illustrates some reasons why. Furthermore when comparing to tailscale, they switch to using the local IPs instead of the remote IPs without actually doing any hole punching. Both entail requiring additional logic on the node graph/node connection manager that can take local address data and prioritise it over remote address data. Basically MatrixAI/js-mdns#1 is needed.

In the comment there appears to be a couple solutions:

  1. Use PCP or PMP to configure the router - this again can lack support
  2. Use local discovery "LAN locator beacons" - multicast... etc
  3. Gather all the subnet information and send it to the seed node. The seed node can then send back all possible IPs, and have the client nodes attempt all of them.
  4. Relay as a last resort.

At 3., we have this problem where network information can be leaked. If we are expecting decentralised NAT #365, then this basically provides internal network information to random third parties which is not a good situation.

For now, we will prefer solution 2. So if at any point we end up trying to contact a particular node, and it has a local address, it should avoid sending a signalling message at all... I'm not sure if that's possible to know whether a particular node address is local or not. We may tag these addresses as "local" if they were acquired through a LAN discovery process.

@CMCDragonkai
Copy link
Member Author

Still need the post on the logs and full connection after you're ready @tegefaulkes.

@CMCDragonkai
Copy link
Member Author

It's important to see that both seed nodes are capable of also discovering each other, and as well as new nodes when they are entered into the cluster.

@CMCDragonkai
Copy link
Member Author

This can be closed now. We know a couple of things:

  1. It is not possible to connect to a node on the same network without Local Network Traversal - Multicast Discovery js-mdns#1. Thus any connection tests from the same network is bound to fail.
  2. Local NAT simulation tests are working now again according to @tegefaulkes in ci: merge staging to master #474.
  3. There are still problems with the testnet nodes failing when automated testnet connection tests terminate/finish the agent process.
  4. Network is still flaky and causes timeout errors as per ci: merge staging to master #474.
  5. We know that NAT to CGNAT works. And seed nodes can contact each other.

A final test is required involving NAT to CGNAT and the 2 seed nodes together. In total 4 nodes should be tested. However with the amount of failures we're going to blocked on this until we really simplify our networking and RPC code.

@CMCDragonkai
Copy link
Member Author

More sophisticated NAT simulation testing will go to Polykey-Simulation repo.

@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
procedure Action that must be executed r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
Development

No branches or pull requests

2 participants