Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vjunosswitch ssh issue. #231

Closed
HeikkiLavaste opened this issue Jul 17, 2024 · 12 comments · Fixed by #235
Closed

vjunosswitch ssh issue. #231

HeikkiLavaste opened this issue Jul 17, 2024 · 12 comments · Fixed by #235

Comments

@HeikkiLavaste
Copy link

Hi,

I pulled the latest updates from the repo and rebuilt my vjunosswitch image.
I'm not able to ssh to it anymore.
ssh debug just says:

debug1: Connecting to 172.20.20.3 [172.20.20.3] port 22.
debug1: Connection established.

and just hangs.

If I connect to the docker container and do an ssh to localhost, I get access to the vjunos cli.

Thanks,
Heikki

@ssasso
Copy link

ssasso commented Jul 26, 2024

Hi, I am having the same issue.

As a workaround, I tried running a socat instance on the container (but on different port, i.e., 222) - and that works.

root@r1:/# socat TCP-LISTEN:222,fork TCP:127.0.0.1:22

...

root@hippo:~# ssh admin@clab-vjunos_initi-r1

root@hippo:~# ssh admin@clab-vjunos_initi-r1 -p 222
Warning: Permanently added '[clab-vjunos_initi-r1]:222,[192.168.121.101]:222' (ECDSA) to the list of known hosts.
Password:
Last login: Fri Jul 26 09:52:18 2024
--- JUNOS 23.2R1.14 Kernel 64-bit  JNPR-12.1-20230613.7723847_buil
admin@r1> exit

Connection to clab-vjunos_initi-r1 closed.

It seems that for some reason the qemu/kvm process is not able to perform correct "internal" port forwarding. Using socat, like other containers do, could solve the problem.

@hellt
Copy link
Owner

hellt commented Jul 26, 2024

Hi @ssasso
In #229 I removed the socat forwarding since in my testing qemu hostfwd was able to stitch 22 and other ports just fine

I did tests with SR OS, though, not with vjunos. Did you try building vjunos after #229 has been merged in?

@ssasso
Copy link

ssasso commented Jul 26, 2024

I built the image today - but I'll do some more deep testing

@ipspace
Copy link

ipspace commented Jul 26, 2024

Hi @ssasso In #229 I removed the socat forwarding since in my testing qemu hostfwd was able to stitch 22 and other ports just fine

FWIW, I can confirm that the vjunos-switch container works OK with the old code ;) I have the code from the #205 days and it built the container just fine.

@ssasso
Copy link

ssasso commented Jul 28, 2024

Seems like some devices work with hostfwd, some others not.

In example, in addition to vjunos-switch:

  • Juniper vSRX --> working
  • Aruba CX --> not working

tcp three ways handshake seems ok, but then no data is forwarded.

11:02:53.977354 IP 192.168.121.1.40890 > 192.168.121.101.22: Flags [S], seq 1232171827, win 64240, options [mss 1460,sackOK,TS val 1304792152 ecr 0,nop,wscale 7], length 0
11:02:53.977405 IP 192.168.121.101.22 > 192.168.121.1.40890: Flags [S.], seq 3749194560, ack 1232171828, win 65160, options [mss 1200,sackOK,TS val 3026846260 ecr 1304792152,nop,wscale 7], length 0
11:02:53.977435 IP 192.168.121.1.40890 > 192.168.121.101.22: Flags [.], ack 1, win 502, options [nop,nop,TS val 1304792152 ecr 3026846260], length 0
11:02:53.978041 IP 192.168.121.1.40890 > 192.168.121.101.22: Flags [P.], seq 1:42, ack 1, win 502, options [nop,nop,TS val 1304792152 ecr 3026846260], length 41: SSH: SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.6
11:02:53.978055 IP 192.168.121.101.22 > 192.168.121.1.40890: Flags [.], ack 42, win 509, options [nop,nop,TS val 3026846260 ecr 1304792152], length 0

tried to force lower mtu, clamp mss, but no changes in the result.

maybe it could be worth adding a per-device-flag to enable direct hostfwd vs using socat?

@ssasso
Copy link

ssasso commented Jul 28, 2024

Ok, seems like I found the issue.

When using socat, the inbound connection to the VM is coming from the local address.

OTOH, when using hostfwd, the inbound connection to the VM is coming from the containerlab host address.

i.e., starting bash+tcpdump on aruba cx vm:

11:18:22.293153 IP 192.168.121.1.56070 > 10.0.0.15.22: Flags [S], seq 27136001, win 65535, options [mss 1460], length 0
11:18:24.510390 IP 192.168.121.1.42648 > 10.0.0.15.22: Flags [S], seq 32832001, win 65535, options [mss 1460], length 0
11:18:31.303120 IP 192.168.121.1.42648 > 10.0.0.15.22: Flags [S], seq 32832001, win 65535, options [mss 1460], length 0

That means all the vrnetlab devices must have a default route on the management interface (via 10.0.0.2) - even better if it's on a dedicated management vrf, to avoid clashing with any other routes of the lab.

After adding the management default route, I was able to connect via SSH and hostfwd.

wrt vSRX vs vJunos-switch, if you look at the initial config, the first one has the default route on the mgmt_junos routing instance, while the second one has only the IP address.

@ipspace
Copy link

ipspace commented Jul 28, 2024

The root cause might be that some devices don't want to accept the default route from DHCP, and once the device is so far that the script can start configuring it, the DHCP response has already been processed.

It might be better to make this configurable and switch to hostfwd only when someone tests that it works on a particular device (or fixes the code like @ssasso did). Who knows how much stuff is broken right now (hopefully nothing else, but...)

@hellt
Copy link
Owner

hellt commented Jul 29, 2024

Thanks @ssasso
it is always a great idea to add the def route in the NOS, since this would allow traffic originating from the router to be properly routed.

Thanks for spotting this. I did it a long time ago for SR OS, and other contributors did it for some other NOSes, but not consistently.

@ssasso
Copy link

ssasso commented Jul 29, 2024

Thanks @ssasso it is always a great idea to add the def route in the NOS, since this would allow traffic originating from the router to be properly routed.

Thanks for spotting this. I did it a long time ago for SR OS, and other contributors did it for some other NOSes, but not consistently.

Unfortunately very few device initial config has the def route set. I.e. Cisco Cat, Nexus, ... is missing the default route, OcNOS is missing it, openwrt is missing it, PanOS is missing it, sonic is missing it, vEOS is missing it... :(

I can try to fix some of them, but I won't be able to test them.

Could be worth try to ask to the "initial device contributor" - opening issues and assigning to them?

@ipspace
Copy link

ipspace commented Jul 29, 2024

Unfortunately very few device initial config has the def route set. I.e. Cisco Cat, Nexus, ... is missing the default route, OcNOS is missing it, openwrt is missing it, PanOS is missing it, sonic is missing it, vEOS is missing it... :(

Which effectively means vrnetlab is currently broken for all of them as most devices use static IP configuration on the management interface, not DHCP (I've spent too long in the libvirt world ;). Maybe it's not a good idea to open the issues and wait for someone to fix stuff?

@hellt
Copy link
Owner

hellt commented Aug 1, 2024

sonic automatically installed the default route

@kaelemc added def routes to all cisco gear (#238)
@SimPeccaud def route has been added to veos (#237)

openwrt/panos/ocnos might have issues still

@ipspace
Copy link

ipspace commented Aug 1, 2024

Great job. Thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants