Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code review and hardening #73

Open
MarekPikula opened this issue Apr 3, 2023 · 29 comments
Open

Code review and hardening #73

MarekPikula opened this issue Apr 3, 2023 · 29 comments
Assignees

Comments

@MarekPikula
Copy link
Collaborator

Hi, first of all, thank you for the great work. I consider using headscale in my company, and having a GUI is crucial for efficient operation. Since VPN is crucial in terms of security for the company, I want to thoroughly review the code of the GUI as it has direct access to all administration tasks (via API key) and is a significant attack surface both from outside and from inside. I have some questions regarding the current code style because it seems to me that some important things are implemented in a poor style, and many low-hanging fruits are left out. I have a few years of experience in Python programming, and honestly, some constructs look to me a little unsettling and make me not want to use this package for production.

  1. Do you use any autoformatter? I personally prefer black as it has a good set of sane defaults and is constrained enough to have repetitive behaviour. If you have your preference it's no problem for me, but let's use it. Formatting code by hand is tedious and error-prone.
  2. You have pylint installed, but maaaany linter warnings seem to be ignored.
  3. I can see that you don't use static type checks. For a security-focused project, it's strongly advised to enable static type checks where possible to prevent obvious bugs from getting to the code base.
  4. Would you like me to spend ~2 days reformating the code, working on static checks, fixing linter warnings and incorporating all these checks to CI?
  5. After some basic code style and lining fixes, I would like to do a security review.
@iFargle
Copy link
Owner

iFargle commented Apr 3, 2023

  1. I do not. I'm not super familiar with coding in general. I have no formal training, just reading tutorials and docs online. I've heard of black before though.
  2. From what i can tell, most of the ignored errors are false positives? Unless I'm mistaken. I get a ton for all the "app.logger" instances, which work fine.
  3. I'm going to be completely honest. I don't know what static type checks are (d'oh :( )
  4. If you have the spare time, I'll take all the help I can get! This code is a bit of spaghetti code since I'm basically learning as I go.

If you have any resources for Python development you'd like to share I'd be eager to give them a read! 99% of my Python experience is just back-end / server automation type scripts. Mostly super simple stuff.

Thank you! Let me know what you need from me!

@MarekPikula
Copy link
Collaborator Author

All right, so I'll spend the next two days or so making the code more production-ready. I'll let you know about the progress 🙂

@iFargle
Copy link
Owner

iFargle commented Apr 3, 2023

Thank you!

@MarekPikula
Copy link
Collaborator Author

Here is a quick update from my side. I'm in the middle of a relatively big refactor, so it will take a few more days to complete and properly test it. I already published part of my work in a separate package (https://pypi.org/project/headscale-api/). It's a Headscale API abstraction, which could be useful also for other projects connecting to Headscale – I already use it in the refactor and am pretty happy about the current state of it. In the meantime, please hold any development so that there are no conflicts for me to solve once the PR is ready later this week or in the beginning of the next week.

@iFargle
Copy link
Owner

iFargle commented Apr 6, 2023

This is already wayyy more advanced than my capabilities. I will be watching closely and learning!

@MarekPikula
Copy link
Collaborator Author

A quick update from my side: I'm mostly done. I tried to test everything and after some bughuting everything seems to work. Now, I'll be cleaning up the code, adding some CI and preparing the PR with full description of changes. I'll let you know once I push some code. Unfortunately, due to the fact that the refactor is rather big and touches many parts of the app, it will be one big commit with many changes, so it will be a little harder to see all the specific aspects of the refactor, but I hope the description will fill the gap. I hope to finish it by the end of the week.

@MarekPikula MarekPikula self-assigned this Apr 20, 2023
@iFargle
Copy link
Owner

iFargle commented Apr 21, 2023

Sounds good! I've been slightly busier than usual so I may not have a lot of time to work on this. I'll try and check daily.

MarekPikula added a commit to MarekPikula/headscale-webui that referenced this issue Apr 21, 2023
Major part of iFargle#73

Unfortunately, it wasn't possible to split it to multiple smaller
commits, since the changes touched the entire application substantially.
Here is a short list of major changes:

1. Create a separate library (headscale-api), which is used as a
   convenient abstraction layer providing Pythonic interface with
   Pydantic. Headscale API is fully asynchronous library, benefitting
   from improved concurrency for backend requests thus increasing page
   load speed, e.g., on "Machines" page.
2. Create a common common, validated with flask-pydantic API passthrough
   layer from GUI to the backend.
3. Move authentication to a separate (auth.py), consolidating the
   functionality in a single place (with better place for expansion in
   the future).
4. Move configuration management to a separate module (config.py). Use
   Pydantic's BaseSettings for reading values from environment, with
   extensive validation and error reporting.
5. Reduce the number of health checks.
    - Now, most are performed during server initialization. If any test
      fails, the server is started in tainted mode, with only the error
      page exposed (thus reducing the surface of attack in invalid
      state).
    - Key checks are implicit in the requests to the backend and
      guarded by `@headscale.key_check_guard` decorator.
    - Key renewal is moved to server-side scheduler.
6. Introduce type hints to the level satisfactory for mypy static
   analysis. Also, enable some other linters in CI and add optional
   pre-commit hooks.
7. Properly handle some error states. Instead of returning success and
   handling different responses, if something fails, there is HTTP error
   code and standard response for it.
8. General formatting, small rewrites for clarity and more idiomatic
   Python constructs.

Signed-off-by: Marek Pikuła <marek.pikula@embevity.com>
MarekPikula added a commit to MarekPikula/headscale-webui that referenced this issue Apr 21, 2023
Major part of iFargle#73

Unfortunately, it wasn't possible to split it to multiple smaller
commits, since the changes touched the entire application substantially.
Here is a short list of major changes:

1. Create a separate library (headscale-api), which is used as a
   convenient abstraction layer providing Pythonic interface with
   Pydantic. Headscale API is fully asynchronous library, benefitting
   from improved concurrency for backend requests thus increasing page
   load speed, e.g., on "Machines" page.
2. Create a common common, validated with flask-pydantic API passthrough
   layer from GUI to the backend.
3. Move authentication to a separate (auth.py), consolidating the
   functionality in a single place (with better place for expansion in
   the future).
4. Move configuration management to a separate module (config.py). Use
   Pydantic's BaseSettings for reading values from environment, with
   extensive validation and error reporting.
5. Reduce the number of health checks.
    - Now, most are performed during server initialization. If any test
      fails, the server is started in tainted mode, with only the error
      page exposed (thus reducing the surface of attack in invalid
      state).
    - Key checks are implicit in the requests to the backend and
      guarded by `@headscale.key_check_guard` decorator.
    - Key renewal is moved to server-side scheduler.
6. Introduce type hints to the level satisfactory for mypy static
   analysis. Also, enable some other linters in CI and add optional
   pre-commit hooks.
7. Properly handle some error states. Instead of returning success and
   handling different responses, if something fails, there is HTTP error
   code and standard response for it.
8. General formatting, small rewrites for clarity and more idiomatic
   Python constructs.

Signed-off-by: Marek Pikuła <marek.pikula@embevity.com>
@iFargle
Copy link
Owner

iFargle commented Apr 21, 2023

Hm, that seems to have broken my Jenkins build.

[2023-04-21 22:04:00 +0900] [1] [INFO] Starting gunicorn 20.1.0
[2023-04-21 22:04:00 +0900] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2023-04-21 22:04:00 +0900] [1] [INFO] Using worker: sync
[2023-04-21 22:04:00 +0900] [8] [INFO] Booting worker with pid: 8
[2023-04-21 22:04:03,207] CRITICAL in config: Following environment variables are required but are not declared or have an invalid value:
[2023-04-21 22:04:03,207] CRITICAL in config:   AUTH_TYPE with type AuthType: type_error.enum
[2023-04-21 22:04:03,207] CRITICAL in config:   BUILD_DATE with type datetime: value_error.datetime
[2023-04-21 22:04:03,207] CRITICAL in config: Environment error for AUTH_TYPE
[2023-04-21 22:04:03,207] CRITICAL in config: Environment error for BUILD_DATE
[2023-04-21 22:04:03,208] ERROR in server: Encountered error when trying to run initialization checks. Running in tainted mode (only the error page is available). Correct all errors and restart the server.
Failed to find attribute 'app' in 'server'.
[2023-04-21 22:04:03 +0900] [8] [INFO] Worker exiting (pid: 8)
[2023-04-21 22:04:03 +0900] [1] [INFO] Shutting down: Master
[2023-04-21 22:04:03 +0900] [1] [INFO] Reason: App failed to load.

@iFargle
Copy link
Owner

iFargle commented Apr 21, 2023

Both BUILD_DATE and AUTH_TYPE should be set unless they're being overwritten somewhere?

@tgkenney
Copy link

tgkenney commented Apr 21, 2023

Chiming in here, but I was setting up this app for the first time and it looks like latest is broken like you said. I was setting AUTH_TYPE but the app was failing to load same as the error above. Using v0.6.1 works fine

Edit: Definitely an issue with the way you're doing the new config.py. I can get AUTH_TYPE to set by using lower "basic", I wasn't able to get BUILD_DATE to take correctly, but the default definitely isn't being used

@iFargle
Copy link
Owner

iFargle commented Apr 22, 2023

I'm troubleshooting, give me a bit :)

@MarekPikula
Copy link
Collaborator Author

To be honest, I didn't expect you to already release the code 😛 But I guess this way, we will get more bug reports quicker.

I think that I know what the problem might be. I'll troubleshoot in a moment.

@iFargle
Copy link
Owner

iFargle commented Apr 22, 2023

ha, I'm super new to all this -- The only way I can get Jenkins to build is either by branching from my repo (my mirror doesn't pick up on pull requests) or by committing to main. So YOLO, I committed to main!

but I realize I should be more careful -- There ARE at least a few active users of this project.

@iFargle
Copy link
Owner

iFargle commented Apr 22, 2023

FYI Jenkins is running. Since it's multiarch it takes around an hour to build.

@iFargle
Copy link
Owner

iFargle commented Apr 22, 2023

I lied. Jenkins doesn't seem to like that new string -- No logs output at all

@MarekPikula
Copy link
Collaborator Author

I'll try to make a GH-based Docker build sometime next week.

@MarekPikula
Copy link
Collaborator Author

Hmm, ok, so I'll revert it to general string. Not that it's needed to be in some concrete format anyways.

@MarekPikula
Copy link
Collaborator Author

Done in #90.

@iFargle
Copy link
Owner

iFargle commented Apr 22, 2023

Now I'm getting other errors:

[2023-04-22 16:56:05 +0900] [1] [INFO] Starting gunicorn 20.1.0
[2023-04-22 16:56:05 +0900] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2023-04-22 16:56:05 +0900] [1] [INFO] Using worker: sync
[2023-04-22 16:56:05 +0900] [8] [INFO] Booting worker with pid: 8
[2023-04-22 16:56:08,399] CRITICAL in config: Following environment variables are required but are not declared or have an invalid value:
[2023-04-22 16:56:08 +0900] [8] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/app/server.py", line 66, in create_app
    auth = AuthManager(config)
           ^^^^^^^^^^^^^^^^^^^
  File "/app/auth.py", line 131, in __init__
    oidc_info = OpenIdProviderMetadata.parse_obj(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pydantic/main.py", line 526, in pydantic.main.BaseModel.parse_obj
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 5 validation errors for OpenIdProviderMetadata
response_types_supported -> 1
  unexpected value; permitted: 'code', 'id_token', 'id_token token', 'code id_token', 'code token', 'code id_token token' (type=value_error.const; given=token; permitted=('code', 'id_token', 'id_token token', 'code id_token', 'code token', 'code id_token token'))
response_types_supported -> 5
  unexpected value; permitted: 'code', 'id_token', 'id_token token', 'code id_token', 'code token', 'code id_token token' (type=value_error.const; given=token id_token; permitted=('code', 'id_token', 'id_token token', 'code id_token', 'code token', 'code id_token token'))
response_types_supported -> 6
  unexpected value; permitted: 'code', 'id_token', 'id_token token', 'code id_token', 'code token', 'code id_token token' (type=value_error.const; given=code token id_token; permitted=('code', 'id_token', 'id_token token', 'code id_token', 'code token', 'code id_token token'))
response_types_supported -> 7
  unexpected value; permitted: 'code', 'id_token', 'id_token token', 'code id_token', 'code token', 'code id_token token' (type=value_error.const; given=none; permitted=('code', 'id_token', 'id_token token', 'code id_token', 'code token', 'code id_token token'))
response_modes_supported -> 0
  unexpected value; permitted: 'query', 'fragment' (type=value_error.const; given=form_post; permitted=('query', 'fragment'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/.venv/lib/python3.11/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/app/.venv/lib/python3.11/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/app/.venv/lib/python3.11/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
                ^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
                    ^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
           ^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocke

I'm going to see what troubleshooting I can do myself. Some of this code is above my head but I'll do my best :)

@iFargle
Copy link
Owner

iFargle commented Apr 22, 2023

So when I initially tested I just cloned your pull request and did a local docker build. I'm really unsure of what the difference would be

@iFargle
Copy link
Owner

iFargle commented Apr 22, 2023

Figured that part out, now we're onto an issue with the scheduler:

[2023-04-22 21:08:08 +0900] [1] [INFO] Starting gunicorn 20.1.0
[2023-04-22 21:08:08 +0900] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2023-04-22 21:08:08 +0900] [1] [INFO] Using worker: sync
[2023-04-22 21:08:08 +0900] [8] [INFO] Booting worker with pid: 8
[2023-04-22 21:08:11,761] INFO in server: Headscale-WebUI Version: v0.7.0 / 7.0.1-testing
[2023-04-22 21:08:11,761] INFO in server: Logger level set to 10.
[2023-04-22 21:08:11,761] INFO in server: Debug state: False
[2023-04-22 21:08:11,773] INFO in base: Scheduler started
[2023-04-22 21:08:11,773] DEBUG in base: Looking for jobs to run
[2023-04-22 21:08:11,773] DEBUG in base: No jobs; waiting until a job is added
[2023-04-22 21:08:11,791] INFO in base: Added job "register_scheduler.<locals>.renew_api_key" to job store "default"
[2023-04-22 21:08:11,791] DEBUG in base: Looking for jobs to run
[2023-04-22 21:08:11,791] INFO in server: Key renewal schedule triggered...
[2023-04-22 21:08:11,791] DEBUG in base: Next wakeup is due at 2023-04-22 22:08:11.773651+09:00 (in 3599.982351 seconds)
Failed to find attribute 'app' in 'server'.
[2023-04-22 21:08:11 +0900] [8] [INFO] Worker exiting (pid: 8)
Job "register_scheduler.<locals>.renew_api_key (trigger: interval[1:00:00], next run at: 2023-04-22 22:08:11 JST)" raised an exception
Traceback (most recent call last):
  File "/app/.venv/lib/python3.11/site-packages/apscheduler/executors/base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/server.py", line 377, in renew_api_key
    if app.ensure_sync(headscale.renew_api_key)() is None:  # type: ignore
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/asgiref/sync.py", line 213, in __call__
    loop_future = loop_executor.submit(
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 169, in submit
    raise RuntimeError('cannot schedule new futures after '
RuntimeError: cannot schedule new futures after interpreter shutdown
/usr/local/lib/python3.11/traceback.py:240: RuntimeWarning: coroutine 'AsyncToSync.main_wrap' was never awaited
  tb.tb_frame.clear()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
[2023-04-22 21:08:11,848] INFO in base: Scheduler has been shut down
[2023-04-22 21:08:11,848] DEBUG in base: Looking for jobs to run
[2023-04-22 21:08:11,848] DEBUG in base: No jobs; waiting until a job is added
[2023-04-22 21:08:12 +0900] [1] [INFO] Shutting down: Master
[2023-04-22 21:08:12 +0900] [1] [INFO] Reason: App failed to load.

Not 100% sure about this one. if you'd like to take a look it's in the 7.0.1-testing branch
https://github.com/iFargle/headscale-webui/tree/7.0.1-testing

I'm off to bed for now

@iFargle
Copy link
Owner

iFargle commented Apr 25, 2023

I can't really do much more -- My system runs out of RAM when trying to install packages via poetry. All 64GB's.. Hm.

@MarekPikula
Copy link
Collaborator Author

Hi, sorry for a long break. I had to catch up at work. I noticed that you reverted the PR. Could you summarize what was problematic in the last revision so that we can go back to work on merging it to main?

@MarekPikula
Copy link
Collaborator Author

MarekPikula commented Jun 4, 2023

The issue you mentioned has a rather misleading message. The thing that is the failing point is Failed to find attribute 'app' in 'server'. and that is because I forgot to update the gunicorn target in Dockerfile 🤦 Fixed in 8e66e3c and pushed change to 0.7.0-dev. I let myself rebase the changes you've made with a sensible commit message.

Side note: Please, don't push commits with such messages to a place where collaboration happens. I suggest reading https://cbea.ms/git-commit/ to make other developers happy. Also, git rebase is your friend before pushing changes to a public repository.

@MarekPikula
Copy link
Collaborator Author

MarekPikula commented Jun 4, 2023

Based on the git history, there are the new features currently available on main since I worked on 0.7.0:

@MarekPikula
Copy link
Collaborator Author

Ok, so all changes guaranteeing feature parity should be pushed to 0.7.0-dev. Merging it will be tricky, since you reverted the previous PRs in 2033f80. I'll figure out a way to make it mergable.

@MarekPikula
Copy link
Collaborator Author

All right. It will be messy, but possible. Opened #107 to have it pass CI. Further discussion there.

@iFargle
Copy link
Owner

iFargle commented Jun 4, 2023

My apologies -- I know you didn't sign up for teaching anyone :) I'll read through the docs and everything later tonight. Thank you!

@MrTinnysis
Copy link

Are there news on the new version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants