Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 639 prints in cdxj #640

Merged
merged 26 commits into from
Jul 6, 2020

Conversation

anatoly-scherbakov
Copy link
Contributor

No description provided.

@anatoly-scherbakov
Copy link
Contributor Author

Closes #639

@codecov
Copy link

codecov bot commented Apr 27, 2020

Codecov Report

Merging #640 into master will increase coverage by 2.18%.
The diff coverage is 55.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #640      +/-   ##
==========================================
+ Coverage   27.07%   29.26%   +2.18%     
==========================================
  Files           7       10       +3     
  Lines        1241     1244       +3     
  Branches      190      184       -6     
==========================================
+ Hits          336      364      +28     
+ Misses        880      857      -23     
+ Partials       25       23       -2     
Impacted Files Coverage Δ
ipwb/__main__.py 0.00% <0.00%> (ø)
ipwb/settings.py 0.00% <0.00%> (ø)
ipwb/replay.py 14.06% <22.22%> (+0.26%) ⬆️
ipwb/indexer.py 50.19% <66.66%> (-0.19%) ⬇️
ipwb/util.py 43.71% <80.76%> (+8.62%) ⬆️
ipwb/error_handler.py 100.00% <100.00%> (ø)
ipwb/exceptions.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4fca791...16bbb8f. Read the comment docs.

ipwb/__main__.py Outdated Show resolved Hide resolved
ipwb/check_for_update.py Outdated Show resolved Hide resolved
ipwb/check_for_update.py Outdated Show resolved Hide resolved
ipwb/util.py Outdated
def logError(errIn):
print(errIn, file=sys.stderr)
except Exception as e:
raise Exception('Unknown error in retrieving daemon status.') from e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the removal of logError(sys.exc_info()[0]) provide a sufficient traceback for debugging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example:

In [1]: try: 
   ...:     raise ValueError('Ugly internal error, GRRRRR!!!') 
   ...: except ValueError as err: 
   ...:     raise Exception('Nice user-friendly error :)') from err 
   ...:                                                                                                           
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-8ed378f4cf49> in <module>
      1 try:
----> 2     raise ValueError('Ugly internal error, GRRRRR!!!')
      3 except ValueError as err:

ValueError: Ugly internal error, GRRRRR!!!

The above exception was the direct cause of the following exception:

Exception                                 Traceback (most recent call last)
<ipython-input-1-8ed378f4cf49> in <module>
      2     raise ValueError('Ugly internal error, GRRRRR!!!')
      3 except ValueError as err:
----> 4     raise Exception('Nice user-friendly error :)') from err
      5 

Exception: Nice user-friendly error :)

As you can see, the user will see the error message you supplied in raise, together with its stack trace. And if the user is to scroll a bit upper they will see the original exception properly preserved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I am convinced. PEP 415'sraise exc from None will also help suppress context, which is useful for development but noisy for the end user who should just be told directly what is wrong. I am fine using the pattern provided in this PR.

ipwb/util.py Outdated

except OSError:
raise Exception(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is raising an Exception within the catch of an exception good practice? Prior implementation exited when this occurred but I do not see this same behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prior implementation exited; an unhandled Exception causes the program to do the same. So, no fundamental difference. But, using raise ... from ... prints the original stacktrace without any extra actions - just using the language's built-in mechanisms. That's why I prefer it to explicit sys.exit().

What is the reason of re-raising the exception though? The same reason which probably motivated to catch these exceptions, in the first place - to provide a bit more friendly error message to the user, I believe.

The more proper way would be to create custom exception classes and to catch them at an upper level of the application converting to nice readable messages, but I believe this can be implemented a bit later.

ipwb/util.py Show resolved Hide resolved
Copy link
Member

@ibnesayeed ibnesayeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some functions such as isDaemonAlive are now raising exceptions instead of logging errors inline, which is great. However, when these functions are called, exceptions are not caught there, which may result in ugly stack trace for end users. Should this be rectified or am I missing something here?

@ibnesayeed
Copy link
Member

Looks like Mat and I were looking at the same code at the same time. 😄

@anatoly-scherbakov
Copy link
Contributor Author

I do not believe unhandled exceptions are inherently a bad thing. What I believe needs to be done is creating custom exceptions for various situations specific for the system, and handling them all in one place.

Something like this:

@dataclasses.dataclass(frozen=True)
class IndexNotFound(Exception):
    path: str
    backend: str

    def __str__(self):
        return f'The CDXJ index file {self.path} was not found by {self.backend} backend. Please check the correctness of the path.'
...

try:
    return replay(path)

except (IndexNotFound, InvalidIndex, ...) as err:
    if verbose is True:
        logger.exception()
    else:
        logger.warning(str(err))
    return

But this is tightly coupled with backends system because most of the operations with the index which can cause exceptions are probably going to be in the backend.

@ibnesayeed
Copy link
Member

ibnesayeed commented May 2, 2020

I am happy with raising custom exceptions if generic exception messages are too vague and a custom one can provide more context-specific message. However, I am against leaving known exceptions unhanded and let the program crash. There are situations where the program must exit/terminate on specific exceptions, but those exits should be preformed after catching those exceptions. There is a thin line between an application exit and application crash, unhanded exceptions suggest the latter. While stack trace is helpful for development, it is so not welcome by end users. Besides, if we know the cause of certain exceptions because we raised them, then we do not need to see the stack trace. Unknown issues should be the ones that need to show stack trace and be handled after being discovered. I have no issues in having a top level exception handler, as long as it is specific enough to not mask previously unknown exceptions.

@machawk1 machawk1 mentioned this pull request May 4, 2020
2 tasks
@anatoly-scherbakov
Copy link
Contributor Author

I will look into implementing some custom exceptions and their printouts in nearest days.

@anatoly-scherbakov
Copy link
Contributor Author

I again fell into the same pit of accumulating changes, so let me try to describe them in detail.

Communicating errors using exceptions

I added a wrapper called exception_handler(), which is used as a decorator around def main(). It catches unhandled exceptions. When caught an exception, it does one of two things:

  • if environment variable DEBUG is False (the default) - we will see only a logging message. Like this:
$ ipwb index ~/projects/covid-in-russia/warc/stopcoronavirus-russia.1588018596.cdxj
2020-07-05 23:51:43,336 [CRITICAL] ipwb: Daemon is not running at: /dns/localhost/tcp/5001/http

But we also can get full traceback:

$ DEBUG=true ipwb index ~/projects/covid-in-russia/warc/stopcoronavirus-russia.1588018596.cdxj
# --snip --
...exceeded with url: /api/v0/id?stream-channels=true (Caused by NewConnectionError('<ipfshttpclient.requests_wrapper.HTTPConnection object at 0x7f7bba80f4f0>: Failed to establish a new connection: [Errno 111] Connection refused'))

The above exception was the direct cause of the following exception:

# -- snip --

    raise Exception(f'Daemon is not running at: {daemonMultiaddr}') from err
Exception: Daemon is not running at: /dns/localhost/tcp/5001/http

The DEBUG variable also increases the logging level to ease debugging.

IPFS client

After adding commits, codecov was unhappy about their coverage. I decided to try improving the situation by adding some tests, and when that didn't help - went on to try simplifying a few things. Fewer the lines, better the coverage.

createIPFSClient() function is changed to ipfs_client(). That function is doing what the original function did - it creates the client.

However, it also is wrapped in @lru_cache decorator which means that the value of the client will be cached. This removes the need for global IPFS_API variables - and thus I removed them.

...and only at that point did I realize that my goal of passing coverage barrier should have been achieved by just writing tests for exception_handler() - and I got to write them, which seems to finally make codecov happy.

@anatoly-scherbakov
Copy link
Contributor Author

@machawk1 fyi :)

Copy link
Member

@ibnesayeed ibnesayeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are commendable contributions @anatoly-scherbakov, applauds for taking care of this. I have a couple of minor suggestions and a concern which I would like to hear your thoughts on.

ipwb/util.py Outdated Show resolved Hide resolved
ipwb/__main__.py Outdated Show resolved Hide resolved
except IOError as err:
raise Exception(
'IPFS config not found. Have you installed ipfs and run ipfs init?'
) from err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since sys.exit() call is removed, and the wrapper exception handler will simply log the exception message when DEBUG is set to false, how will the process terminate? There is no point in keeping the service alive in an defined state when critical pieces are missing, no matter DEBUG is on or off. DEBUG should make logging more verbose with stack trace, but it should not change the process termination behavior. Current implementation, if I could understand it well, means it will terminate the process by raising the exception when DEBUG is on even when the exception is rather innocent and handlable and will not terminate the process even on critical exceptions when the DEBUG is off. Is this the case or am I missing something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibnesayeed let us describe the calls sequence:

if __name__ == "__main__":
    main()

seemingly calls main(), but actually calls wrapper(*args, **kwargs) (see the implementation of exception_logger).

Then, wrapper calls the original main() inside a try - except block.

Now what happens if an unhandled exception pops out somewhere in main() or one of the functions it calls? Being unhandled, that exception pops up through the call stack, reaches main(), passes it and arrives at wrapper().

The execution of main() at this point is already interrupted by a propagating exception. wrapper() now can either reraise the exception or just log it - that does not matter because, when its except clause is done its work, wrapper() function returns and the process stops.

At least it should if everything works as I believe it does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anatoly-scherbakov I understand how wrapper/decorator functions work in Python. I have actually used them heavily in an HTTP testing framework I wrote a while ago. My concern here is a little different. Not every exception needs to be reraised to bubble out of the main and let the process die. For example, if the system is configured to use a local IPFS node (which is the case for now as the code needs some changes to decouple IPFS from IPWB) and the IPFS daemon is not accessible because the binary is not installed on the system then the process should die because there is no hope it will ever be accessible. However, if the daemon is not accessible because it is not running, but is installed, then we may not want to let the process die, because the admin control panel has an option to attempt to start/stop IPFS daemon. In the earlier code, some exceptions were logging a message and returning None while other exceptions were logging a message and exiting. I am not sure if that behavior is still preserved. I think it is great that we have a catch-all wrapper exception handler, but we may want to handle some exceptions inline and take appropriate actions instead of bubbling every exception up with a custom message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I think the custom exception IPFSDaemonNotAvailable is handled inline in the replay. This looks good to me now. Thanks @anatoly-scherbakov for your valuable contributions. We look forward to see more from you.

anatoly-scherbakov and others added 2 commits July 6, 2020 10:56
e → err

Co-authored-by: Sawood Alam <ibnesayeed@gmail.com>
f-strings forever!

Co-authored-by: Sawood Alam <ibnesayeed@gmail.com>
Copy link
Member

@ibnesayeed ibnesayeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@machawk1 please merge this.

except IOError as err:
raise Exception(
'IPFS config not found. Have you installed ipfs and run ipfs init?'
) from err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I think the custom exception IPFSDaemonNotAvailable is handled inline in the replay. This looks good to me now. Thanks @anatoly-scherbakov for your valuable contributions. We look forward to see more from you.

@machawk1
Copy link
Member

machawk1 commented Jul 6, 2020

Did some manual verification on my end as well. Moving to merge this into master. Thanks, @anatoly-scherbakov!

@machawk1 machawk1 merged commit 1575b1e into oduwsdl:master Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants