Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cron fails on two servers, approximately same time, without error #423

Closed
Tracked by #674
ghost opened this issue May 10, 2019 · 19 comments
Closed
Tracked by #674

cron fails on two servers, approximately same time, without error #423

ghost opened this issue May 10, 2019 · 19 comments
Labels
type:bug Bug reports and bug fixes

Comments

@ghost
Copy link

ghost commented May 10, 2019

I have two aws servers, identical setup:
lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.2 LTS Release: 18.04 Codename: bionic
uname -r 4.15.0-1035-aws
node --version v8.16.0
cat package.json | grep cron "cron": "^1.7.1",

both have cron running with this code:
new CronJob('*/10 * * * * *', procNewBlock, null, true, 'Europe/Berlin');

both stopped running at approximately the same time, the first was running the last time at:
May 2 09:05:47 UTC

the second last run was at:
May 2 09:06:30 UTC

so far this was the only time that this happend (running for about 1 month now). To me it looks as if some scheduled system task outside of nodejs is killing the crons. But any help greatly appreciated. The nodejs processes did not see any interruption.

@ghost
Copy link
Author

ghost commented May 10, 2019

probably related:
#231
#232

@ghost
Copy link
Author

ghost commented May 10, 2019

For me, it does not seem to depend on the server load: one of my servers is rather idle, the other rather busy. Both have the same resources.
Also, even though both crons stopped at almost the same time, their starting times are more than 1 day apart.

@ncb000gt
Copy link
Member

That's really strange. Is there a service that both crons are using? What I'm wondering is if there is a possibility that they were using the same service that went down at some time which caused an exception in your calling code which cascaded and caused the crons to die.

Let me know, thanks!

@ghost
Copy link
Author

ghost commented May 14, 2019

Both servers synchronize to the ubuntu time server every 30 min, but at different times. However, sometimes the synchronization shows some timeouts (10s) - see the second log - which are longer than the cron interval. Could this be a problem?

Server 1:
09:00:57 May 2 09:00:57 ip-10-0-0-162 systemd-timesyncd[6168]: Network configuration changed, trying to establish connection. May 2 09:00:57 ip-10-0-0-162 systemd-timesyncd[6168]: Network configuration changed, trying to establish connection. 09:00:57 May 2 09:00:57 ip-10-0-0-162 systemd-timesyncd[6168]: Synchronized to time server 91.189.89.198:123 (ntp.ubuntu.com).

Server 2:
May 2 09:18:17 ip-10-0-0-7 systemd-timesyncd[23072]: Network configuration changed, trying to establish connection. May 2 09:18:17 ip-10-0-0-7 systemd-timesyncd[23072]: Network configuration changed, trying to establish connection. 09:18:27 May 2 09:18:27 ip-10-0-0-7 systemd-timesyncd[23072]: Timed out waiting for reply from 91.189.89.199:123 (ntp.ubuntu.com). 09:18:27 May 2 09:18:27 ip-10-0-0-7 systemd-timesyncd[23072]: Synchronized to time server 91.189.89.198:123 (ntp.ubuntu.com).

@jSherz
Copy link

jSherz commented Jul 3, 2019

Hi @ChristophDFine . Did you ever get to the bottom of what caused this?

@ghost
Copy link
Author

ghost commented Jul 3, 2019

No, not yet. I am still waiting that it happens again so I get more debug info.
So far it happened once more, which showed me that:
a) it is actually not related to the cron package but some deeper source in nodejs
b) it was probably a coincidence that it happend on both servers at the same time

@eero-lehtinen
Copy link

eero-lehtinen commented Jul 9, 2019

Hey,
I've had the exact same issue. Once a month timers would just stop working. Turned out that node v10 has an integer overflow issue that kills all timers after 25 days. It's fixed in v10.9.

Edit: might not be the same issue since your processes were launched at different times and stopped at the same time.

@ncb000gt
Copy link
Member

Has anyone been able to reproduce this issue? If not I'll close the issue and chalk it up to a node issue.

Let me know. Thanks.

@EdClaus
Copy link

EdClaus commented Mar 28, 2020

This morning I got this same error and tried out with a fresh new install of rasbian and node-red. I was able to pinpoint it to an inject node with specific content. With this (below) node it crashed, with other configurated inject nodes there are no problems.

[{"id":"a02e32cd.28c53","type":"tab","label":"Flow 6","disabled":false,"info":""},{"id":"a78a0f31.f1bc9","type":"inject","z":"a02e32cd.28c53","name":"Init & reset","topic":"","payload":"auto","payloadType":"str","repeat":"","crontab":"*/20 2 * * *","once":true,"onceDelay":0.1,"x":130,"y":120,"wires":[[]]}]

@awaismehmood88
Copy link

awaismehmood88 commented Jan 23, 2023

Hi,
I have same issue code runs 2 3 days then stops without error.

Code

let cronJob = null;
let isRunning = false;
const options = {
    cronTime: `*/5 * * * * *`,
    onTick: fireTick,
    onComplete: () => logger.debug('CRON:TICK job completed'),
    runOnInit: true,
};
cronJob = new CronJob(options);
cronJob.start();

function fireTick() {
    if (isRunning) {
        logger.debug('CRON:TICK Already in execution');
        return true;
    }
    isRunning = true;
    logger.debug('CRON:TICK Finding X job start');
    findJob()
        .then(() => {
            logger.debug('CRON:TICK Finding X job end');
            isRunning = false;
        })
        .catch(e => {
            isRunning = false;
        })
    return true;
}

Logs

{"level":"debug","message":"CRON:TICK Finding X job start","timestamp":"2023-01-21 03:06:10"}
{"level":"debug","message":"CRON:TICK job completed","timestamp":"2023-01-21 03:06:10"}
{"level":"debug","message":"CRON:TICK Finding X job end","timestamp":"2023-01-21 03:06:10"}
{"level":"debug","message":"CRON:TICK Finding X job start","timestamp":"2023-01-21 03:06:15"}
{"level":"debug","message":"CRON:TICK job completed","timestamp":"2023-01-21 03:06:15"}
{"level":"debug","message":"CRON:TICK Finding X job end","timestamp":"2023-01-21 03:06:15"}
{"level":"debug","message":"CRON:TICK Finding X job start","timestamp":"2023-01-21 03:06:20"}
{"level":"debug","message":"CRON:TICK job completed","timestamp":"2023-01-21 03:06:20"}
{"level":"debug","message":"CRON:TICK Finding X job end","timestamp":"2023-01-21 03:06:20"}
{"level":"debug","message":"CRON:TICK job completed","timestamp":"2023-01-21 03:06:39"}
{"level":"debug","message":"CRON:TICK Finding X job start","timestamp":"2023-01-21 03:06:40"}
{"level":"debug","message":"CRON:TICK job completed","timestamp":"2023-01-21 03:06:41"}
{"level":"debug","message":"CRON:TICK Finding X job end","timestamp":"2023-01-21 03:06:44"}

@intcreator
Copy link
Collaborator

@awaismehmood88 are you able to reproduce this behavior?

@ChristophDFine what indicates that the error is with Node as opposed to node-cron?

@awaismehmood88
Copy link

Yes, unfortunately I had to choose other library over this.

@gramakri
Copy link

@awaismehmood88 out of curiosity, which other library did you choose?

@awaismehmood88
Copy link

Using node-cron

@intcreator
Copy link
Collaborator

@awaismehmood88 I'm trying to reproduce your issue. what does the findJob function do? I use this library for processes that run for months without stopping so it must only happen under specific circumstances

@awaismehmood88
Copy link

awaismehmood88 commented Mar 27, 2023

FindJob function finds jobs from DB and executive IO intensive task, it may take less than a second or may take up to one minute, code also contains time intervals, callbacks etc

There was not fixed time when cron stops, I tried same code on mac and linux it stops after few days

@intcreator
Copy link
Collaborator

you're sure there's no error message when it stops? I started a test cron job right after my last comment just to test this and it still works after a few days. my guess is it's related to #467 if it's unpredictable and caused by the environment.

can you start very simple test cron to see if a console log causes it to fail after a couple of days?

could you also try starting an even more complex job with more DB/IO operations to see if you can get it to fail in a few seconds or minutes? I'm surprised actually because if it's time intensive you should be able to just make it async so it fires off and then cron doesn't worry about it anymore

@awaismehmood88
Copy link

There was no error shown in logs, I'll try simple cron job on same machine and share more details.

@sheerlox sheerlox added the type:bug Bug reports and bug fixes label Aug 15, 2023
@sheerlox
Copy link
Collaborator

sheerlox commented Nov 4, 2023

Closing this issue since OP established it wasn't an issue with the library, but rather with Node.js.

EDIT: leaving the type:bug label even thought it's not one, so the issue can be found easily in case someone encounters it again in the future.

@sheerlox sheerlox closed this as not planned Won't fix, can't repro, duplicate, stale Nov 4, 2023
@sheerlox sheerlox added type:bug Bug reports and bug fixes and removed type:bug Bug reports and bug fixes labels Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Bug reports and bug fixes
Projects
None yet
Development

No branches or pull requests

8 participants