Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jenkins machine configuration on Windows test machines need to update #23

Closed
sophia-guo opened this issue Sep 21, 2017 · 46 comments
Closed
Assignees

Comments

@sophia-guo
Copy link

sophia-guo commented Sep 21, 2017

Openjdk tests build on windows got failures for permission issue:

java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\openjdk_test_x86-64_windows\openjdk-test\OpenJDK_Playlist\openjdk-jdk8u\jdk\test\sun\management\windows\revokeall.exe

According to last two comments in adoptium/aqa-tests#37 (comment) Jenkins machine configuration need to update to specify the tools location for git.

The issue are still there suppose the configuration isn't be updated.

@tellison tellison self-assigned this Sep 21, 2017
@tellison
Copy link
Contributor

I've updated the git tool location in Jenkins for node test-packet-x64-windows-2012r2-1 to point to C:\Users\jenkins\AppData\Local\Programs\Git\bin\git.exe rather than the default Cgwin version that Brad identified in your linked issue.

Please check to see if this is sufficient.

@sophia-guo
Copy link
Author

sophia-guo commented Sep 27, 2017

Problem stays. See comment in adoptium/aqa-tests#37
The main problem is jenkins cannot wipe out the workspace created by itself.

Could someone with "root/Admin" permission log in and wipe out the workspace C:\Users\jenkins\workspace\openjdk_test_x86-64_windows? ( I only have jenkins permission)

In this way with a clean workspace we can try if problem stays. As I mentioned initially it happened when test machine is offline( either by regular jenkins restart or unknown reason), which means probably some configuration changes. Thanks.

@karianna
Copy link
Contributor

@tellison Maybe you and I can figure this out today?

@sophia-guo
Copy link
Author

Suppose someone has responded my former comment. No this issue anymore in recent build.

@sophia-guo
Copy link
Author

@gdams
Copy link
Member

gdams commented Nov 21, 2017

@sophia-guo what is the current status of this?

@sophia-guo
Copy link
Author

@gdams
Copy link
Member

gdams commented Nov 21, 2017

right okay, we need to get this fixed

@smlambert
Copy link
Contributor

We have disabled the Windows test builds until we can get a clean Windows machine:
https://ci.adoptopenjdk.net/view/work%20in%20progress/job/openjdk_test_x86-64_windows/

@gdams - perhaps you can give myself and @sophia-guo a quick tutorial on how to reconfigure the machines at adopt / run ansible playbooks, we can help bring up new test machines, or refresh ones that are in a bad state.

@bblondin
Copy link
Contributor

@gdams will be reprovisioning this machine over the next day or two.
Then we'll work together to deploy it with via Ansible from AWX

@bblondin bblondin self-assigned this Jan 27, 2018
@bblondin
Copy link
Contributor

system has been reprovisioned, playbook ran
147.75.32.146

needs to be added to Jenkins @gdams can you do that?

@gdams
Copy link
Member

gdams commented Jan 31, 2018

okay added to jenkins. @smlambert can you let us know if this new machine fixes the issues?

@sophia-guo
Copy link
Author

@sxa
Copy link
Member

sxa commented Mar 27, 2018

Based on investigation from @pnstanton The full cygwin install wasn't in the system PATH which likely caused the failure to pick up the nohup command.

@bblondin bblondin removed their assignment Apr 4, 2018
@sophia-guo
Copy link
Author

@sxa555 sorry for confusion.
To reproduce first issue: (take my testing files as example)
remote to the machine test-packet-x64-windows-2012r2-1
cd C:\Users\jenkins\workspace\TestBuild_Sandbox\openjdk-tests>
C:\Users\jenkins\workspace\TestBuild_Sandbox\openjdk-tests>get1.sh
The only function in this get1.sh is to wget a sdk.

For the second issue if you run get1.sh the job will run in a new cmd window, which is similar to jenkins job. If jenkins job use bat to run a script the job will be in a new cmd window. The original window(jenkins job console) will not get any output.

If you see this job https://ci.adoptopenjdk.net/view/work%20in%20progress/job/TestBuild_Sandbox/259/console you will see:
21:40:22 [TestBuild_Sandbox] Running batch script
21:40:22
21:40:22 C:\Users\jenkins\workspace\TestBuild_Sandbox>C:\Users\jenkins\workspace\TestBuild_Sandbox/openjdk-tests/get.sh -s C:\Users\jenkins\workspace\TestBuild_Sandbox -t C:\Users\jenkins\workspace\TestBuild_Sandbox/openjdk-tests -p x64_win -v openjdk8 -r releases -c ( fatal error message come from this command, however in this console we cann't get any error message due to the issue 2 I mentioned)
[Pipeline] }

[Pipeline] // script

@sophia-guo
Copy link
Author

To be clear I remote as a jenkins user and jenkins jobs is also working as a jenkins user.

@sxa
Copy link
Member

sxa commented Apr 16, 2018

If you run the script with bash name_of_script.sh then it doesn't have the linker failure, but for that to work you need to run the script through dos2unix to change the line endings to UNIX format first. Also running it with bash name_of_script.sh will stop it firing up a separate window so you will be able to get any output into the log.

When I run ldd against wget using a separate window (using a shell script just running the name of the script) then it seems to be picking up all the DLLs from /c/cygwin64/bin whereas when run inline from cmd.exe it is finding them at /usr/bin. I would have thought that /c/cygwin64/bin and /usr/bin would be the same within a cygwin environment but perhaps not ... I'll need to come back to this unless you can run it directly with bash name_of_script.sh (Or not use a .sh script at all given that it's just one wget command)

@sxa
Copy link
Member

sxa commented Apr 16, 2018

(For the record, rebooting didn't make any difference to this)

@sophia-guo
Copy link
Author

sophia-guo commented Apr 16, 2018

Yes, in jenkins pipeline if using shell step invoke shell script on windowns we do get the issue about the Windows UNIX ending issue:

sh '$OPENJDK_TEST/get.sh -s $WORKSPACE -t $OPENJDK_TEST -p platform -v ${JVM_VERSION} -r sdk_resource -c url'

19:52:14 + 'C:\Users\jenkins\workspace\TestBuild_Sandbox/openjdk-tests/get.sh' -s 'C:\Users\jenkins\workspace\TestBuild_Sandbox' -t 'C:\Users\jenkins\workspace\TestBuild_Sandbox/openjdk-tests' -p platform -v openjdk8 -r sdk_resource -c url
19:52:14 C:\Users\jenkins\workspace\TestBuild_Sandbox/openjdk-tests/get.sh: line 14: $'\r': command not found
19:52:14 C:\Users\jenkins\workspace\TestBuild_Sandbox/openjdk-tests/get.sh: line 21: $'\r': command not found
19:52:14 C:\Users\jenkins\workspace\TestBuild_Sandbox/openjdk-tests/get.sh: line 22: syntax error near unexpected token `$'\r''
19:52:14 C:\Users\jenkins\workspace\TestBuild_Sandbox/openjdk-tests/get.sh: line 22: `usage ()

https://ci.adoptopenjdk.net/view/work%20in%20progress/job/TestBuild_Sandbox/260/console

Similar as running bash name_of_script.sh directly in cmd:

C:\Users\jenkins\workspace\TestBuild_Sandbox\openjdk-tests>bash get1.sh
get1.sh: line 14: $'\r': command not found
get1.sh: line 21: $'\r': command not found
get1.sh: line 22: $'\r': command not found

It may not be guaranteed there is dos2unix on window machines.

So in jenkins job on Windows I'm trying the bat step to invoke the shell script, which don't have that windows ending issue. However hit this cygwin issue. In jenkins file this step is:

if ( SPEC.contains('win') ) {
         bat "$OPENJDK_TEST/get.sh -s $WORKSPACE -t $OPENJDK_TEST -p ${getPlatformAndLabel(SPEC)[0]} -v ${JVM_VERSION} -r ${params.SDK_RESOURCE} -c ${params.CUSTOMIZED_SDK_URL}"
} 

@sophia-guo
Copy link
Author

sophia-guo commented Apr 16, 2018

(Or not use a .sh script at all given that it's just one wget command)

Did you mean using wget in command line? Yes, if using wget in cmd directly it's ok as I mentioned in above comment.

@sxa
Copy link
Member

sxa commented Apr 17, 2018

Did you mean using wget in command line? Yes, if using wget in cmd directly it's ok as I mentioned in above comment.

But is there a reason not to do that in your cmd script instead of calling a UNIX shell script?

@sxa
Copy link
Member

sxa commented Apr 17, 2018

I tried reinstalling cygwin. It made no difference. Eventually discovered that .sh files had been associated with the shell supplied with git instead of the cygwin one which was causing the mismatch. Now resolved.

Although I now have two versions of cygwin on the machine. The new current one has been set up using exactly what's in the Windows playbook (plus wget which wasn't there). Let me know if it's working.

@sophia-guo
Copy link
Author

I just tried and still got:
2 [main] wget (2888) C:\cygwin64\bin\wget.exe: *** fatal error - cygheap base mismatch detected - 0x1802F7408/0x1802FD410.
This problem is probably due to using incompatible versions of the cygwin DLL.
Search for cygwin1.dll using the Windows Start->Find/Search facility
and delete all but the most recent version. The most recent version should
reside in x:\cygwin\bin, where 'x' is the drive on which you have
installed the cygwin distribution. Rebooting is also suggested if you
are unable to find another cygwin DLL.

@sxa
Copy link
Member

sxa commented Apr 18, 2018

Try now - I think the change I made yesterday may only have affected the Administrator account - the jenkins account still seemed to have the previous association (Either that or something changed it back :-) )

@sophia-guo
Copy link
Author

Now if I run script directly in cmd, cygwin64 I will hit that windows ending format issue.

If I run script directly in gitbash I will get:
2 [main] wget (1436) C:\cygwin64\bin\wget.exe: *** fatal error - cygheap base mismatch detected - 0x1802F7408/0x1802FD410.
This problem is probably due to using incompatible versions of the cygwin DLL.
Search for cygwin1.dll using the Windows Start->Find/Search facility
and delete all but the most recent version. The most recent version should
reside in x:\cygwin\bin, where 'x' is the drive on which you have
installed the cygwin distribution. Rebooting is also suggested if you
are unable to find another cygwin DLL.

which is the same behavior as I run script in jenkins job.

@sxa
Copy link
Member

sxa commented Apr 18, 2018

If I run script directly in gitbash

So what is the current problem you are facing that's holding you up? Do you need the bash supplied with git?

@sxa
Copy link
Member

sxa commented Apr 18, 2018

Now if I run script directly in cmd, cygwin64 I will hit that windows ending format issue.

Can you give me recreate instructions because as far as I can see it's working correctly in this situation

@sophia-guo
Copy link
Author

sophia-guo commented Apr 18, 2018

I saw that you have a openjdk-tests directory, so try following in cmd and you should be able to reproduce:
C:\Users\jenkins\sxa\openjdk-tests>get.sh -s C:\Users\jenkins\sxa\openjdk-tests/
get.sh -t C:\Users\jenkins\sxa\openjdk-tests/ -p x64_win -v openjdk8 -r releases
-c
C:\Users\jenkins\sxa\openjdk-tests\get.sh: line 14: $'\r': command not found
C:\Users\jenkins\sxa\openjdk-tests\get.sh: line 21: $'\r': command not found
C:\Users\jenkins\sxa\openjdk-tests\get.sh: line 22: syntax error near unexpected
token $'\r'' ':\Users\jenkins\sxa\openjdk-tests\get.sh: line 22: usage ()

@sophia-guo
Copy link
Author

In gitbash:
$ openjdk-tests/get.sh -s C:\Users\jenkins\sxa -t C:\Users\jenkins\sxa/openjdk-tests -p x64_win -v openjdk8 -r releases -c openjdk-tests/get.sh: line 99: cd: C:Usersjenkinssxa/openjdk-tests: No such file or directory
Cloning into 'openj9'...
remote: Counting objects: 10285, done.
remote: Compressing objects: 100% (5254/5254), done.
remote: Total 10285 (delta 5136), reused 6135 (delta 4440), pack-reused 0
Receiving objects: 100% (10285/10285), 18.71 MiB | 7.13 MiB/s, done.
Resolving deltas: 100% (5136/5136), done.
Checking out files: 100% (8868/8868), done.
Rewrite ad90192b41828ba2603ec424efc1d448e5434ceb (1/1) (0 seconds passed, remaining 0 predicted)
Ref 'refs/heads/master' was rewritten
openjdk-tests/get.sh: line 104: cd: C:Usersjenkinssxa/openjdk-tests: No such file or directory
mv: cannot stat 'openj9': No such file or directory
openjdk-tests/get.sh: line 66: cd: C:Usersjenkinssxa: No such file or directory
Get binary openjdk...
2 [main] wget (3228) C:\cygwin64\bin\wget.exe: *** fatal error - cygheap base mismatch detected - 0x1802F7408/0x1802FD410.
This problem is probably due to using incompatible versions of the cygwin DLL.
Search for cygwin1.dll using the Windows Start->Find/Search facility
and delete all but the most recent version. The most recent version should
reside in x:\cygwin\bin, where 'x' is the drive on which you have
installed the cygwin distribution. Rebooting is also suggested if you
are unable to find another cygwin DLL.
Failed to retrieve the jdk binary, exiting

jenkins@test-packet-x64-windows-2012r2-1 MINGW64 ~/sxa
$ pwd
/c/Users/jenkins/sxa

@sxa
Copy link
Member

sxa commented Apr 19, 2018

My last comment appears to have got lost so repeating it:

Why are you choosing to run from the git bash shell instead of cygwin's shell when running the script from there?

I've been able to reproduce the issue you had running from cmd now although it oddly doesn't occur everywhere - the fact it doesn't complain until line 14 is rather odd and is possibly why I didn't see it with a smaller example.

Have these scripts been run in another environment on Windows before?

@sxa
Copy link
Member

sxa commented Apr 19, 2018

From discussion in slack - let's try using the existing Windows version of get.bat for now instead of running under cygwin in order to avoid this being a blocking issue.

@sophia-guo
Copy link
Author

I will try that. If I recall correctly even using get.bat jenkins job is using cygwin instead of gitbash. Though I'm not sure. Will update .

@sophia-guo
Copy link
Author

sophia-guo commented Apr 19, 2018

I have updated story for windows only to test if the issue is fixed. Using get.bat I haven't hit the issue of wget cygwin issue. But realized that we need the perl JSON and Text::CSV module installed on test machine. And now seemed it's not there.

@sxa
Copy link
Member

sxa commented Apr 20, 2018

@sophia-guo Can you give me recreate instructions please? As far as I can see test.bat does not do any perl related things.

Can you check what your PATH is at the time - and can you check if it works if you put C:\strawberry\bin at the start of it, since you may be picking up the cygwin perl instead.

@sophia-guo
Copy link
Author

@sxa555 perl JSON and Text::CSV module are needed by our test framework TestKitGen, which is in Eclipse openj9 repo.

@sxa
Copy link
Member

sxa commented Apr 20, 2018

@sophia-guo I understand that - but do you get the problem when running with the Strawberry perl implementation which has had additional modules installed.

EDIT: The strawberry perl installation on that machine should be the default perl for the jenkins user and does already have JSON 2.94 and Text::CSV 1.95 installed

@sophia-guo
Copy link
Author

sophia-guo commented Apr 20, 2018

@sxa555 I'm not sure which perl is picked up in jenkins job.
However , PATH of jenkins job:
PATH=/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files/Mellanox/MLNX_WinOF2/Management Tools:/cygdrive/c/Program Files/Mellanox/MLNX_WinOF2/Performance Tools:/cygdrive/c/Program Files/Mellanox/MLNX_WinOF2/Diagnostic Tools:/cygdrive/c/Program Files/Mellanox/MLNX_VPI/IB/Tools:/cygdrive/c/Program Files/Mellanox/MLNX_CIMProvider/lib/mft:/cygdrive/c/Strawberry/bin:/cygdrive/c/Program Files/Java/jdk8u144-b01/bin:/cygdrive/c/apache-ant/apache-ant-1.10.1/bin:/cygdrive/c/Program Files (x86)/Windows Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program Files/Microsoft SQL Server/110/Tools/Binn:/cygdrive/c/Program Files (x86)/Microsoft SDKs/TypeScript/1.0:/cygdrive/c/Program Files/Microsoft SQL Server/120/Tools/Binn:/cygdrive/c/Program Files/Git/cmd:/usr/bin

I have printout the PATH right before I call perl:
C:\ProgramData\Oracle\Java\javapath;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0;C:\Program Files\Mellanox\MLNX_WinOF2\Management Tools;C:\Program Files\Mellanox\MLNX_WinOF2\Performance Tools;C:\Program Files\Mellanox\MLNX_WinOF2\Diagnostic Tools;C:\Program Files\Mellanox\MLNX_VPI\IB\Tools;C:\Program Files\Mellanox\MLNX_CIMProvider\lib\mft;C:\Strawberry\bin;C:\Program Files\Java\jdk8u144-b01\bin;C:\apache-ant\apache-ant-1.10.1\bin;C:\Program Files (x86)\Windows Kits\8.1\Windows Performance Toolkit;C:\Program Files\Microsoft SQL Server\110\Tools\Binn;C:\Program Files (x86)\Microsoft SDKs\TypeScript\1.0;C:\Program Files\Microsoft SQL Server\120\Tools\Binn;C:\Program Files\Git\cmd;C:\cygwin64\bin;

Remote to the machine and running the perl in cmd and got same issue. And checked that Strawberry actually is under c:\Strawberry\perl\bin and using c:\Strawberry\perl\bin\perl I didn't get that issue. Maybe this wrong path is the issue?
screen shot 2018-04-20 at 1 10 10 pm

@sxa
Copy link
Member

sxa commented Apr 20, 2018

Hmmm yep - I had to restart the machine to get the jenkins agent to update the path properly to be C:\Strawberry\perl\bin. Should be ok now ...

@sophia-guo
Copy link
Author

Using our former test build story on Windows (using batch script instead of shell script) two successful test builds have finished without this configure issue. openjdk_test_x86-64_windows . Next step would be updates windows test jobs.

Close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants