Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16708 - provide PMDK pkg with NDCTL enabled (2.1.0-2) #38

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

grom72
Copy link
Contributor

@grom72 grom72 commented May 16, 2024

This PR create PMDK package with NDCTL.

Explicitly enable Valgrind

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
…purpose

Test-master-branch: grom72/ndctl-validation

Skip-list-master:  test_ior_intercept_libioil:DAOS-16260  test_osa_offline_reintegration_without_checksum:DAOS-15608  test_daos_drain_simple:DAOS-15271  test_daos_rebuild_ec:DAOS-14982  test_osa_offline_reintegration_without_checksum:DAOS-14570
Skip-list-release/2.6: test_cart_rpc:DAOS-15989 test_daos_rebuild_ec:DAOS-14982 test_dfuse_daos_build_wt_pil4dfs:DAOS-16215 test_daos_rebuild_simple:DAOS-15290 test_dfuse_daos_build_wb:DAOS-16215 test_dfuse_daos_build_wt:DAOS-16215 test_dfuse_daos_build_metadata:DAOS-16215 test_dfuse_daos_build_data:DAOS-16215 test_dfuse_daos_build_nocache:DAOS-16215
Allow-unstable-test: true
Skip-func-hw-test-large: false

Priority: 2

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Check if 18kB stack size can be used with Valgrind enabled.

Test-master-branch: grom72/ndctl-validation

Skip-list-master: test_ior_intercept_libioil:DAOS-16260 test_osa_offline_reintegration_without_checksum:DAOS-15608  test_daos_drain_simple:DAOS-15271  test_daos_rebuild_ec:DAOS-14982  test_osa_offline_reintegration_without_checksum:DAOS-14570
Skip-list-release/2.6: test_cart_rpc:DAOS-15989 test_daos_rebuild_ec:DAOS-14982 test_dfuse_daos_build_wt_pil4dfs:DAOS-16215 test_daos_rebuild_simple:DAOS-15290 test_dfuse_daos_build_wb:DAOS-16215 test_dfuse_daos_build_wt:DAOS-16215 test_dfuse_daos_build_metadata:DAOS-16215 test_dfuse_daos_build_data:DAOS-16215 test_dfuse_daos_build_nocache:DAOS-16215
Allow-unstable-test: true

Priority: 2

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Test-master-branch: grom72/ndctl-validation

Skip-list-master: test_ior_intercept_libioil:DAOS-16260 test_osa_offline_reintegration_without_checksum:DAOS-15608  test_daos_drain_simple:DAOS-15271  test_daos_rebuild_ec:DAOS-14982  test_osa_offline_reintegration_without_checksum:DAOS-14570
Skip-list-release/2.6: test_cart_rpc:DAOS-15989 test_daos_rebuild_ec:DAOS-14982 test_dfuse_daos_build_wt_pil4dfs:DAOS-16215 test_daos_rebuild_simple:DAOS-15290 test_dfuse_daos_build_wb:DAOS-16215 test_dfuse_daos_build_wt:DAOS-16215 test_dfuse_daos_build_metadata:DAOS-16215 test_dfuse_daos_build_data:DAOS-16215 test_dfuse_daos_build_nocache:DAOS-16215
Allow-unstable-test: true
Skip-func-hw-test: true
Skip-func-test: true
Priority: 2

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Test-master-branch: grom72/ndctl-validation

Skip-list-master: test_ior_intercept_libioil:DAOS-16260 test_osa_offline_reintegration_without_checksum:DAOS-15608  test_daos_drain_simple:DAOS-15271  test_daos_rebuild_ec:DAOS-14982  test_osa_offline_reintegration_without_checksum:DAOS-14570
Skip-list-release/2.6: test_cart_rpc:DAOS-15989 test_daos_rebuild_ec:DAOS-14982 test_dfuse_daos_build_wt_pil4dfs:DAOS-16215 test_daos_rebuild_simple:DAOS-15290 test_dfuse_daos_build_wb:DAOS-16215 test_dfuse_daos_build_wt:DAOS-16215 test_dfuse_daos_build_metadata:DAOS-16215 test_dfuse_daos_build_data:DAOS-16215 test_dfuse_daos_build_nocache:DAOS-16215
Allow-unstable-test: true
Skip-func-hw-test: true
Skip-func-test: true
Priority: 2

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Test-master-branch: grom72/ndctl-validation

Skip-list-master: test_ior_intercept_libioil:DAOS-16260 test_osa_offline_reintegration_without_checksum:DAOS-15608  test_daos_drain_simple:DAOS-15271  test_daos_rebuild_ec:DAOS-14982  test_osa_offline_reintegration_without_checksum:DAOS-14570
Skip-list-release/2.6: test_cart_rpc:DAOS-15989 test_daos_rebuild_ec:DAOS-14982 test_dfuse_daos_build_wt_pil4dfs:DAOS-16215 test_daos_rebuild_simple:DAOS-15290 test_dfuse_daos_build_wb:DAOS-16215 test_dfuse_daos_build_wt:DAOS-16215 test_dfuse_daos_build_metadata:DAOS-16215 test_dfuse_daos_build_data:DAOS-16215 test_dfuse_daos_build_nocache:DAOS-16215
Allow-unstable-test: true
Priority: 2

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
@grom72 grom72 changed the title WIP - enabling NDCTL on the top of pmdk 2.1.0 DAOS-14514 - enabling NDCTL on the top of pmdk 2.1.0 Aug 20, 2024
@grom72 grom72 changed the title DAOS-14514 - enabling NDCTL on the top of pmdk 2.1.0 DAOS-14408 - enabling NDCTL on the top of pmdk 2.1.0 Aug 21, 2024
@grom72 grom72 marked this pull request as ready for review September 3, 2024 18:19
@grom72 grom72 requested a review from a team as a code owner September 3, 2024 18:19
Copy link
Contributor

@brianjmurrell brianjmurrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this was ready for review or not, but since it's not a Draft, I have provided some suggestions.

pmdk.spec Show resolved Hide resolved
pmdk.spec Outdated
Comment on lines 22 to 24
%define _make_common_args EXTRA_CFLAGS="-Wno-error" NORPATH=1 BUILD_EXAMPLES=n BUILD_BENCHMARKS=n

%if %{with ndctl}
%define make_common_args %{_make_common_args}
%else
%define make_common_args %{_make_common_args} NDCTL_ENABLE=n PMEMOBJ_IGNORE_DIRTY_SHUTDOWN=y PMEMOBJ_IGNORE_BAD_BLOCKS=y
%endif
%define make_common_args %{_make_common_args}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no use-case for --without-ndctl then this indirection through %{_make_common_args} is no longer necessary and just adds confusion (this suggestion may look goofy because GH doesn't really support suggesting on non-changed lines, but I suspect you will get the meaning regardless):

Suggested change
%define _make_common_args EXTRA_CFLAGS="-Wno-error" NORPATH=1 BUILD_EXAMPLES=n BUILD_BENCHMARKS=n
%if %{with ndctl}
%define make_common_args %{_make_common_args}
%else
%define make_common_args %{_make_common_args} NDCTL_ENABLE=n PMEMOBJ_IGNORE_DIRTY_SHUTDOWN=y PMEMOBJ_IGNORE_BAD_BLOCKS=y
%endif
%define make_common_args %{_make_common_args}
%define make_common_args EXTRA_CFLAGS="-Wno-error" NORPATH=1 BUILD_EXAMPLES=n BUILD_BENCHMARKS=n

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Test-master-branch: grom72/ndctl-validation

Skip-list-master: test_ior_intercept_libioil:DAOS-16260 test_osa_offline_reintegration_without_checksum:DAOS-15608  test_daos_drain_simple:DAOS-15271  test_daos_rebuild_ec:DAOS-14982  test_osa_offline_reintegration_without_checksum:DAOS-14570
Skip-list-release/2.6: test_cart_rpc:DAOS-15989 test_daos_rebuild_ec:DAOS-14982 test_daos_rebuild_simple:DAOS-15290
Allow-unstable-test: true
Priority: 2

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
daltonbohning pushed a commit to daos-stack/daos that referenced this pull request Oct 9, 2024
This PR prepares DAOS to be used with NDCTL enabled in PMDK, which means:
- NDCTL must not be used when non-DCPM (simulate PMem) - `storage class: "ram"` is used:
`PMEMOBJ_CONF=sds.at_create=0` env variable disables NDCTL features in the PMDK
This change affects all tests run on simulated PMem (e.g. inside VMs).
Some DOAS utility applications may also require `PMEMOBJ_CONF=sds.at_create=0` to be set.

- The default ULT stack size must be at least 20KiB to avoid stack overuse by PMDK with NDCTL enabled and be aligned with Linux page size.
`ABT_THREAD_STACKSIZE=20480` env variable is used to increase the default ULT stack size.
This env variable is set by control/server module just before engine is started.
Much bigger stack is used for pmempool open/create-related tasks e.g. `tgt_vos_create_one` to avoid stack overusage. 

This modification shall not affect md-on-ssd mode as long as `storage class: "ram"` is used for the first tier in the `storage` configuration.
This change does not require any configuration changes to existing systems.

The new PMDK package with NDCTL enabled (daos-stack/pmdk#38) will land as soon as this PR is merged and backported to stable/2.6.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
grom72 added a commit to daos-stack/daos that referenced this pull request Oct 10, 2024
This PR prepares DAOS to be used with NDCTL enabled in PMDK, which means:
- NDCTL must not be used when non-DCPM (simulate PMem) - `storage class: "ram"` is used:
`PMEMOBJ_CONF=sds.at_create=0` env variable disables NDCTL features in the PMDK
This change affects all tests run on simulated PMem (e.g. inside VMs).
Some DOAS utility applications may also require `PMEMOBJ_CONF=sds.at_create=0` to be set.

- The default ULT stack size must be at least 20KiB to avoid stack overuse by PMDK with NDCTL enabled and be aligned with Linux page size.
`ABT_THREAD_STACKSIZE=20480` env variable is used to increase the default ULT stack size.
This env variable is set by control/server module just before engine is started.
Much bigger stack is used for pmempool open/create-related tasks e.g. `tgt_vos_create_one` to avoid stack overusage.

This modification shall not affect md-on-ssd mode as long as `storage class: "ram"` is used for the first tier in the `storage` configuration.
This change does not require any configuration changes to existing systems.

The new PMDK package with NDCTL enabled (daos-stack/pmdk#38) will land as soon as this PR is merged.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
grom72 added a commit to daos-stack/daos that referenced this pull request Oct 10, 2024
This PR prepares DAOS to be used with NDCTL enabled in PMDK, which means:
- NDCTL must not be used when non-DCPM (simulate PMem) - `storage class: "ram"` is used:
`PMEMOBJ_CONF=sds.at_create=0` env variable disables NDCTL features in the PMDK
This change affects all tests run on simulated PMem (e.g. inside VMs).
Some DOAS utility applications may also require `PMEMOBJ_CONF=sds.at_create=0` to be set.

- The default ULT stack size must be at least 20KiB to avoid stack overuse by PMDK with NDCTL enabled and be aligned with Linux page size.
`ABT_THREAD_STACKSIZE=20480` env variable is used to increase the default ULT stack size.
This env variable is set by control/server module just before engine is started.
Much bigger stack is used for pmempool open/create-related tasks e.g. `tgt_vos_create_one` to avoid stack overusage.

This modification shall not affect md-on-ssd mode as long as `storage class: "ram"` is used for the first tier in the `storage` configuration.
This change does not require any configuration changes to existing systems.

The new PMDK package with NDCTL enabled (daos-stack/pmdk#38) will land as soon as this PR is merged.

Allow-unstable-test: true
Priority: 2

Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Skip-list: test_dfuse_daos_build_wt_pil4dfs:DAOS-16556

Allow-unstable-test: true
Priority: 2

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
@grom72 grom72 changed the title DAOS-14408 - enabling NDCTL on the top of pmdk 2.1.0 DAOS-16708 - provide PMDK pkg with NDCTL enabled (2.1.0-2) Oct 15, 2024
@brianjmurrell
Copy link
Contributor

There are a lot of failures in the downstream DAOS testing on both master and release/2.6. Are we sure these are unrelated to this PR? I have a hard time imagining that the tip of release/2.6 has 147 failures on it.

daltonbohning pushed a commit to daos-stack/daos that referenced this pull request Oct 16, 2024
)

* DAOS-14408 common: enable NDCTL for DCPM

This PR prepares DAOS to be used with NDCTL enabled in PMDK, which means:
- NDCTL must not be used when non-DCPM (simulate PMem) - `storage class: "ram"` is used:
`PMEMOBJ_CONF=sds.at_create=0` env variable disables NDCTL features in the PMDK
This change affects all tests run on simulated PMem (e.g. inside VMs).
Some DOAS utility applications may also require `PMEMOBJ_CONF=sds.at_create=0` to be set.

- The default ULT stack size must be at least 20KiB to avoid stack overuse by PMDK with NDCTL enabled and be aligned with Linux page size.
`ABT_THREAD_STACKSIZE=20480` env variable is used to increase the default ULT stack size.
This env variable is set by control/server module just before engine is started.
Much bigger stack is used for pmempool open/create-related tasks e.g. `tgt_vos_create_one` to avoid stack overusage.

This modification shall not affect md-on-ssd mode as long as `storage class: "ram"` is used for the first tier in the `storage` configuration.
This change does not require any configuration changes to existing systems.

The new PMDK package with NDCTL enabled (daos-stack/pmdk#38) will land as soon as this PR is merged.

Signed-off-by: Jan Michalski <jan.michalski@intel.com>
Skip-list: test_dfuse_daos_build_wt_pil4dfs:DAOS-16556

Allow-unstable-test: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
@grom72
Copy link
Contributor Author

grom72 commented Oct 16, 2024

There are a lot of failures in the downstream DAOS testing on both master and release/2.6. Are we sure these are unrelated to this PR? I have a hard time imagining that the tip of release/2.6 has 147 failures on it.

release/2.6 failures are expected in combination when the old DAOS (w/o NDCTL support) DAOS is used w/ PMDK that uses NDCTL:
*ERROR* usc_ndctl.c: 55: pmem2_source_device_usc: Unsafe shutdown count is not supported for this source
All errors are related to tests on VMs when NDCTL-related features must be intentionally disabled as simulated PMem (RAM-based) does not support it => see daos-stack/daos#14371 (comment): PMEMOBJ_CONF=sds.at_create=0

master's tests failures are related to md-on-ssd, but let us wait for the next validation cycle

@brianjmurrell
Copy link
Contributor

release/2.6 failures are expected in combination when the old DAOS (w/o NDCTL support) DAOS is used w/ PMDK that uses NDCTL: *ERROR* usc_ndctl.c: 55: pmem2_source_device_usc: Unsafe shutdown count is not supported for this source All errors are related to tests on VMs when NDCTL-related features must be intentionally disabled as simulated PMem (RAM-based) does not support it => see daos-stack/daos#14371 (comment): PMEMOBJ_CONF=sds.at_create=0

Regardless of why, what this is telling us is that as soon as we land this, we are going to break both relelase/2.6 future releases and all ongoing (i.e. daily/weekly/etc.) testing on relelase/2.6. We cannot do that as we are just throwing >100 "known failures" on to the pile and further creating a situation that will need to be resolved to make another 2.6.x release (which we might be forced to do to fix a bug, even if we currently have no plans for a 2.6.2).

So ultimately, one way or another, we need to resolve the >100 test failures that we are going to introduce to release/2.6 by landing this. Should relelase/2.6 not get whatever is necessary for it to work with this new pmdk backported to it?

Skip-list: test_ior_intercept_libpil4dfs:DAOS-16343

Priority: 2
Allow-unstable-test: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Skip-list: test_ior_intercept_libpil4dfs:DAOS-16260 test_dfuse_daos_build_wt_pil4dfs:DAOS-16732

Allow-unstable-test: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
@grom72
Copy link
Contributor Author

grom72 commented Oct 22, 2024

release/2.6 failures are expected in combination when the old DAOS (w/o NDCTL support) DAOS is used w/ PMDK that uses NDCTL: *ERROR* usc_ndctl.c: 55: pmem2_source_device_usc: Unsafe shutdown count is not supported for this source All errors are related to tests on VMs when NDCTL-related features must be intentionally disabled as simulated PMem (RAM-based) does not support it => see daos-stack/daos#14371 (comment): PMEMOBJ_CONF=sds.at_create=0

Regardless of why, what this is telling us is that as soon as we land this, we are going to break both relelase/2.6 future releases and all ongoing (i.e. daily/weekly/etc.) testing on relelase/2.6. We cannot do that as we are just throwing >100 "known failures" on to the pile and further creating a situation that will need to be resolved to make another 2.6.x release (which we might be forced to do to fix a bug, even if we currently have no plans for a 2.6.2).

So ultimately, one way or another, we need to resolve the >100 test failures that we are going to introduce to release/2.6 by landing this. Should relelase/2.6 not get whatever is necessary for it to work with this new pmdk backported to it?

@brianjmurrell , please take a look at daos-stack/daos#15203

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants