Skip to content
Bryan Drewery edited this page Nov 13, 2015 · 8 revisions

Todo and Known Issues

3.0

These should be fixed in trunk and then merged back to 3.0.x after enough testing.

  • Possibly a ZFS ports tree issue where the tree is checked out before filesystem created - might have been operator error.
  • Document hooks
  • Validate jail -c -v amd64 is a number!
  • Poudriere jails have major issues with error handling/cleanup:
  • jail -c does not cleanup jail on failure
  • Failure in jail -u freebsd-update can leave 91amd64 running with no way to stop it
  • pkg bootstrap skipping sanity is no good, skips all version checks. Need to have a bootstrap mode so it reruns.
  • This is wrong, should skip shell-meta as npm was skipped, not because node failed:
====>> [01] Finished build of www/node: Failed: build
====>> [01] Skipping build of www/npm: Dependent port www/node failed
====>> [01] Skipping build of local/shell-meta: Dependent port www/node failed
  • False positive old ver delete seen: ====>> Deleting old version: dovecot-sieve-1.2 0.1.19.txz
  • New dependency checking should account for DELETED deps too (might not be an issue)
  • Options checking needs to account for DELETED options (reported by feld)
  • Document/test portshaker usage
  • pkgclean/options MOVED support
  • Display DEPRECATED ports at end and UNMAINTAINED ports.

3.1

  • Convert+import updated processonelog
  • options needs to be jailed. (#214)
  • logclean to cleanup old logs (bdrewery has started this). Note that not easily possible to show "free space" calc due to this involving mostly hard-links.
  • Add hook for ports updating, before/after (status ptname path)
  • Add hook between various stages of startup for profiling
  • Allow building from existing /usr/src - currently it skips build.
  • Replace dependency checking with new Mk/Scripts version
  • "soft exit", "exit when idle" mode, SIGUSR1 (#356)
  • Missing poudriere.conf/BASEFS checks fail due to not having msg_error defined yet.
  • Need to run pkgclean automatically during startup. This avoids pkg-repo finding unwanted packages.
    • Needs to be configurable.
    • Don't do this with -Ct specified or testport.

Stability

  • There is substantial risk that large ports build at once and consume all RAM/swap and cause a OOM or panic. Either need to make the queue wait on these known large ones or monitor the amount of remaining memory and current CPU load and delay builds while high. Note that hidden in this task is reworking the queue to allow delaying builds. This is not possible currently and conflicts with detecting a stuck/deadlocked queue. A more flexible queue would allow retrying fetch failures or failed builds (due to memory constraints).
# An attempt to get % memory used, considering ARC_MFU as non-free.
sysctl -n vm.stats.vm.v_inactive_count vm.stats.vm.v_cache_count vm.stats.vm.v_free_count hw.physmem kstat.zfs.misc.arcstats.size vfs.zfs.mfu_size hw.pagesize|tr '\n' ' '| { read vm1 vm2 vm3 physmem arc arc_mfu pagesize; echo "scale=2;((${physmem} - ((${vm1} + ${vm2} + ${vm3}) * ${pagesize}) - ${arc} + ${arc_mfu}) / ${physmem}) * 100"|bc;}
  • rctl(8) support to limit jail memory usage. This may hurt performance, but will ensure no jail uses all RAM. ulimit -m does not work per-jail, it is only per-process.
  • Assigning jails cpusets to ensure misbehaving ports don't hog all CPU and only use the expected 1 CPU.
  • Fetch retrying is critical. A major port failing to fetch due to intermittent issues leads to the whole tree being skipped. This probably should be fixed in Mk/bsd.port.mk as currently make checksum does not retry to fetch on download failure, only checksum failure.
  • If poudriere crashes it can leave behind its ref jail which confuses poudriere status into thinking a build is still running. This is problematic for automated builds. This can be fixed by having poudriere record its pid somewhere and having status kill -0 the pid to see if it still running.

Poudriered

  • -j all support needed in clients, or poudriered. Placing into poudriered would allow logically grouping the job together better
  • queue needs support for queuing to the socket
  • Remove daemon.sh
  • Configuration for how many jobs can run at once for poudriered
  • Queueing is asynchronous now, but only 1 (or configuration per above) should run at a time
  • HTML Queue page linking to builds

Future

  • Package seeding (download from elsewhere). This must be smart and check remote options/dependencies before downloading and compare to local. If the incremental build will end up deleting it right away then don't download. If we download after sanity then we need to ensure the packages download match what the build actually needs.
  • statsd
  • Start building faster, during compute-deps if possible. I.e., ports-mgmt/pkg can build immediately before any dependency checking since it has none. Generalizing this may allow more to build as they are determined to have little dependencies.
  • Status lacks showing jail -cu
  • Speedup dep calculation more. Original idea here was to detect if ports tree changed and to cache everything. Detecting if the tree changed though is slow with stat(2) all files.

4.0

4.0 will essentially be a rewrite with new architecture. Most code will move to C and we will design the architecture to support distributed builds.

  • Need updated Design for 3.x and one for 4.x
  • Need to document requirements (3.x functionality) to ensure we don't lose functionality,checks or regress.
  • Privilege separation with libnv(3). Sub-commands should not be ran as root, rather specific needs should be passed to poudriered such as jail_start or jail_switch_networking.
  • Sandboxing with Capsicum
  • Currently poudriere executes everything from the context of the host using jexec. This hurts performance a lot as every jexec and fork/exec creates a lot of lock contention. The jail should spawn up a client and the host (master) should communicate to it with commands. This also improves sandboxing substatially as it avoids accidentlly forgetting injail in the code. This architecture lends itself to remote jails/builders as well.
  • REST daemon
  • Use automated testing from the start.

Wishlist

  • Unionfs fixed
  • tmpfs -o clone with COW support. This would greatly reduce the amount of RAM needed to use tmpfs and speedup jail creation, rollback, etc. ZFS almost accomplishes this but it still will eventually write blocks back to disk. If ZFS had a tmpfs option to never write the data to disk this would also suffice.
Clone this wiki locally