Skip to content

Prepping data for blm project

Matthew Perry edited this page Jul 14, 2014 · 76 revisions

NOTE: All scripts for the BLM project can be found in growth-yield-batch/scripts/blm. There may be other copies on other machines but this is the authoritative source

Initial Prep

  • Using QGIS, Export shp to sqlite as stands_orig.sqlite
  • Copy to master.sqlite
cp stands_orig.sqlite master.sqlite
  • Attributes for BLM stands Here's the list of fields we need for the stands:
Shapefile	Database	Notes
---------	--------	-----
PrjID       standid     unique across study area
AvgElevM    elev        meters, mean
b_slope     slope       median/mean
b_aspect    aspect      majority based on classified slope map?
SiteClass   sitecls     numeric site class code  corresponds to a site tree/index combo
acres       acres    
GNN_FCID    gnnfcid     zonal majority
LocCode     location    3 digit location or region code, based on our hand-drawn map
LAT         lat  
--          habitat     LEAVE BLANK FOR NOW (plant association or habitat code)
--          batch       LEAVE BLANK FOR NOW (a code to divide up the FVS runs)
--          rx      LEAVE BLANK FOR NOW (a comma-delimited list of valid rxs)
Variant     variant     based on our hand drawn map
DIST_NAME   district     
--          blmra       resource area
Category    mgmtgrp         BLM, NSOW, GO (Exclusion/GrowOnly)
  • Fix location codes; use only National Forest code. See scripts/blm/fix_location.sql

  • Change field names on stands table to match; see ~/src/growth-yield-batch/scripts/blm/blm_stands_structure.sql

  • Fill in rx based on mgmt group and variant (e.g. Exclusion gets a '1', NSOW gets a '1,3,4', everything else = blank) .. use ~/src/growth-yield-batch/scripts/blm/blm_apply_rxs_to_stands.py

  • fill in batch (all or some sort of stratified random sampling) ... use ~/src/growth-yield-batch/scripts/blm/blm_stratified_sampling_batch.py

  • Import treelists and climate

Treelists can be found at /g/projects/projects2013/BLMForestTool/Data/Processed/Treelists and Climate inputs at /g/projects/projects2013/BLMForestTool/Data/Source/climate/Climate-FVS/GNN_Climate-FVS_Input_Data

python ~/src/growth-yield-batch/scripts/csv2sqlite.py treelists/FVS_GNN_Treelist.csv master.sqlite treelist
python ~/src/growth-yield-batch/scripts/csv2sqlite.py treelists/FVS_GNN_Snaglist.csv master.sqlite treelist
python ~/src/growth-yield-batch/scripts/csv2sqlite.py climate/FVSClimAttrs.csv master.sqlite climate
  • Modify DEmtwm from climate data
UPDATE climate
SET DEmtwm = 0.000001
WHERE DEmtwm = 0
-- maybe do other DE*
  • Create indexes
sqlite3 master.sqlite "CREATE INDEX stand_idx ON climate(StandID);"
sqlite3 master.sqlite "CREATE INDEX scenario_idx ON climate(Scenario);"
sqlite3 master.sqlite "CREATE INDEX gnn_fcid_idx ON treelist(GNN_FCID);"
sqlite3 master.sqlite "CREATE INDEX standid_idx ON stands(standid);"
  • Edit and run blm/prep_gyb.py <batch> .. where is sample.. or all
unbuffer python ~/src/growth-yield-batch/scripts/blm/prep_gyb.py <batch> | tee <batch>.log

Run Growth&Yield Batch

Within the batch project directory (e.g. ./<batch>/) there should already be a cond directory. Next, copy the entire rx directory over from projects/BLMClimate/rx where the master rxs reside for this project.

Edit config.json accordingly.

Transfer to ford at /storage/gyb_projects/BLM_Batch1/

cd /storage/gyb_projects/BLM_Batch1/<batch>
build_keys.py --cores=16
batch_fvs.py --cores=16

Post-processing

  • fix failures

batch_fvs.py --fix --cores=X

Keep going and debugging until you can ensure that the remaining plots simply are not valid.

  • (OPTIONAL) Put the master db on RAM disk to make IO super-fast
sudo mkdir /mnt/ramdisk
sudo mount -t tmpfs -o size=50G tmpfs /mnt/ramdisk
cp master.sqlite /mnt/ramdisk
cp <batch>/final/data.db /mnt/ramdisk
cd /mnt/ramdisk
  • copy fvsaggregate table to master
$ sqlite3 master.sqlite
-- data.db is in <batch>/final/data.db or at /mnt/ramdisk if applicable
ATTACH 'data.db' AS D; 

-- confirm
.databases

-- Drop table (if it already exists)
DROP TABLE fvsaggregate;

-- Create the table
CREATE TABLE fvsaggregate AS SELECT * FROM D.trees_fvsaggregate;

-- If you want to merge the data to an existing table, there is also an INSERT
INSERT INTO fvsaggregate SELECT * FROM D.trees_fvsaggregate;
  • Identify and remove stands with any failures
create index ix_condrxoff ON fvsaggregate (cond, rx, "offset");

-- should give max of 189 (21 time periods x 9 climates) good!
SELECT max(n) 
FROM (SELECT count(*) as n, cond, rx, "offset" 
      FROM fvsaggregate
      GROUP by cond, rx, "offset") t;

-- identify all cond that do not have complete records
.mode csv
.output bad.csv
SELECT cond
FROM (SELECT count(*) as cnt, cond, rx, "offset"
      FROM fvsaggregate
      GROUP BY cond, rx, "offset" )
WHERE cnt < 189
GROUP BY cond;
.output stdout

-- Construct a sql query from `bad.csv`
-- May be more elegant way to delete them??
DELETE FROM fvsaggregate
WHERE cond in ("287", "374", "944", "1644", "1805", "1846", "1847", "2236", "2752", "2850", "3063", "3287", "3459", "3759", "3878", "3996", "4145", "5453", "5473", "5486", "5503", "5648", "5740", "5757", "5762", "5764", "5873", "5914", "6001", "6057", "6069", "6108", "6348", "6402", "6563", "6564", "6706", "6743", "6744", "6974", "7027", "7063", "7097", "7137", "7194", "7202", "7265", "7447", "7548", "7813", "8058", "17567", "18862", "19491", "25442", "25461", "32985", "34342", "37224", "48388", "48392", "48435", "48813", "48836", "49049", "49123", "49157", "49704", "49709", "49876", "49907", "49919", "49965", "50220", "50236", "50422", "50596", "50631", "50662", "50748", "50895", "50964", "51068", "51071", "51176", "51214", "51297", "51306", "51311", "51381", "51415", "51422", "51426", "51493", "51593", "51603", "51691", "51721", "51755", "51811", "51897", "51965", "51991", "52044", "52115", "52140", "52437", "52451", "52541", "52544", "52569", "52627", "52713", "52719", "52732", "52809", "52816", "52885", "53078", "53093", "53238", "53636", "53802", "53854", "53871", "53993", "54054", "54055", "54060", "54136", "54200", "54263", "54364", "54408", "54409", "54482", "54524", "54532", "54577", "54679", "54680", "54751", "54778", "54811", "54815", "54859", "54910", "54927", "55242", "55346", "55402", "55568", "56330", "56584", "56636", "57935", "58207", "58284", "58312", "58431", "58626", "58646", "80699", "86549");

-- dont need this anymore
DROP index ix_condrxoff;
  • Create other indexes
CREATE INDEX idx_fvsaggregate_var ON fvsaggregate (var);
CREATE INDEX idx_fvsaggregate_year ON fvsaggregate (year);
CREATE INDEX idx_fvsaggregate_cond ON fvsaggregate (cond);
CREATE INDEX idx_fvsaggregate_rx ON fvsaggregate (rx);
CREATE INDEX idx_fvsaggregate_climate ON fvsaggregate (climate);
CREATE INDEX idx_fvsaggregate_offset ON fvsaggregate ("offset");
CREATE INDEX idx_fvs ON fvsaggregate ("var", "rx", "cond", "site", "climate", "offset");
CREATE INDEX idx_yearclim ON fvsaggregate ("year", "climate");

Run summary queries

TODO

how many stand ids have gnn pixels without treelists
how many stands actually have fvsaggregate results
counts of stands, gnnfcids, years, cliimate, variant, rx, offset... summary
area broken down by mgmtgroup, district
output shapefile

Run Scheduler

preprocess using sql query

drop table fvs_stands;

create table fvs_stands as 
    select  
        total_stand_carbon * acres as carbon,
        removed_merch_bdft * acres / 1000.0 as timber, -- mbf
        after_merch_bdft * acres / 1000.0 as standing, --mbf
        NSONEST * acres as owl,
        (CASE WHEN FIREHZD > 3 THEN acres ELSE 0 END) as fire,
        removed_merch_bdft * slope * acres / 1000.0 as cost, -- OK as proxy
        removed_total_ft3 * acres as removed_vol, -- per period
        accretion * acres * 5 as accretion_vol, -- annual
        mortality * acres * 5 as mortality_vol, -- annual
        start_total_ft3 * acres as start_vol, -- per period
        fortype,
        year,
        standid,
        variant,
        district,
        mgmtgrp,
        fvs.rx as rx,
        "offset",
        climate,
        acres
    from fvsaggregate fvs
    join stands as s
    on s.standid = fvs.cond
    where fvs.total_stand_carbon is not null -- should remove any blanks
    ORDER BY standid, year;

-- add indexes for year and climate
create index ix_fvs_stands_rxoff ON fvs_stands (rx, "offset");
create index ix_fvs_stands_all ON fvs_stands (standid, rx, "offset", climate);

Important note: the scheduler only optimizes on a single climate scenario

  • Copy growth-yield-batch/scripts/blm/schedule_blm.py to your local dir and modify accordingly

Run it, "calibrate" it, rinse, repeat.

*HINT: you can run the scheduler once on the server and transfer the cache* files to run in a graphical environment in order to view the realtime graph.

When scheduling is completed, copy optimal rxs to sqlite

for i in results/*_stands_rx.csv; do 
 echo $i
 python /home/mperry/src/growth-yield-batch/scripts/csv2sqlite.py $i ../../master.sqlite optimalrx
done

Query it

Join the fvs_stands results with the optimal rxs

create index ix_optimalrx_all ON optimalrx (stand, rx, "offset", climate);

SELECT s.year, s.climate as climate,
                sum(carbon) as carbon, sum(timber) as timber, sum(owl) as owl, sum(cost) as cost
    FROM fvs_stands as s
    JOIN optimalrx as o
    ON s.standid = o.stand
    AND s.rx = o.rx
    AND s.offset = o.offset
    AND s.climate = o.climate
    GROUP BY s.year, s.climate;

Alternatively, the same information can be output by the scheduler into a csv and used directly

csv2sqlite results.csv master.sqlite results

Acres of land under each mgmt

SELECT o.rx as rx, s.climate as climate, sum(s.acres) as acres, count(s.acres) as num
    FROM fvs_stands as s
    JOIN optimalrx as o
    ON s.standid = o.stand
    AND s.rx = o.rx
    AND s.offset = o.offset
    AND s.climate = o.climate
    WHERE s.year = 2013
    GROUP BY o.rx, s.climate;

Output to csv and graph in R