Skip to content

clindocu/sasxpt-r

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

sasxpt-r

Import of SAS .xpt files into R. Generation of a Data Definition Table (DDT) in SAS.

Data checks regarding values in data base vs. format definitions/value labels.

Overview

The goal of this SAS program is to import SAS .xpt files by generating automatically an executable .R script file that generates R data frames - step by step - with variable factors, levels, labels and R date/times.

For importing of .xpt files two steps must be followed:

  1. Run SAS program sasxpt_r.sas in SAS
  2. Run impsas_xpt.R (generated by the SAS program) in R or RStudio

The results are:

  1. A Data Definition Table (SAS Output)
  2. An .R script file that contains R statements/functions to import .xpt files in R data frames including variable factors, levels, labels and R date/times

Platform

SAS on Windows, R or RStudio

Usage

Modify the SAS program sasxpt_r.sas

Enter manually:

  1. Directory where SAS .xpt files are stored (e.g. C:\Temp\R)
filename dir_list pipe 'dir "C:\Temp\R"  /a:-d-h-s /b';
  1. R working directory (the same directory as in 1. but with slash /)
%let RSETWD=C:/Temp/R/;
  1. Create and specify a Subdirectory where SAS .xpt files will be converted temporary to .sas7bdat files
libname tmpsas "C:\Temp\R\SAS_TEMP";
  1. Specify the location of the SAS program with proc format statements (alternatively copy SAS format catalog to 'work.formats')
%include "C:\Temp\R\Formats01.sas";

Afterwards:

-> Run program sasxpt_r.sas in SAS

-> Open impsas_xpt.R (generated by the SAS program) in R or RStudio

-> Set encoding to UTF-8 (e.g. RStudio -> File -> Reopen with Encoding -> UTF-8) and run the script file

Result

Following SAS output (example) will be automatically generated by the SAS program:

Data Definition Table (SAS ouput)

Member Column                                                 Column
Name   Name        Column Label                        Type   Format      Decode

DM     STUDYID     Study Identifier                    char   $13.
       DOMAIN      Domain Abbreviation                 char   $2.
       USUBJID     Unique Subject Identifier           char   $20.
       SUBJID      Subject Identifier for the Study    char   $3.
       RFSTDTC     Subject Reference Start Date/Time   char   $16.
       RFENDTC     Subject Reference End Date/Time     char   $10.
       RFXSTDTC    Date/Time of First Study Treatment  char   $16.
       RFXENDTC    Date/Time of Last Study Treatment   char   $16.
       RFICDTC     Date/Time of Informed Consent       char   $10.
       RFPENDTC    Date/Time of End of Participation   char   $10.
       DTHDTC      Date/Time of Death                  char   $10.
       DTHFL       Subject Death Flag                  char   $YES.       Y = Yes
       SITEID      Study Site Identifier               char   $2.
       INVNAM      Investigator Name                   char   $28.
       AGE         Age                                 num    BEST8.
       AGEU        Age Units                           char   $5.
       SEX         Sex                                 char   $SEX.       F = Female, M = Male
       RACE        Race                                char   $41.
       ARMCD       Planned Arm Code                    char   $20.
       ARM         Description of Planned Arm          char   $11.
       TESTDTM     Datetime of (Test)                  dtim   DATETIME16.
       TESTDT      Date of (Test)                      date   DATE9.
       TESTTM      Time of (Test)                      time   TIME6.
          
a.s.o.

File impsas_xpt.R (automatically generated)

Following .R script file (example) will be automatically generated by the SAS program:

#### Set R working directory
setwd("C:/Temp/R/")
#### Use install.packages("foreign"), install.packages("Hmisc"), install.packages("chron") if necessary
library(foreign)
library(Hmisc)
library(chron)
#### Read all .xpt files from working directory. Derive date/times. Set factor, levels, labels.
DM <- read.xport("DM.xpt")
DS <- read.xport("DS.xpt")
TA <- read.xport("TA.xpt")
# a.s.o.;
DM$TESTDTM <- as.POSIXct(DM$TESTDTM, tz = "GMT", origin = "1960-01-01")
DM$TESTDT <- as.Date(DM$TESTDT, origin = "1960-01-01")
DM$TESTTM <- as.POSIXct(DM$TESTTM, tz = "GMT", origin = "0001-01-01")
DM$TESTTM <- times(format(DM$TESTTM, "%H:%M:%S")) # chron times class
# a.s.o.;
label(DM$STUDYID) <- "Study Identifier"
label(DM$DOMAIN) <- "Domain Abbreviation"
label(DM$USUBJID) <- "Unique Subject Identifier"
label(DM$SUBJID) <- "Subject Identifier for the Study"
label(DM$RFSTDTC) <- "Subject Reference Start Date/Time"
label(DM$RFENDTC) <- "Subject Reference End Date/Time"
label(DM$RFXSTDTC) <- "Date/Time of First Study Treatment"
label(DM$RFXENDTC) <- "Date/Time of Last Study Treatment"
label(DM$RFICDTC) <- "Date/Time of Informed Consent"
label(DM$RFPENDTC) <- "Date/Time of End of Participation"
label(DM$DTHDTC) <- "Date/Time of Death"
DM$DTHFL <- factor(DM$DTHFL, c("Y"), exclude = "")
levels(DM$DTHFL) <- c("Yes")
label(DM$DTHFL) <- "Subject Death Flag"
label(DM$SITEID) <- "Study Site Identifier"
label(DM$INVNAM) <- "Investigator Name"
label(DM$AGE) <- "Age"
label(DM$AGEU) <- "Age Units"
DM$SEX <- factor(DM$SEX, c("F", "M"), exclude = "")
levels(DM$SEX) <- c("Female", "Male")
label(DM$SEX) <- "Sex"
label(DM$RACE) <- "Race"
label(DM$ARMCD) <- "Planned Arm Code"
label(DM$ARM) <- "Description of Planned Arm"
label(DM$ACTARMCD) <- "Actual Arm Code"
label(DM$TESTDTM) <- "Datetime of (Test)"
label(DM$TESTDT) <- "Date of (Test)"
label(DM$TESTTM) <- "Time of (Test)"
# a.s.o.;

After running this .R script file in R or RStudio -> look for warnings (there should not be any) -> finished

Note: Beside the R base functions (as.POSIXct, as.Date, factor, levels and format) only the functions foreign::read.xport, Hmisc::label and chron::times will be used for generating R data frames.

Alternative method

Note: There is an alternative - very much shorter - method to import SAS .xpt files into R, e.g. using the function Hmisc::sasxport.get:

library(foreign)
# Formats exported in SAS with 'proc format cntlout=format; run; quit;'
form <- read.xport("format.xpt")

library(Hmisc)
# Convert .xpt files to R data frames with formats, levels, labels and R date/times;
DM <- sasxport.get("DM.xpt", formats = form, lowernames = FALSE, allow = "_")

Tests

Test platform: SAS 9.4 windows 7 64-bit, R 3.3.2

See test data in .../testsas

Restrictions

SAS Formats

Only "simple" user-defined SAS formats (numeric or character) can be used, e.g.:

********** Definition of SAS Formats in library 'work';
proc format;
    value $NY (notsorted)
              "N"="No"
              "Y"="Yes"
              "U"="Unknown";
              
    valee $YES "Y"="Yes";
              
    value $SEX "F"="Female"
               "M"="Male";

    value AESEV 1="MILD"
                2="MODERATE"
                3="SEVERE";
               
run;
quit;

Note that the notsorted option is very useful if the rank order should be maintained (instead of an alphabetic order), here No, Yes, Unknown

Formats like e.g. 30 - 50 = 'Between 30 and 50', or picture formats cannot be used.

SAS xport Files

SAS XPORT files should be created by the COPY Procedure in SAS Version 5, Version 6 and higher of the SAS Software.

In case that the .xpt files cannot be loaded, modify the statements in macro %ImpDset for loading the .xpt files (just as you usually do):

********** Read .xpt Files and Copy to .sas7bdat files (library 'tmpsas');
%macro ImpDset(XPTFILE);
    %local File;

    %let File = &RSETWD.&XPTFILE&..xpt;
    ;/********** Library File */ 
    libname xportin xport "&File";
 
    ;/********** .xpt File to .sas7bdat File */ 
    proc copy in=xportin out=tmpsas;
        select &XPTFILE;
    run;
    
    libname xportin clear;
%mend ImpDset;

Or, for creating .xpt files use code similar to:

%let RSETWD=C:/Temp/R/;

%macro WriteXPT(INDB);
    ;/********** Library for output must be defined */ 
    libname xportout xport "&RSETWD.&INDB..xpt";
     
    ;/********** Export to x port File */ 
    proc copy in=work out=xportout memtype=data; 
        select &INDB;
    run;

    libname xportout clear;
%mend WriteXPT;

%WriteXPT(DM);

or:

%let SASWD=C:\Temp\R\;

%macro WriteXPT(INDB, MYLABEL=);
    ;/********** Set Dataset Label to File Name as default*/ 
    %if &MYLABEL= %then %let MYLABEL="&INDB";
    
    ;/********** Library for output must be defined */ 
    libname xportout xport "&SASWD&INDB..xpt";
     
    ;/********** Export to x port File */ 
    data xportout.&INDB (label=&MYLABEL);
        set &INDB;
    run;

    libname xportout clear;
%mend WriteXPT;

%WriteXPT(DS, MYLABEL="Disposition");

"Exotic" date and datetime formats

When you use date or datetime formats other than 'DATE', 'DDMMYY', 'MMDDYY', 'YYMMDD', 'E8601DT', 'EURDFD' , 'TIME', 'HHMM', 'DATETIME', 'DATEAMPM', 'EURDFDT' extend the SAS code in:

    ********** Identify Dates (extend code, if neccessary);
    if index(upcase(FORMAT), 'DATE') gt 0 then TYP="date";
    if index(upcase(FORMAT), 'DDMMYY') gt 0 then TYP="date";
    if index(upcase(FORMAT), 'MMDDYY') gt 0 then TYP="date";
    if index(upcase(FORMAT), 'YYMMDD') gt 0 then TYP="date";
    if index(upcase(FORMAT), 'E8601DT') gt 0 then TYP="date";
    if index(upcase(FORMAT), 'EURDFD') gt 0 then TYP="date"; * can be overwritten with EURDFDT (dtim);

    ********** Identify Time Formats;
    if index(upcase(FORMAT), 'TIME') gt 0 then TYP="time";
    if index(upcase(FORMAT), 'HHMM') gt 0 then TYP="time";

    ********** Identify DateTime Formats;
    if index(upcase(FORMAT), 'DATETIME') gt 0 then TYP="dtim";
    if index(upcase(FORMAT), 'DATEAMPM') gt 0 then TYP="dtim";
    if index(upcase(FORMAT), 'EURDFDT') gt 0 then TYP="dtim";

R Times without Date Part

Times will be converted 1st to POSIXct with origin = '0001-01-01'. Afterwards the function chron::times will be applied to convert the date/times to a variable with a times class (w/o date part). Check if this is desired (otherwise change or delete code in the SAS program, e.g. if you prefer time variables in POSIXct format).

    ********** Derive Time. Note: Prior to the time the date 0001-01-01 is assigned (POSIXct format);
    if TYP="time" then do;
       RCODE=trim(R_NAME) || " <- as.POSIXct(" || strip(R_NAME) 
             || ", tz = " || quote('GMT') || ", origin = " || quote("0001-01-01") || ")";
       output;
       ********** Delete code below or remove *times*, if you prefer POSIXct format for time variables;
       RCODE=trim(R_NAME) || " <- times(format("  || strip(R_NAME) 
             || ", " || quote(strip('%H:%M:%S')) || ")) # chron times class";
       output;
       end;

Times will be displayed as HH:MM:SS (w/o milliseconds, if any)

Labels and Formats

Recommendation: All SAS variables should have "labels". If SAS labels are missing the labels will be set (by the SAS program) to <- "" in R.

Consider also to assign 'formats' to all variables, e.g. $20., BEST12., 8., date9., datetime22.3, a.s.o., depending on the type of the variables.

Discrepancies between Values in Database and Formats

Critical Errors

There may be some values in the SAS database - where a user-defined format is assigned - but with no corresponding format definition/value label, e.g. when $SEX in 'proc format' is defined as "F", "M" but in the database also "U" (for Unknown) is present. In this case the value "U" will be set to 'NA' in R, w/o any warning.

Consider extension of definitions in SAS proc format (highly recommended!).

Non-Critical Discrepancies

On the other hand, there may be SAS formats/value labels defined, but the value in the database is missing, e.g. $SEX is defined as "F", "M" and "U", but the value "U" (for Unknown) is missing in the database.

Check, if changes are necessary. Such categorization can be intended e.g. for tabulation of categories with missing values, here category Unknown with N=0.

SAS program to check for these Discrepancies

Include or run the SAS program sasformat_chk.sas directly after sasxpt_r.sas. This results in an output of all Critical Errors and Non-Critical Discrepancies (if any).

Conclusion

This SAS program allows a quick transformation of many SAS .xpt files to R data frames. To implement the program only a few parameters - namely the directory where the xpt files are stored and the location of the user-defined SAS formats - have to be entered and in two steps R data frames were generated automatically - step by step - with variable factors, levels, labels and R date/times. The program provides also a Data Definition Table to see the variable properties and the decodes of the formats. Finally, a 2nd SAS program can be run to check if there are values in the database w/o corresponding value labels/format definitions (and vice versa).

Beside the R base functions only the functions foreign::read.xport, Hmisc::label and chron::times will be used for generating R data frames.

About

Import of SAS .xpt files into R

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages