Skip to content

The objective of the project is to estimate the time-to-death for patients diagnosed with lung cancer.

License

Notifications You must be signed in to change notification settings

justinbosscher/survival-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python v3.8.1 Jupyter Notebook License

Lung Cancer Survival Analysis

Table of Contents

Project Objective

The objective of the project is to estimate the time-to-death for patients diagnosed with lung cancer.

Methods

  • Data Visualization
  • Survival Analysis
    • Kaplan-Meier Estimator
    • Nelson-Aalen Estimator
    • Log-Rank Test
    • Cox's Proportional Hazard Model

Technologies

  • Jupyter Notebook
  • Python 3.8.1
  • Python Packages
    • pandas
    • numpy
    • lifelines KaplanMeierFitter
    • lifelines NelsonAalenFitter
    • lifelines logrank_test
    • lifelines CoxPHFitter
  • Python Virtual Environment

Getting Started

  1. If necessary, install the python3-venv package using the following command: sudo apt install python3.8-venv
  2. Create a virtual environment with: python3 -m venv survival-wkspc
  3. cd into the survival-wkspace folder
  4. Activate the environment: source bin/activate
  5. Install python packages: python3 -m pip -r requirements.txt

Project Description

In oncology, typical questions to be answered are:

  • What is the impact of specific clinical characteristics on patients' survival?
    • EG: Is there a difference between people who have high blood sugar and those who do not?
  • What is the probability that an individual survives a specific time?
    • EG: Given a set of cancer patients, what is the probability that a patient will be alive at that time if some set of time has passed?
  • What, if any, are the differences in survival between groups of patients?
    • EG: Compare the effects of two different treatments.

Survival Analysis Fundamentals:

  • Survival Time - The amount of time until an event
    • Events could be birth, death, product failure, etc...
  • Relapse - A deterioration in the subject's state of health after a temporary improvement
  • Progression - The process of developing or moving gradually towards a more advanced state
  • Data Censoring - Occurs when observations do not experience the event of interest
    • EG: In the study of a product's failure, some products might not fail
    • Right Censoring - Occurs when the subject under observation does not or has not yet experienced the event
      • EG: When studying death, a patient might still be alive
    • Left Censoring - Occurs when the event cannot be observed
      • EG: The event could have occurred before the study began
    • Interval Censoring - Occurs when the data is for a specific time interval
      • EG: The event of interest might occur before or after, but not during the study

Two related probabilities are used to analyze survival data:

  1. Survival Function (S)
    • Defined as the probability that a subject wurvives from the diagnosis of a disease to a specified future time, t
    • Focuses on the survival of a subject
    • EG: S(200) = 0.7 means that after 200 days, a subject's probability of survival is 0.7
    • Kaplan-Meier Estimator is used to find the survival probability of a subject
  2. Hazard Function (H)
    • Defined as the a subject under observation at time t has an event at time t
    • Focuses on the death of a subject
    • EG: H(200) = 0.7 means that on or after 200 days, the probability of death is 0.7
    • NOTE: The Hazard Function gives us the cumulative probability

Models Results

Kaplan-Meier Estimator

  • Is a non-parametric statistic used to estimate the survival function from the lifetime data
  • Often used to measure the fraction of patients living for a specific time after treatment or diagnosis
  • These plots help to visualize survival curves

The probability of survival at time ti, S(ti), is calculated as follows:

S(ti)

More simply:

S(ti)

Survival Probability

Survival Function w/ Confidence Interval

Cumulative Density Plot

Cumulative Density Plot w/ Confidence Interval

Conditional Median Time To Event

Nelson-Aalen Estimator

  • Is a non-parametric statistic used to estimate hazard rates
  • Often used to measure the fraction of patients who have died for at or by a specified time after treatment or diagnosis
  • Aggregate information regarding survival can be visualized using the Nelson-Aalen hazard function, h(t). The hazard function gives us the probability that a subject under observation at time t has an event of interest at that time
  • NOTE: The hazard rate CANNOT be found by transforming the survival rate

The cumulative probability of hazard at time t, H(t), is calculated as follows:

H(t)

Cumulative Probability of Event of Interest

Cumulative Hazard w/ CI

Cumulative Hazard v. Cumulative Density

Survival Rates by Group

Cumulative Density

Cumulative Hazard Plot

To calculate non-cumulative hazard probability at a specified time, t:

H(t)

Cumulative Hazard v. Cumulative Density

Conditional Median Times to Event

Survival Function w/ CI - Male Group

Survival Function w/ CI - Female Group

Cumulative Density & Cumulative Hazard - Male Group

Cumulative Density & Cumulative Hazard - Female Group

Log-Rank Test

  • Is a hypothesis test that is used to compare the survival distributions of two samples with the goal of determining if there is any significant difference between the groups compared.
  • The null hypothesis states that there is no significant difference between groups being studied.

Cox's Proportional Hazard Regression

  • Is a regression model used, generally, by medical researchers to determine the relationship between the survival time of a subject and one or more predictor variables. It helps us to determine how different parameters such as age, sex, weight, height, etc.. affect the survival time of a subject.
  • Unlike the Kaplan-Meier Estimator, the Nelson-Aalen Estimator, and the Log-Rank Test, Cox's Proportional Hazard Regression Analysis works for both categorical and non-categorical predictors.
  • In short, it is used to determine how different factors impact the event of interest.
  • Hazard Ratio (HR)
    • HR = value of exp(bi) from the Hazard Function
    • HR = 1 : No Effects
    • HR < 1 : Reduction in Hazard
    • HR > 1 : Increase in Hazard

Cox's Proportional Hazard Plot

Survival Chances by Subject

About

The objective of the project is to estimate the time-to-death for patients diagnosed with lung cancer.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published