Heart Disease Analysis

This dataset measures the presence of heart disease in 303 patients from the Cleveland Clinic and was donated in 1988. The data works with 14 attributes, such as cholesterol level, electrocardiographic results, artery health and more. This model predicts the presence of heart disease in a patient with 93% accuracy. This Heart Disease ML project taught me new visualization techniques, feature scaling, model analysis, and classification techniques. 

Our attributes:

  • age: The age of the patient
  • sex: The sex of the patient
    • 1: Male
    • 0: Female
  • cp: Chest pain type
    • 1: Typical angina
    • 2: Atypical angina
    • 3: Non-anginal pain
    • 4: Asymptomatic
  • trestbps: Resting blood pressure (in mm Hg on admission to the hospital)
  • chol: Serum cholestoral in mg/dl
  • fbs: Fasting blood sugar > 120 mg/dl
    • 1: True
    • 0: False
  • restecg: Resting electrocardiographic results
    • 0: Normal
    • 1: Having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    • 2: Showing probable or definite left ventricular hypertrophy by Estes' criteria
  • thalach: Maximum heart rate achieved
  • exang: Exercise-induced angina
    • 1: Yes
    • 0: No
  • oldpeak: ST depression induced by exercise relative to rest
  • slope: The slope of the peak exercise ST segment
    • 1: Upsloping
    • 2: Flat
    • 3: Downsloping
  • ca: Number of major vessels (0-3) colored by fluoroscopy
  • thal: A blood disorder called thalassemia
    • 3: Fixed defect (no blood flow in some part of the heart)
    • 6: Normal blood flow
    • 7: Reversible defect (a blood flow is observed but it is not normal)
  • num: Heart disease presence severity (We will change this to a target)
    • 1: Yes
    • 0: No

Below is a quick visualization of the patient attributes:

 

. . . and here are the results! I went with just these 6 models.

 

Our logistic regression and ridge classifiers had a 93% accuracy! (however a bit unfortunate that we had more false negatives than positives).

Overall, a patient having any major vessels colored by flouroscopy (1+) were the most at risk of heart disease (around 3 - 4% risk increase for each vessel). Male patients were also more at risk, as well as those with thallasemia and abnormal chest pain.

I completed this Summer of 2022 and it's my first machine learning project. (I know that everyone and their mother has worked with this dataset, but it was more for the learning experience!)


 

Related posts