Elements Reference - Deployed Models by HTML5 UP

ML model to predict patient with heart disease
using the RandomForest, Linear SVC classifier.

Data Source: Kaggle

Libraries and Packages: Scikit-learn, Python-Numpy, Pandas

Keywords:

AUC: Area Under the Curve

Fpr-False Positive rate

Tpr-True Positive rate

Outline

Understanding the objective

Methodology

1. Data Setup

2. Data Cleaning

3. Creating a feature matrix

4. Choosing the right model and hyperparameters

5. Model Evaluation

6. Saving and Loading

7. Summary of Results

Objective

The objective of this project is to provide a technology to determine the prices of new cars that is being produced. Employing the use of predictions models will help to set the most accurate prices for cars based on market value. thereby leading to better growth and outcomes for manufacturers. Features

Age: displays the age of the individual.

sex: displays the gender of the individual using the following format : 1 = male 0 = female

Chest-pain type: displays the type of chest-pain experienced by the individual using the following format : 1 = typical angina 2 = atypical angina 3 = non — anginal pain 4 = asymptotic

Resting Blood Pressure: displays the resting blood pressure value of an individual in mmHg (unit)

Serum Cholestrol: displays the serum cholesterol in mg/dl (unit)

Fasting Blood Sugar: compares the fasting blood sugar value of an individual with 120mg/dl. If fasting blood sugar > 120mg/dl then : 1 (true) else : 0 (false)

Resting ECG : displays resting electrocardiographic results 0 = normal 1 = having ST-T wave abnormality 2 = left ventricular hyperthrophy

Max heart rate achieved : displays the max heart rate achieved by an individual.

Exercise induced angina : 1 = yes 0 = no

ST depression induced by exercise relative to rest: displays the value which is an integer or float.

Peak exercise ST segment : 1 = upsloping 2 = flat 3 = downsloping

Number of major vessels (0–3) colored by flourosopy : displays the value as integer or float.

Thal : displays the thalassemia : 3 = normal 6 = fixed defect 7 = reversible defect

Target Variable: The target shows the diagnosis of heart disease : Displays whether the individual is suffering from heart disease or not : 0 = absence 1 = present The score is being used as a conditional statement for classification as viewed in the deployed model section.

1.Data Setup

Pip installing the dependencies.
Importing the dataset and required libraries

2. Data Cleaning

The data was inspected for null values to remove abnormality. However, the dataset showed no missing values.

3. Creating a feature matrix

Data was split into x and y to separate features to be trained from the target .

4. Choosing the right model and hyperparameters

Checklink

Model Fitting

The Random Forest classifier and Linear SVC was chosen as an estimator as prescribed by the scikit learn sheet in the documentation above. The x data was split into training and testing,the data used in training is about 20% indicated by test_size=0.2.The shape of the model was examined and the model fit into train-data. RandomForest Classifier

5. Model Evaluation

The confusion matrix and scoring parameters were adopted in evaluating this model. The classification result shows an 100% indicating the RandomForest model is a perfect one.

The Confusion matrix summary shows the same result with the classification report showing that the classification is accurate.

The mean absolute error for the predicted value or its likely deviation from the original value is 0.0.. which is perfect. Scoring Parameters

Linear SVC The LinearSVC gave a score of 79.5%. This means about 80% of the data should be accurate

Evaluating the Linear SVC model Evaluation Metrics Score Parameters: 'accuracy', 'average precision', 'recall', 'f1' Metric functions AUC/ROC score , ' Confusion matrix, Classification report 1. Source Parameter- cross_validation On the LinearSVC:

On the RandomForest

Accuracy LinearSVC

RandomForest

Recall LinearSVC

RandomForest

F1 LinearSVC

RandomForest

From several iterations,the Linear SVC model have about 75% likelihood of predicting the right label i.e about 7 out of 10 times while the RandomForest have over 90% likelihood Metric Functions AUC/ROC comparison of a model true positive rate vs model false positive rate *True positive= model predicts 1 when truth is 1 *False positive= model predicts 1 when truth is 0 *True negative =model predicts 0 when truth is 0 *False negative =model predicts 0 when truth is 1 Plotting the ROC

Confusion Matrix The confusion matrix will help to compare the labels a model predicts and the actual labels it was meant to predict. It reveals where the model is getting confused.

Making the confusion matrix more visual using the Heat Map

7. Saving and Loading

The pickle module was used to save the datasets to be deployed.

8. Summary of results

A perfect model has an f1-score of 1.00. f1-score of 1.00 is perfect, perfect accuracy is equal to 1.0, we recorded an accuracy of 1.0 *Accuracy is a good measure to start with if all classes are balanced (e.g same amount of samples which are labelled with 0 or 1 *Precision and recall become more important when classes are imbalanced *if false positive are worse tan false negatives, aim for higher precision *If false negative predictions are worse than false positives, aim for higher recall. *F1-score is a combination of precision and recall. The RandomForest model showed the best performance for this prediction.

scroll to top

Data Source: Kaggle

Libraries and Packages: Scikit-learn, Python-Numpy, Pandas

Keywords:

AUC: Area Under the Curve

Fpr-False Positive rate

Tpr-True Positive rate

Outline

Understanding the objective

Methodology

1. Data Setup

2. Data Cleaning

3. Creating a feature matrix

4. Choosing the right model and hyperparameters

5. Model Evaluation

6. Saving and Loading

7. Summary of Results

Objective

1.Data Setup

2. Data Cleaning

3. Creating a feature matrix

4. Choosing the right model and hyperparameters

Model Fitting

5. Model Evaluation

7. Saving and Loading

8. Summary of results

Social