OCI Data Science - Useful Tips

Check for Public Internet Access ```python import requests response = requests.get("https://oracle.com") assert response.status_code==200, "Internet connection failed" ```
Helpful Documentation
Typical Cell Imports and Settings for ADS ```python %load_ext autoreload %autoreload 2 %matplotlib inline import warnings warnings.filterwarnings('ignore') import logging logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR) import ads from ads.dataset.factory import DatasetFactory from ads.automl.provider import OracleAutoMLProvider from ads.automl.driver import AutoML from ads.evaluations.evaluator import ADSEvaluator from ads.common.data import ADSData from ads.explanations.explainer import ADSExplainer from ads.explanations.mlx_global_explainer import MLXGlobalExplainer from ads.explanations.mlx_local_explainer import MLXLocalExplainer from ads.catalog.model import ModelCatalog from ads.common.model_artifact import ModelArtifact ```
Useful Environment Variables ```python import os print(os.environ["NB_SESSION_COMPARTMENT_OCID"]) print(os.environ["PROJECT_OCID"]) print(os.environ["USER_OCID"]) print(os.environ["TENANCY_OCID"]) print(os.environ["NB_REGION"]) ```

First we will begin with importing the relevant packages and libraries

Getting the Data

CT Dataset zips (CT-0 and CT-23) was uploaded to notebook session and were unzipped

Processing the Data

Functions for processing the CT images are below. The images are read in, normalized, and resized.

Normalization of chest CT data reduces variation. Here, we used the min/max technique to normalize the data. The mathematical formulation : volume (scaled) = (volume - min(volume)) / (max(volume) - min(volume))

One important thing to keep in mind when using the MinMax Scaling is that it is highly influenced by the maximum and minimum values in our data so if our data contains outliers it is going to be biased. MinMaxScaler rescales the data set such that all feature values are in the range [0, 1]. This is done feature-wise in an independent way.

For resizing Volume they are using the Spline interpolated zoom (SIZ). We then take each volume, calculate its depth D, and zoom it along the z-axis by a factor of 1 using spline interpolation , where the interpolant is an order of three. Here, the input volume is zoomed or squeezed by replicating the nearest pixel along the depth/z-axis. As it uses spline interpolation to squeeze or expand the z-axis to the desired depth, it retains a substantial level of information from original 3D volume.

The paths of all images are stored in a list. Images in CT-0 are images of normal lungs, and images in CT-23 are images of abnormal lungs

Labeling and Train-Test Split

With the stored paths, images are loaded iteratively using the "process_scan" function. The loaded scans are converted and stored in 3D arrays. Labels are made for abnormal and normal scans. The data is split into train, validation sets

Rotating Train set

Functions for preprocessing the training and validation data for CNN are below. The training data is rotated and an additional channel added. The validation data has an additional channel added

Tensorflow pipelines are made below. This makes it easier to train the CNN without running out of resources (eg memory)

Build and compile CNN model

Visualizing model performance

Here the model accuracy and loss for the training and the validation sets are plotted. Since the validation set is class-balanced, accuracy provides an unbiased representation of the model's performance.

Make predictions on a single CT scan

Create Model Artifact

Saving Model in the Model Catalog