Fraud Detection Classification

Importing Packages

Getting data from Object Storage

In order to get the data from object storage we need to follow the steps below:

  1. Go to OCI Console.

    a. Top right icon for profile. Click on "oracleidentitycloudservice/"

    b. Go to API key below resources

    c. Create a new API key if you dont have one.

     -select the Generate API Key Pair radio button.
     -Copy the contents of the Configuration File Preview.
     -Download private key
     -Click Add
  1. Come back to the Data Science platform, open terminal.

    a. Inside /home/datascience directory create a .oci directory (mkdir .oci)

    b. In .oci, create a config file (touch config) and a private_key.pem (touch private_key.pem)

(For the following steps use a text editor for Unix, we used vi - visual editor)

  1. Copy the content of the private key you downloaded before into private_key.pem.
  2. Copy the content of the Configuration File Preview and paste it into this config file inside .oci. Replace "< path to your private keyfile > #TODO" with /home/datascience/.oci/private_key.pem

  3. Use ADS Datafactory in your notebook to get your data. The code is provided below.

Exploaratory Data Analysis using the Oracle Accelerated Data Science (ADS) SDK

Through our EDA we see that the data used is highly unbalanced. The ADS feature .show_in_notebook warns us that Class has 284315 (99.83%) zeros. Our visualization confirms that as well. Therefore, the next thing we need to do is preprocess the unbalanced data as to avoid as much bias as we can.

Preprocessing - Oversampling

We convert the ds format of data to a Pandas Data frame to preprocess the data with certain packages.

As you can see below that now the number of class instances are same for both fraud and normal

Model Training and Testing using AutoML

Splitting the Data (Test/Train)

The balanced dataset is then split into a train, and test sets. The train set will be used in the AutoML feature to train several different ML models.

Using AutoML, several models are trained with the dataset and the most effective model is output

Visualization and Accuracy

Below we show the performance of the different algorithms that were trained and the results of each stage of the AutoML pipeline

Model Evaluation and Explaination using ADS

ADS provides a thorough model evaluation and explanation API through the ADSEvaluator and ADSExplainer object. One can look at global and local evaluation and explanation with ADS. Some demonstrations are shown below

Model Evalution

Model Explanation

Saving ML model to Model Catalog

What is Oracle Model Catalog?

Oracle Model Catalog is an Oracle data science service that allows you to save you model in a ready to deploy state in the OCI console. To save you model in the catalog, you first need to prepare your model into a model artifact. A model artifact is your running model, including any dependencies of your model stored within a directory.

Model Saving

Here you can see that the model has been saved under "Models" in the same Project

model_catalog.png

Model Deployement explained in the website.