Algorithm

We know that the algorithm.py file should have the following schema:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


import logging
from typing import Any, Dict, List

from pandas import DataFrame

from plugins.common.algorithm import Algorithm

# Enable logging
logger = logging.getLogger(__name__)


class FooBarAlgorithm(Algorithm):
    def __init__(self, algo_data: Dict[str, Any]):
        super().__init__(algo_data)

    def preprocess(self) -> str:
        # Pre-process the data
        pass

    def train(self) -> str:
        # Train the model
        pass

    def predict(self, prefix: List[dict]) -> dict:
        # Predict the result by using the given prefix
        pass

    def predict_df(self, df: DataFrame) -> dict:
        # Predict the result using a DataFrame
        pass

Now we will explain in detail.

preprocess

The preprocess function is used to preprocess the training dataframe. PrCore will call this function when training data is available.

You can use self.get_df() to get the original training dataframe. The columns of the dataframe are renamed by PrCore.

The column name can be one of the following:

CASE_ID
ACTIVITY
TIMESTAMP
START_TIMESTAMP
END_TIMESTAMP
RESOURCE
DURATION
COST
OUTCOME
TREATMENT
TREATMENT_RESOURCE

The TREATMENT_RESOURCE represents the first resource who triggers the treatment in a case. If there is no treatment in a case, the value will be empty.

If the column is not in the above list, but it is marked as case attribute, it will be renamed to CASE_ATTRIBUTE_<data_type>_<attribute_name>. For example, if the attribute name is age, the column name will be CASE_ATTRIBUTE_NUMBER_age.

The data type can be one of the following:

TEXT
NUMBER
BOOLEAN
DATETIME
CATEGORICAL

If the column is not case attribute, it will be renamed to EVENT_ATTRIBUTE_<data_type>_<attribute_name>.

You can use self.get_parameter_value(key) to get the value of the parameter. The parameter can come from the config.py file, or it can be set by the user. The key should be a string.

You can also use self.get_additional_info_value(key) to get the value of the additional information. The additional information can be set by the user. The key should be a string.

Having the above information, you can prepare the training data for your algorithm.

If you need to store any data for persistency, you can use self.set_data_value(key, value). The key should be a string, and the value can be any type of data, but it should be serializable by pickle. The data will be stored after the algorithm is trained.

You can get the data by using self.get_data()[key].

The expected output is a string. An empty string means that the algorithm is ready to be trained. If the string is not empty, PrCore will show the error message to the user.

train

When preprocess is finished, PrCore will call the train function to train the algorithm.

Please store your trained model by utilizing self.set_data_value(key, value). For example, you can store the trained model by using self.set_data_value("model", model).

PrCore will take care of persistency and load the data when needed. Follwing previous example, if you want to fetch the model, you can use self.get_data()["model"].

The expected output is a string. An empty string means that the algorithm is ready to be used. If the string is not empty, PrCore will show the error message to the user.

predict

When there is a need to predict a prefix, PrCore will call the predict function.

The input is a list of dictionaries. Each dictionary contains the following schema:

{
    "column_name": "value"
}

The column_name is the name of the attribute. The naming convention is the same as described the preprocess function. The list is ordered by the timestamp of the event.

The predict function should return a dictionary with the following schema:

{
    "date": datetime.now().isoformat(),
    "type": self.get_basic_info()["prescription_type"],
    "output": "can be any json serializable data, can be null",
    "plugin": {
        "name": self.get_basic_info()["name"],
        "model": "Model name",
        "accuracy": 0.8
    }
}

The date, type, output, plugin, and name is required. You can add other fields to plugin if you want. It is recommended to add some metrics fields to plugin to help users to understand the performance of the plugin.

It is important to still return a dictionary even if the prediction is not available. In this case, you can set the output field to null, or you can use the get_null_output function to get a null output.

predict_df

The predict_df function is used to predict a dataframe. The input is a dataframe with the same schema as the training dataframe. The output should be a dictionary, which contains the following schema:

{
    "case_id_1": {
        "date": datetime.now().isoformat(),
        "type": self.get_basic_info()["prescription_type"],
        "output": "can be any json serializable data, can be null",
        "plugin": {
            "name": self.get_basic_info()["name"],
            "model": "Model name",
            "accuracy": 0.8
        }
    },
    "case_id_2": {
        "date": datetime.now().isoformat(),
        "type": self.get_basic_info()["prescription_type"],
        "output": "can be any json serializable data, can be null",
        "plugin": {
            "name": self.get_basic_info()["name"],
            "model": "Model name",
            "accuracy": 0.8
        }
    }
}

The key of the dictionary is the case id. The value is the same as the predict function’s output.

It is important to still return a dictionary even if the prediction is not available. In this case, you can set the dictionary value to {} as an empty dictionary.