Home > Getting Started > Getting Started with FastScore > Getting Started with FastScore v1.9

Getting Started with FastScore v1.9

This is a guide for installing and running FastScore. It contains instructions for first-time and novice users, as well as reference instructions for common tasks. This guide was last updated for v1.9 of FastScore.

If you need support or have questions, please email us: support@opendatagroup.com

Contents

  1. Installing FastScore
    1. Prerequisites
    2. Installing the FastScore CLI
    3. Start FastScore Microservices Suite with Docker Swarm (Recommended)
  2. Configuring and Starting FastScore
  3. Using the FastScore Dashboard
  4. Working with Models and Streams
    1. Creating and Loading Models
    2. Models in Python and R
    3. Input and Output Schema
    4. Input and Output Streams
    5. Engine Parameters
    6. Running a Model in FastScore

Installing FastScore

This guide will walk you through installing and running Open Data Group’s FastScore microservices suite. The following instructions will generally assume that you’re working on a Linux machine. There are slight differences if you’re running FastScore on MacOS, which will be indicated by a special note. The differences if you’re running Windows have not yet been fully charted.

Prerequisites

The FastScore Microservices Suite is hosted on DockerHub (https://hub.docker.com/u/fastscore/). As such, one must first install Docker. For example, on Ubuntu Linux:

$ sudo apt-get install docker.io

It’s also useful (recommended but not mandatory) to have Docker Compose installed. Installation instructions can be found here: docs.docker.com/compose/install/.

On MacOS, Docker actually runs inside of a virtual machine (see Docker’s documentation here: https://docs.docker.com/machine/ ). In order to make sure all of the ports and IP addresses are handled correctly, you’ll need to run the commands from inside this virtual machine. To start the virtual machine and give it the name “default”, use the following command:

$ docker-machine create --driver=virtualbox default

This uses VirtualBox as the driver for the virtual machine. If you don’t have it already, you should download the VirtualBox client to manage the docker-machine. Among other things, this can be used to set up port forwarding for the virtual machine, which may be needed later. To switch to this environment for the default virtual machine, use the following command:

$ eval $(docker-machine env default)

The virtual machine’s IP address can be retrieved with the docker-machine ip command, e.g.,

$ docker-machine ip
192.168.99.100

This IP address should be used as the FastScore host machine IP address.

Once Docker has been installed, there are only a few steps needed to get FastScore running.

  1. Configure FastScore using the FastScore CLI and configuration file.
  2. Start the FastScore services, either manually, or via Docker Compose (recommended).
  3. Connect to the FastScore Dashboard with your browser.
  4. Write a FastScore configuration file.

Let’s go through each step carefully.

Installing the FastScore Command-Line Interface (CLI)

The FastScore CLI can be downloaded and installed using the following commands:

pip install fastscore-cli

This will install the required dependencies. The FastScore CLI is a Python tool, so it doesn’t need to be compiled, and the setup script should automatically add the CLI to $PATH.

python-pip, python-setuptools and python-dev (i.e. header files) are required to properly install the FastScore CLI. These may or may not be already present on your system. If not, you will need to install them. For example:

$ sudo apt-get install python-pip python-dev python-setuptools

Once you’ve installed the FastScore CLI, check that it works by executing the following command in your terminal. Also see FastScore Command Line Interface for more information on subcommands.

$ fastscore help
FastScore CLI v1.9.1
Usage: fastscore <command> [<subcommand> ...]
Available commands:
  help             Explain commands and options
  connect          Establish a FastScore connection
  login            Authenticate a FastScore connection
  config           Configure the FastScore fleet
  fleet            Examine status of the FastScore fleet
  use              Select the target instance
  run              Run easy model setups
  model            Manage analytic models
  attachment       Manage model attachments
  schema           Manage Avro schemas
  snapshot         Manage model snapshots
  policy           Manage import policies
  stream           Manage streams/stream descriptors
  engine           Manage engine state
  sensor           Manage sensors/sensor descriptors
  stats            Show assorted statistics
  debug            Watch debugging messages
  profile          Profile internal operations
  pneumo           Access Pneumo messages
  monitor          Monitor data processing
Run 'fastscore help <command>' to get more details on <command> usage

This displays a list of all of the FastScore CLI commands.

We will be referencing files from the Getting-Started github repository found here. This set up will use Docker Swarm to bring up the FastScore containers.

Items in the repo:

From the directory with the Makefile, to quickly start FastScore:

make deploy

To quickly tear down FastScore:

make stop

First, let’s look at the containers described in the docker-compose.yaml file that are now up and running.

View all Docker containers that are running with the docker ps command. The output should look something like this:

CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS              PORTS                                    NAMES
14bfaee9100e        fastscore/model-manage-mysql:1.9   "sh -c 'cd /var/lib/…"   About an hour ago   Up About an hour                                             fs-vanilla_database.1.vb7bticmnpauo3hz6bor4r8qx
ada9681679e8        fastscore/engine:1.9               "/fastscore/startup.…"   About an hour ago   Up About an hour                                             fs-vanilla_engine-2.1.ubonr83uuaqw2en4yk3lsc9w0
880b6882742d        fastscore/engine:1.9               "/fastscore/startup.…"   About an hour ago   Up About an hour                                             fs-vanilla_engine-1.1.or1ocif21hfoej8xxfo5qykbm
cb67e0090cf2        fastscore/connect:1.9              "bin/connect"            About an hour ago   Up About an hour                                             fs-vanilla_connect.1.wxswv0g8hojps2f2c0i5lc96l
318eedda837a        fastscore/dashboard:1.9            "npm --no-update-not…"   About an hour ago   Up About an hour                                             fs-vanilla_dashboard.1.v7ia7we8qrwpyeawkk7jzg3se
b89be509f74d        fastscore/kafka:1.9                "/entry.sh"              About an hour ago   Up About an hour    2181/tcp, 2888/tcp, 3888/tcp, 9092/tcp   fs-vanilla_kafka.1.w1tc6grz7iil56x2b8pcdbjal
84b4f75fccda        fastscore/model-manage:1.9         "bin/model_manage"       About an hour ago   Up About an hour                                             fs-vanilla_model-manage.1.o750isigzx7v17kwddn17nmzl

Configuring and Starting FastScore

Next, let’s take a look at the setup.sh file that is referenced in the Makefile.

fastscore connect https://localhost:8000
fastscore config set config.yaml
fastscore fleet -wait

FastScore will ask you to connect to the host it is running on and the port of the dashboard. In this example we are using localhost and the dashboard is listening on port 8000.

FastScore’s microservices architecture requires each microservice component to communicate with other components. These communications are managed by the Connect microservice. In order for Connect to connect, it has to be given information about the other microservices components in a configuration file. A sample configuration file is shown below:

fastscore:
  fleet:
    - api: model-manage
      host: model-manage
      port: 8002
    - api: engine
      host: engine-1
      port: 8003
    - api: engine
      host: engine-2
      port: 8003

  db:
    type: mysql
    host: database
    port: 3306
    username: root
    password: root

  pneumo:
    type: REST
    #type: kafka
    #bootstrap:
    #  - kafka:9092
    #topic: notify

Configuration files are written in YAML. The configuration file above specifies the host machines and ports for the Model Manage container, the MySQL database container used by Model Manage, and two Engine containers, all hosted on the same machine. Additionally, Pneumo, an asynchronous notification library used by FastScore, is configured to communicate via Kafka or REST.

We can then check the status of our containers using the fleet command:

$ fastscore fleet -wait
Name            API           Health
--------------  ------------  --------
engine-1        engine        ok
engine-2        engine        ok
model-manage-1  model-manage  ok

Finally, the load.sh file will load in some models, attachments, schemas, and streams into FastScore so you can start deploying! We are loading these into the MySql backing of Model Manage for this example. Alternatively, you can connect Model Manage to your own github repository that already has models, attachments, streams, and schemas in it! Now we’re ready to start scoring.

Using the FastScore Dashboard

FastScore’s Dashboard provides a convenient user interface for reviewing engine status and managing models, schemas, sensors, and streams. However, as compared to the FastScore CLI, it requires a few additional setup steps to get things running.

First, if you are not running FastScore on your local machine (for example, if you have FastScore running on a cloud service platform), you will need to allow incoming and outgoing traffic on port 8000 (used by the FastScore Dashboard). You will also need to have configured FastScore as described in the previous section.

To access the Dashboard, take your browser to the FastScore host machine at port 8000 using the HTTPS protocol (it won’t work with HTTP). If all goes well , you will be greeted by this screen:

Home Screen On the left-hand side of the Dashboard are four sections: engine-1, engine-2, model-manage-1, Connect. These correspond to the Engine microservices, the Model Manage microservice, and the Connect microservice. The green dots on the engines and model manage indicate that they are currently running correctly. If you have configured additional engine containers, they will also appear on the side.

If, instead, you get an “Application Initialization Error,” check your configuration file for any errors, and verify that you have followed all of the FastScore CLI configuration steps. If the fastscore fleet command shows both Model Manage and your Engine containers working properly, then the problem most likely has to do with Dashboard’s proxy service or your host machine’s network traffic settings.

Working with Models and Streams

FastScore is a streaming analytic engine: its core functionality is to read in records from a data stream, score them, and output that score to another data stream. As such, running any model consists of three steps:

  1. Loading models, streams, schemas, and sensors
  2. Setting Engine parameters
  3. Running the model

Creating and Loading Assets into FastScore Model Manage

Version 1.9 of FastScore supports models in Python, R, Java, MATLAB, PFA, PrettyPFA and C formats. Some setup steps differ slightly between Python/R models and PFA, Java, MATLAB, or C models. As a model interchange format, PFA can provide some benefits in performance, scalability, and security relative to R and Python. PrettyPFA is a human-readable equivalent to PFA. However, as the majority of users will be more familiar with R and Python, we focus on these two languages in this section.

Loading Assets

The FastScore CLI allows a user to load models directly from the command line. The list of models currently loaded in FastScore can be viewed using the model list command:

$ fastscore model list
Name    Type
------  ------
MyModel Python

Models can be added with model add <name> <file>, and removed with model remove <name>. Additionally, the fastscore model show <name> command will display the named model.

Models via the Dashboard

The Dashboard provides functionality to add and manage models. To upload a model, under the Models tab, select the “Upload model” button, and choose a model from your local machine. Alternatively, “select model”, depicted below, allows you to select an existing model from the model manager by name.

Model Load

Additionally, models can be added, removed, inspected, and edited from the Models tab under Model Manage:

Model Load The screenshot above shows the model manager tab, and an existing “auto_gbm.py” model. Models can be removed, saved, created, uploaded, or downloaded from this view. Note that after creating or modifying a model in this view, it must still be selected for use from the Engine tab.

Models in Python and R

All models are added to FastScore and executed using the same CLI commands, namely:

fastscore model add <modelname> <path/to/model.extension>

Note that, in order to determine whether a model is Python or R, Engine requires that it have an appropriate file extension (.py for Python, .R for R, .pfa for PFA, and .ppfa for PrettyPFA). Also, in order to score a Python/R model, there are certain constraints on the form the model must take.

FastScore includes both a Python2 and Python3 model runner. By default, .py files are interpreted as Python2 models—to load a Python3 model, use the file extension .py3, or the flag -type:python3 option with fastscore model add:

fastscore model add -type:python3 my_py3_model path/to/model.py

to add a Python3 model.

Python Models

Python models must declare a one-argument action() function. The minimal example of a Python model is the following:

# fastscore.input: input-schema
# fastscore.output: output-schema

def action(datum):
    yield 0

This model will produce a 0 for every input.

Additionally, Python models may declare begin() and end() functions, which are called at initialization and completion of the model, respectively. A slightly more sophisticated example of a Python model is the following:

# fastscore.input: input-schema
# fastscore.output: output-schema

import cPickle as pickle

def begin(): # perform any initialization needed here
        global myObject
    myObject = pickle.load(open('object.pkl'))
    pass # or do something with the unpickled object

def action(datum): # datum is expected to be of the form '{"x":5, "y":6}'
    record = datum
    x = record['x']
    y = record['y']
    yield x + y

def end():
    pass

This model returns the sum of two numbers. Note that we are able to import Python’s standard modules, such as the pickle module. Non-default packages can also be added using Import Policies, as described here. Custom classes and packages can be loaded using attachments, as described in the Gradient Boosting Regressor tutorial.

R Models

R models feature much of the same functionality as Python models, as well as the same constraint: the user must define an action function to perform the actual scoring. For example, the analogous model to the Python model above is

# fastscore.input: input-schema
# fastscore.output: output-schema

# Sample input: {"x":5.0, "y":6.0}
action <- function(datum) {
  x <- datum$x
  y <- datum$y
  emit(x + y)
}

Input and Output Schema

FastScore enforces strong typing on both the inputs and outputs of its models using AVRO schema. For R and Python models, this typing is enforced by specifying schema names in a smart comment at the top of the model file:

# fastscore.input: array-double
# fastscore.output: double

Python and R models must specify schemas for their inputs and outputs. PrettyPFA and PFA models already contain the input and output schema as part of the model definition, so they do not require a schema attachment.

For example, a model that expects to receive records of two doubles as inputs might have the following schema:

{
  "name": "Input",
  "type": "record",
  "fields" : [
    {"name": "x", "type": "double"},
    {"name": "y", "type": "double"}
  ]
}

The model might then produce a stream of doubles as its output:

{
  "name": "Output",
  "type": "double"
}

Input and output schema must be uploaded separately to FastScore. To upload the schema to FastScore with the CLI, use the following commands:

fastscore schema add input input.avsc
fastscore schema add output output.avsc

Attachments can also be managed from within the Dashboard, using the Model Manage view.

Input and Output Streams

Before a model can be run, it has to have some data to run on. Input and output streams are used to supply the incoming data to the model, and to return the corresponding scores. Currently, ten types of stream transports are supported: file, Kafka, Authenticated Kafka, Executable, HTTP, TCP, UDP, ODBC, debug, and console streams. All of these types are configured using a Stream Descriptor file.

Stream Descriptors are small JSON files containing information about the stream. An example of a Stream Descriptor for a Kafka stream is displayed below:

{
  "Description": "read Avro-typed JSON records from a Kafka stream",
  "Transport": {
    "Type": "kafka",
    "BootstrapServers": ["127.0.0.1:9092"],
    "Topic": "data-feed-1",
    "Partition": 0
  },
  "Encoding": "json",
  "Schema": { type: "record", ... }
}

Stream descriptors are documented in more detail on the stream descriptor page. The easiest type of stream to use is a file stream, which reads or writes records directly from/to a file inside of the FastScore engine container. Here is an example of such a stream:

{
  "Description": "read input from the specified file",
  "Loop": false,
  "Transport": {
    "Type": "file",
    "Path": "/root/data/neural_net_input.jsons"
  },
  "Envelope": "delimited",
  "Encoding": "json",
  "Schema": {"type": "array", "items": "double"}
}

This file stream expects each line of the neural_net_input.jsons file to be a vector of doubles, encoded as a JSON object, and delimitated by newlines. The file is located in the /root/data/ directory of the engine container. The "Loop": false line tells FastScore to stop reading the file after reaching the end of the file, as opposed to looping over the lines in the file.

Streams via FastScore CLI

The FastScore CLI can be used to configure data streams. The stream list command displays a list of existing streams:

$ fastscore stream list
demo-1
demo-2

By default, two demo file streams are included in FastScore. The demo-1 data set consists of random numbers. The demo-2 dataset consists of lists of JSONS with the following AVRO schema:

{
  "type":"array",
  "items": {
    "type": "record",
    "fields": [
      {"name":"x", "type":"double"},
      {"name":"y", "type":"string"}]
  }
}

These demo streams can be used to test whether or not a simple model is working correctly. Additional streams can be added using the fastscore stream add <stream-name> <stream-descriptor-file> command. Existing streams can be sampled (displaying the most recent items of the stream) with fastscore stream sample <stream-name>.

For filestreams, it is easiest to manage container input and output by linking a directory on the host machine to the engine container. This can be done in the Docker-Compose file by modifying the engine service to the following:

[...]

  engine-1:
    image: fastscore/engine:1.9
    network_mode: "host"
    environment:
      CONNECT_PREFIX: https://127.0.0.1:8001
    volumes:                           # new volume section
      - ./data:/root/data


[...]

This will link the ./data directory on the host machine to the /root/data directory of the engine container. A filestream from the file “mydata.jsons” located in data on the host machine can then be accessed by FastScore using the stream descriptor

{
  "Loop": false,
  "Transport": {
    "Type": "file",
    "Path": "/root/data/mydata.jsons"
  },
  "Envelope": "delimited",
  "Encoding": "json",
  "Schema": [...]
}

A similar stream descriptor can be used for the output stream to write the output scores to a file in the same directory.

When using Docker volume linking to link a directory on the host machine to the Engine instance, Docker must have privileges to read and write from the specified directory. Additionally, the directory on the container must be chosen carefully, as its contents will be overwritten with the contents of the corresponding host directory upon linking. /root/data is safe (as it only contains the demo datafiles), but other directories on the container (e.g., /usr) may not be.

Streams via the Dashboard

Analogously to models, streams can also be manipulated from the Dashboard. Selecting the “Streams” tab under Model Manage displays the following view:

Streams On the left, existing Stream Descriptors are displayed. New Stream Descriptors can be added and existing ones edited from this view. The example above displays a simple file stream, which will load the input_data.jsons file located in the /root/data directory of the Engine Docker container.

Engine Parameters

Engine parameters, such as the number of Engine instances currently running, as well as information about the model, are displayed on the Dashboard Engine tab.

Running a Model in FastScore

When using the Dashboard, models will begin scoring as soon as both the model and input/output streams are set from the Engine tab, and no further action from the user is required. Various statistics about performance and memory usage are displayed on the Engine tab. To run a model using the FastScore CLI, use the fastscore job sequence of commands:

This concludes the FastScore Getting Started guide. Additional FastScore API documentation is available at https://opendatagroup.github.io/Reference/FastScore%20API/. Happy scoring!