This is a guide for installing and running FastScore. It contains instructions for first-time and novice users, as well as reference instructions for common tasks. This guide was last updated for v1.9 of FastScore.
If you need support or have questions, please email us: support@opendatagroup.com
This guide will walk you through installing and running Open Data Group’s FastScore microservices suite. The following instructions will generally assume that you’re working on a Linux machine. There are slight differences if you’re running FastScore on MacOS, which will be indicated by a special note. The differences if you’re running Windows have not yet been fully charted.
The FastScore Microservices Suite is hosted on DockerHub (https://hub.docker.com/u/fastscore/). As such, one must first install Docker. For example, on Ubuntu Linux:
$ sudo apt-get install docker.io
It’s also useful (recommended but not mandatory) to have Docker Compose installed. Installation instructions can be found here: docs.docker.com/compose/install/.
On MacOS, Docker actually runs inside of a virtual machine (see Docker’s documentation here: https://docs.docker.com/machine/ ). In order to make sure all of the ports and IP addresses are handled correctly, you’ll need to run the commands from inside this virtual machine. To start the virtual machine and give it the name “default”, use the following command:
$ docker-machine create --driver=virtualbox default
This uses VirtualBox as the driver for the virtual machine. If you don’t have it already, you should download the VirtualBox client to manage the docker-machine. Among other things, this can be used to set up port forwarding for the virtual machine, which may be needed later. To switch to this environment for the default virtual machine, use the following command:
$ eval $(docker-machine env default)
The virtual machine’s IP address can be retrieved with the docker-machine ip command, e.g.,
$ docker-machine ip 192.168.99.100
This IP address should be used as the FastScore host machine IP address.
Once Docker has been installed, there are only a few steps needed to get FastScore running.
Let’s go through each step carefully.
The FastScore CLI can be downloaded and installed using the following commands:
pip install fastscore-cli
This will install the required dependencies. The FastScore CLI is a Python tool, so it doesn’t need to be compiled, and the setup script should automatically add the CLI to $PATH
.
python-pip
,python-setuptools
andpython-dev
(i.e. header files) are required to properly install the FastScore CLI. These may or may not be already present on your system. If not, you will need to install them. For example:$ sudo apt-get install python-pip python-dev python-setuptools
Once you’ve installed the FastScore CLI, check that it works by executing the following command in your terminal. Also see FastScore Command Line Interface for more information on subcommands.
$ fastscore help
FastScore CLI v1.9.1
Usage: fastscore <command> [<subcommand> ...]
Available commands:
help Explain commands and options
connect Establish a FastScore connection
login Authenticate a FastScore connection
config Configure the FastScore fleet
fleet Examine status of the FastScore fleet
use Select the target instance
run Run easy model setups
model Manage analytic models
attachment Manage model attachments
schema Manage Avro schemas
snapshot Manage model snapshots
policy Manage import policies
stream Manage streams/stream descriptors
engine Manage engine state
sensor Manage sensors/sensor descriptors
stats Show assorted statistics
debug Watch debugging messages
profile Profile internal operations
pneumo Access Pneumo messages
monitor Monitor data processing
Run 'fastscore help <command>' to get more details on <command> usage
This displays a list of all of the FastScore CLI commands.
We will be referencing files from the Getting-Started github repository found here. This set up will use Docker Swarm to bring up the FastScore containers.
Items in the repo:
From the directory with the Makefile, to quickly start FastScore:
make deploy
To quickly tear down FastScore:
make stop
First, let’s look at the containers described in the docker-compose.yaml file that are now up and running.
View all Docker containers that are running with the docker ps
command. The output should look something like this:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
14bfaee9100e fastscore/model-manage-mysql:1.9 "sh -c 'cd /var/lib/…" About an hour ago Up About an hour fs-vanilla_database.1.vb7bticmnpauo3hz6bor4r8qx
ada9681679e8 fastscore/engine:1.9 "/fastscore/startup.…" About an hour ago Up About an hour fs-vanilla_engine-2.1.ubonr83uuaqw2en4yk3lsc9w0
880b6882742d fastscore/engine:1.9 "/fastscore/startup.…" About an hour ago Up About an hour fs-vanilla_engine-1.1.or1ocif21hfoej8xxfo5qykbm
cb67e0090cf2 fastscore/connect:1.9 "bin/connect" About an hour ago Up About an hour fs-vanilla_connect.1.wxswv0g8hojps2f2c0i5lc96l
318eedda837a fastscore/dashboard:1.9 "npm --no-update-not…" About an hour ago Up About an hour fs-vanilla_dashboard.1.v7ia7we8qrwpyeawkk7jzg3se
b89be509f74d fastscore/kafka:1.9 "/entry.sh" About an hour ago Up About an hour 2181/tcp, 2888/tcp, 3888/tcp, 9092/tcp fs-vanilla_kafka.1.w1tc6grz7iil56x2b8pcdbjal
84b4f75fccda fastscore/model-manage:1.9 "bin/model_manage" About an hour ago Up About an hour fs-vanilla_model-manage.1.o750isigzx7v17kwddn17nmzl
Next, let’s take a look at the setup.sh file that is referenced in the Makefile.
fastscore connect https://localhost:8000
fastscore config set config.yaml
fastscore fleet -wait
FastScore will ask you to connect to the host it is running on and the port of the dashboard. In this example we are using localhost and the dashboard is listening on port 8000.
FastScore’s microservices architecture requires each microservice component to communicate with other components. These communications are managed by the Connect microservice. In order for Connect to connect, it has to be given information about the other microservices components in a configuration file. A sample configuration file is shown below:
fastscore:
fleet:
- api: model-manage
host: model-manage
port: 8002
- api: engine
host: engine-1
port: 8003
- api: engine
host: engine-2
port: 8003
db:
type: mysql
host: database
port: 3306
username: root
password: root
pneumo:
type: REST
#type: kafka
#bootstrap:
# - kafka:9092
#topic: notify
Configuration files are written in YAML. The configuration file above specifies the host machines and ports for the Model Manage container, the MySQL database container used by Model Manage, and two Engine containers, all hosted on the same machine. Additionally, Pneumo, an asynchronous notification library used by FastScore, is configured to communicate via Kafka or REST.
We can then check the status of our containers using the fleet
command:
$ fastscore fleet -wait
Name API Health
-------------- ------------ --------
engine-1 engine ok
engine-2 engine ok
model-manage-1 model-manage ok
Finally, the load.sh file will load in some models, attachments, schemas, and streams into FastScore so you can start deploying! We are loading these into the MySql backing of Model Manage for this example. Alternatively, you can connect Model Manage to your own github repository that already has models, attachments, streams, and schemas in it! Now we’re ready to start scoring.
FastScore’s Dashboard provides a convenient user interface for reviewing engine status and managing models, schemas, sensors, and streams. However, as compared to the FastScore CLI, it requires a few additional setup steps to get things running.
First, if you are not running FastScore on your local machine (for example, if you have FastScore running on a cloud service platform), you will need to allow incoming and outgoing traffic on port 8000 (used by the FastScore Dashboard). You will also need to have configured FastScore as described in the previous section.
To access the Dashboard, take your browser to the FastScore host machine at port 8000 using the HTTPS protocol (it won’t work with HTTP). If all goes well , you will be greeted by this screen:
On the left-hand side of the Dashboard are four sections: engine-1, engine-2, model-manage-1, Connect. These correspond to the Engine microservices, the Model Manage microservice, and the Connect microservice. The green dots on the engines and model manage indicate that they are currently running correctly. If you have configured additional engine containers, they will also appear on the side.
If, instead, you get an “Application Initialization Error,” check your configuration file for any errors, and verify that you have followed all of the FastScore CLI configuration steps. If the
fastscore fleet
command shows both Model Manage and your Engine containers working properly, then the problem most likely has to do with Dashboard’s proxy service or your host machine’s network traffic settings.
FastScore is a streaming analytic engine: its core functionality is to read in records from a data stream, score them, and output that score to another data stream. As such, running any model consists of three steps:
Version 1.9 of FastScore supports models in Python, R, Java, MATLAB, PFA, PrettyPFA and C formats. Some setup steps differ slightly between Python/R models and PFA, Java, MATLAB, or C models. As a model interchange format, PFA can provide some benefits in performance, scalability, and security relative to R and Python. PrettyPFA is a human-readable equivalent to PFA. However, as the majority of users will be more familiar with R and Python, we focus on these two languages in this section.
The FastScore CLI allows a user to load models directly from the command line. The list of models currently loaded in FastScore can be viewed using the model list command:
$ fastscore model list
Name Type
------ ------
MyModel Python
Models can be added with model add <name> <file>
, and removed with model remove <name>
. Additionally, the fastscore model show <name>
command will display the named model.
The Dashboard provides functionality to add and manage models. To upload a model, under the Models tab, select the “Upload model” button, and choose a model from your local machine. Alternatively, “select model”, depicted below, allows you to select an existing model from the model manager by name.
Additionally, models can be added, removed, inspected, and edited from the Models tab under Model Manage:
The screenshot above shows the model manager tab, and an existing “auto_gbm.py” model. Models can be removed, saved, created, uploaded, or downloaded from this view. Note that after creating or modifying a model in this view, it must still be selected for use from the Engine tab.
All models are added to FastScore and executed using the same CLI commands, namely:
fastscore model add <modelname> <path/to/model.extension>
Note that, in order to determine whether a model is Python or R, Engine requires that it have an appropriate file extension (.py
for Python, .R
for R, .pfa
for PFA, and .ppfa
for PrettyPFA). Also, in order to score a Python/R model, there are certain constraints on the form the model must take.
FastScore includes both a Python2 and Python3 model runner. By default, .py
files are interpreted as Python2 models—to load a Python3 model, use the file extension .py3
, or the flag -type:python3
option with fastscore model add
:
fastscore model add -type:python3 my_py3_model path/to/model.py
to add a Python3 model.
Python models must declare a one-argument action()
function. The minimal example of a Python model is the following:
# fastscore.input: input-schema
# fastscore.output: output-schema
def action(datum):
yield 0
This model will produce a 0 for every input.
Additionally, Python models may declare begin()
and end()
functions, which are called at initialization and completion of the model, respectively. A slightly more sophisticated example of a Python model is the following:
# fastscore.input: input-schema
# fastscore.output: output-schema
import cPickle as pickle
def begin(): # perform any initialization needed here
global myObject
myObject = pickle.load(open('object.pkl'))
pass # or do something with the unpickled object
def action(datum): # datum is expected to be of the form '{"x":5, "y":6}'
record = datum
x = record['x']
y = record['y']
yield x + y
def end():
pass
This model returns the sum of two numbers. Note that we are able to import Python’s standard modules, such as the pickle
module. Non-default packages can also be added using Import Policies, as described here. Custom classes and packages can be loaded using attachments, as described in the Gradient Boosting Regressor tutorial.
R models feature much of the same functionality as Python models, as well as the same constraint: the user must define an action function to perform the actual scoring. For example, the analogous model to the Python model above is
# fastscore.input: input-schema
# fastscore.output: output-schema
# Sample input: {"x":5.0, "y":6.0}
action <- function(datum) {
x <- datum$x
y <- datum$y
emit(x + y)
}
FastScore enforces strong typing on both the inputs and outputs of its models using AVRO schema. For R and Python models, this typing is enforced by specifying schema names in a smart comment at the top of the model file:
# fastscore.input: array-double
# fastscore.output: double
Python and R models must specify schemas for their inputs and outputs. PrettyPFA and PFA models already contain the input and output schema as part of the model definition, so they do not require a schema attachment.
For example, a model that expects to receive records of two doubles as inputs might have the following schema:
{
"name": "Input",
"type": "record",
"fields" : [
{"name": "x", "type": "double"},
{"name": "y", "type": "double"}
]
}
The model might then produce a stream of doubles as its output:
{
"name": "Output",
"type": "double"
}
Input and output schema must be uploaded separately to FastScore. To upload the schema to FastScore with the CLI, use the following commands:
fastscore schema add input input.avsc
fastscore schema add output output.avsc
Attachments can also be managed from within the Dashboard, using the Model Manage view.
Before a model can be run, it has to have some data to run on. Input and output streams are used to supply the incoming data to the model, and to return the corresponding scores. Currently, ten types of stream transports are supported: file, Kafka, Authenticated Kafka, Executable, HTTP, TCP, UDP, ODBC, debug, and console streams. All of these types are configured using a Stream Descriptor file.
Stream Descriptors are small JSON files containing information about the stream. An example of a Stream Descriptor for a Kafka stream is displayed below:
{
"Description": "read Avro-typed JSON records from a Kafka stream",
"Transport": {
"Type": "kafka",
"BootstrapServers": ["127.0.0.1:9092"],
"Topic": "data-feed-1",
"Partition": 0
},
"Encoding": "json",
"Schema": { type: "record", ... }
}
Stream descriptors are documented in more detail on the stream descriptor page. The easiest type of stream to use is a file stream, which reads or writes records directly from/to a file inside of the FastScore engine container. Here is an example of such a stream:
{
"Description": "read input from the specified file",
"Loop": false,
"Transport": {
"Type": "file",
"Path": "/root/data/neural_net_input.jsons"
},
"Envelope": "delimited",
"Encoding": "json",
"Schema": {"type": "array", "items": "double"}
}
This file stream expects each line of the neural_net_input.jsons
file to be a vector of doubles, encoded as a JSON object, and delimitated by newlines. The file is located in the /root/data/
directory of the engine container. The "Loop": false
line tells FastScore to stop reading the file after reaching the end of the file, as opposed to looping over the lines in the file.
The FastScore CLI can be used to configure data streams. The stream list
command displays a list of existing streams:
$ fastscore stream list
demo-1
demo-2
By default, two demo file streams are included in FastScore. The demo-1 data set consists of random numbers. The demo-2 dataset consists of lists of JSONS with the following AVRO schema:
{
"type":"array",
"items": {
"type": "record",
"fields": [
{"name":"x", "type":"double"},
{"name":"y", "type":"string"}]
}
}
These demo streams can be used to test whether or not a simple model is working correctly.
Additional streams can be added using the fastscore stream add <stream-name> <stream-descriptor-file>
command. Existing streams can be sampled (displaying the most recent items of the stream) with fastscore stream sample <stream-name>
.
For filestreams, it is easiest to manage container input and output by linking a directory on the host machine to the engine container. This can be done in the Docker-Compose file by modifying the engine service to the following:
[...]
engine-1:
image: fastscore/engine:1.9
network_mode: "host"
environment:
CONNECT_PREFIX: https://127.0.0.1:8001
volumes: # new volume section
- ./data:/root/data
[...]
This will link the ./data
directory on the host machine to the /root/data
directory of the engine container. A filestream from the file “mydata.jsons” located in data
on the host machine can then be accessed by FastScore using the stream descriptor
{
"Loop": false,
"Transport": {
"Type": "file",
"Path": "/root/data/mydata.jsons"
},
"Envelope": "delimited",
"Encoding": "json",
"Schema": [...]
}
A similar stream descriptor can be used for the output stream to write the output scores to a file in the same directory.
When using Docker volume linking to link a directory on the host machine to the Engine instance, Docker must have privileges to read and write from the specified directory. Additionally, the directory on the container must be chosen carefully, as its contents will be overwritten with the contents of the corresponding host directory upon linking.
/root/data
is safe (as it only contains the demo datafiles), but other directories on the container (e.g.,/usr
) may not be.
Analogously to models, streams can also be manipulated from the Dashboard. Selecting the “Streams” tab under Model Manage displays the following view:
On the left, existing Stream Descriptors are displayed. New Stream Descriptors can be added and existing ones edited from this view. The example above displays a simple file stream, which will load the
input_data.jsons
file located in the /root/data directory
of the Engine Docker container.
Engine parameters, such as the number of Engine instances currently running, as well as information about the model, are displayed on the Dashboard Engine tab.
When using the Dashboard, models will begin scoring as soon as both the model and input/output streams are set from the Engine tab, and no further action from the user is required. Various statistics about performance and memory usage are displayed on the Engine tab.
To run a model using the FastScore CLI, use the fastscore job
sequence of commands:
fastscore job run <model-name> <input-stream-name> <output-stream-name>
runs the model named <model-name>
with the specified input and output streams.fastscore job stop
halts the currently running model.fastscore job status
and fastscore job statistics
display various information about the currently running job.
Some of the statistics displayed by the fastscore job statistics
command, such as memory usage, are also shown on the Dashboard.This concludes the FastScore Getting Started guide. Additional FastScore API documentation is available at https://opendatagroup.github.io/Reference/FastScore%20API/. Happy scoring!