Home > Hadrian > Before you begin…

Download and install Aurelius. This article was tested with Aurelius 0.8.3; newer versions should work with no modification. R >= 3.0.1 is required.

Launch an R prompt and load the `aurelius`

library:

```
R version 3.0.1 (2013-05-16) -- "Good Sport"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(aurelius)
>
```

The glmnet library for regularized linear models usually isn’t packaged with an R distribution, but it is available on CRAN. This page assumes that you are not only familiar with the glmnet package, but have already created and fine-tuned your linear model, having produced a `glmnetObject`

of class `"cv.glmnet"`

.

Conversion to PFA proceeds in three steps:

- Extract parameters from the
`glmnetObject`

. - Format them as an R list-of-lists that is equivalent to a data structure in PFA.
- Create the PFA document, including the line of PFA code that evaluates the linear model.

These steps are not combined into one function call to allow for variations in how the model is invoked, including preprocessing, postprocessing, and attaching additional information to the linear fit object.

The following pulls all the relevant data from the `glmnetObject`

, leaving training data behind.

```
fit <- pfa.glmnet.modelParams(
pfa.glmnet.extractParams(
glmnetObject, lambdaval = "lambda.chosen"))
```

where `lambdaval`

allows you to choose a field within the `glmnetObject`

. This `fit`

has a `fit$const`

term (a number or a length *N* list of numbers) and a `fit$coeff`

factor (a length *N* list of number or an *N* by *M* list of lists).

The field names could be extracted as

```
# avoid features selected out by the lasso
mat <- coef(glmnetObject, s = glmnetObject$lambda.chosen)
fieldNames <- rownames(mat)[as.numeric(mat) != 0.0]
fieldNames <- as.list(fieldNames[-1])
```

If you want all of these regressors to be in the input schema, as a record of named fields, create the input record like this:

```
fieldNames <- fit$regressors
fieldTypes <- rep(avro.double, length(fieldNames))
names(fieldTypes) <- fieldNames
inputSchema <- avro.record(fieldTypes, "Input")
```

PFA’s `model.reg.linear`

expects the input to be an array of anonymous numbers, not a record of named numbers, so you need to preprocess the input with some generated code. Here’s how to generate the code:

```
makeVectorCode <- list(type = avro.array(avro.double),
new = lapply(fieldNames, function (n) {
paste("input.", n, sep = "")
})
```

You can look at what you’ve created with `cat(json(makeVectorCode))`

to see how it will look in the completed PFA document. It creates an array from each input field manually, and thus encodes the order of named fields to be transformed by a vector or matrix of anonymous columns.

If you have no reason to name the input fields (e.g. they never had names, and it doesn’t make sense to invent “field1”, “field2”, “field3”, etc.), just make the input schema an array and skip the preprocessing step.

```
inputSchema <- avro.array(avro.double)
```

It is good practice to use an `avro.typemap`

to ensure that named types are declared only once in the output PFA.

```
tm <- avro.typemap(
Input = inputSchema,
Output = avro.double,
Regression = avro.record(list(const = avro.double,
coeff = avro.array(avro.double))))
```

Naturally, if your `fit$const`

is a list and your `fit$coeff`

is a list-of-lists, this should be reflected in your `Output`

and `Regression`

types:

```
tm <- avro.typemap(
Input = inputSchema,
Output = avro.array(avro.double),
Regression = avro.record(list(const = avro.array(avro.double),
coeff = avro.array(avro.array(avro.double)))))
```

Here’s an example PFA document with our `makeVectorCode`

preprocessing:

```
pfaDocument <- pfa.config(
input = tm("Input"),
output = tm("Output"),
cells = list(regression =
pfa.cell(tm("Regression"), list(const = fit$const,
coeff = unname(fit$coeff)))),
action = expression(
vector <- makeVectorCode,
model.reg.linear(vector, regression)
))
```

Without it, we’d just pass `input`

as the first argument to `model.reg.linear`

. If this is logistic regression (or there’s some other reason you want to postprocess with a link function, wrap the `model.reg.linear`

with `m.link.logit`

or another link function.

To write the PFA to a file, use

```
json(pfaDocument, fileName = "mymodel.pfa")
```

If you have Titus and rPython installed (see installation page), you can test the scoring engine without leaving R.

```
engine <- pfa.engine(pfaDocument) # verifies that pfaDocument is internally consistent
engine$action(list(field1 = 3.14, field2 = 3.14))
```

where `field1`

, `field2`

, etc. are named fields, assuming a record-based input schema. If your input schema is an anonymous array, use

```
engine$action(list(3.14, 3.14))
```

with the appropriate number of dimensions and values that test the correctness of the model.