Developer Center

Resources to get you started with Algorithmia

Algorithm Development with Model Versioning

Updated

In the past, when shipping algorithms into production on Algorithmia, it’s been a challenge to ensure the integrity of model files. It’s also been difficult to update models, as it’s often the case that model file paths are hard-coded into the algorithm source code itself. Until now, there’s never been a standardized way to manage the serialized model files and other data files that are used by algorithms. Enter the model manifest system.

Introduction to the model manifest

Here we introduce the concept of model manifests: JSON files that define the types of data files your algorithm requires (or conditionally requires) to process requests. Model manifests can describe anything from weights files, to labels, to optional language models for more complex workflows.

The following is a brief example of a model manifest:

{
  "required_files": [
    {
      "name": "scikit_model",
      "source_uri": "data://zeryx/scikit_demos/digits_classifier0001ABC.pkl",
      "fail_on_tamper": true
    }
  ],
  "optional_files": []
}

Schemas

Below are schemas for model manifest file objects.

Model manifest schema

required_files - Any file objects defined here will be eagerly downloaded and prepared for algorithm usage optional_files - Any file objects defined here will be lazily downloaded when a get_model(..) operation is invoked

Data file object schema

name - The unique identifier for this particular file object; this ID is the reference you’ll want to use when interacting with and manipulating a data file in your algorithm source_uri - An Algorithmia data URI (i.e., prefixed with data://, s3://, gcp://, etc.) that points to your model file; first make sure your account has access to the file at this location fail_on_tamper - An optional boolean field that lets you define whether or not your algorithm should fail if the model manifest system detects that this file has been adjusted since the initial build

Frozen manifest

The model manifest isn’t the whole story, however. In order to truly ensure that your model files haven’t been altered, a new freeze command has been added to the algo CLI.

If you’re in your algorithm directory after git cloneing locally, simply type algo freeze to automatically generate a frozen model manifest file. This file will include md5_checksum elements for each file and a lock-checksum element for the manifest file. These hash values can then be used to detect tampering.

{
  "required_files": [
    {
      "name": "scikit_model",
      "source_uri": "data://zeryx/scikit_demos/digits_classifier0001ABC.pkl",
      "fail_on_tamper": true,
      "md5_checksum": "bbb2113cb37feaae8f0989f25021aafd"
    }
  ],
  "optional_files": [],
  "timestamp": "1635788201.4994009",
  "lock_checksum": "c30654c2359d42c8d6e36918516c52ad"
}

If there’s a conflict between the md5 hash calculated at runtime and the hash calculated when the algorithm was compiled, a tamper event is sent; this special type of event can be picked up and used to throw an exception.

You don’t want your algorithm in production to get tampered with; for example, imagine if an attacker subtly adjusted your model to do something illegal or malicious! By default, any data file object with fail_on_tamper set to True will throw an exception on md5 mismatch.

Using the load function to leverage the model manifest in your algorithms

Now that we’ve walked through the various elements of this custom model manifest structure, let’s take a look at how we can actually use it in an algorithm.

The key component is the get_model method, e.g.:

def load(model_data):
    model_path = model_data.get_model("scikit_model")
    with open(model_path, 'rb') as f:
        model = pickle.load(f)
    return model, model_data.client

With our overhauled ADK system, load functions now allow you to directly interact with and utilize model manifest objects without even needing to import them. No boiler plate needed—it’s all handled automatically for you.

from Algorithmia import ADK
import Algorithmia
import numpy as np
from PIL import Image
import pickle
import sklearn

def load(model_data):
    model_path = model_data.get_model("scikit_model")
    with open(model_path, 'rb') as f:
        model = pickle.load(f)
    return model, model_data.client

def format_image(url, client):
    local_image_path = client.file(url).getFile().name
    img = Image.open(local_image_path)
    img = np.asarray(img)
    return img.flatten()

def apply(input, load_data):
    model, client = load_data
    image = format_image(input, client)
    results = model.predict([image])[0]
    return int(results)

algorithm = ADK(apply, load)
algorithm.init("Algorithmia")

Conclusion

With recent updates to the ADK, and with this new model manifest data-versioning system, we’re making algorithm development simpler and more standardized.

If you want to see the model manifest system in action, please check out this model manifest example algorithm, and keep your eyes peeled on our algorithmia-adk repo for the latest improvements.