metacog.png

Integrating Metacog with customer’s backend infrastructure: a reference guide

Introduction

From its conception, Metacog platform was envisioned as a growing set of RESTful API endpoints, grouped by services like Data Ingestion, Data Retrieval, Data Processing and others.

As a customer, you are free to interact directly with those endpoints, with any programming language that best suits your needs; and for some specific use cases, like Data Ingestion from HTML5 interactive widgets, you may make use of the Metacog Client Library: a javascript component that implements services like network-safe data injection, playback and scoring support on top of the Metacog API.

The official documentation presents a large amount of demos and tutorials designed to highlight specific features of both the Metacog API and the Metacog Client Library, including:

All of these demos and tutorials work by using demo security credentials and predefined values for concepts like “JobIDs” and “VisualizationIDs”. In other cases the Metacog Client Library quietly provides by-default random-generated “Session IDs” and “Learner IDs”. This approach is good for demos, but the question that remains is: what is the role played by those values at the moment of integrating Metacog in your own infrastructure? How do you generate and manage them? What are the recommended good practices at architectural level, and how to deal with the Metacog’s architectural directive of opaque learner ids?

This guide try to present answers to all these questions, by documenting the architectural rationales and by presenting the “Toy Server”: A cloud-based minimal (but fully workable) backend designed to simulate a customer’s infrastructure, fully integrated with Metacog API.  

Metacog’s Architectural Concepts

There are two main concepts in Metacog Architecture that you need to know before starting to design the integration with your own infrastructure: The Opaque IDs directive and the Async-retrieval pattern.

Opaque IDs Directive

The Opaque IDs directive says that no real ids should travel from the customer’s system to Metacog Platform. Despite offering high security standards in all the API endpoints, it is a Metacog belief that the better way of not compromise customer sensitive data is just neither manipulate nor store that data at all.

In Metacog’s context, there are two sources of sensitive data: the pair of learner-id and session-id and the credentials used to interact with the API endpoints.

Learner and session ids

As part of the initialization parameters of the Metacog Client Library, a session_id and a learner_id are expected. If they are not found, random generated values will be provided under the hood.

The learner_id is important because it is the only way to identify a group of sessions as belonging to the same learner.  Under the same rationale, session_ids should be unique to the learner, in order to clearly differentiate the data belonging to each execution.

The random session_id will be generated each time the Metacog Client Library is initialized, and that covers a large set of use cases: most of the times you want to identify each attempt of the learner with the widget as an independent session.

But there are other use cases: Imagine that you are tracking the interactions of a learner over a game or simulation designed to be run for some days, maybe a few weeks. In that case, each time the user logs into the game you may want to provide the same session id.

Other interesting use case is for multiplayer games: maybe in a multiplayer scenario all the users should be registered under the same session id, and new session ids should be triggered each time a new game is started.  

Authorization tokens

Metacog does not use the concept of system users and therefore does not keep a list of user ids of your organization. The reasoning is that a Metacog user model implies to keep an internal storage of users, credentials and roles that may not fit the model of each organization that wants to interact with the platform.

Instead of that, there are API endpoints that you can use to generate Authorization tokens: a token will be associated to a specific subset of data, and by restricting the access of your system users to the tokens in your platform you can implement different levels of data-isolation.  

Async-Retrieval pattern

With the exception of some specific real-time services, all the services designed to retrieval data follow the same pattern:

  1. A POST call to create a Request (i.e: DataRequest, VisualizationRequest..). This call returns a request ID (DataRequestID, VisualizationRequestID.. ). Optionally, you can pass an email address for notification.
  2.  You can store the RequestID and polling a GET endpoint asking for the status of the Request. It will respond “PROCESSING” and eventually “DONE”. If you passed a notification email address at step 1, you may omit this step and trigger step 3 when the email’s arrive.
  3. You can GET the Request.

From the point of view of integration, it means that RequestIDs should be kept at least until the “DONE” status is reached. RequestIDs are also useful to obtain the original parameters used as filters for the Request.

Toy Server Design

The Toy Server was designed with the goal of represent the minimal infrastructure for an educational organization connected to Metacog, with users that may be either learners or teachers.

Learners use a set of interactive widgets that feed data into Metacog, and teachers can track their interaction, both globally and also at the detail level.

The following illustration presents the covered use cases:

 

Use Case: Learners: Use Widgets

A learner will be presented, after login, with a set of interactive widgets. Picking any of the widgets will generate a Metacog session that will be associated to the learner’s id.

Use Case: Teachers: Manage Learner’s identities

Teachers should be able to create mappings between learner id’s and metacog id’s. The idea here is to satisfy the opaque id’s directive by keeping learner’s ids within the boundaries of the Toy Server, and use another value, the metacog ID, to store sessions in the Metacog Platform.  All CRUD operations should be supported in this use case.

Ideally, all interactions that involves learner’s ids should behave as in the following diagram:

Frontend and Backend will interchange and understand real Learner’s ids, but all the information sent and received  from Metacog API will be transformed in such a way that only opaque metacog IDs will be shared.

Use Case: Teachers: Manage Query Projects

As already mentioned in the “Async-Retrieval directive” section, Retrieval of data in Metacog works by creating a Request Object (there are different types of them) and then polling or waiting for notification, until the object reaches the “DONE” status.

Once the Request is in “DONE”, the associated data will be always available for fast retrieval.

We are going to refer to the Metacog’s DataRequest object as a “Data Exploration Project” or just “a Project”, and the toy server will be in charge of keeping track of the original request parameters, the Request Object’s id and the polling of the status.

Use Case: Teachers: View General Data

Once a project reaches the “DONE” status, the Toy Server will bring access to custom a visualization of all the associated data. The visualization will be interactive, allowing the teacher to select specific learner paths, and compare the paths of different learners.

Use Case: Teachers: Replay Individual Data

The teacher should be able to select a specific session, for any learner included in the original Data Request, and execute the playback of the Learner’s behavior alongside with a visualization of the event flow.

Toy Server Architecture

The following diagram presents the top-level architecture of the Toy Server:

 

It was conceived as a RESTful backend on top of AWS Services, with an independent frontend developed as an SPA built with the Angular 2 framework, plus some visualizations made with the D3.js library. The front end also contains some of the widgets that are already instrumented in the metacog website, but wired in such a way that instead of random learner ids they will use known learner ids from the system.

Both the Toy Server’s backend and frontend have communication with the Metacog API. The backend performs raw calls, while the front-end delegates to the Metacog Client Library.

Instead of using the standard approach of writing the server as a basecode to be deployed in an EC2 instance, the Toy Server makes use of the AWS services Lambda and Gateway, which in combination, allow the creation of server-less backends: Gateway is in charge of handling the HTTP request and responses, while the actual code execution resides in lambda functions, written in javascript for a node.js execution environment. The beauty of this approach is not only to reduce the deployment complexity, but also facilitate reproducibility of the test environment: the whole configuration is available for download as a simple script that will recreate the Toy Server in your own AWS environment, carefully isolated for any other resource that you may have there. We provide more details in the “cloning the Toy Server” section.

In the above diagram, the main components of the Toy Server Backend infrastructure are presented, highlighting the three types of implemented endpoints according to its usage of other AWS and Metacog Resources:

Type 1 endpoints: AWS Gateway redirections

The workflow through the AWS API Gateway involves as first step to apply the Toy Server Authentication Strategy to the incoming request, to validate the auth token.

Then, at the Integration Request step, the publisher_id and  application_id are added to the HTTP header before doing a call to the target metacog endpoint.

Response from the endpoint is passed as is, with no further manipulation.

Type 2 endpoints: Calling Metacog API from AWS Lambda functions

Type 2 endpoints invoke a Lambda function at the integration step. The function return either the expected data or pre-defined error strings that are mapped to proper HTTP error codes in the Integration and Method Response stages.

This is an example of one of the lambda functions:

 

In the above code, the headers object is feed with the publisher and application ids that are stored in the credentials.js file. By zipping all the js files of the project into a single package, and associating the package to all the lambda functions, you can share and reuse code and configurations files easily.

Type 3 endpoints: toy-server functionality with persistence

The main feature of these endpoints is that they implement functionality that is exclusive of the toy server, they may (or not) access metacog endpoints, but more often they will rely in custom persistence.

The above Lambda function makes use of the dynamoodb-doc library, provided by default in the node.js execution context of AWS Lambda, and access the mappings table to obtain a list of learner ids.  

Authentication Strategy

This section covers the authentication mechanisms chosen for the Toy Server. There are two authentication zones: Toy Server to Metacog and Toy Server’s frontend to backend.

When a new account is created for an organization in the Metacog Platform, a pair of credentials are provided, known as the publisher and application id’s.

By owning these two keys, any user will have access to all metacog endpoints, either directly or by using them to create authentication tokens than can be used to access the remaining endpoints.

To decide when the credentials should be passed down to the user is an important design decision.   We have the following cases:

Toy server Backend to Metacog

 We have the following cases:

These are the most secure cases, as the Metacog credentials are never exposed to the front-end.

 

Toy Server Frontend to Metacog

There are cases where direct communications between client side and Metacog are desirable.  This is the case of the Javascript Metacog Client Library: the library provided by Metacog that implements the logger logic, but also can be used for more advanced functionality, like playing-back and scoring training sessions.

In the toy server, we identify two security zones for the usage of the Metacog Client Library based on the role.

Secure zone

We are going to assume that teachers are going to log-in in a secure environment. It means that the front end will have access to both the publisher and application id tokens, allowing the teacher to make direct calls to metacog API from javascript.   Learners never should have access to this zone.

Unsecure zone

For the learners, we assume they are going to log-in  in an unsecure environment. Therefore, they are not going to have access to all the credentials, but only to the publisher_id and they will make use of a special learner_token provided by Metacog.

This is how the process works:

  1. The authentication process of the Toy Server should make an additional call to Metacog API /access/learner, and after passing the publisher,  application and learner id, it will get a special learner_token, valid only for the given learner.
  2. The publisher_id and the lerner_token are returned to the front end, keeping the application_id safe in the Toy Server Backend context.
  3. The Metacog Client Library is started with the learner_token instead of the application_id, in this mode the Client Library will only be able to log events to Metacog, but not using any of the other endpoints.

Toy Server Frontend to Backend authentication

The Toy Server offers three logging mechanisms:

  1. A Teacher can authenticate with an external OAuth provider (Google). Once the OAuth authentication is OK, the user id (email) is compared against a whitelist stored as a json file in a S3 Bucket. The oAuth token is returned to the frontend, and future calls will pass it as an http header.
  2. A demo login is provided. This one overrides the OAuth mechanism, and avoids the  whitelist.
  3. Learners login by selecting a learnerid from a list. No password is asked. It means that there is a unsecure call to an endpoint that returns the list of learnerids, so it is a implementation only for demo purposes.  

The following picture presents the screen-flow of the three mechanisms:

Client Side Implementation

This section covers some details of the client side implementation, mostly focused on the interaction with Metacog services and widget integration. The client is built upon Angular2 and typescript, with D3.js used for the implementation of the group visualization.

Managing Learner Identities

This view is a CRUD interface over the DynamoDB “mappings” table of the Toy Server.  

It was implemented as a Angular2 component plus a set of services, as presented below:

The toyServerAPI service encapsulates the HTTP calls to the Toy Server endpoints, and the mappings.service provides the abstracted CRUD interface, properly handling eventual network errors.  A similar pattern was followed for the other UI components.

Managing Projects

As mentioned in the Toy Server Design section, a Project represents the union of a ScoreRequest and a VisualizationRequest that share the same parameters.

The screenshots below present the way how those parameters are captured:

Notice that the Add Learner button present to the user the learner ids, instead of metacog ids.

The backend should take care of convert those ids before doing the calls to metacog API.

Other parameters that need to be know at these step are the list of widgets ids, right now we are using three in the demo: “BLL002”, “TEI001” and “chernoff_tutorial”. Please check the “widget integration” section below to know more about the widgets.

Visualization Workflow

Once a project is created, the associated Visualization Request and Score Request start processing in the Metacog Platform and the UI will display the label “processing..”.

Only when both Request rech DONE status, the UI will present the user the option to explore data:

The first visualization available present every session performed by all the learners, in each one of the available widgets. Some interactive features are available, like visualization of the “learning path”: a line that joins all the sessions for the selected students, and also the user can click on any learning path to display labels at each session in the path with additional information like the duration of each session and the number of events.

By clicking in one of those labels, the user will be redirected to a detail view, where he will be able to execute the playback of the chosen session, along with a learnogram visualization. This view is well suited for detailed analysis of the behavior of a learner in a particular widget.

Widget Integration

In the previous section we saw that widgets can be embed to work in sync with a visualization. But widgets can also be accessed as a user, to create new sessions.

In order to have widgets capable of these kind of integrations, some considerations should be kept in mind:

Cloning the Toy Server

Using AWS management console:

Metacog provides a cloud formation template that can be used to create your own copy of the metacog toy server in your own AWS. In order to use it  you just have to go to the cloud formation service console and click the create stack button

Then on the choose a template section click on “Specify an Amazon S3 template URL” and specify the template url (https://s3.amazonaws.com/metacogtoyserverpublic/metacogToyServer.template), after that click next and a list of the following parameters will apear:

After specifying the parameters click next and the options section will appear, click next again and on the review section on the bottom check the checkbox on the capabilities section:

Screen Shot 2016-07-11 at 6.25.54 PM.png

After that click the create button and wait for the stack to reach the status “CREATE_COMPLETE”.  

Once the stack is created you can start using it with the invoke url that is on the output section after you select the stack

Screen Shot 2016-07-11 at 6.37.36 PM.png

In order to delete a stack select it and click on the actions button and then the delete stack option.

Some general guidelines:

Using AWS command line interface:

In order to create a stack with the aws cli you can use the following command:

aws cloudformation create-stack --stack-name toyserverCLI --capabilities CAPABILITY_IAM --template-url https://s3.amazonaws.com/metacogtoyserverpublic/metacogToyServer.template --parameters ParameterKey=ApiName,ParameterValue=MetacogToyServerAPI ParameterKey=MetacogEndpoint,ParameterValue=https://api.metacog.com ParameterKey=PublisherID,ParameterValue=XXXXXX ParameterKey=ApplicationID,ParameterValue=XXXXXXXXXXX ParameterKey=Auth,ParameterValue=XXXXXXX ParameterKey=createDemoData,ParameterValue=yes ParameterKey=demoToken,ParameterValue=test ParameterKey=EmailList,ParameterValue=lfcaro@gmail.com

You can replace the parameterValue arguments for your own parameter values, after executing the command wait for the stack to reach the status “CREATE_COMPLETE”.  

In order to delete a stack use the following command::

aws cloudformation delete-stack --stack-name toyserverCLI

Configuring the client side project

The demo page at Metacog’s site offers a running version of the client side project connected to a living instance of the Toy Server.

You also can download the client side project, either to hit the existing demo Toy Server or in combination with your own server,  after cloning the Toy Server in your own AWS Infrastructure, as described in the previous section.

The Client Project is running at:  https://www.metacog.com/developer/examples/session_explorer

If you visit the link, you will see in the menu bar the “Sources” link, that allows the download of a zip file:  

The Zip file presents the following structure:

In addition, the zip also contains login.component.js and login.component.map. These files are the transpiled versions of the typescript file, and can be safely removed, if you prefer to process the typescript on your own.

 Pointing to your custom instance of the Toy Server

There is a config.js (not a typescript one) that should be configured to customize the URL of the Toy Server instance and the URL of the widgets used for demonstration.

This is how the config.js file looks for the demo site:

var Config = {

api: "https://klc6oo1xsd.execute-api.us-east-1.amazonaws.com/stage/",

demo_token: '1Olfa9,b208lmd?}cg3SF3qP"48"z$;?`iRo}9A1@J&dBF2nl)x`7,5+4-19vu*',

widgets: {

'TEI001':'http://www.metacog.com/developer/examples/tei/logging.html',

'BLL002':'http://www.metacog.com/developer/examples/phet/concentration-logging.html',

'chernoff_tutorial':'http://www.metacog.com/developer/examples/chernoff/logger/index.html'

 }

};

You may also need to adjust the base path in the index.html file, to match your client side deployment strategy. I.e, if you want to run the application in the root, you may need to change the base to “/”.

Once you have properly configured, running ‘npm start’ in the root folder should start the angular application.