Skip to content

Tutorial

Launch Our Product on AWS Marketplace

Step 1: Go to AWS Marketplace

  1. Visit our product's page on AWS Marketplace.

Step 2: Select the Product

  1. On the product details page, click the "Continue to Subscribe" button.

Step 3: Choose EC2 Launch

  1. On the subscription confirmation page, click the "Continue to Configuration" button.
  2. Choose your configuration options, such as software version and region, then click the "Continue to Launch" button.
  3. On the launch page, select "Launch through EC2".
  4. Configure the EC2 instance as per your needs, then click the "Launch" button.

Important Considerations During Launch

  • Name: Assign a name to your instance, such as recommend-hq.
  • Instance Type: Select an instance type with at least 4 GB of memory, such as t4g.medium.
  • Key Pair: Create a new key pair or select an existing one. You will need the private key file (.pem) to connect to your instance via SSH. For more details, refer to the AWS Key Pair documentation.
  • VPC Configuration: Configure the VPC settings to ensure network connectivity for your instances. If you are unsure, you can use the default VPC provided by AWS. For more details, refer to the AWS VPC documentation.

Step 4: Connect to Your EC2 Instance

  1. Open an SSH client.
  2. Locate your private key file. The key used to launch this instance is YOUR-KEY-PAIR.pem
  3. Run this command, if necessary, to ensure your key is not publicly viewable.
    chmod 400 "YOUR-KEY-PAIR.pem"
    
  4. Connect to your instance using its Public DNS:
    ssh -i YOUR-KEY-PAIR.pem ubuntu@EC2-INSTANCE-PUBLIC-DNS
    
    For example, if your key pair file is named recommend-hq.pem and your instance's public DNS is ec2-43-207-37-42.ap-northeast-1.compute.amazonaws.com, the command would look like this:
    ssh -i "recommend-hq.pem" ubuntu@ec2-43-207-37-42.ap-northeast-1.compute.amazonaws.com
    

Step 5: Configure AWS Permissions

Setup Permissions

It is recommended to attach the AdministratorAccess AWS managed policy to the IAM role associated with your EC2 instance. This ensures that the role has comprehensive permissions required for the setup.

  1. Once connected to your EC2 instance, ensure that the instance has the necessary AWS permissions. You can configure this in several ways:

    • IAM Roles: Attach an appropriate IAM role to the EC2 instance with the necessary permissions. This method is recommended as it provides temporary security credentials and simplifies access key management. For more information, visit IAM Roles for Amazon EC2.
    • AWS CLI Configuration: Manually configure AWS credentials using the aws configure command:
      aws configure
      
    • Environment Variables: Set AWS credentials and region using environment variables:
      export AWS_ACCESS_KEY_ID=your_access_key_id
      export AWS_SECRET_ACCESS_KEY=your_secret_access_key
      export AWS_DEFAULT_REGION=your_region
      
    • Configuration Files: Use the AWS credentials and config files located in ~/.aws/credentials and ~/.aws/config. For more information, visit Configuration and Credential Files.
  2. Use the following command to verify the IAM role and permissions:

    aws sts get-caller-identity
    
    This command confirms that the instance has the correct IAM role and permissions by returning the AWS account and IAM role details. If the command executes successfully, your setup is correct.

Generate Dataset Schemas

Step 1: Review Sample Data

Before generating the dataset schemas, it is crucial to understand the structure of your data. Below is an overview of the sample data used for this process.

Sample Users Data (example/dataset/users.csv)

This file contains information about users, including their ID, gender, age, location, and membership_ID.

USER_ID,GENDER,AGE,LOCATION,MEMBERSHIP_ID
1,M,81,18,92999
2,F,73,8,34816
3,F,32,10,56994

Sample Items Data (example/dataset/items.csv)

This file lists the items (e.g., books) with their ID, title, author, and categories.

ITEM_ID,TITLE,AUTHOR,CATEGORIES
1,Non-Fiction_book_9506,Author_54,Romance|Mystery|Non-Fiction
2,Self-Help_book_7631,Author_62,Self-Help
3,Horror_book_3376,Author_22,Horror|History

Sample Interactions Data (example/dataset/events.csv)

This file records interactions between users and items, such as ratings, with event type, value, and timestamp.

USER_ID,ITEM_ID,EVENT_TYPE,EVENT_VALUE,TIMESTAMP
4,547,rating,4,1716315807
20,443,rating,3,1706705124
3,233,rating,3,1716563701

To understand more about the format required by AWS Personalize, you can refer to the following links:

Step 2: Generate AWS Personalize Schemas

To generate the schemas for AWS Personalize based on your dataset, run the following command in your terminal:

recommend-hq schema-generation \
  --user=example/dataset/users.csv \
  --item=example/dataset/items.csv \
  --interaction=example/dataset/events.csv \
  --textual-fields=TITLE

This command processes the provided CSV or JSON files and generates the appropriate schema files for AWS Personalize.

Default IMPRESSION Field

The IMPRESSION field is now included by default when generating the interaction schema using this tool. This field can be used in log events.

Additionally, the IMPRESSION field is supported during the retraining process. When the train mode is set to update_dataset_group, this field will be utilized in the retraining.

Sample Command Output

After executing the command, you should see output similar to the following:

Load Input Users File: example/dataset/users.csv
Generating Users Schema ...

View the Processed Result

┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃    Field Name   ┃ Data Type ┃ Categorical ┃ Textual ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│    IS_ACTIVE    │  string   │      X      │    X    │
│     USER_ID     │  string   │      X      │    X    │
│     GENDER      │  string   │      O      │    X    │
│       AGE       │    int    │      X      │    X    │
│     LOCATION    │    int    │      X      │    X    │
│  MEMBERSHIP_ID  │  string   │      X      │    X    │
└─────────────────┴───────────┴─────────────┴─────────┘

Output User Schema: conf/schema/user_schema.json


Load Input Items File: example/dataset/items.csv
Generating Items Schema ...

View the Processed Result

┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃   Field Name   ┃  Data Type  ┃ Categorical ┃ Textual ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│    IS_ACTIVE   │   string    │      X      │    X    │
│  TEXTUAL_FIELD │ null,string │      X      │    O    │
│     ITEM_ID    │   string    │      X      │    X    │
│     AUTHOR     │   string    │      O      │    X    │
│   CATEGORIES   │   string    │      O      │    X    │
└────────────────┴─────────────┴─────────────┴─────────┘

Output Item Schema: conf/schema/item_schema.json


Load Input Interactions File: example/dataset/events.csv
Generating Interactions Schema ...

View the Processed Result

┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Field Name  ┃  Data Type  ┃ Categorical ┃ Textual ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│   USER_ID   │   string    │      X      │    X    │
│   ITEM_ID   │   string    │      X      │    X    │
│ EVENT_TYPE  │   string    │      X      │    X    │
│ EVENT_VALUE │ float,null  │      X      │    X    │
│  TIMESTAMP  │    long     │      X      │    X    │
│ IMPRESSION  │ string,null │      X      │    X    │
└─────────────┴─────────━───┴─────────────┴─────────┘

Output Interaction Schema: conf/schema/interaction_schema.json

Output ETL Configure: conf/etl/configure.json
This output indicates the status of schema generation for each file and shows a processed result table with field names, data types, and whether the fields are categorical or textual.

Understanding AWS Personalize Schema

The command generates schemas required by AWS Personalize. For more details on the custom datasets and schemas, you can refer to the official AWS documentation:

Step 3: Check Generated Schemas

After running the command, the generated schemas can be found in the conf/schema folder. Verify the following files:

  • user_schema.json
  • item_schema.json
  • interaction_schema.json

Step 4: Check ETL Configuration

The ETL (Extract, Transform, Load) configuration is also generated and can be found in the conf/etl folder. Verify the following configuration files:

  • configure.json

Completion and Verification

By following these steps, you will successfully generate the dataset schemas required for using Recommend HQ with AWS Personalize. Ensure that you verify the output files and configurations to avoid any issues in subsequent steps.

Configuration Files

This section covers the configuration files necessary for setting up your project. We will go through conf/project.yaml, conf/filter.yaml, conf/campaign.yaml, and conf/solution.yaml. Each file has specific settings that may need to be customized according to your requirements.

conf/project.yaml

In this file, we configure the necessary settings for AWS Personalize. This includes specifying the project name, AWS region, and paths to schema files. The configuration details ensure that AWS Personalize can correctly identify and process your datasets.

project_name: recommend-hq
aws:
  region_name: ap-northeast-1
personalize:
  item_schema_file_path: conf/schema/item_schema.json
  user_schema_file_path: conf/schema/user_schema.json
  interaction_schema_file_path: conf/schema/interaction_schema.json
  etl_configure_file_path: conf/etl/configure.json
  force_reprocess_lookback_days: 8
  log_lookback_days: 180

For more detailed information on configuring project.yaml, refer to the Project Configuration Page.

conf/solution.yaml

In this file, we configure various recommendation recipes, including user_personalization, similar_items, and personalized_ranking. Each recipe has specific target event types (target_event_type), event value thresholds (event_value_threshold), parameters (parameters), and excluded dataset columns (excluded_dataset_columns).

The target_event_type must match the event type defined in your event dataset. For more details on the event dataset format, refer to the Sample Interactions Data.

For more details on configuring these recipes, refer to the specific recipe documentation:

recipes:
  user_personalization:
    target_event_type: "rating"
    event_value_threshold: "0"
    parameters:
      - name: "hidden_dimension"
        value: 128
    excluded_dataset_columns:
      - name: "Item"
        value:
          - ""
          - ""
      - name: "User"
        value:
          - ""
      - name: "Interaction"
        value:
          - ""
  similar_items:
    target_event_type: "rating"
    event_value_threshold: "0"
    parameters:
      - name: "popularity_discount_factor"
        value: 0.5
    excluded_dataset_columns:
      - name: "Item"
        value:
          - ""
          - ""
      - name: "User"
        value:
          - ""
      - name: "Interaction"
        value:
          - ""
  personalized_ranking:
    target_event_type: "rating"
    event_value_threshold: "0"
    parameters:
      - name: "bptt"
        value: 8
    excluded_dataset_columns:
      - name: "Item"
        value:
          - ""
          - ""
      - name: "User"
        value:
          - ""
      - name: "Interaction"
        value:
          - ""

conf/campaign.yaml

In this file, we define the campaigns for each recipe, including user_personalization, similar_items, and personalized_ranking. Each campaign specifies the minimum provisioned transactions per second (min_provisioned_tps). This setting determines the baseline capacity for your campaign, ensuring that the service can handle a minimum number of requests per second.

For more details on this setting, refer to the AWS Create Campaign Documentation.

campaigns:
  user_personalization:
    min_provisioned_tps: 1
  similar_items:
    min_provisioned_tps: 1
  personalized_ranking:
    min_provisioned_tps: 1

conf/filter.yaml

In this file, we define filters to be applied to the recommendation system. Filters allow you to include or exclude items based on specific criteria. For instance, the filter_by_category filter includes items where the categories contain a specified value.

filters:
  - name: "filter_by_category"
    expression: "INCLUDE ItemID WHERE Items.CATEGORIES IN ($CATEGORY)"

This filter ensures that only items with categories containing the specified $CATEGORY are included in the recommendations. Filters are particularly useful for tailoring recommendations to meet specific business requirements or user preferences.

For more detailed information on configuring filters, refer to the AWS Personalize Filter Documentation.

Execute the Full Deployment

Once you have configured the necessary settings in the project.yaml, campaign.yaml, solution.yaml, and filter.yaml files, the next step is to execute the full deployment for your recommendation system using AWS Personalize. This deployment process includes:

  • Synchronizing configuration files
  • Initializing AWS personalize resources
  • Synchronizing filters
  • Deploying AWS infrastructure with AWS CDK
  • Deploying API services

Re-deployment Needed After Configuration Changes

If you make any changes to the configuration files (project.yaml, campaign.yaml, solution.yaml, or filter.yaml), you need to run the deploy command again to apply those changes. These configuration files contain parameters and variables that the recommend-hq service will use during its operation.

To deploy the setup, use this command:

recommend-hq deploy --yes

For more detailed information on the deployment process and additional options, refer to the Deploy Command Documentation.

Upload Dataset

To update your dataset, you need to use the following three APIs:

  • POST /update-items
  • POST /update-users
  • POST /log-event

Dataset Requirements

  • At minimum, 1000 item interactions records from users interacting with items in your catalog. These interactions can be from bulk imports, or streamed events, or both.
  • At minimum, 25 unique user IDs with at least two item interactions for each.

Production Integration

If you intend to use this system in a production environment, you need to integrate these APIs into your service to ensure real-time updates and data consistency.

For more detailed information, see the AWS Dataset Documentation and the Data API Documentation.

Since you need to know the API endpoint and key, you can use the following command to retrieve them:

recommend-hq status api

You will see the following output:

API
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ API URL                                                           ┃ API Key                                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ https://kwwwv2zwxk.execute-api.ap-northeast-1.amazonaws.com/prod/ │ hGWRohg1Qk9RgWq4RKXpY9kLlI0ClqQr7xm4Q2NS │
└───────────────────────────────────────────────────────────────────┴──────────────────────────────────────────┘

Here is a tool that uses the Data API to upload the example dataset. You can use it as follows:

API_HOST=<api_endpoint> X_API_KEY=<api_key> python -m cli.upload_dataset.upload_dataset upload_user example/dataset/users.csv

API_HOST=<api_endpoint> X_API_KEY=<api_key> python -m cli.upload_dataset.upload_dataset upload_items example/dataset/items.csv

API_HOST=<api_endpoint> X_API_KEY=<api_key> python -m cli.upload_dataset.upload_dataset upload_event example/dataset/events.csv

This confirms that the datasets have been successfully uploaded and are now ready to be used by the recommendation system.

Train the Recommendation Model

Once the dataset has been uploaded, the next step is to train the recommendation model. For the initial training, you need to execute the recommend-hq train command with the specified recipes.

To perform the initial training, use the following command:

recommend-hq train \
  --recipes=user_personalization,similar_items,personalized_ranking \
  --training-mode=init

You will see an output similar to the one in the video:

View State machine execution: arn:aws:states:ap-northeast-1:191395820281:execution:RecommendHqPersonalizeFullTrainingJobStateMachine906A8CC2-7Vw6GQ95aQed:3228ef22-7b5a-4dd9-8beb-787cd0b56fca

This ARN link is clickable and will open a webpage where you can view the state machine execution details. Here, you can monitor the progress and status of the training job, as shown in the image below.

train state machine

For more detailed information on the training process and additional options, refer to the Train Command Documentation.

Test Recommend API

Recommend Items to User

This API recommends items to a user based on their preferences.

Request Example:

curl -XPOST '{API_URL}/recommend-items-to-user' \
--header 'x-api-key: {API_KEY}' \
--header 'Content-Type: application/json' \
--data '{
    "user_id": "1",
    "limit": 10,
    "filter_name": "",
    "filter_value_dict": {}
}'

Recommend Items to Item

This API recommends items related to a specific item.

Request Example:

curl -XPOST '{API_URL}/recommend-items-to-item' \
--header 'x-api-key: {API_KEY}' \
--header 'Content-Type: application/json' \
--data '{
    "item_id": "1",
    "limit": 10,
    "filter_name": "",
    "filter_value_dict": {}
}'

Rank Items for User

This API ranks a list of items for a user based on their preferences.

Request Example:

curl -XPOST '{API_URL}/rank-items-to-user' \
--header 'x-api-key: {API_KEY}' \
--header 'Content-Type: application/json' \
--data '{
    "user_id": "1",
    "item_ids": ["1", "2", "4", "6", "7"]
}'

For detailed API documentation, please refer to Recommend API Documentation.

Synchronize Filters

Step 1: Synchronize Config

The recommend-hq deploy command will initially synchronize the filter settings defined in filter.yaml. This process ensures that all filter configurations are uploaded to the system and the filters in Amazon Personalize are consistent with your configuration file.

recommend-hq deploy

Note

Filters in Amazon Personalize cannot be modified once they are created. If you need to update a filter with the same name, you must first delete the existing filter and then run the sync command again to recreate it.

Step 3: Check Filter Status

recommand-hq status filter
Filter
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Name                                         ┃ ARN                                                                                                ┃ Status ┃ Filter Expression                                      ┃ Last Updated        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ recommend-hq-filter_by_category-1718651139   │ arn:aws:personalize:ap-northeast-1:191395820281:filter/recommend-hq-filter_by_category-1718651139  │ ACTIVE │ INCLUDE ItemID WHERE Items.CATEGORIES IN ($CATEGORY)   │ 2024-06-17 19:05:53 │
└──────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────┴────────┴────────────────────────────────────────────────────────┴─────────────────────┘

Step 4: Test Filter

Request Example:

curl -XPOST '{API_URL}/recommend-items-to-user' \
--header 'x-api-key: {API_KEY}' \
--header 'Content-Type: application/json' \
--data '{
    "user_id": "1",
    "limit": 5,
    "filter_name": "filter_by_category",
    "filter_value_dict": {"CATEGORY": "Horror"}
}'

Response:

{
  "request": {
    "item_id": "1",
    "limit": 5,
    "filter_name": "filter_by_category",
    "filter_value_dict": {
      "CATEGORY": "Horror"
    }
  },
  "response": {
    "items": [
      {
        "ITEM_ID": "811",
        "TITLE": "Horror_book_2526",
        "AUTHOR": "Author_3",
        "CATEGORIES": "Health|Horror"
      },
      {
        "ITEM_ID": "687",
        "TITLE": "Horror_book_1810",
        "AUTHOR": "Author_90",
        "CATEGORIES": "Horror"
      },
      {
        "ITEM_ID": "335",
        "TITLE": "Technology_book_1771",
        "AUTHOR": "Author_72",
        "CATEGORIES": "Young Adult|Health|Technology|Horror|Travel"
      },
      {
        "ITEM_ID": "77",
        "TITLE": "Horror_book_6371",
        "AUTHOR": "Author_43",
        "CATEGORIES": "Biography|Children|Romance|Horror|Mystery"
      },
      {
        "ITEM_ID": "216",
        "TITLE": "Religion_book_7347",
        "AUTHOR": "Author_96",
        "CATEGORIES": "Fantasy|Horror|Religion"
      }
    ]
  }
}

For more details on filter examples, refer to the Filter expression examples.

Show Status and Resources

To check the current status of the recommend-hq tool, use the following command:

recommend-hq status
To list all the resources managed by the recommend-hq tool, use the following command:

recommend-hq resources

Uninstall Resources

If you need to uninstall all resources managed by the recommend-hq tool, use the following command:

recommend-hq uninstall -y