Tutorial¶
Launch Our Product on AWS Marketplace¶
Step 1: Go to AWS Marketplace¶
- Visit our product's page on AWS Marketplace.
Step 2: Select the Product¶
- On the product details page, click the "Continue to Subscribe" button.
Step 3: Choose EC2 Launch¶
- On the subscription confirmation page, click the "Continue to Configuration" button.
- Choose your configuration options, such as software version and region, then click the "Continue to Launch" button.
- On the launch page, select "Launch through EC2".
- Configure the EC2 instance as per your needs, then click the "Launch" button.
Important Considerations During Launch
- Name: Assign a name to your instance, such as recommend-hq.
- Instance Type: Select an instance type with at least 4 GB of memory, such as t4g.medium.
- Key Pair: Create a new key pair or select an existing one. You will need the private key file (.pem) to connect to your instance via SSH. For more details, refer to the AWS Key Pair documentation.
- VPC Configuration: Configure the VPC settings to ensure network connectivity for your instances. If you are unsure, you can use the default VPC provided by AWS. For more details, refer to the AWS VPC documentation.
Step 4: Connect to Your EC2 Instance¶
- Open an SSH client.
- Locate your private key file. The key used to launch this instance is
YOUR-KEY-PAIR.pem
- Run this command, if necessary, to ensure your key is not publicly viewable.
- Connect to your instance using its Public DNS:
For example, if your key pair file is named
recommend-hq.pem
and your instance's public DNS isec2-43-207-37-42.ap-northeast-1.compute.amazonaws.com
, the command would look like this:
Step 5: Configure AWS Permissions¶
Setup Permissions
It is recommended to attach the AdministratorAccess
AWS managed policy to the IAM role associated with your EC2 instance. This ensures that the role has comprehensive permissions required for the setup.
-
Once connected to your EC2 instance, ensure that the instance has the necessary AWS permissions. You can configure this in several ways:
- IAM Roles: Attach an appropriate IAM role to the EC2 instance with the necessary permissions. This method is recommended as it provides temporary security credentials and simplifies access key management. For more information, visit IAM Roles for Amazon EC2.
- AWS CLI Configuration: Manually configure AWS credentials using the
aws configure
command: - Environment Variables: Set AWS credentials and region using environment variables:
- Configuration Files: Use the AWS credentials and config files located in
~/.aws/credentials
and~/.aws/config
. For more information, visit Configuration and Credential Files.
-
Use the following command to verify the IAM role and permissions:
This command confirms that the instance has the correct IAM role and permissions by returning the AWS account and IAM role details. If the command executes successfully, your setup is correct.
Generate Dataset Schemas¶
Step 1: Review Sample Data¶
Before generating the dataset schemas, it is crucial to understand the structure of your data. Below is an overview of the sample data used for this process.
Sample Users Data (example/dataset/users.csv)¶
This file contains information about users, including their ID, gender, age, location, and membership_ID.
Sample Items Data (example/dataset/items.csv)¶
This file lists the items (e.g., books) with their ID, title, author, and categories.
ITEM_ID,TITLE,AUTHOR,CATEGORIES
1,Non-Fiction_book_9506,Author_54,Romance|Mystery|Non-Fiction
2,Self-Help_book_7631,Author_62,Self-Help
3,Horror_book_3376,Author_22,Horror|History
Sample Interactions Data (example/dataset/events.csv)¶
This file records interactions between users and items, such as ratings, with event type, value, and timestamp.
USER_ID,ITEM_ID,EVENT_TYPE,EVENT_VALUE,TIMESTAMP
4,547,rating,4,1716315807
20,443,rating,3,1706705124
3,233,rating,3,1716563701
To understand more about the format required by AWS Personalize, you can refer to the following links:
Step 2: Generate AWS Personalize Schemas¶
To generate the schemas for AWS Personalize based on your dataset, run the following command in your terminal:
recommend-hq schema-generation \
--user=example/dataset/users.csv \
--item=example/dataset/items.csv \
--interaction=example/dataset/events.csv \
--textual-fields=TITLE
This command processes the provided CSV or JSON files and generates the appropriate schema files for AWS Personalize.
Default IMPRESSION Field
The IMPRESSION
field is now included by default when generating the interaction schema
using this tool. This field can be used in log events.
Additionally, the IMPRESSION
field is supported during the retraining process. When the train mode is set to update_dataset_group
, this field will be utilized in the retraining.
Sample Command Output¶
After executing the command, you should see output similar to the following:
Load Input Users File: example/dataset/users.csv
Generating Users Schema ...
View the Processed Result
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Field Name ┃ Data Type ┃ Categorical ┃ Textual ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ IS_ACTIVE │ string │ X │ X │
│ USER_ID │ string │ X │ X │
│ GENDER │ string │ O │ X │
│ AGE │ int │ X │ X │
│ LOCATION │ int │ X │ X │
│ MEMBERSHIP_ID │ string │ X │ X │
└─────────────────┴───────────┴─────────────┴─────────┘
Output User Schema: conf/schema/user_schema.json
Load Input Items File: example/dataset/items.csv
Generating Items Schema ...
View the Processed Result
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Field Name ┃ Data Type ┃ Categorical ┃ Textual ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ IS_ACTIVE │ string │ X │ X │
│ TEXTUAL_FIELD │ null,string │ X │ O │
│ ITEM_ID │ string │ X │ X │
│ AUTHOR │ string │ O │ X │
│ CATEGORIES │ string │ O │ X │
└────────────────┴─────────────┴─────────────┴─────────┘
Output Item Schema: conf/schema/item_schema.json
Load Input Interactions File: example/dataset/events.csv
Generating Interactions Schema ...
View the Processed Result
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Field Name ┃ Data Type ┃ Categorical ┃ Textual ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ USER_ID │ string │ X │ X │
│ ITEM_ID │ string │ X │ X │
│ EVENT_TYPE │ string │ X │ X │
│ EVENT_VALUE │ float,null │ X │ X │
│ TIMESTAMP │ long │ X │ X │
│ IMPRESSION │ string,null │ X │ X │
└─────────────┴─────────━───┴─────────────┴─────────┘
Output Interaction Schema: conf/schema/interaction_schema.json
Output ETL Configure: conf/etl/configure.json
Understanding AWS Personalize Schema¶
The command generates schemas required by AWS Personalize. For more details on the custom datasets and schemas, you can refer to the official AWS documentation:
Step 3: Check Generated Schemas¶
After running the command, the generated schemas can be found in the conf/schema
folder. Verify the following files:
user_schema.json
item_schema.json
interaction_schema.json
Step 4: Check ETL Configuration¶
The ETL (Extract, Transform, Load) configuration is also generated and can be found in the conf/etl
folder. Verify the following configuration files:
configure.json
Completion and Verification
By following these steps, you will successfully generate the dataset schemas required for using Recommend HQ with AWS Personalize. Ensure that you verify the output files and configurations to avoid any issues in subsequent steps.
Configuration Files¶
This section covers the configuration files necessary for setting up your project. We will go through conf/project.yaml
, conf/filter.yaml
, conf/campaign.yaml
, and conf/solution.yaml
. Each file has specific settings that may need to be customized according to your requirements.
conf/project.yaml¶
In this file, we configure the necessary settings for AWS Personalize. This includes specifying the project name, AWS region, and paths to schema files. The configuration details ensure that AWS Personalize can correctly identify and process your datasets.
project_name: recommend-hq
aws:
region_name: ap-northeast-1
personalize:
item_schema_file_path: conf/schema/item_schema.json
user_schema_file_path: conf/schema/user_schema.json
interaction_schema_file_path: conf/schema/interaction_schema.json
etl_configure_file_path: conf/etl/configure.json
force_reprocess_lookback_days: 8
log_lookback_days: 180
For more detailed information on configuring project.yaml, refer to the Project Configuration Page.
conf/solution.yaml¶
In this file, we configure various recommendation recipes, including user_personalization, similar_items, and personalized_ranking. Each recipe has specific target event types (target_event_type
), event value thresholds (event_value_threshold
), parameters (parameters
), and excluded dataset columns (excluded_dataset_columns
).
The target_event_type must match the event type defined in your event dataset. For more details on the event dataset format, refer to the Sample Interactions Data.
For more details on configuring these recipes, refer to the specific recipe documentation:
recipes:
user_personalization:
target_event_type: "rating"
event_value_threshold: "0"
parameters:
- name: "hidden_dimension"
value: 128
excluded_dataset_columns:
- name: "Item"
value:
- ""
- ""
- name: "User"
value:
- ""
- name: "Interaction"
value:
- ""
similar_items:
target_event_type: "rating"
event_value_threshold: "0"
parameters:
- name: "popularity_discount_factor"
value: 0.5
excluded_dataset_columns:
- name: "Item"
value:
- ""
- ""
- name: "User"
value:
- ""
- name: "Interaction"
value:
- ""
personalized_ranking:
target_event_type: "rating"
event_value_threshold: "0"
parameters:
- name: "bptt"
value: 8
excluded_dataset_columns:
- name: "Item"
value:
- ""
- ""
- name: "User"
value:
- ""
- name: "Interaction"
value:
- ""
conf/campaign.yaml¶
In this file, we define the campaigns for each recipe, including user_personalization, similar_items, and personalized_ranking. Each campaign specifies the minimum provisioned transactions per second (min_provisioned_tps
). This setting determines the baseline capacity for your campaign, ensuring that the service can handle a minimum number of requests per second.
For more details on this setting, refer to the AWS Create Campaign Documentation.
campaigns:
user_personalization:
min_provisioned_tps: 1
similar_items:
min_provisioned_tps: 1
personalized_ranking:
min_provisioned_tps: 1
conf/filter.yaml¶
In this file, we define filters to be applied to the recommendation system. Filters allow you to include or exclude items based on specific criteria. For instance, the filter_by_category filter includes items where the categories contain a specified value.
filters:
- name: "filter_by_category"
expression: "INCLUDE ItemID WHERE Items.CATEGORIES IN ($CATEGORY)"
This filter ensures that only items with categories containing the specified $CATEGORY are included in the recommendations. Filters are particularly useful for tailoring recommendations to meet specific business requirements or user preferences.
For more detailed information on configuring filters, refer to the AWS Personalize Filter Documentation.
Execute the Full Deployment¶
Once you have configured the necessary settings in the project.yaml, campaign.yaml, solution.yaml, and filter.yaml files, the next step is to execute the full deployment for your recommendation system using AWS Personalize. This deployment process includes:
- Synchronizing configuration files
- Initializing AWS personalize resources
- Synchronizing filters
- Deploying AWS infrastructure with AWS CDK
- Deploying API services
Re-deployment Needed After Configuration Changes
If you make any changes to the configuration files (project.yaml
, campaign.yaml
, solution.yaml
, or filter.yaml
), you need to run the deploy command again to apply those changes. These configuration files contain parameters and variables that the recommend-hq service will use during its operation.
To deploy the setup, use this command:
For more detailed information on the deployment process and additional options, refer to the Deploy Command Documentation.
Upload Dataset¶
To update your dataset, you need to use the following three APIs:
POST /update-items
POST /update-users
POST /log-event
Dataset Requirements
- At minimum, 1000 item interactions records from users interacting with items in your catalog. These interactions can be from bulk imports, or streamed events, or both.
- At minimum, 25 unique user IDs with at least two item interactions for each.
Production Integration
If you intend to use this system in a production environment, you need to integrate these APIs into your service to ensure real-time updates and data consistency.
For more detailed information, see the AWS Dataset Documentation and the Data API Documentation.
Since you need to know the API endpoint and key, you can use the following command to retrieve them:
You will see the following output:
API
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ API URL ┃ API Key ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ https://kwwwv2zwxk.execute-api.ap-northeast-1.amazonaws.com/prod/ │ hGWRohg1Qk9RgWq4RKXpY9kLlI0ClqQr7xm4Q2NS │
└───────────────────────────────────────────────────────────────────┴──────────────────────────────────────────┘
Here is a tool that uses the Data API to upload the example dataset. You can use it as follows:
API_HOST=<api_endpoint> X_API_KEY=<api_key> python -m cli.upload_dataset.upload_dataset upload_user example/dataset/users.csv
API_HOST=<api_endpoint> X_API_KEY=<api_key> python -m cli.upload_dataset.upload_dataset upload_items example/dataset/items.csv
API_HOST=<api_endpoint> X_API_KEY=<api_key> python -m cli.upload_dataset.upload_dataset upload_event example/dataset/events.csv
This confirms that the datasets have been successfully uploaded and are now ready to be used by the recommendation system.
Train the Recommendation Model¶
Once the dataset has been uploaded, the next step is to train the recommendation model. For the initial training, you need to execute the recommend-hq train command with the specified recipes.
To perform the initial training, use the following command:
recommend-hq train \
--recipes=user_personalization,similar_items,personalized_ranking \
--training-mode=init
You will see an output similar to the one in the video:
View State machine execution: arn:aws:states:ap-northeast-1:191395820281:execution:RecommendHqPersonalizeFullTrainingJobStateMachine906A8CC2-7Vw6GQ95aQed:3228ef22-7b5a-4dd9-8beb-787cd0b56fca
This ARN link is clickable and will open a webpage where you can view the state machine execution details. Here, you can monitor the progress and status of the training job, as shown in the image below.
For more detailed information on the training process and additional options, refer to the Train Command Documentation.
Test Recommend API¶
Recommend Items to User¶
This API recommends items to a user based on their preferences.
Request Example:
curl -XPOST '{API_URL}/recommend-items-to-user' \
--header 'x-api-key: {API_KEY}' \
--header 'Content-Type: application/json' \
--data '{
"user_id": "1",
"limit": 10,
"filter_name": "",
"filter_value_dict": {}
}'
Recommend Items to Item¶
This API recommends items related to a specific item.
Request Example:
curl -XPOST '{API_URL}/recommend-items-to-item' \
--header 'x-api-key: {API_KEY}' \
--header 'Content-Type: application/json' \
--data '{
"item_id": "1",
"limit": 10,
"filter_name": "",
"filter_value_dict": {}
}'
Rank Items for User¶
This API ranks a list of items for a user based on their preferences.
Request Example:
curl -XPOST '{API_URL}/rank-items-to-user' \
--header 'x-api-key: {API_KEY}' \
--header 'Content-Type: application/json' \
--data '{
"user_id": "1",
"item_ids": ["1", "2", "4", "6", "7"]
}'
Reference Links¶
For detailed API documentation, please refer to Recommend API Documentation.
Synchronize Filters¶
Step 1: Synchronize Config¶
The recommend-hq deploy command will initially synchronize the filter settings defined in filter.yaml. This process ensures that all filter configurations are uploaded to the system and the filters in Amazon Personalize are consistent with your configuration file.
Note
Filters in Amazon Personalize cannot be modified once they are created. If you need to update a filter with the same name, you must first delete the existing filter and then run the sync command again to recreate it.
Step 3: Check Filter Status¶
Filter
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Name ┃ ARN ┃ Status ┃ Filter Expression ┃ Last Updated ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ recommend-hq-filter_by_category-1718651139 │ arn:aws:personalize:ap-northeast-1:191395820281:filter/recommend-hq-filter_by_category-1718651139 │ ACTIVE │ INCLUDE ItemID WHERE Items.CATEGORIES IN ($CATEGORY) │ 2024-06-17 19:05:53 │
└──────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────┴────────┴────────────────────────────────────────────────────────┴─────────────────────┘
Step 4: Test Filter¶
Request Example:
curl -XPOST '{API_URL}/recommend-items-to-user' \
--header 'x-api-key: {API_KEY}' \
--header 'Content-Type: application/json' \
--data '{
"user_id": "1",
"limit": 5,
"filter_name": "filter_by_category",
"filter_value_dict": {"CATEGORY": "Horror"}
}'
Response:
{
"request": {
"item_id": "1",
"limit": 5,
"filter_name": "filter_by_category",
"filter_value_dict": {
"CATEGORY": "Horror"
}
},
"response": {
"items": [
{
"ITEM_ID": "811",
"TITLE": "Horror_book_2526",
"AUTHOR": "Author_3",
"CATEGORIES": "Health|Horror"
},
{
"ITEM_ID": "687",
"TITLE": "Horror_book_1810",
"AUTHOR": "Author_90",
"CATEGORIES": "Horror"
},
{
"ITEM_ID": "335",
"TITLE": "Technology_book_1771",
"AUTHOR": "Author_72",
"CATEGORIES": "Young Adult|Health|Technology|Horror|Travel"
},
{
"ITEM_ID": "77",
"TITLE": "Horror_book_6371",
"AUTHOR": "Author_43",
"CATEGORIES": "Biography|Children|Romance|Horror|Mystery"
},
{
"ITEM_ID": "216",
"TITLE": "Religion_book_7347",
"AUTHOR": "Author_96",
"CATEGORIES": "Fantasy|Horror|Religion"
}
]
}
}
For more details on filter examples, refer to the Filter expression examples.
Show Status and Resources¶
To check the current status of the recommend-hq tool, use the following command:
To list all the resources managed by the recommend-hq tool, use the following command:Uninstall Resources¶
If you need to uninstall all resources managed by the recommend-hq tool, use the following command: