Monday, April 3, 2023

Data refresh in Quick Sight

 Amazon QuickSight does not have a direct mechanism to be notified when underlying data sources (such as an S3 file, RDS database, or Redshift table) are refreshed. However, the recommended approach is to use an AWS Lambda function or an AWS Step Functions workflow in combination with the QuickSight API to trigger dataset refreshes programmatically.

Suggested Approach:

  1. Use QuickSight API to Trigger a Dataset Refresh: After your data is updated (e.g., when an ETL job completes or an S3 file is updated), invoke the QuickSight API to refresh the dataset. This can be done via the AWS SDK or using an AWS Lambda function.

  2. Event-Driven Automation with CloudWatch or Lambda: You can automate the refresh process using AWS CloudWatch Events, Amazon S3 Events, or database triggers to detect data changes and initiate a refresh using the QuickSight API.

  3. QuickSight Refresh Programmatically: To programmatically notify QuickSight, you can use the UpdateDataSet API or CreateIngestion API to refresh the datasets.

Implementation Options:

1. CreateIngestion API to Refresh Dataset:

This approach will trigger a new ingestion (data load) in QuickSight. You can automate this API call using AWS Lambda, Step Functions, or another AWS service.


More details can be found here 

Python Example:

python
import boto3 import datetime # Replace with your actual AWS Region and Dataset ID AWS_REGION = 'us-east-1' DATASET_ID = 'your-dataset-id' AWS_ACCOUNT_ID = 'your-account-id' QUICKSIGHT_ROLE_NAME = 'quicksight' # Initialize QuickSight client quicksight_client = boto3.client('quicksight', region_name=AWS_REGION) def refresh_quicksight_dataset(): response = quicksight_client.create_ingestion( DataSetId=DATASET_ID, IngestionId='Ingestion_' + datetime.datetime.now().strftime("%Y%m%d%H%M%S"), AwsAccountId=AWS_ACCOUNT_ID ) print(f"Ingestion Response: {response}") return response # Call the function to trigger ingestion refresh_quicksight_dataset()

This function triggers a dataset refresh for the specified dataset ID. You can automate the execution of this script using Lambda functions or scheduled events.

2. Automation using AWS Lambda:

If the data refresh is triggered by another event (e.g., S3 file update, Glue job completion, or RDS data change), you can use a Lambda function with an S3 event or a CloudWatch Event Rule to call the create_ingestion API.

Example Lambda Function Configuration:

  • Trigger: Set up S3 Event Notification (PUT) or a CloudWatch Event Rule (for Glue job completion).
  • Action: Call the create_ingestion API to refresh the QuickSight dataset.

3. Event-Driven Approach using Step Functions:

For complex workflows (e.g., multiple datasets need to be refreshed sequentially), you can use AWS Step Functions to define a state machine that:

  1. Triggers data ingestion using the QuickSight API.
  2. Checks for the status of the ingestion.
  3. Updates dashboards once ingestion is complete.

Monitoring and Notifications:

  • Use CloudWatch Alarms to monitor the status of the ingestions and send notifications if ingestion fails.
  • Automate retry mechanisms using Lambda or Step Functions.

Key Considerations:

  • Dataset IDs: Ensure that you have the dataset ID and account ID.
  • Permissions: The AWS Identity needs appropriate permissions (quicksight:CreateIngestion, quicksight:UpdateDataSetPermissions, etc.) to manage dataset ingestion.
  • Ingestion Limits: Each dataset can have only one active ingestion at a time, so coordinate to avoid conflicts.

By combining these approaches, you can effectively notify and trigger QuickSight to reflect changes in your underlying data, ensuring that your visualizations remain up-to-date.

Let me know if you'd like to see this setup in more detail or need guidance for a specific use case!

Use SSH Keys to clone GIT Repository using SSH

  1. Generate a New SSH Key Pair bash ssh-keygen -t rsa -b 4096 -C "HSingh@MindTelligent.com" -t rsa specifies the type of key (...