Wednesday, December 13, 2023

TypeScript-first schema declaration using ZOD

Zod is a TypeScript-first schema declaration and validation library used to define the shape of data in TypeScript. It allows you to create schemas for your data structures, validate incoming data against those schemas, and ensure type safety within your TypeScript applications.


Here's a simple example demonstrating how Zod can be used:


Typescript code

import * as z from 'zod';


// Define a schema for a user object

const userSchema = z.object({

  id: z.string(),

  username: z.string(),

  email: z.string().email(),

  age: z.number().int().positive(),

  isAdmin: z.boolean(),

});


// Data to be validated against the schema

const userData = {

  id: '123',

  username: 'johndoe',

  email: 'john@example.com',

  age: 30,

  isAdmin: true,

};


// Validate the data against the schema

try {

  const validatedUser = userSchema.parse(userData);

  console.log('Validated user:', validatedUser);

} catch (error) {

  console.error('Validation error:', error);

}

```


In the above example:


1. We import `z` from 'zod', which provides access to Zod's functionality.

2. We define a schema for a user object using `z.object()`. Each property in the object has a specific type and validation constraint defined by Zod methods like `z.string()`, `z.number()`, `z.boolean()`, etc.

3. `userData` represents an object we want to validate against the schema.

4. We use `userSchema.parse()` to validate `userData` against the defined schema. If the data matches the schema, it returns the validated user object; otherwise, it throws a validation error.


Zod helps ensure that the incoming data adheres to the defined schema, providing type safety and validation within TypeScript applications. This prevents runtime errors caused by unexpected data shapes or types.

Monday, December 11, 2023

AWS Glue Job to read data from Amazon Kinesis

 Here's an example of how to use AWS Glue to read from an Amazon Kinesis stream using PySpark. AWS Glue can be used to create ETL (Extract, Transform, Load) jobs to process data from Kinesis streams.

First, make sure you have the necessary AWS Glue libraries and dependencies. You will also need permission from the AWS Glue service to access your Kinesis stream.

Here is a basic example of how to set up a Glue job to read from a Kinesis stream:


import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import DynamicFrame import json # Initialize the Glue context and Spark session args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) # Define the Kinesis stream parameters stream_name = "your_kinesis_stream_name" region_name = "your_region_name" # Create a DynamicFrame from the Kinesis stream data_frame = glueContext.create_data_frame.from_catalog( database="your_database", table_name="your_table" ) # Convert DynamicFrame to DataFrame df = data_frame.toDF() # Perform transformations on the DataFrame # For example, if your Kinesis data is in JSON format, you might need to parse it parsed_df = df.rdd.map(lambda x: json.loads(x["data"])).toDF() # Show the parsed data parsed_df.show() # Write the data to an S3 bucket or another destination output_path = "s3://your_output_bucket/output_path/" parsed_df.write.format("json").save(output_path) # Commit the job job.commit()

Explanation:

  1. Initialize Glue context and Spark session: This sets up the necessary context for running Glue jobs.
  2. Define Kinesis stream parameters: Specify your Kinesis stream name and region.
  3. Create a DynamicFrame: Use Glue's create_data_frame method to read from the Kinesis stream.
  4. Transformations: Parse the JSON data or perform other transformations as required.
  5. Write the data: Save the transformed data to an S3 bucket or another desired destination.
  6. Commit the job: This finalizes the Glue job.

Prerequisites:

  • Ensure you have the AWS Glue, AWS Kinesis, and PySpark libraries installed.
  • You need appropriate permissions for AWS Glue to access the Kinesis stream and S3 buckets.
  • Replace placeholders like your_kinesis_stream_name, your_region_name, your_database, your_table, and s3://your_output_bucket/output_path/ with actual values specific to your setup.

Make sure to test this script in your AWS Glue environment, as the configuration might vary based on your specific use case and AWS environment settings.

Amazon Bedrock and AWS Rekognition comparison for Image Recognition

 Both Amazon Bedrock and AWS Rekognition are services provided by AWS, but they cater to different use cases, especially when it comes to ...