Harvinder Saluja's Springboot, OCI AWS, API, EIA, EISA, Data Science/Engineering and Dev/Ops BLOG: October 2017

Wednesday, October 18, 2017

Oracle Event Hub Cloud Service: Oracle Kafka Solution

Oracle Event Hub Cloud Service delivers the power of Kafka as a managed streaming data platform integrated into the Oracle Cloud ecosystem. Create Topics and start streaming or manage and deploy your own Dedicated Kafka Cluster with Elastic Scalability.

Perform the following steps to create an Oracle Event Hub Cloud Service - Topic instance. You can skip this section, if you already have an Oracle Event Hub Cloud Service - Topic instance and plan to use that for this demo.

Log in to your Oracle Event Hub Cloud Service - Topic account.
In the Services page, click Create Service.
The Create Service screen appears. Provide the following details and click Next.
- Service Name: topicdemo
- Service Description: Example to demo topic
- Hosted On: platformdemo
- Number of Partitions: 2
- Retention Period (Hours): 24
Note: The platformdemo is the name of the Oracle Event Hub Cloud Service - Platform cluster in which the topic will be created. You can provide a different name if you want to host this in a different Oracle Event Hub Cloud - Platfrom cluster.
In the Confirm page, if you find the details appropriate, click Create.
The control returns to the Services page. In the Services page, you could now see the new topicdemo service listed.
Click on the Event Hub icon adjacent to the topicdemo instance to go to the Service Overview page.
In the Service Overview page, observe the Topic field. This is the name of the Topic service that will be used in programs demonstrated in this tutorial.

References: Oracle Technetwork

Monday, October 16, 2017

Apache Kafka and Apache Spark: A "Data Science Match" made in heaven.

Kafka is a publish-subscribe messaging system that provides a reliable Spark Streaming
source. The Kafka project introduced a new consumer API between versions 0.8 and 0.10, so there are 2 separate corresponding Spark Streaming packages available. The API provides one-to-one mapping between Kafka's partition and the DStream generated RDDs partition along with access to metadata and offset.

The following diagram shows end-to-end integration with Kafka, consuming messages from it, doing simple to complex windowing ETL, and pushing the desired output to various sinks such as memory, console, file, databases, and back to Kafka itself.

An overview of what our end-to-end integration will look like.

An overview of what our end-to-end integration will look like.

Following set of properties will need to be added to Spark Streaming API to integrate Kafka with Spark as a Source

bootstrap.servers: This describes the host and port of Kafka server(s) separated by a comma.

key.deserializer: This is the name of the class to deserialize the key of the messages from Kafka.

value.deserializer: This refers to the class that deserializes the value of the message.

group.id: This uniquely identifies the group of consumer.

auto.offset.reset: This is used messages are consumed from a topic in Kafka, but does not have initial offset in Kafka or if the current offset does not exist anymore on the server then one of the following options helps.

Harvinder Saluja's Springboot, OCI AWS, API, EIA, EISA, Data Science/Engineering and Dev/Ops BLOG

Wednesday, October 18, 2017

Oracle Event Hub Cloud Service: Oracle Kafka Solution

Oracle Event Hub Cloud Service delivers the power of Kafka as a managed streaming data platform integrated into the Oracle Cloud ecosystem. Create Topics and start streaming or manage and deploy your own Dedicated Kafka Cluster with Elastic Scalability.

Monday, October 16, 2017

Apache Kafka and Apache Spark: A "Data Science Match" made in heaven.

Amazon Sagemaker Studio

Total Pageviews