This BLOG focuses on "hands on approach" around AWS, OCI Oracle Cloud Infrastructure, Dev/Ops, MicroServices, OKTA, Oracle Fusion Middleware, Oracle Service Bus, Oracle AIA, Oracle SOA Suite, Oracle SOA Cloud/Developer Cloud, Oracle Identity Management including OID, OAM, OIM, OSSO, Oracle Big Data, WLST Scripts and Oracle Edifecs B2B Engine for HIPAA/HL7/X12/EDIFACT EDI., Kafka, Spark, Spring Boot, DevOps, AWS, GCP and Oracle Cloud
Oracle Event Hub Cloud Service delivers the power of Kafka as a managed streaming data platform integrated into the Oracle Cloud ecosystem. Create Topics and start streaming or manage and deploy your own Dedicated Kafka Cluster with Elastic Scalability.
Perform the following steps to create an Oracle Event Hub Cloud Service - Topic instance. You can skip this section, if you already have an Oracle Event Hub Cloud Service - Topic instance and plan to use that for this demo.
Log in to your Oracle Event Hub Cloud Service - Topic account.
In the Services page, click Create Service.
The Create Service screen appears. Provide the following details and click Next.
Service Name: topicdemo
Service Description: Example to demo topic
Hosted On: platformdemo
Number of Partitions: 2
Retention Period (Hours): 24
Note: The platformdemo is the name of the Oracle Event Hub Cloud Service - Platform cluster in which the topic will be created. You can provide a different name if you want to host this in a different Oracle Event Hub Cloud - Platfrom cluster.
In the Confirm page, if you find the details appropriate, click Create.
The control returns to the Services page. In the Services page, you could now see the new topicdemo service listed.
Click on the Event Hub icon adjacent to the topicdemo instance to go to the Service Overview page.
In the Service Overview page, observe the Topic field. This is the name of the Topic service that will be used in programs demonstrated in this tutorial.
Kafka is a publish-subscribe messaging system that provides a reliable Spark Streaming
source. The Kafka project introduced a new consumer API between versions 0.8 and 0.10, so there are 2 separate corresponding Spark Streaming packages available. The API provides one-to-one mapping between Kafka's partition and the DStream generated RDDspartition along with access to metadata and offset.
The following diagram shows end-to-end integration with Kafka, consuming messages from it, doing simple to complex windowing ETL, and pushing the desired output to various sinks such as memory, console, file, databases, and back to Kafka itself.
Following set of properties will need to be added to Spark Streaming API to integrate Kafka with Spark as a Source
bootstrap.servers: This describes the host and port of Kafka server(s) separated by a comma.
key.deserializer: This is the name of the class to deserialize the key of the messages from Kafka.
value.deserializer: This refers to the class that deserializes the value of the message.
group.id: This uniquely identifies the group of consumer.
auto.offset.reset: This is used messages are consumed from a topic in Kafka, but does not have initial offset in Kafka or if the current offset does not exist anymore on the server then one of the following options helps.