1、Orlando,FLOctober 69IBM TechXchange 20252976Duncan Scott MartinsonAmericas Sales Leader,Databases,IBMOpen Source Crash CourseIntegrating Kafka,dbt,Airflow,and Ranger with watsonx.dataAgenda010203040506Reference ArchitectureApache KafkadbtApache AirflowApache RangerFull Data PlatformIBMTechXchange|20
2、25 IBM CorporationThe Usual SuspectsOpen Reference ArchitectureProducersConsumerswatsonx.dataKafkaEvent StreamingApache KafkaApache Kafka is an open-source distributed event streaming platform used to:Publish and subscribe to streams of records(events)Store streams durably and reliablyProcess stream
3、s in real-timeCore Concepts:Core Concepts:ProducerProducer:Sends data(events)to topicstopicsConsumerConsumer:Reads data from topicsBrokerBroker:Kafka server that stores and manages eventsTopicTopic:Named category of events,split into partitionspartitions for scalabilityKafkaHow it fits in with watso
4、nx.dataEvent StreamingProducersConsumerswatsonx.dataKafkaHow it fits in with watsonx.dataEvent StreamingProducersConsumerswatsonx.dataData ReplicationDataStageStreamSetsdbt(Data Build Tool)ELTdbtdbt is an open-source command-line tool that enables data analysts and engineers to transform data in the
5、ir transform data in their warehouse more effectivelywarehouse more effectively.Core Concepts:Core Concepts:SQLSQL-based transformationsbased transformations:Write modular SQL queries to define models(tables/views).Version controlVersion control:Integrates with Git for collaborative development.Abst
6、raction:Abstraction:Logic is independent of source system syntax.Object names can be updated in once place.Testing&documentationTesting&documentation:Built-in tools for testing data quality and generating docs.-This model pivots order statuses into columns using Jinja%set statuses=completed,shipped,