4 years ago · 04ce809da3
--- a/Ecosystem.md
+++ b/Ecosystem.md
@@ -0,0 +1,65 @@
 
				+# 7 The Kafka ecosystem and its future
			
 
				+
			
 
				+## Use cases and challenges
			
 
				+
			
 
				+- Primary use cases
			
 
				+    - Connecting disparate sources of data
			
 
				+    - Given the client API, it is possible to write data connectors and sinks for any data source
			
 
				+    - Data supply chain pipelines, replacing old ETL environments.
			
 
				+    - More generally, "Big data" integration (Hadoop, Spark)
			
 
				+- Remaining challenges
			
 
				+    - Govern data evolution
			
 
				+    - Intrinsic data (in)consistency
			
 
				+    - Big (last 5 years in 2016) and fast (next 5 years in 2016) data
			
 
				+
			
 
				+## Governance and evolution
			
 
				+
			
 
				+- Each producer defines its message contract via the (Key|Value)-serializer properties
			
 
				+- The base serializers are not enough for most cases, hence custom serializers
			
 
				+- These contracts have versions, all cohabitating
			
 
				+- Consumers need to be aware of those in order to deserialize data
			
 
				+- K limitation is the lack of a registry of message formats and their versions
			
 
				+- Confluent: Kafka Schema Registry
			
 
				+    - Almost universal format: Apache Avro. Self-describing.
			
 
				+    - First class Avro (de)serializers
			
 
				+    - Schema registry and version management in the cluster
			
 
				+    - RESTful service discovery
			
 
				+    - Compatibility broker
			
 
				+
			
 
				+Alternatives: Protobuf, Apache Thrift, MessagePack
			
 
				+
			
 
				+## Consistency and productivity
			
 
				+
			
 
				+- Lots of duplicated effort writing producers and consumers, all of them mostly the same
			
 
				+- Lack of a common framework for integrating the sources and targets, although
			
 
				+  they are not _that_ numerous for each category:
			
 
				+    - producers: file systems, NoSQL, RDBMS,  ...
			
 
				+    - consumers: Search Engines, HDFS, RDBMS, ...
			
 
				+- Confluent after Kafka 0.10 => Kafka Connect and Connector Hub
			
 
				+    - Common framework for integration
			
 
				+    - Makes writing consumers and producers easier and more consistent
			
 
				+    - Platform connectors:
			
 
				+        - Oracle, HP, ...
			
 
				+        - 50+ and growing
			
 
				+    - Connector Hub: available to anyone providing such an integration
			
 
				+    - Should make K integration faster and cheaper
			
 
				+
			
 
				+## Fast data
			
 
				+
			
 
				+- Needs: Real-time, Predictive Analytics, Machine Learning
			
 
				+- Apache Platforms include: Apache Storm, Apache Spark, Apache Cassandra, Apache Hadoop, Apache Flink
			
 
				+- Problem: each of these includes its own cluster management, multiplying the
			
 
				+  operational cost
			
 
				+- When using K in the middle, it means lots of producers and consumers to keep up
			
 
				+  at scale
			
 
				+- Confluent after Kafka 0.10 => Kafka Streams
			
 
				+    - Leverages K machinery instead of writing all these integrations
			
 
				+    - Single infrastructure solution
			
 
				+        - At least for streaming-based processing
			
 
				+    - Embeddable within existing applications
			
 
				+    - Java Library, just like KafkaConsumer and KafkaProducer
			
 
				+
			
 
				+## Ecosystem
			
 
				+
			
 
				+- Scale-ups: LinkedIn, Netflix, Twitter, Uber
			
 
				+- Publisher: Confluent
			
--- a/pom.xml
+++ b/pom.xml
@@ -6,7 +6,7 @@
 
				 
			
 
				     <groupId>fr.osinet.ps.kafka</groupId>
			
 
				     <artifactId>samples</artifactId>
			
 
				-    <version>1.0-SNAPSHOT</version>
			
 
				+    <version>0.1-SNAPSHOT</version>
			
 
				 
			
 
				     <properties>
			
 
				         <maven.compiler.source>8</maven.compiler.source>