flow-architectures-the-future-of-streaming-and-event-driven-integration
How event-driven integration and real-time data streaming will transform business and create the next wave of platform economies.
Key Insights
- The World Wide Flow (WWF). “Like HTTP created the World Wide Web and linked the world’s information, what I call ‘flow’ will create the World Wide Flow and link the world’s activity.” The bet is that standardized event streaming becomes the connective tissue of business the same way HTTP became the connective tissue of information — and the platform opportunities that follow are of the same order of magnitude.
- Flow defined. “Flow is networked software integration that is event-driven, loosely coupled, and highly adaptable and extensible.” The key mechanics: consumers self-service subscribe to producer streams; once connected, data is pushed automatically; producers retain control over what goes to whom. The critical shift from REST/webhook patterns is that the consumer need not be known ahead of time — unlocking integrations that couldn’t be negotiated in advance.
- Value lives in the interaction, not the pipe. “flow is just the movement of data. Value is created by interacting with that flow.” When designing a flow-based product, the infrastructure (Kafka, Kinesis, etc.) is commodity; the differentiation is what you do with the stream — transform, aggregate, act on it in near-real time.
- The Jevons Paradox applied to streaming. When common interfaces lower the cost of integration, total demand for streaming goes up, not down — the same dynamic that made coal consumption rise when steam engines got more efficient. “Lower the cost of integration, and people will find new uses for streaming that will boost the overall demand for streaming technologies.” The standardization fight (CloudEvents, discovery APIs) is where the next platform market gets created.
- Five requirements for fluent architectures (the book’s structural skeleton): Security (producers control access), Agility (both sides can adapt), Timeliness (data arrives in a relevant time frame), Memory (streams can be replayed), Manageability (observability + controllability). A flow architecture missing any one of these fails as a business-grade system — use this list as a procurement or design checklist.
- Data provenance as the billion-dollar gap. “the next billion-dollar startups in the enterprise software market will be the company that solves maintaining data provenance in a high-volume, rapidly changing environment like the WWF.” In any regulated industry or mission-critical pipeline, knowing the lineage of every data point in the stream matters as much as the data itself.
— Drafted from external sources; review and edit to make your own.
Kindle Highlights
Highlights
Like HTTP created the World Wide Web and linked the world’s information, what I call “flow” will create the World Wide Flow and link the world’s activity. — location: 73 ^ref-45846
If flow was in place before this pandemic started, we might have seen a different response. Once contact tracing data was captured, it could have easily been shared with anyone who was authorized to use it. Real-time data from different brands of smartphones and other disparate sources could have been more easily combined to create a holistic view of a person’s contact risks. The up-to-date inventory data of every supplier of medical equipment could quickly have been combined into a single view of national or even global supply. As new providers of masks, face shields, and other critical equipment started producing goods, they could have added their data to that inventory without having to develop code to manage the connections, data packaging, and flow control needed to do so. — location: 344 ^ref-63156
Flow is networked software integration that is event-driven, loosely coupled, and highly adaptable and extensible. — location: 354 ^ref-38393
This definition leaves a lot to be desired in terms of the mechanics of flow, so let’s define some key properties that differentiate flow from other integration options. In information theory — which studies the quantification, storage, and communication of information — the sender of a set of information is called the producer, and the receiver is known as the consumer (the terms source and receiver are also used, but I prefer producer and consumer). With this in mind, flow is the movement of information between disparate software applications and services characterized by the following: Consumers (or their agents) request streams from producers through self-service interfaces Producers (or their agents) choose which requests to accept or reject Once a connection is established, consumers do not need to actively request information — it is automatically pushed to them as it is available Producers (or their agents) maintain control of the transmission of relevant information — i.e., what information to transmit when, and to whom Information is transmitted and received over standard network protocols — including to-be-determined protocols specifically aligned with flow mechanics — location: 358 ^ref-26470
flow is just the movement of data. Value is created by interacting with that flow. — location: 384 ^ref-52760
Even where application programming interfaces (APIs) exist for streaming, which can enable action as soon as the signal is sent, those APIs are largely proprietary for each offering. Twitter, for example, has a widely used API for consuming their social media streams, but it is completely proprietary to them. There is no consistent and agreed-upon mechanism for exchanging signals for immediate action across companies or industries. — location: 425 ^ref-1157
flow architectures must be designed to be asynchronous, highly adaptable, and extensible. — location: 438 ^ref-40551
This global graph of flowing data, and the software systems where that data is analyzed, transformed, or otherwise processed, will create an activity network that will rival the extent and importance of the World Wide Web. This global graph of activity is what I call the World Wide Flow (WWF). The WWF promises not only to democratize the distribution of activity data, but to create a platform on which new products or services can be discovered through trial and error at low cost. — location: 478 ^ref-35893
The latter (where the producer calls the consumer) is an excellent pattern for integrations in which there is a triggering action at the producer (e.g., a user clicks a button to start the transaction) and the consumer service is known ahead of time. However, this requirement — that the consumer and their needs be known ahead of time — is what limits the use of this pattern in commercial integrations. It is expensive for the consumer to negotiate a connection with the producer, and then implement an API endpoint that meets their expectations. — location: 506 ^ref-36467
lowering the cost of stream processing, increasing the flexibility in composing data flows, and creating and utilizing a rich market ecosystem. Lower cost, increased flexibility, and increased choice are three natural attractors for any organization looking to “do more with less” while remaining adaptable to changing conditions. — location: 877 ^ref-38403
William Stanley Jevons, a nineteenth-century economist and logician, noticed that the introduction of a more efficient steam engine to English factories resulted in a large increase in the consumption of coal, the primary fuel for these engines. Even though each steam engine used less coal to produce the same power as older technology, the increased efficiency made coal a cheaper power source. This made steam power more attractive to new use cases, which resulted in more steam engines consuming more coal overall. So, what’s the “steam engine” when it comes to flow? What is the technology that can improve in such a way that it fundamentally changes the economics of integration? Unsurprisingly, I suppose, I argue that it is stream processing. Also, I think the innovation that will change the economics of data and event streaming will be the common interfaces and protocols that enable flow. Lower the cost of integration, and people will find new uses for streaming that will boost the overall demand for streaming technologies. The Jevons paradox at work. — location: 903 ^ref-1993
You can easily imagine the value of health-care data streams for building services around patient data. — location: 1079 ^ref-53823
five requirements of fluent architectures and the WWF needed to be useful to businesses and other institutions: security, agility, timeliness, memory, and manageability: Security Producers (or their agents) must maintain control over who can access their streams, and data must be protected as it is shared with authorized consumers. Agility Flow must allow both consumers and producers to adapt and experiment in a changing environment, and for the system itself to rapidly adapt in response. Timeliness Data must arrive in a time frame that is relevant to the context for which it is being applied. Memory Where relevant, flow must accommodate the need to replay streams to create long-term and short-term memory in the system. Manageability It must be possible for both producers and consumers to understand the behavior of the system from their perspective (observability), and to take — location: 1343 ^ref-12025
Enabling near real-time integration between disparate organizations through event-driven architectures. — location: 1463 ^ref-61692
components of the protocol that I think are worth calling out separately are the metadata format and the payload format; I think these are likely to be distinct specifications that are the key building blocks in defining an overall protocol for flow. — location: 1525 ^ref-44734
Metadata format Protocols for describing metadata that can be used by flow libraries to understand payload formatting, encrypt/decrypt payloads, understand the payload origin, and so on. Payload data format The protocol for understanding the specific data payloads sent by the producer to the consumer. These formats will likely vary significantly from use case to use case, but standard payload formats may be defined for common streams in a given — location: 1529 ^ref-62482
Discovery APIs, on the other hand, are largely nonexistent, except as components of proprietary product offerings (see Chapter 4). I am not aware of a product or service that has been widely adopted that has a discovery API. For this reason, I think discovery is largely in the Genesis phase, though its importance is becoming more broadly understood. As more developers start to address the discovery problem, I expect general use discovery APIs to evolve quickly. — location: 1738 ^ref-35500
formats defined by Apache Kafka, AWS, and others for their respective technologies. Though some standardization efforts are underway, such as CNCF CloudEvents, they are only now just starting to appear in products and services, and are far from ubiquitous. — location: 1747 ^ref-40762
you are interested in studying streaming architectures in more depth, I suggest Tyler Akidau, Slava Chernyak, and Reuven Lax’s Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing (O’Reilly), Fabian Hueske and Vasiliki Kalavri’s Stream Processing with Apache Flink (O’Reilly), or Gerard Maas and Francois Garillot’s Stream Processing with Apache Spark (O’Reilly). — location: 1829 ^ref-1053
In event sourcing, the record of state for any given entity is represented entirely by the stream of state changes recorded in its topic. — location: 2111 ^ref-25784
An alternative to processing streams as queues or logs is to maintain running a model of system state and build your processing around that model. — location: 2122 ^ref-61867
In a stateful processing system, an understanding of the problem domain is used to create and maintain a stateful representation of the real world. In our traffic example, the model uses software agents — — location: 2128 ^ref-50220
Stateful stream processing platforms give you a digital representation of your real-world system that is constantly updated by the event stream used to define it. — location: 2133 ^ref-62553
we worked to understand our purpose: to create a simple way to integrate applications across organization boundaries using events and standard interfaces and protocols. — location: 2222 ^ref-30453
CNCF CloudEvents team is the most committed to solving this problem that I have found to date. — location: 2553 ^ref-6092
the next billion-dollar startups in the enterprise software market will be the company that solves maintaining data provenance in a high-volume, rapidly changing environment like the WWF. Knowing the data consumed in mission-critical (or even life-critical) applications is accurate is just too important. — location: 2610 ^ref-3382
an agreed-upon, standard interface for connecting to uniquely identified streams does not yet exist. The next five years or so will see this problem get more and more attention, so expect a plethora of options to be announced in that time frame. Using the examples of cloud computing and container management as a guide, it shouldn’t take more than another five to seven years after that for the market to declare a winner. — location: 2676 ^ref-40996
The central value proposition of flow is really built around two things: lowering the cost of integrating via event streams, and using event streams to signal state changes in near-real time. — location: 2712 ^ref-8751
There are two traits that have to be in place for stream processing to be manageable. First, the operator (a producer or a consumer) has to be able to see what is going on in the system, a capability known today as observability. Second, the operator must be able to take action that affects the behavior of the system in predictable ways, which is known as controllability. — location: 2807 ^ref-15354
event stream history as a form of memory could be one of the biggest opportunities for some new forms of software enabled by flow. — location: 2922 ^ref-15647
Control of Intellectual Property The last requirement is perhaps one of the most important if flow is to be at the heart of any fundamental change to the way business is done. Without the ability for flow to ensure that the owners of intellectual property (IP) can maintain control of that property, flow cannot be trusted. And, as I noted in “Security”, trust is always the first requirement of a business system of value. — location: 2928 ^ref-56717
CloudEvents, is a specification for describing events that shows great promise for flow. Defined as a common metadata model that can be mapped (or “bound”) to any number of connection or pubsub protocols, CloudEvents is simple and capable of carrying a wide variety of payload types. Version 1.0 of CloudEvents, released in October 2019, focuses entirely on data packaging, and I will describe that in the Protocols section. However, the committee has now turned its attention to two interfaces that will be key to simplifying flow consumption: subscriptions and discovery. The subscription API is most interesting for this section. Defined as a common API for publish-and-subscribe activities that does not interfere with existing mechanisms, such as MQTT, the CloudEvents Subscription API may be defining a key interface for flow’s success. — location: 4073 ^ref-63780
Page created automatically from Kindle notes sync