Skip to content

Latest commit

 

History

History
26 lines (14 loc) · 1.73 KB

07a-event_streams-intro.asciidoc

File metadata and controls

26 lines (14 loc) · 1.73 KB

////This intro here is a good model for other chapters - this one’s rough, but the bones are here. Amy////

Much of Hadoop’s adoption is driven by organizations realizing they the opportunity to measure every aspect of their operation, unify those data sources, and act on the patterns that Hadoop and other Big Data tools uncover.////Share a few examples of things that can be measured (website hits, etc.) Amy////

For

e-commerce site, an advertising broker, or a hosting provider, there’s no wonder inherent in being able to measure every customer interaction, no controversy that it’s enormously valuable to uncovering patterns in those interactions, and no lack of tools to act on those patterns in real time.

can use the clickstream of interactions with each email ; this was one of the cardinal advantages cited in the success of Barack Obama’s 2012 campaign.

This chapter’s techniques will help, say, a hospital process the stream of data from every ICU patient; a retailer process the entire purchase-decision process from or a political campaign to understand and tune the response to each email batch and advertising placement.

Hadoop cut its teeth at Yahoo, where it was primarily used for processing internet-sized web crawls(see next chapter on text processing) and

Quite likely, server log processing either a) is the reason you got this book or b) seems utterly boring.////I think you should explain this for readers. Amy//// For the latter folks, stick with it; hidden in this chapter are basic problems of statistics (finding histogram of pageviews), text processing (regular expressions for parsing), and graphs (constructing the tree of paths that users take through a website).