Simple Scala examples for using Apache Flink's DataStream and SQL API for the same problem, that is simple enough not to involve other libraries.
The example approximates Pi with the Monte Carlo method using Flink's
DataStream API
: ds.PiEstimatorSQL API (Blink)
: sql.PiEstimatorDataStream and SQL API
: mixed.PiEstimator
All three examples have the same functionality, by iteratively
approaching better and better estimate for the value of Pi.
All write the actual best estimate in the given intervals (3 sec)
till an explicit user interruption.
As a continuous DataSource, (random/sequnetial) IDs are generated either by IdGenerator or by DataGen SQL Connector.
The next step is the creation is random point in a 2x2 box , centered in the origin, for each generated ID.
The ratio of withinCircle
points in the sample estimates Pi/4.
Rationale : The area of the 2x2 box is 4, the area of the 1 unit radius circle inside is 1*1*Pi = Pi
by definition. As a consequence, the ratio of these two areas are :
Pi/4 = P(isWithinCircle)/1
=> Pi = 4 * P(isWithinCircle)
The data from the millions of trials (resulting 4.0's and 0.0's) are aggregated in two phases. The first
phase is on each thread (the bulk of the workload), while the second is a global aggregation.
The output is collected in Print SQL Sink or Flink's DataStreamSink.