Add Policy Recommendation Spark job and image #16

dreamtalen · 2022-05-05T21:42:37Z

In this PR, we add the policy recommendation spark job and image into the Theia repo.
Previously policy recommendation spark job has got several reviews on the Antrea repo at antrea-io/antrea#3064. I'm highlighting these changes compared with that closed PR:

Add removing auto-generated Pod labels feature to merge recommended policies.
Add the capability of recommending the toService ANPs for Pod-to-Svc flows.
Make corresponding changes to work with Antctl CLI on the Flow Aggregator side.

Also, this is only the first PR for the Policy Recommendation feature. I will create subsequent PRs including Documentation, Antctl CLI, unit tests, and e2e tests, to complete this new feature.

plugins/policy-recommendation/policy_recommendation_job.py

.github/workflows/build.yml

build/images/Dockerfile.policy-recommendation.ubuntu

plugins/policy-recommendation/policy_recommendation_job.py

salv-orlando

Partial review

salv-orlando · 2022-05-20T16:36:20Z

build/images/Dockerfile.policy-recommendation.ubuntu

@@ -0,0 +1,19 @@
+FROM gcr.io/spark-operator/spark-py:v3.1.1


todo for a future PR: Consider whether this image can be moved in the antrea dockerhub as we did for the other 3rd party images used by Theia

Got it, yes we could tag this image and push to our docker hub too.

plugins/policy-recommendation/antrea_crd.py

salv-orlando · 2022-05-20T16:39:17Z

plugins/policy-recommendation/antrea_crd.py

+        """Returns the model properties as a dict"""
+        result = {}
+
+        for attr, _ in six.iteritems(self.attribute_types):


For instance here you can just use .items because you don't have to worry about python2 compatibility.
In case the image defaults to python2, you can change the /usr/bin/python symlink or explicitly trigger the job with python3

Thanks, changed.

plugins/policy-recommendation/antrea_crd.py

plugins/policy-recommendation/policy_recommendation_job.py

salv-orlando · 2022-05-20T16:53:59Z

plugins/policy-recommendation/policy_recommendation_job.py

+        print(help_message)
+        sys.exit(2)
+    for opt, arg in opts:
+        if opt in ("-h", "--help"):


I'd check for -h before entering the loop, a user my specify also other options, we don't want to parse them if -h is specified.

Sounds good to me, thanks!

plugins/policy-recommendation/policy_recommendation_job.py

salv-orlando · 2022-05-23T10:53:31Z

plugins/policy-recommendation/policy_recommendation_job.py

+        # Select user trusted denied flows when unprotected equals False
+        sql_query += " WHERE trusted == 1"
+    if start_time:
+        sql_query += " AND flowEndSeconds >= '{}'".format(start_time)


UX question: the condition above captures flows that were completed after the requested start time.
In the case of start_time, would it make sense to instead capture the flows that were already started at start_time?

Sounds good to me, changed it to check flowStartSeconds instead.

plugins/policy-recommendation/policy_recommendation_job.py

salv-orlando · 2022-05-23T11:12:47Z

plugins/policy-recommendation/policy_recommendation_job.py

+        )
+    else:
+        print("Warning: egress tuple {} has wrong format".format(egress))
+        return ""


should this be considered an error? If so, should we fail the job instead of returning an empty string?

Make sense to me, marked this as a fatal error, and stopped the spark job immediately.

salv-orlando · 2022-05-23T11:21:12Z

plugins/policy-recommendation/policy_recommendation_job.py

+def recommend_antrea_policies(flows_df, option=1, deny_rules=True, to_services=True):
+    ingress_rdd = flows_df.filter(flows_df.flowType != "pod_to_external")\
+        .rdd.map(map_flow_to_ingress)\
+        .reduceByKey(lambda a, b: (a[0]+PEER_DELIMITER+b[0], ""))


(to no address in this PR) you can consider using a NamedTuple for src and dest so instead of referring by item 0 and item 1 you can refer to them as "src" and "dest"

In PySpark, I think to achieve a 'namedTuple' like data structure I need to change the current RDD to the Dataframe type. This will involve lots of changes in the computation code. Could we mark this as a TODO for now?

Surely it can be TODO, Ignore it for this PR.

The namedtuple would actually just be some syntactic sugar, where you access item in a tuple as if you were accessing an object.

I would not think this requires using a DataFrame, but if that's the case, it's surely not worth the effort.

salv-orlando · 2022-05-23T11:24:14Z

plugins/policy-recommendation/policy_recommendation_job.py

+        .option("password", os.getenv("CH_PASSWORD")) \
+        .option("dbtable", table_name) \
+        .save()
+    return recommendation_id


@yanjunz97 do you think we need to update the clickhouse monitor to periodically clean up recommendation results as well? Do you think perhaps we might need to define an expiration time for results, as perhaps it's not ok to bluntly delete reco results when memory exceeds threshold.

In any case, I am ok not supporting periodic collelction of old reco results in Theia's first release.

As we do not expect the recommendation results may occupy too many spaces, I think expiration time might be more reasonable comparing to cleaning up by the monitor.

But I'm not sure what expiration time should be chosen. I think a recommendation policy might be useful for a long time. Maybe it is more reasonable to delete them only when users trigger a deletion task from the UI?

Thanks @yanjunz97, that's valuable feedback. Nothing we need to address here, but we surely need a mechanism to handle lifecycle of policy recommendation results.

plugins/policy-recommendation/antrea_crd.py

salv-orlando · 2022-05-27T11:31:51Z

It might also make sense of using the python logging library instead of printing to stdout.
It should be fairly easy to introduce logging in this job.

dreamtalen · 2022-06-01T00:13:56Z

It might also make sense of using the python logging library instead of printing to stdout. It should be fairly easy to introduce logging in this job.

Sure, I added the code to use spark logger to replace the print statement.

Signed-off-by: Yongming Ding <dyongming@vmware.com>

ziyouw · 2022-06-01T06:44:09Z

plugins/policy-recommendation/policy_recommendation_job.py

+]
+
+spark = SparkSession.builder.getOrCreate()
+spark.sparkContext.setLogLevel("info")


Should we make the log level configrable? It may help for live debug purpose.

Let's what @dreamtalen reckons. From what I gather this log will only emit what we are logging in this job, and - obviously - we are not logging anything at debug level.

Thank Salvatore help me answer this question.
Yes, I hard-coded log level to info because I only added logs at info, warning, and error level. Also tried changing to "debug" mode and I could see lots of debug logs automatically generated by Spark.

ziyouw · 2022-06-01T06:53:16Z

plugins/policy-recommendation/policy_recommendation_job.py

+    -s, --start_time=None: The start time of the flow records considered for the policy recommendation. 
+        Format is YYYY-MM-DD hh:mm:ss in UTC timezone. Default value is None, which means no limit of the start time of flow records.
+    -e, --end_time=None: The end time of the flow records considered for the policy recommendation.
+        Format is YYYY-MM-DD hh:mm:ss in UTC timezone. Default value is None, which means no limit of the end time of flow records.


Since we may do not have proper flow records in DB, maybe we need a warn message to indicate this case?

salv-orlando

Code looks pretty much good to me.
There are a few pending questions from Ziyou, waiting on approval.

plugins/policy-recommendation/antrea_crd.py

salv-orlando · 2022-06-01T14:59:45Z

plugins/policy-recommendation/policy_recommendation_job.py

+]
+
+spark = SparkSession.builder.getOrCreate()
+spark.sparkContext.setLogLevel("info")


Let's what @dreamtalen reckons. From what I gather this log will only emit what we are logging in this job, and - obviously - we are not logging anything at debug level.

plugins/policy-recommendation/policy_recommendation_job.py

salv-orlando · 2022-06-01T15:08:15Z

plugins/policy-recommendation/policy_recommendation_job.py

+def recommend_antrea_policies(flows_df, option=1, deny_rules=True, to_services=True):
+    ingress_rdd = flows_df.filter(flows_df.flowType != "pod_to_external")\
+        .rdd.map(map_flow_to_ingress)\
+        .reduceByKey(lambda a, b: (a[0]+PEER_DELIMITER+b[0], ""))


Surely it can be TODO, Ignore it for this PR.

The namedtuple would actually just be some syntactic sugar, where you access item in a tuple as if you were accessing an object.

I would not think this requires using a DataFrame, but if that's the case, it's surely not worth the effort.

salv-orlando · 2022-06-01T15:09:12Z

plugins/policy-recommendation/policy_recommendation_job.py

+        .option("password", os.getenv("CH_PASSWORD")) \
+        .option("dbtable", table_name) \
+        .save()
+    return recommendation_id


Thanks @yanjunz97, that's valuable feedback. Nothing we need to address here, but we surely need a mechanism to handle lifecycle of policy recommendation results.

plugins/policy-recommendation/antrea_crd.py

plugins/policy-recommendation/policy_recommendation_job.py

dreamtalen · 2022-06-01T17:52:53Z

It might also make sense of using the python logging library instead of printing to stdout. It should be fairly easy to introduce logging in this job.

Sure, I added the code to use spark logger to replace the print statement.

Update regarding logs: I found the spark logger only works on the driver and mapped functions running on the executors will meet an error: SparkContext can only be created/accessed/used on the driver. Also, the logging Python lib doesn't work either (no logs outputted inside mapped functions). I'm trying to find another approach for logging.
Ref: https://stackoverflow.com/questions/36022988/in-pyspark-how-can-i-log-to-log4j-from-inside-a-transformation

jianjuns · 2022-06-02T21:32:49Z

plugins/policy-recommendation/policy_recommendation_job.py

+        svc_acnp_list = svc_acnp_rdd.collect()
+    if deny_rules:
+        if option == 1:
+            # Recommend deny ANPs for the applied to groups of allow policies


Nit: appliedTo

Thank Jianjun, addressed.

jianjuns · 2022-06-02T21:33:27Z

plugins/policy-recommendation/policy_recommendation_job.py

+        logger.error("Error: option {} is not valid".format(option))
+        return []
+    if option == 3:
+        # Recommend k8s native network policies for unprotected flows


Nit: NetworkPolicies

We capitalize the first K8s resource or CRD kinds.

plugins/policy-recommendation/policy_recommendation_job.py

salv-orlando

I think code LGTM. Hope issues with logging have been sorted out.

salv-orlando · 2022-06-03T14:51:47Z

plugins/policy-recommendation/policy_recommendation_job.py

+    return flow_df
+
+def write_recommendation_result(spark, result, recommendation_type, db_jdbc_address, table_name, id):
+    if not id:


nit (perhaps to be addressed in a future PR): id is a reserved keyword in python. In this case it will be correctly interpreted, but using it is a risk for a maintainability perspective (e.g.: we change the param name to something else, no error is thrown when running the code, but "if not id" will always be false!)

Thank Salvatore, that's a fair concern. I rename this parameter to recommendation_id_input instead.

Signed-off-by: Yongming Ding <dyongming@vmware.com>

dreamtalen force-pushed the policy-reco branch 3 times, most recently from ad8e9e9 to 9b7034b Compare May 6, 2022 17:44

dreamtalen requested review from yanjunz97, heanlan, salv-orlando and wsquan171 May 6, 2022 18:02

heanlan reviewed May 13, 2022

View reviewed changes

plugins/policy-recommendation/policy_recommendation_job.py Outdated Show resolved Hide resolved

plugins/policy-recommendation/policy_recommendation_job.py Outdated Show resolved Hide resolved

dreamtalen force-pushed the policy-reco branch from 9b7034b to 52bddd1 Compare May 14, 2022 00:04

yanjunz97 reviewed May 16, 2022

View reviewed changes

dreamtalen force-pushed the policy-reco branch from 52bddd1 to 5446b01 Compare May 16, 2022 18:38

salv-orlando reviewed May 20, 2022

View reviewed changes

dreamtalen force-pushed the policy-reco branch 2 times, most recently from d5b1586 to 5670828 Compare May 22, 2022 22:41

salv-orlando reviewed May 23, 2022

View reviewed changes

yuntanghsu reviewed May 24, 2022

View reviewed changes

plugins/policy-recommendation/antrea_crd.py Show resolved Hide resolved

Add Policy Recommendation Spark job and image

5eea1b9

Signed-off-by: Yongming Ding <dyongming@vmware.com>

dreamtalen force-pushed the policy-reco branch 2 times, most recently from ff8ef3e to 98ef479 Compare June 1, 2022 00:21

ziyouw reviewed Jun 1, 2022

View reviewed changes

salv-orlando reviewed Jun 1, 2022

View reviewed changes

annakhm reviewed Jun 1, 2022

View reviewed changes

plugins/policy-recommendation/antrea_crd.py Outdated Show resolved Hide resolved

annakhm reviewed Jun 1, 2022

View reviewed changes

plugins/policy-recommendation/antrea_crd.py Outdated Show resolved Hide resolved

annakhm reviewed Jun 1, 2022

View reviewed changes

plugins/policy-recommendation/antrea_crd.py Outdated Show resolved Hide resolved

annakhm reviewed Jun 1, 2022

View reviewed changes

plugins/policy-recommendation/policy_recommendation_job.py Outdated Show resolved Hide resolved

annakhm reviewed Jun 1, 2022

View reviewed changes

plugins/policy-recommendation/policy_recommendation_job.py Outdated Show resolved Hide resolved

dreamtalen force-pushed the policy-reco branch from 98ef479 to 75a7b38 Compare June 1, 2022 18:21

dreamtalen force-pushed the policy-reco branch 2 times, most recently from 63fbf9e to 4b55b90 Compare June 2, 2022 00:15

jianjuns reviewed Jun 2, 2022

View reviewed changes

dreamtalen force-pushed the policy-reco branch from 4b55b90 to 5e99a81 Compare June 2, 2022 22:47

salv-orlando approved these changes Jun 3, 2022

View reviewed changes

Address comments

eb2a155

Signed-off-by: Yongming Ding <dyongming@vmware.com>

dreamtalen force-pushed the policy-reco branch from 5e99a81 to eb2a155 Compare June 3, 2022 17:00

annakhm approved these changes Jun 3, 2022

View reviewed changes

dreamtalen merged commit 75f6968 into antrea-io:main Jun 3, 2022

Add Policy Recommendation Spark job and image #16

Add Policy Recommendation Spark job and image #16

Conversation

dreamtalen commented May 5, 2022 • edited Loading

salv-orlando left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dreamtalen May 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salv-orlando commented May 27, 2022

dreamtalen commented Jun 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salv-orlando left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dreamtalen commented Jun 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salv-orlando left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dreamtalen commented May 5, 2022 •

edited

Loading

dreamtalen May 22, 2022 •

edited

Loading

dreamtalen commented Jun 1, 2022 •

edited

Loading