Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sonic-xcvrd] add new daemon sonic-xcvrd to platform monitor #17

Merged
merged 9 commits into from
Jul 30, 2018
Merged

[sonic-xcvrd] add new daemon sonic-xcvrd to platform monitor #17

merged 9 commits into from
Jul 30, 2018

Conversation

keboliu
Copy link
Collaborator

@keboliu keboliu commented Jul 3, 2018

[sonic-xcvrd] add new daemon sonic-xcvrd to fetch Transceiver informations to DB

* This new daemon will periodcally read DOM information from eeprom and post to DB
* This new daemon will listen to the SFP plug in/out event and update the DB accordingly

* file change list

    new file:   scripts/xcvrd
    new file:   setup.py

Design doc: https://github.com/Azure/SONiC/blob/gh-pages/doc/OIDsforSensorandTransciver.MD

…tions to DB

* This new daemon will periodcally read DOM information from eeprom and post to DB
* This new daemon will listen to the SFP plug in/out event and update the DB accordingly

* file change list

	new file:   scripts/xcvrd
	new file:   setup.py

 Signed-off-by Liu Kebo kebol@mellanox.com
PHYSICAL_PORT_NOT_EXIST = -1
SFP_EEPROM_NOT_READY = -2

TEMPE_UNIT = 'C'
Copy link
Contributor

@jleveque jleveque Jul 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest rename to TEMP_UNIT as "temp" is a common abbreviation for "temperature." #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

REDIS_HOSTNAME = "localhost"
REDIS_PORT = 6379
REDIS_TIMEOUT_USECS = 0
SELECT_TIMEOUT = 1000
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT_TIMEOUT [](start = 0, length = 14)

SELECT_TIMEOUT_USECS? #Closed

Copy link
Contributor

@jleveque jleveque Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - add units to name. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised.


VERSION = '1.0'

SYSLOG_IDENTIFIER = "xcvrd"
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xcvrd [](start = 21, length = 5)

get the script basename programatically? #Closed

Copy link
Contributor

@jleveque jleveque Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. This is copied from other daemons I've written, and I've also been meaning to go back over them and obtain the basename programmatically.

SYSLOG_IDENTIFIER = os.path.basename(__file__) should do the trick. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised.

VERSION = '1.0'

SYSLOG_IDENTIFIER = "xcvrd"
XCVRD_MODULE_NAME = "xcvrd"
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XCVRD_MODULE_NAME [](start = 0, length = 17)

not used. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

REDIS_TIMEOUT_USECS = 0
SELECT_TIMEOUT = 1000

DOM_INFO_UPDATE_PERIOD = 60
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOM_INFO_UPDATE_PERIOD [](start = 0, length = 22)

in seconds? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and revised


DOM_INFO_UPDATE_PERIOD = 60

SFP_STATUS_PLUGIN = '1'
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SFP_STATUS_PLUGIN [](start = 0, length = 17)

plugin->inserted
plugout->removed #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest rename, plug* are not good names.


In reply to: 200794543 [](ancestors = 200794543)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

log_info("Caught SIGTERM - exiting...")
sys.exit(128 + sig)
else:
log_warning("Caught unhandled signal '" + sig + "'")
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and return? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised.

@@ -0,0 +1,430 @@
#!/usr/bin/env python
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python [](start = 15, length = 6)

python2 #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised

stdout = proc.communicate()[0]
proc.wait()
hwsku = stdout.rstrip('\n')
except OSError, e:
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except [](start = 4, length = 6)

try-except added no value except a message which may be not accurate. Suggest remove.

Update: actually you swallow the inner exception, which make it worse. #Closed

(platform, hwsku) = get_platform_and_hwsku()

# Load platform module from source
#platform_path = "/".join([PLATFORM_ROOT_PATH, platform])
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#platform_path = "/".join([PLATFORM_ROOT_PATH, platform]) [](start = 4, length = 57)

remove all commented code. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised.

continue

(key, op, fvp) = sst.pop()
if key in ["PortConfigDone", "PortInitDone"]:
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PortConfigDone", "PortInitDone" [](start = 20, length = 31)

Will it be a problem if port configured down long before you subscribe? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I have tested, even Xcvrd started after port config already done, still can get this after subscribe.


#Spawn the task to handle the SFP plug in/out event
sfp_change_event_monitoring_thread = MonitoringThread(int_tbl, dom_tbl)
sfp_change_event_monitoring_thread.start()
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start [](start = 39, length = 5)

after starting timer and thread, the main thread is doing nothing. maybe you can use one less thread. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the thread to the main task as the main loop.

def monitor_sfp_change_task(int_tbl, dom_tbl):
log_info('Xcvrd sub thread started....')
while True:
status, port_dict = platform_sfputil.get_transceiver_change_event()
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status [](start = 8, length = 6)

if some error happen, the while-loop will burn CPU. #Closed

post_port_dom_info_to_db(logical_port_name, table)

global timer
timer = threading.Timer(DOM_INFO_UPDATE_PERIOD, periodically_post_dom_info_to_db, [table])
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

periodically_post_dom_info_to_db [](start = 52, length = 32)

This is weird recursion. Lots of timers will be created quickly 1->2->4->... ? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if using a local variable 'timer' it will be the case as you said, here I am using a global 'timer' variable, so no this issue, timer triggered as expected.

Copy link

@hui-ma hui-ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add Txpower into database and SNMP? If you look at 8436.py and 8472.py the Sonic github (https://github.com/Azure/sonic-platform-common/blob/master/sonic_sfp), there both are a function for reading TX power calc_tx_poweron. Could you use it to get the TX power?

Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and there are still open comments.


# Start main loop to listen to the SFP change event.
log_info("Start main loop")
while True:
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while True: [](start = 4, length = 11)

if some error happen immediately in get_transceiver_change_event(), the while-loop will burn CPU #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised.

log_error("Error: thread_exit_flag is set and timer thread exit")
thread_exit_flag_lock.release()
return
thread_exit_flag_lock.release()
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may use threading.Event to reduce code. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised.

for logical_port_name in logical_port_list:
post_port_dom_info_to_db(logical_port_name, table)

global timer
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global [](start = 4, length = 6)

suggest define a class and move global var used by the thread inside the class. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised.

# TODO, SFP return error code, need handle accordingly.
continue
else:
# If get_transceiver_change_event() return error, will clean up the DB and then exit
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may just simply break the while loop, and do the clean up with less indentation. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised

for logical_port_name in logical_port_list:
del_port_sfp_dom_info_to_db(logical_port_name, int_tbl, dom_tbl)
log_error("Error: got error from get_transceiver_change_event, exit")
return
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return [](start = 12, length = 6)

return an non zero code to indicate daemon abnormal exit. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised


# Start the dom sensor info update timer thread
dom_update_threade_event = threading.Event()
dom_update_thread_timer = None
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event and timer objects could be hidden in the update class. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised.

class dom_info_update_task:
def __init__(self, timer, event, table):
self.task_running_event = event
self.task_running_event.set()
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task_running_event [](start = 13, length = 18)

How about a event for stopping. It is more intuitive and readable. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed.

class dom_info_update_task:
def __init__(self, table):
self.task_stopping_event = threading.Event()
self.task_stopping_event.set()
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task_stopping_event [](start = 13, length = 19)

self.task_stopping_event.clear() #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default the event is set to false, so we don't need to call clear() here.

self.dom_table = table

def task_run(self):
if not self.task_stopping_event.wait(1):
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not self.task_stopping_event.wait(1): [](start = 8, length = 40)

if self.task_stopping_event.isSet(): #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use isSet() to get the event status.

self.task_timer.start()

def task_stop(self):
self.task_stopping_event.clear()
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.task_stopping_event.clear() [](start = 8, length = 32)

self.task_stopping_event.set() #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised.

@qiluo-msft qiluo-msft merged commit b1b9169 into sonic-net:master Jul 30, 2018
@keboliu keboliu deleted the sonic-xcvrd branch August 22, 2018 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants