Bug #271
EC's checking of active nodes is too resource consuming
| Status: | Closed | Start date: | 20/04/2010 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | Thierry Rakotoarivelo | % Done: | 100% |
|
| Category: | OMF Software | |||
| Target version: | OMF 6.0 |
Description
As reported by Giovanni Di Stasi
The current mechanisms for the EC to check if the nodes defined in an experiment are "active" is not efficient, and may consume to much resources in some OMF deployment.
Here is the current mechanism:
- When a new experiment starts with a group of node, the Experiment Controller (= EC, the software launch with the 'omf exec' command) checks if the nodes defined in the experiment are "active" (= they exist in the testbed and are not 'broken').
- To do that, for each node the current EC v5.2 asks the full list of "active" node to the CMC service of the Aggregate Manager (= AM) and test if the defined node is in that list.
- To build that list, the CMC service v5.2 has 2 options:
-- it you are at Winlab, CMC service queries the CM of the defined node to see if it is working,
-- if you have OMF installed without a CM card on the nodes, CMC service will built a mock list with all the node set as "active", and to do that it will take the X_MAX and Y_MAX and generate a list with X_MAX * Y_MAX entries.
- In your case, I guess it is the 2nd option , thus the list for you would have 770 entries (7*110, right?)
- Therefore your EC needs to test if your defined node is within the 770 entries.
- Then the all process loops again to test for the next defined node in your experiment... So depending on the machines that runs the EC and AM service, their load, and the number of nodes defined in your experiment that might take some time.
Here would be a proposed solution (IMPORTANT: we should think this through a bit more, to see if another design would be better)
- have the EC issue a specific query to the CMC service for checking if a unique node is active. Thus the EC will not have to test the inclusion of X in a list of N entries)
- have the the CMC service (in the 2nd case, when no CM card is installed on the node) return not a list, but the specific reply to the EC's query, i.e. node is active or not (this could be built out of the Inventory database information). Thus the AM will not have to build a N-entry list for each query.
History
Updated by Thierry Rakotoarivelo almost 2 years ago
- Target version changed from OMF 5.3 to OMF 6.0
- % Done changed from 0 to 20
For the soon 5.3 release, we de-activate the EC check for "active" nodes. This is because the EC later issues ENROLL messages to each resources requested for the experiment. If a resource does not respond within a certain window, the EC considers it as "inactive". Thus this dynamic check by the EC makes the 1st check redundant.
For the future 5.4 release, we will also use a similar dynamic check but with the possible reply from an AM or RM instead of the reply from an RC (with some potential modification of the ENROLL process).
Updated by Thierry Rakotoarivelo 9 months ago
- Status changed from New to Closed
- Assignee set to Thierry Rakotoarivelo
- % Done changed from 20 to 100
I am closing this issue now, the current (and future) version of the EC is not performing a check on all resources known by the CMC, but instead tries to contact the resource directly to see if it is responding. This is implemented by the ENROLL/ENROLLED handshake in the current OMF 5.3. and will probably be replaced by another mechanism in later major OMF revisions.
(Note that the current EC's only call to the CMC AM at the start of the experiment is a request for turning on individual specific nodes. In the case of testbeds without CM card, the CMC stub AM service will always return OK to that call).