The Plugin Portal is a component that allows BigFix Platform to manage devices that do not have a BigFix Agent installed, for example Amazon Web Services or mobile devices.
It adds BigFix features to the devices it manages - in particular, it facilitates gathering of sites, evaluation of Fixlets and running actions on the discovered devices.
In short, the Plugin Portal acts as the agent for all the devices managed by BigFix.
You can find more details in the Plugin Portal doc page.
The changes made in BigFix 10.0.5* have the purpose of increasing its efficiency, reducing the use of machine resources and the quantity of content (Fixlets, analyses) managed for the discovered devices.
To reduce the use of resources we proceeded to:
-
- An optimization of the code
- Use of the internal SQLite databases not only for result persistence but also during runtime execution to reduce the amount of data loaded in the process memory.
To reduce unnecessary content, two new methods have been added to the portal for filtering:
-
- Custom Sites subscription
- Content provided by Sites
*Patch 4 already includes some of these improvements, in the rest of the article comparisons are related to versions prior to Patch4
Machine resources optimization
The Plugin Portal main thread uses loops to execute operations on the involved devices (evaluate Fixlets, run actions…). To avoid overloading the server by sending all the reports at one time, each main cycle is splitted in mini-cycles performing Fixlet evaluation and action executions on groups of 500 devices.
In previous versions the portal kept in memory the results of Fixlets, properties and actions for all the discovered devices, using SQLite Databases for result persistence.
Now the DBs are used at runtime: the results are loaded from them at the beginning of each mini-cycle only for the group of 500 involved devices. At the end of the mini-cycle they are released, reducing the average memory load.
The behavior of multi-thread evaluation changed; in previous versions the _BESPluginPortal_Performance_ThreadLimit
was set by default to 1 and the suggested value was ( number of CPU cores – 1 ).
Now the Portal can use all the CPU cores during the evaluation phase and the default value is automatically set to the most performant one, that is the number of CPU cores.
A Plugin Portal upgrade from previous versions does not change an already set value; anyway, for performance reason, it is strongly suggested to unset _BESPluginPortal_Performance_ThreadLimit after the upgrade to get the default value.
Reduce unnecessary content
As written before, the Plugin Portal acts as Agent for all the devices it discovers; with this premise, it is fundamental to optimize the Plugin Portal overhead for performance reason, filtering the content it manages: all the content not made for the discovered devices should not reach the Plugin Portal.
Since version 10.0.4, there are two new features helping to filter content from sites. It is really important understanding and apply them to Customer and External Sites.
Filtering by relevance. Before 10.0.4, the Plugin Portal subscribed the devices to a Custom Site after evaluating the applicability relevance resulting from the subscription condition of the site.
Now the subscription relevance of the site is evaluated by the Plugin Portal only if it uses at least one of the following inspectors:
- in agent context
- in proxy agent context
- in plugin portal context
You can find details and examples in the Custom Site management doc page.
Filtering by content name. After a Site is downloaded, at gathering time, the Plugin Portal can exclude some Fixlets or analyses from the evaluation process. It is sufficient to specify the excluded content in a json file, named PluginPortalSubscriptionOptions.json.
You can find details and examples in the Filtering the content of subscribed sites doc page.
Performance results
BigFix Plugin Portal is the component in charge of processing content of different nature on behalf of the managed devices; just like the BESClient processes content for a single endpoint, the Portal needs to do the same for all the devices it manages.
The processing rate of this content and the resources it takes to process it constitute a set of KPIs (Key Performance Indicators) to be measured. This includes:
- Content processing in terms of actions, Fixlets, properties etc.
- Resource used to fulfill it
- Time to process group of devices batches
Patch 5 improvements and optimizations provide a huge burst in core Portal performance: this was the main limiting factor to the number of managed devices scale out.
To assess and verify the benefits of this rework the approach has been to use a suitable benchmarking workload and compare main KPIs.
To better highlight the improvement, results are presented broken down into:
- Code Optimization effects
- Unnecessary content processing
- New scale out numbers
Code optimization
Code has been refactored and improved to optimize resource utilization and processing logic, as described above. In order to quantify this improvement alone, Portal was shielded artificially from processing custom content: Plugin Portal with previous version is let process the same exact content it would process with Patch5.
Resulting is an important burst on all the relevant KPIs, together with a reduced utilization of underlying resources.
Code refactoring and optimization shows improvements in processing rate of all major BigFix entities (properties being the most prominent). This overall is reflected in the Batch rate improvement that gives an overall measurement of Portal processing capabilities.
Batch rate (and all other KPIs) improves with number of devices managed, as evidenced comparing the same set managing 10k and 50k respectively: for 50k devices batch rate is almost 200% faster respect to past.
One important part of the optimization worked was to reduce resource usage, mostly related to memory consumption.
The picture shows the average improvement percentage respect to previous levels. Beside a huge gain (yes, it’s more than 500%), it’s worth to notice that this improvement scales well with the number of devices managed.
Unnecessary content processing removal
What said so far, refers to an artificial condition imposed to have fair comparison: avoiding the Portal to process unnecessary content. If custom content not pertinent to the Portal is processed this leads to obvious performance degradation, and very bluntly, the more un-necessary content the Plugin Portal processes the worse it performs.
In the current test environment the net improvement, in average, is depicted below for 10k discovered: batch rate is over three times higher respect to the past.
Even in the discovery phase (heavier respect to update one), the burst obtained is considerable.
Scale out : up to 300k
From all the above, it becomes evident how the Portal is able to scale out as never before.
Patch 5 allows now the Portal to manage up to 300k devices (75k per Portal instance) (refer to capacity planning blog).
Conclusion
Patch 5 includes a strong effort to improve the overall BigFix Portal performance. This leads to a much lower processing time and a much more controlled resource usage, which in turn allows to seamlessly scale up to 300k devices managed.
Patch 5 represents a major cornerstone for BigFix Portal, but further improvements are yet to come: so stay tuned…
As always BigFix is able to find it, fix it, even faster !!!
Authors of this Blog:
Emilio De Angelis
Valeria Mazza
Massimo Marra