Telemetry
STB has lots of metrics that would be powerful for analysis of trends and performance insights. Some sources of those metrics are applications (subsystems) and SNMP/TR-069 values. An easy way to get some application metrics is through the log strings they write to log files. To date, the only means of getting at this content has been through the daily uploading of these files to file servers. Then, custom tools like Sphinx have been indexing the text content. Ad-hoc searches and a few reports give engineers access to the content.
But this content has not been available to the range of operations engineers, analysts, managers, and others who would like to search for and monitor issues. And more importantly, it isn’t timely. 24 hours late is too late for many cases. In order to improve on timeliness, a new effort will have the boxes sending specific metrics from specific logs and SNMP/TR-069 at near-real time frequencies.
RDK Telemetry improved on the early practice where the RDK would simply bundle log files once a day and transfer the bundle to a file server. RDK Telemetry implemented several key features:
- More real time data
- Configurable metrics or events that the box should retrieve and send
- Improved upload frequency–every 15 minutes–that made the timeliness of this data more relevant to analysts
- More sources of metrics and events: log files and SNMP or TR-069 parameters
- Improved cloud solution for analytics: big data solutions for receiving streams and storing the data for reporting
- Real time metrics use terse key/value pairs
-
Eg: {“searchResult”:[{“Total Space available”:”30″},{“Heartbeat”:”61″},{“Success tune”:”280″},{“Failure tune”:”0″},{“mac”:”FF:FF:FE:FE:FF:FE”},{“Version”:”XYZ_PROD_master_HYB_021622172018″}]}
-
All of these features have been big improvements over the earlier log file transfers.
Key Features
- RDK telemetry data is send from every RDK device on a specific interval
- Ability to generate critical matrix using the configured parameters
- Leverages logs, different system status information and snmp commands
- Uses key-value pairs for data upload
RDK telemetry data is sent from every RDK device periodically. During bootup, Xconf sends certain parameters (matrix info which it likes to gather). It sends what parameter and where to reteive it from.
eg: parameters are be derived from log, status information (cpu load / mem usage) using system commands or via snmp commands
STB sends request to server with version, device details etc. Based on server rules, telemetry agent retrieves the telemetry markers from server. Based on that information, it retrieves data from log / snmp etc. Finally telemetry agent packages all these data to json message and sends to a server which gets processed and updated in splunk. Server can configure how often it needs telemetry data.
How it works
The telemetry upload process is controlled through dcm-log service
- DCMscript.sh communicates with Xconf server and fetches the predefined markers
- Using the markers, DCM Script will prepare a sorted map file for the log lookup, creates a DCA Agent cron job.
- Cron job retrieves data from the device
- From the retrieved data, it will create a JSON formatted message.
- JSON format data will be uploaded to server