Use Azure Log Analytics to notify critical events occurring on Microsoft Teams Room Systems

In the previous post we had an overview of how to create Azure Log Analytics and configure them to collect data from windows systems. Once the information is ingested in the workspace we currently have a choice to make alarms and notify the responsible team dependent on various signal logics which will be useful on monitoring these devices.

These alerts are scoped to each log analytics workspace. It will be a smart thought to isolate the services ,group them on singular workspace and create separate alerts for critical events happening on these monitored devices.

In order to create the alerts Navigate to alerts on the same workspace  – Click on New Alert Rule

Navigate to signal logic and choose the signal logic. There are multiple we need to see if any more interesting which suits our requirement can be added over here.

Now we have the required critical signals based on which the alert needs to be triggered. Usually the signal type will be from the collected events and the performance counters. In our scenario we could go with some default events from the list and also custom log search.

Device Restart Alert:

In our example for default one did choose the Choose the signal logic of heartbeat from the existing one – (Useful when the device turns off)

Select the required devices  – make operator threshold value 0 – aggregation 5 minutes & frequency of evaluation 1 minute (The frequency of aggregation and evaluation can be chosen based on the interval of how many times we want to check the heartbeat). In normal cases it is best recommended not to choose a smaller frequency time range for large volume of devices and probably for critical devices alone it can be selected on a smaller frequency time period.

Disk Threshold Alert:

Similarly like device restart we are having disk threshold alert by default which can be configured.

It notifies when it exceeds the configured space. Select the resource configured for Teams – Select the Condition – Select the computers – the object name whenever the % free space is greater than and choose the value 60 percent. The percentage can be altered based on our requirement.

Then we need to select the required object, instance , counter path and source system. In our case we have selected one performance counter % free space. This will alert us when the disk space crosses 60 percent of overall capacity.

Chosen aggregate period is 5 minutes and the frequency time is 1 minute for every evaluation. Again we can change the frequency of evaluation for this probably on two times in a day one on the earlier time and other one  on the evening.

Custom Alerts:

Custom Alerts are more intriguing. With custom alerts we must be able to avail most of our alerting mechanisms. We have to select the signal custom log search for the custom alerts.

Event  | where EventLog == "System" | where EventLevelName == "Error"
|where RenderedDescription != "*updatefailed*" 
| where EventData != "DCOM"
| project TimeGenerated, Computer, RenderedDescription

Example used the above query to report only the events which has error messages apart from windows update and DCOM alerts . We can further filter on not contains operator and create custom query based on  our requirement.

When any error messages apart from the excluded events comes up in the targeted devices we will be alerted for the same.

Note there are multiple action types – Email/SMS/Push/Voice, ITSM and Webhook will be more convenient for us in this case on Skype room systems monitoring.

Email – We can send email Email/SMS/Push/Voice when the alert is triggered. This will be the most convenient and easiest part to start with. This will help us to collect all the used cases initially and see which ones are really helpful and the ones which is not helping us. Once we devise a strategy from the email alerts then probably we can go with the other alerting mechanisms.

ITSM – We can integrate with IT service desk management tool to create incidents when these alerts are triggered. Most of the IT service desk management tools are capable of API integration especially with Azure AD and must be easier to suffice this requirement.

Webhook- We can configure to send push notification to teams channels when these alerts are triggered. Probably a dedicated teams channel can be created for the first level of NOC monitoring team. Post that the webhook can be configured to trigger the critical events alert to the teams channel.

Now with the email alert – Created action group – Chosen action type email/SMS/Push/Voice

By default there are no action group created. So an action group must be created and targeted to NOC team email group.

Added the email address for notification. Well there are other options as well like sending SMS and Voice which could also be leveraged.

We do have an option to modify the email subject based on the alert details.

Finally we name the alert details , mark the severity , enable and create them.

We have the option to see all the  configured rules.

Once after configuration, we can see the statistical dashboards which provides us the summary of total alerts that have been triggered and their status.

We are receiving the email alerts when the disk space exceeds the configured level of 60 percentage.

Similarly when the device was turned off, the configured heartbeat alert triggered an email to the recipient.

Similar like this we can create multiple required alerts for critical events.

At this moment we have option to create alerts for every action type which can be targeted for all computers and they are charged individually on a very nominal price. So for multiple alerting types we need to create multiple action types. These alerts are purely based only on the collected logs which are present on the azure log analytics workspace. Just in case if we are trying to collect more details which are not present on the collected logs then we wouldn’t be able to create the alerts. The Azure logs Alerting mechanisms provide a great way to alert the critical events happening across the monitoring systems.

Thanks in Advance

Sathish Veerapandian

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: