Application Insights: A cure for your Sitecore Log pains

março 20, 2017

Any system administrator or developer who has maintained a Sitecore application in production knows about the log pains I’m talking about. These pains happen every time we notice an issue in an installation. It all starts with the file-system log files that are rolling over every time Sitecore restarts and these pains continue through to searching a bajillion lines of information while you try to find the elusive details of a log error that might indicate what problem the site is having.

Oh wait, you were looking at the log file on the wrong server.

Now that I’ve forced you to relive that horror, let us look at a different way of going about things. Microsoft’s Application Insights isn’t really new to the scene, but Sitecore has taken big leaps with version 8.2 update 1+ to make sure integration with Application Insights is super-tight. The Sitecore PaaS offering in Azure can’t support the local log files and performance counters on those app services so instead things need to be streamed to Application Insights. While that makes PaaS pretty awesome right away, we can steal all those goodies for any other Sitecore instance too.

Application Insights - Dashboard - ONE.png

Application Insights - TotalServerRequestsByPerformance - TWO.png

What does Sitecore send to Application Insights?

Sitecore sends what you would expect (errors, warnings, info logs) to Application Insights from a log perspective, but Application Insights telemetry will also let you see additional details such as:

  • Server response times
  • Number of server requests
  • Page view load times
  • Failed Requests
  • Dependencies
  • Windows events
  • Performance counters
  • Application Maps of the entire topology

Ermahgerhd… Wer er mah lerghz?

That sounds great and all, but the system administrator from our earlier example is probably just trying to find the error in the logs and figure out why the site is blowing up. How do we find the errors in the logs in this new world of insights and fancy dashboards?

The Sitecore documentation is really well done for this topic. You should read it to get the specific steps, but as a quick summary:
  1. Use keyword search if you have a specific error message you are looking for.
  2. Use Event Type filters on the search to restrict results to Trace and Exception messages.
  3. Use Role filters on the search to restrict results to a particular role (like CD).
  4. Use a Time Range filter to restrict your search to relevant days and times.
  5. Use a Severity level filter to filter down to Errors and Warnings
  6. Use the event details panel to review the message details
  7. Use navigation to find related messages to the selected event
Consider a scenario where a user is reporting a page is blowing up (your typical 500 error page). The system administrator would likely take the following steps:
  1. Ask for the page the user is viewing.
  2. RDP into the content delivery server.
  3. Navigate to the logs directory to find the log file from the date the user said they visited the site (which isn’t always the correct date).
  4. Open the log file in some tool (like Notepad++).
  5. Execute a text search on the file contents to find the error.
  6. If not found, repeat for any other log files from that date
  7. If not found, repeat for any other log files on other delivery servers.
  8. Once the exception is identified, the system administrator then looks at the log entries immediately following and preceding to see any indication of the cause of the error.

In an Application Insights world, the flow is very similar, except we can eliminate some of the painful steps.

  1. Log into Application Insights
  2. Apply a ‘role’ filter for CD to pull the events from all delivery instances
  3. Filter by event types Trace and Exception
  4. Filter by Severity options for Errors and Warnings
  5. Apply a Time Range filter based on when the user said they saw the error. Open up your span a little bit on either side to allow for user memory error.
  6. Scan the events for something relevant and select the event to view details.
  7. Navigate to related messages to look for an indication of the cause of the error.

As you can see, it is a similar approach, but your multi-server setup is much easier to handle and you don’t need to have remote access to the server to do this. This is great if you want the developers investigating the issue instead of the system administrator. Hello DevOps improvement!

I don’t always use telemetry, but when I do…

Application Insights isn’t just for Sitecore. With its free option (1GB per month) you can tie this to pretty much anything you have running. Just follow these instructions for your .NET web application and you’ll be up and running with this advanced analytics platform in no time!

Below is an example report from a simple MVC web application I run internally on the Nonlinear network to offer automation services to the team. I added application insights telemetry configuration, deployed, and can immediately view data. The graph shows average response times over a span of time (in this case about 3 days of data). Effort to plug it in? Less than 30 minutes end to end. That includes the time I had to read up on how to do it AND the time taken to do the deployment.

Application Insights - NonlinearAutomation_ResponseTimesAnalytics - THREE.png

CAUTION: The free version of Application Insights has a usage limit of 1GB after which you pay. However, the telemetry gathering caps for Application Insights are not altered in any way when you select the free version. Make sure to go in and cap your daily usage so you don’t accidentally start getting charged before you are ready.

Setting a daily cap:
  1. Log into the Azure Portal
  2. Access your Application Insights Resource
  3. On the left side, in the ‘Configure’ group, select “Features + pricing”
  4. Ensure you have “Application Insights Basic” selected as the plan
  5. At the top, select “Daily cap”
  6. Enter in a daily limit value.

Some related reading before you go