Telemetry is everything, you cannot have enough data when users start asking you why the system is behaving differently than yesterday or performance is changing over time.
This is where Azure SQL stands out from On Premises. You can get so much more data and in an easy way to analyse.
However, you need to know where to find it because not everyting is setup automatically after you create a database. Some is, some is not.
This blog is about how to connect Azure SQL Analytics to your Azure Monitor.
The steps how to do this are described in this docs entry and I don’t want to repeat existing documentation. I will add some screenshots of some results for a 220 GB Microsoft Dynamics NAV database with 80 concurrent users.
Step 1 – Patience!
After you have activated Azure SQL Analytics it will not be visible for a while. It takes time in the background to be generated and but together by the Microsoft Minions who control your tenant in the background. Remember that these Minions have labour contracts and a rights to have a break every now and then.
Step 2 – Azure Monitor & More…
When the Minions are finished the data will show up in Azure Monitor. Search for it in your environment
And then, at least in my case you have to click on More…
This should show a link to your Azure SQL Analysis. In my case with two databases. DEV and PROD.
Step 3 – The Dashboard
The first dashboard you’ll see is something like this, except for the fact that this shows data 24 hours after activation and we had a busy friday with a performance incident. I’ll get back to that.
There are some interesting statistics here already visible like wait stats, deadlocks and autotuning. I’ll handle wait stats in this blog and maybe I’ll get back to deadlocks and autotuning later. There is a “good” reason the autotuning is red and I’ll look at that tomorrow (sunday) when nobody is working on the system.
Step 4 – Drill Down | Database Waits
If we drill down into the Database Waits we see more details on what types of waits we are dealing with here.
It does not help looking at these waits without narrowing down into specific moments in time when “things go wrong” because specific events relate to specific wait stats and some waits are just there whether you like it or not. We all know CXPPACKET because NAV/Business Central fires a lot of simple queries to the Azure SQL engine resulting in CPU time wasted. There is not much you can do about that. (As far as I know).
Step 5 – Houston we have a problem!
It’s 3:51pm on friday afternoon when my teammate sends me a message on Skype that users are complaining about performance. Since we just turned on this great feature I decide to use it and see what goes wrong.
We drill down again one more time and click on the graph showing the waits.
Note that this screenshot was created a day after the incident but it clearly illustrates and confirms that “someting” is off around the time my teammate sent me a message. The wait time on LCK_M_U goes through the roof! We have a blocker in our company.
Hey, this is KQL again!
Now we are in a familiar screen, because this is the same logging that Business Central Application Insights is using. Drilling down into the graph actually generated a KQL query.
Step 6 – What is causing my block?
To see what query is causing my block I have to go back to the Azure Dashboard and click on Blocks like this
From here we have two options. If I click on the database graph I get taken into the KQL editor and if I click on a specific block event I get a more UI like information screen. Let’s click on the latter.
Step 7 – Get the Query Hash
This is where it get’s nerdy. The next screen shows the blocking victim and the blocking process.
It also shows a Query Hash.
This is where I had to use google, but I learned that each “Ad-Hoc” query targetted against SQL Server gets logged internally with a Query Hash.
Since NAV/Business Central only used Ad-Hoc queries we have a lot of them and it’s important to understand how to read them.
What worries me a bit here is the Blocking Process’ Status which is sleeping. I have to investigate this more, but I interpret this as a process that went silent and the user is not actively doing something.
Step 8 – Get the Query
Using Google I (DuckDuckGo actually) also found a way to get these queries as long as they still exist in the cache of your SQL Server. Simply use this query
SELECT deqs.query_hash ,
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
WHERE deqs.query_hash = 0xB569219D4B1BE79E
This will give you both the query and the execution plan. You have to use SQL Server Management studio to execute this against your Azure SQL Database
Step 9 – Restart the service tier
Unfortunately for me this journey resulted in having to restart the service tier. We could not identify the exact person/user who had executed the query that was locking. Maybe we will be able to do that in a future incident since I’m learning very fast how to use this stuff and time is off the most essence when incidents like this happen on production environments.
Needless to say that the NAV Database Locks screen was not showing anything. I would have used that otherwise off course.