MySQL Case Statement and COUNT(*) question
I'm attempting to normalize OS names for a report that provides totals of each OS per location. I can use IPs to set location names with a CASE statement, that works fine, but when I do the same sort of maneuver, use a CASE statement to normalize reported OS names, the query still returns multiple grouped counts, even though the OS_NAME is now the same. For example, we have MS Windows 7 Professional reporting in from various offices as 'Microsoft Windows 7 专业版', 'Microsoft Windows 7 ПрофеÑÑÐ¸Ð¾Ð½Ð°Ð»ÑŒÐ½Ð°Ñ x64', 'Microsoft Windows 7 Professional x64', I'd like a report that normalizes all of these to 'Microsoft Windows 7 Professional'. The CASE statement does the normalizing fine, but the count is still coming in as though there are three separate OS names; so rather than a result like so:
Microsoft Windows 7 Professional :: 6.1.7601 :: Office a :: 50 Microsoft Windows 7 Professional :: 6.1.7601 :: Office b :: 20 Microsoft Windows 7 Professional :: 6.1.7601 :: Office c :: 30
I'd get a result like this:
Microsoft Windows 7 Professional :: 6.1.7601 :: Office a :: 12
Microsoft Windows 7 Professional :: 6.1.7601 :: Office a :: 8
Microsoft Windows 7 Professional :: 6.1.7601 :: Office a :: 15
Microsoft Windows 7 Professional :: 6.1.7601 :: Office a :: 5
Microsoft Windows 7 Professional :: 6.1.7601 :: Office a :: 10
Microsoft Windows 7 Professional :: 6.1.7601 :: Office b :: 16
Microsoft Windows 7 Professional :: 6.1.7601 :: Office b :: 4
Microsoft Windows 7 Professional :: 6.1.7601 :: Office c :: 18
Microsoft Windows 7 Professional :: 6.1.7601 :: Office c :: 12
Microsoft Windows 7 Professional :: 6.1.7601 :: Office c :: 30
If I removed the normalizing OS_NAME CASE statement, the results are the same count but with the various names...
Here is the SQL I have so far:
SELECT CASE WHEN OS_NAME = 'Microsoft Windows 7 专业版' THEN 'Microsoft Windows 7 Professional' WHEN OS_NAME = 'Microsoft Windows 7 ПрофеÑÑÐ¸Ð¾Ð½Ð°Ð»ÑŒÐ½Ð°Ñ x64' THEN 'Microsoft Windows 7 Professional' WHEN OS_NAME = 'Microsoft Windows 7 МакÑÐ¸Ð¼Ð°Ð»ÑŒÐ½Ð°Ñ x64' THEN 'Microsoft Windows 7 Ultimate' WHEN OS_NAME = 'Microsoft Windows 7 Professional x64' THEN 'Microsoft Windows 7 Professional' WHEN OS_NAME = 'Microsoft Windows 7 Ultimate x64' THEN 'Microsoft Windows 7 Ultimate' WHEN OS_NAME = 'Microsoft Windows 7 Enterprise x64' THEN 'Microsoft Windows 7 Enterprise' WHEN OS_NAME = ' (ppc)' THEN 'Mac OS X Mountain Lion' WHEN OS_NAME = ' (x86)' THEN 'Mac OS X Mountain Lion' WHEN OS_NAME LIKE 'Mac OS X 10.7.%' THEN 'Mac OS X Lion' WHEN OS_NAME LIKE 'Mac OS X 10.6.%' THEN 'Mac OS X Snow Leopard' ELSE OS_NAME END AS OS_NAME, OS_VERSION,
CASE WHEN IP LIKE '10.1.%' THEN 'office a' WHEN IP LIKE '10.2.%' THEN 'office b' WHEN IP LIKE '10.3.%' THEN 'office c' ELSE IP END AS IP_Location, COUNT(*) FROM MACHINE GROUP BY IP_Location, OS_NAME
Thanks!
Answers (3)
Another possibility would be to use something other than OS_NAME for the CASE statement alias. Since OS_NAME is already a column in the MACHINE table, it appears to be grouping by those entries and not your alias (also named OS_NAME). Again, I tested with my English/French site and this approach also worked. So, for example:
SELECT
CASE
WHEN OS_NAME = 'Microsoft Windows 7 专业版' THEN 'Microsoft Windows 7 Professional'
WHEN OS_NAME = 'Microsoft Windows 7 ПрофеÑÂÑÂиональнаÑ x64' THEN 'Microsoft Windows 7 Professional'
WHEN OS_NAME = 'Microsoft Windows 7 МакÑÂимальнаÑ x64' THEN 'Microsoft Windows 7 Ultimate'
WHEN OS_NAME = 'Microsoft Windows 7 Professional x64' THEN 'Microsoft Windows 7 Professional'
WHEN OS_NAME = 'Microsoft Windows 7 Ultimate x64' THEN 'Microsoft Windows 7 Ultimate'
WHEN OS_NAME = 'Microsoft Windows 7 Enterprise x64' THEN 'Microsoft Windows 7 Enterprise'
WHEN OS_NAME = ' (ppc)' THEN 'Mac OS X Mountain Lion'
WHEN OS_NAME = ' (x86)' THEN 'Mac OS X Mountain Lion'
WHEN OS_NAME LIKE 'Mac OS X 10.7.%' THEN 'Mac OS X Lion'
WHEN OS_NAME LIKE 'Mac OS X 10.6.%' THEN 'Mac OS X Snow Leopard'
ELSE 'Microsoft'
END AS Operating_System,
OS_VERSION,
CASE
WHEN IP LIKE '10.1.%' THEN 'office a'
WHEN IP LIKE '10.2.%' THEN 'office b'
WHEN IP LIKE '10.3.%' THEN 'office c'
ELSE IP
END AS IP_Location,
COUNT(*)
FROM MACHINE
GROUP BY IP_Location, Operating_System
Hope that helps!
John
Comments:
-
Thanks John! - kpm8 12 years ago
One other idea unrelated to your question - you could condense your OS CASE statement by using RLIKE (REGEXP) to only list the unique characters (your foreign characters didn't come across correctly, so I'm including the entire string, but you could truncate further). Like this:
SELECT
CASE
WHEN OS_NAME RLIKE '7 专业版|7 ПрофеÑÂÑÂиональнаÑÂ|7 Pro' THEN 'Microsoft Windows 7 Professional'
WHEN OS_NAME RLIKE '7 МакÑÂимальнаÑ|7 Ult' THEN 'Microsoft Windows 7 Ultimate'
WHEN OS_NAME RLIKE '7 Ent' THEN 'Microsoft Windows 7 Enterprise'
WHEN OS_NAME RLIKE ' (ppc)| (x86)' THEN 'Mac OS X Mountain Lion'
WHEN OS_NAME LIKE 'Mac OS X 10.7.%' THEN 'Mac OS X Lion'
WHEN OS_NAME LIKE 'Mac OS X 10.6.%' THEN 'Mac OS X Snow Leopard'
ELSE 'Microsoft'
END AS Operating_System,
OS_VERSION,
CASE
WHEN IP LIKE '10.1.%' THEN 'office a'
WHEN IP LIKE '10.2.%' THEN 'office b'
WHEN IP LIKE '10.3.%' THEN 'office c'
ELSE IP
END AS IP_Location,
COUNT(*)
FROM MACHINE
GROUP BY IP_Location, Operating_System
John
Comments:
-
...and thanks again! - kpm8 12 years ago
Here's hopefully a simple adjustment - assuming the version numbers are the same for (what I'm assuming to be) the different language versions of the same OS, try changing the last line to:
GROUP BY IP_Location, OS_VERSION
I tested this with my site that has English and French name differences (WinXP Professional, WinXP Professionnel) and it worked properly, but they are all standardized on the same version.
If you are dealing with different versions, we'll need to try something different.
John