On Tuesday, July 21, 2009, the WAA broadcasted its most recent webcast entitled Web Analytics Lab - Data Warehouse/BI Integration for Predictive Insights. This event was presented and created by our corporate sponsor Unica. If you missed the presentation, don't worry, you can view the archived recording. Michelle Rutan from National Instruments continues the discussion by answering additional questions not aired during the presentation.
Q: Do you apply any kind of grouping or thesaurus to the keyword side of your analysis? Have you tried information and/or entropy approach to the essence calculation? Do you subtract essence for keywords that were a bounce for a landing page?
A: As far as grouping basically we start with running SQL queries on the page/keyword index to identify all the keyword variations that we want to include. We are building a thesaurus of sort, but each requester has their own views which makes this interesting. As far as an entropy or weighting of different keywords, other than the initial weight it gets from the number of referrals we are not doing this. This project is a long journey changing all of the time as we fine tune things. I mentioned that we take out certain generic pages from our index and wouldn't search on high volume keywords just because the accuracy would be low. We are looking constantly at how to fine tune this.
Q: Keywords being used to drive traffic to the site depends on the pages being ranked well for those words. Is this kind of a self fulfilling process? How do you account for this in your SEO
efforts?
A: Addressed during the webinar. Please check the archived recording if you didn't get a chance to attend.
Q: Data Quality: with such a large numbers of data, with various sources, did you use a formal master data management (MDM) approach?
A: I would love to say we are that far. In the data life-cycle our essence is still really new. We are still formulating all of the uses and learning all of the caveats. When we finalize these uses and are ready to make them a standard part of our customer data we will be integrating into our MDM. To get as far as we are I consulted with database administrators to design the database, statisticians for how to do the calculations, and worked with our search guru on general concepts.
Q: Could you expand a bit more on how you calculate the essence score?
A: Pages are broken into keywords that have referred them to the page so all keywords index for a page total 1 for each page. So DAQ might be .79 for ni.com/daq. Then looking at a persons visit history we take all of the pages they visited and look for the keywords that we are interested in…and add the keyword index scores together for each person. So if they visited /daq and /labview and DAQ for /daq had a keyword index of .79 and DAQ for /labview had a keyword index of .02 then their DAQ essence is .81.
Q: What has been the biggest surprise "learning" in implementing this?
A: A couple things:
Keywords - I initially imagined just typing in what you were looking for, but given that we have such a long tail of keywords it takes more effort than I thought to identify the variations you are interested in. This is the first and manual step in our process to calculate each essence request.
Score breadth - Again I initially thought this would be a really clean clear score. The score range is significantly different for each query so we have been focusing a lot on what % of the results to use for each query. I never imagined we would get essences of .001 to .00001 but due to the number of keyword referrals it is a long long tail.
Q: Would unsupervised cluster analysis be helpful to find clusters? Is it fair to extrapolate from search referred visitors to all visitors?
A: We are not doing any unsupervised cluster analysis….it is all manually extracted, cleaned and through trial and error clustered (in sorts).
Q: How do you know their email IDs. Do they have to login through an authentication system?
A: There are many sections of our site that if you give us information we offer personalization or will save your cart or project. There are also many advanced technical resources that require an end user to give their information. This web site identifier is how we know about them and tie them to back-end contacts.

