Bill Slawski Interview

Bill Slawski is one of the lesser known SEO’s, he has a blog called SEO by the Sea. His writing can best be described as thorough and honest. Bill often digs deep into patents and other things that most would find too difficult to grasp. This is exactly what keeps me visiting his blog, here is my interview with The SEO who lives by the sea?

Hello Bill, do you really live by the sea?

Hi Aaron, Right now, I’m about 30 minutes from the Delaware Bay, and I spent most of last year, when I started the blog, working about 300 feet away from the Chesapeake Bay. My early, formative years were spent running around the New Jersey shore, fishing and crabbing and riding around the different bays and inlets. If I didn’t like the small college town I live in so much, I would probably move a little closer to the coast.

I noticed that you and I appeared in ODP at the same time, I can not figure out who submitted my blog, did someone also place your blog in the directory or did you submit yourself for inclusion?

When I saw the “Web Design and Development: Promotion: Weblogs” category, and noticed the blogs of a lot of friends there, I decided that it was a good place to be listed. At the time, they were also listing RSS feeds for sites, so I submitted my blog and tried to submit my RSS feed. There was a problem with the submission of the RSS feed, and the accompanying instructions suggested that I try to validate the feed at feedvalidator.org. The validator had no problems with the feed, so I started a thread at Resource Zone. While the feed issue wasn’t resolved, I seem to have ended up having been listed the same day as my submission. And, I’m pretty happy with the description for my blog that a DMOZ editor came up with.

Speaking of links and inclusion, Matt Cutts and Vanessa Fox from Google have made it clear that they have ways of determining if a link is natural or not, you have looked into Google’s patents and are a smart guy, what do you think?

A link is a link is a link. One of the main conceptual underpinnings of the World Wide Web is the ability to freely link to where ever you like. At the top of most pages of SEO by the Sea is a quote from Tim Berners-Lee, from his page on Links and Law: Myths. Here’s an expanded version of that statement, from Dr. Berners-Lee’s note:

There are some fundamental principles about links on which the Web is based. These are principles allow the world of distributed hypertext to work. Lawyers, users and technology and content providers must all agree to respect these principles which have been outlined.

It is difficult to emphasize how important these issues are for society. The first amendment to the Constitution of the United States, for example, addresses the right to speak. The right to make reference to something is inherent in that right. On the web, to make reference without making a link is possible but ineffective – like speaking but with a paper bag over your head.

It doesn’t say that a link is a vote. That’s an assumption made in the early days of Backrub. There is a benefit to using links and hypertext analysis to index the unstructured documents on the web. But, by doing so, by indexing and measuring based upon links, the meaning of a link has been transformed from a reference to a vote.

I see that transformation, from reference to popularity contest as unnatural. Of course, there is no right to be listed and indexed in Google, but the search engine’s popularity means that if you want to be found on the web, being in Google is important. And that means paying attention to their definition of links.

What assumptions do they make when deciding upon whether a link is natural or unnatural? How can they programmatically make a decision whether or not a link should have value as a vote? There may be some indicators that they can look at.

1. Look at the placement of links on a page, and which regions of the pages that they appear within. Advertisements may all be clustered together within the same block, or region, of a page. Identifying that block may give the search engine a chance to devalue ranking achieved from links within that section. A patent application from November, 2004, Methods and systems for determining a meaning of a document to match the document to content, shows that they have been working on the same type of Block Level Link Analysis that Microsoft came out with about around the same time.

2. The Historical Data patent application from Google looked at time related factors based upon links – such as the freshness of links, the frequency of growth of links to pages (”A typical, ‘legitimate’ document attracts back links slowly.”), and others.

3. It’s really not difficult to imagine that Google is exploring areas similar to what Yahoo described in their recent patent involving Trustrank – Link-based spam detection.

4. Ranking Documents Based on Anchor Phrases – Should a link to a page count more if the anchor text within the link seems to be related to the content on the page it links to? Should it count even more if it also appears to be related to the content of the page it is linked from? Those ideas appear in a recent patent application from Google, Multiple index based information retrieval system.

Those just really brush the surface, and there are other ideas, in other patent filings from Google that may or may not have been implemented, that approach this concept of unnatural linking in the context of indexing based in part upon links. We also seem to be seeing a movement towards considering different ways to take user behavior into account in rankings, from measuring the amount of time spent on a page, and the distance down that page that someone may scroll, to a viewing of logfiles from the search engine and ISPs that capture popularity of phrases and pages, to frequency of bookmarking of pages, and looking at pages that may have been visited by someone and are included in their history and browser cache files.

I have been spending a few weeks in Google Sitemaps Group reading people’s concerns and have noticed that 99% of the time people who complain of going up and down in the SERPS do not have a 301 redirect set to “www” or their domain without the “www” in the URL. Again, even Matt Cutts wishes that canonical issues and .htaccess were easier for admins, he also said that the Big Daddy Datacenter would improve upon canonicals but the word “improve” does not mean “fix” correct? What is it about Google’s algorithm that makes it choke while Yahoo and MSN seem to be getting it right?

I believe that Matt addressed some of the reasons for the differences in a blog post or two he made on Big Daddy, by saying that sometimes they prefer to show the page that they think is the best source page regardless of a redirect – often based upon which version of a URL that they think someone will be more likely to click upon from search results (see SEO Advice: Discussing 302 Redirects).

But, he also offered some good advice in this post: SEO Advice: URL Canonization. Many canonicalization issues are things that people can take into their own hands, so that they don’t have to rely upon the search engine to get it right for them.

I also notice that sites with lots of backlinks like Matt Cutts blog do not have canonical issues but us weenie bloggers with few links do, seen this and believe it is true?

It’s funny some of the pages you’ll see where this is an issue sometimes. One site with a lot of back links, The New York Times, without the “www” is a toolbar rank 7, and with the “www” is a 10. So, it’s not just mom and pop sites that aren’t taking these matters into their own hands. And it’s not just blogs that face these issues.

Notice something different in my blog, my sitemap is a PR5 but my index.php is only a PR3? Do you also believe that PR is only needed for the deeper crawl like I do? In other words, if you only have 20 pages you don’t need much PR but if you have 10,000 you need more?

I’ve seen a number of sites where deeper pages had higher pageranks than the main page. There are a few pages at SEO by the Sea, like the one on Google Acquisitions, that show more green in the toolbar. I’ve also seen sites with popular tools and applications on interior pages that outrank the index pages of those sites. Since pagerank is based upon specific URLs (and sometimes the pages they point to if canonical issues are resolved), rather than being a site wide measure, that should be expected.

One of the original working papers on which Google was formed was Efficient Crawling Through URL Ordering by Junghoo Cho, Hector Garcia-Molina, Lawrence Page. It describes the value of pagerank as one of the importance metrics that could be used in the decision as to what pages to crawl and send to an index. From what I’ve seen, it does have a fair amount of value in determining how deeply a site will be indexed.

The last question leads me to a new thought, is their such a thing as having too much Page Rank?

There might be. Pointing links to pages from a URL that has a pagerank 9 is kind of intoxicating and frightening at the same time. It’s highly recommended.

I liked your post on decay and dead links, I hate to keep referring to Matt Cutts but he also mentioned this is an important thing that webmasters should monitor, do you follow every word Matt Cutts speaks like I do? What do you think of Mr. Cutts?

Thanks. There have been a lot of subtle hints from the search engines and from academic papers that it’s not a bad idea to keep on top of dead links and redirects on the pages of a site, even before that patent application came out. For a number of large sites, going through pages clearing away deadlinks and redirects, and re-evaluating where working links point to can have a positive impact upon how well a page will rank. Add to that addressing canonical issues and making sure that pages are spiderable and distinctly unique from each other, and you have a good foundation to build upon with engaging and linkable content.

I had the chance to meet Matt really briefly at the last New York SES. I woke up on the last day of presentations, and checked my email to find a message from my friend Peter Da Vanzo, with just a smilely face, and a link to a post on Matt’s blog, which was calling me out. I caught up to Matt after he presented during the “Meet the Crawlers” session, and said hello. He surprised me a little by telling me that he liked my blog. I do have Matt’s blog on my RSS reader, along with another 540 or so blogs. Matt seems ideally suited for the job he has at Google – he’s friendly and personable, and does seem to care about what he does.

Have you found any good examples of search engine bias? Some have recently said that Yahoo is lowering the rankings of sites with Google Adsense on them, I do not believe this is the case but what happens if search engines start playing these games?

I mentioned a paper (pdf) by Alejandro M. Diaz in that post, and I think I agree with a premise that he follows, which is that it is a question of not really whether bias exists, but rather which sort of bias we prefer. How do you determine quality and relevance and popularity? Which pages should appear within results when there is duplicate content, and which pages shouldn’t? Should some topics not be indexed at all, and should the laws of the jurisdiction you are within influence what you show to people? Keep in mind too, that “relevance” is a subjective measure.

Some of the webmaster guidelines that the search engines publish are easy to follow, and some are aspirational, and more difficult to define or understand. I’m willing to not link to bad neighborhoods, but I’d reserve the right to link to one anyway. Especially, for instance, when someone posts a link to their site over at Cre8asite Forums, where I’m an administrator, and they want to know what they can do to improve their site. One of the main reasons why we have the crazy redirect at the forums that we do, is so that people can make those posts, and link to those pages, and the forum doesn’t get harmed by Google for trying to help someone.

My own experiences with Yahoo and adsense don’t seem to show a bias based upon the use of adsense, and I’d be happy to take a look at some sites where people might feel that this is a problem to see if there is something else that might be impacting their rankings in Yahoo.

Hey, can I get on your blogroll sometime? I know it is hard to sometimes understand what the hell I am talking about but gheesh, ain’t anyone going to give me a little link love one of these days? I am trying to be more serious here and here, but boy is it a freakin’ struggle. 😉

I think that can be arranged. 🙂

I’m trying to keep my blogroll from getting too long, but I like having a lot of blogs listed on it. One of the blogs that I remember visiting a lot when I first started blogging a few years ago was Camworld – as much for the list of blogs on the site, as the content of the posts there.

Name 5 blogs you visit, I am looking for lesser known ones that pack a punch like yours does. I am noticing a trend that SEO’s are all hanging out on just a few blogs and forums now, they are falling into a predictable pattern.

I do a lot of my visiting via RSS feed rather than directly. But, there’s a chance of missing something doing that all the time. Lesser known blogs, OK. Here are some that you might not have seen before:

  • Data Mining
  • Interdigital Strategies
  • Geeking with Greg
  • Screenwerk
  • Mauro Cherubinis Moleskine

What is it you currently do for business, SEO? Got any cool personal projects or plans in the coming years? Are you ready to come clean and show us your spammiest website or blog?

I’m doing some SEO consulting, and working on some local sites. I have been trying to launch a blog that focuses upon the community I live within. I see the power of forums and blogs to bring together people from around the globe, but they also create the opportunity to join together neighbors, and folks from a few blocks away who may share common interests but have never run into each other before. I’m thinking about enlisting the aid of some local college journalism students to help me with posts to that blog, once the school year starts back up.

I’ve had a fair amount of successes without crossing over the line of what some search engines might call spam. Though a recent call for papers for the Adversarial Information Retrieval on the Web (SIGIR 2006) in Seattle, to be held in August, made me wonder a little. Is reading academic papers and patent applications about search, and writing about them on a blog something that one could consider “reverse engineering of ranking algorithms.” Maybe it is. Is trying out some of the ideas presented in those documents, spamming the search engines, or is it just a good proactive business practice? Maybe it’s just being thorough.

Yes sir, thanks for stopping in Bill and keep up your scholarly non Web 2.0 insight, very refreshing indeed!

Thank you, Aaron.