Search Engine Optimisation Tools, Blogs
News & Comment  - SeoTools.co.nz
   Home arrow SEOBlogs arrow Google Webmasters Blog
Google Webmasters Blog
Discover your links PDF Print E-mail
Thursday, 22 February 2007
You asked, and we listened: We've extended our support for querying links to your site to much beyond the link: operator you might have used in the past. Now you can use webmaster tools to view a much larger sample of links to pages on your site that we found on the web. Unlike the link: operator, this data is much more comprehensive and can be classified, filtered, and downloaded. All you need to do is verify site ownership to see this information.



To make this data even more useful, we have divided the world of links into two types: external and internal. Let's understand what kind of links fall into which bucket.



What are external links?

External links to your site are the links that reside on pages that do not belong to your domain. For example, if you are viewing links for http://www.google.com/, all the links that do not originate from pages on any subdomain of google.com would appear as external links to your site.



What are internal links?



Internal links to your site are the links that reside on pages that belong to your domain. For example, if you are viewing links for http://www.google.com/, all the links that originate from pages on any subdomain of google.com, such as http://www.google.com/ or mobile.google.com, would appear as internal links to your site.



Viewing links to a page on your site



You can view the links to your site by selecting a verified site in your webmaster tools account and clicking on the new Links tab at the top. Once there, you will see the two options on the left: external links and internal links, with the external links view selected. You will also see a table that lists pages on your site, as shown below. The first column of the table lists pages of your site with links to them, and the second column shows the number of the external links to that page that we have available to show you. (Note that this may not be 100% of the external links to this page.)





This table also provides the total number of external links to your site that we have available to show you.

When in this summary view, click the linked number and go to the detailed list of links to that page.

When in the detailed view, you'll see the list of all the pages that link to specific page on your site, and the time we last crawled that link. Since you are on the External Links tab on the left, this list is the external pages that point to the page.





Finding links to a specific page on your site

To find links to a specific page on your site, you first need to find that specific page in the summary view. You can do this by navigating through the table, or if you want to find that page quickly, you can use the handy Find a page link at the top of the table. Just fill in the URL and click See details. For example, if the page you are looking for has the URL http://www.google.com/?main, you can enter ??main? in the Find a page form. This will take you directly to the detailed view of the links to http://www.google.com/?main.





Viewing internal links



To view internal links to pages on your site, click on the Internal Links tab on the left side bar in the view. This takes you to a summary table that, just like external links view, displays information about pages on your site with internal links to them.



However, this view also provides you with a way to filter the data further: to see links from any of the subdomain on the domain, or links from just the specific subdomain you are currently viewing. For example, if you are currently viewing the internal links to http://www.google.com/, you can either see links from all the subdomains, such as links from http://mobile.google.com/ and http://www.google.com, or you can see links only from other pages on http://www.google.com.




Downloading links data

There are three different ways to download links data about your site. The first: download the current view of the table you see, which lets you navigate to any summary or details table, and download the data in the current view. Second, and probably the most useful data, is the list all external links to your site. This allows you to download a list of all the links that point to your site, along with the information about the page they point to and the last time we crawled that link. Thirdly, we provide a similar download for all internal links to your site.





We do limit the amount of data you can download for each type of link (for instance, you can currently download up to one million external links). Google knows about more links than the total we show, but the overall fraction of links we show is much, much larger than the link: command currently offers. Why not visit us at Webmaster Central and explore the links for your site?
 
A quick word about Googlebombs PDF Print E-mail
Thursday, 22 February 2007
Co-written with Ryan Moulton and Kendra Carattini



We wanted to give a quick update about "Googlebombs." By improving our analysis of the link structure of the web, Google has begun minimizing the impact of many Googlebombs. Now we will typically return commentary, discussions, and articles about the Googlebombs instead. The actual scale of this change is pretty small (there are under a hundred well-known Googlebombs), but if you'd like to get more details about this topic, read on.



First off, let's back up and give some background. Unless you read all about search engines all day, you might wonder "What is a Googlebomb?" Technically, a "Googlebomb" (sometimes called a "linkbomb" since they're not specific to Google) refers to a prank where people attempt to cause someone else's site to rank for an obscure or meaningless query. Googlebombs very rarely happen for common queries, because the lack of any relevant results for that phrase is part of why a Googlebomb can work. One of the earliest Googlebombs was for the phrase "talentless hack," for example.



People have asked about how we feel about Googlebombs, and we have talked about them in the past. Because these pranks are normally for phrases that are well off the beaten path, they haven't been a very high priority for us. But over time, we've seen more people assume that they are Google's opinion, or that Google has hand-coded the results for these Googlebombed queries. That's not true, and it seemed like it was worth trying to correct that misperception. So a few of us who work here got together and came up with an algorithm that minimizes the impact of many Googlebombs.



The next natural question to ask is "Why doesn't Google just edit these search results by hand?" To answer that, you need to know a little bit about how Google works. When we're faced with a bad search result or a relevance problem, our first instinct is to look for an automatic way to solve the problem instead of trying to fix a particular search by hand. Algorithms are great because they scale well: computers can process lots of data very fast, and robust algorithms often work well in many different languages. That's what we did in this case, and the extra effort to find a good algorithm helps detect Googlebombs in many different languages. We wouldn't claim that this change handles every prank that someone has attempted. But if you are aware of other potential Googlebombs, we are happy to hear feedback in our Google Web Search Help Group.



Again, the impact of this new algorithm is very limited in scope and impact, but we hope that the affected queries are more relevant for searchers.
 
About badware warnings PDF Print E-mail
Thursday, 22 February 2007
Some of you have asked about the warnings we show searchers when they click on search results leading to sites that distribute malicious software. As a webmaster, you may be concerned about the possibility of your site being flagged. We want to assure you that we take your concerns very seriously, and that we are very careful to avoid flagging sites incorrectly. It's our goal to avoid sending people to sites that would compromise their computers. These exploits often result in real people losing real money. Compromised bank accounts and stolen credit card numbers are just the tip of this identity theft iceberg.



If your site has been flagged for badware, we let you know this in webmaster tools. Often, we find that webmasters aren't aware that their sites have been compromised, and this warning in search results is a surprise. Fixing a compromised site can be quite hard. Simply cleaning up the HTML files is seldom sufficient. If a rootkit has been installed, for instance, nothing short of wiping the machine and starting over may work. Even then, if the underlying security hole isn't also fixed, they may be compromised again within minutes.



We are looking at ways to provide additional information to webmasters whose sites have been flagged, while balancing our need to keep malicious site owners from hiding from Google's badware protection. We aim to be responsive to any misidentified sites too. If your site has been flagged, you'll see information on the appeals process in webmaster tools. If you can't find anything malicious on your site and believe it was misidentified, simply send an email to This e-mail address is being protected from spam bots, you need JavaScript enabled to view it for evaluation. If you'd like to discuss this with us or have ideas for how we can better communicate with you about it, please post in our webmaster discussion forum.
 
Better understanding of your site PDF Print E-mail
Monday, 25 December 2006
SES Chicago was wonderful. Meeting so many of you made the trip absolutely perfect. It was as special as if (Chicago local) Oprah had joined us!

While hanging out at the Google booth, I was often asked about how to take advantage of our webmaster tools. For example, here's one tip on Common Words.

Common Words: Our prioritized listing of your site's content
The common words feature lists in order of priority (from highest to lowest) the prevalent words we've found in your site, and in links to your site. (This information isn't available for subdirectories or subdomains.) Here are the steps to leveraging common words:

1. Determine your website's key concepts. If it offers getaways to a cattle ranch in Wyoming, the key concepts may be "cattle ranch," "horseback riding," and "Wyoming."

2. Verify that Google detected the same phrases you believe are of high importance. Login to webmaster tools, select your verified site, and choose Page analysis from the Statistics tab. Here, under "Common words in your site's content," we list the phrases detected from your site's content in order of prevalence. Do the common words lack any concepts you believe are important? Are they listing phrases that have little direct relevance to your site?

2a. If you're missing important phrases, you should first review your content. Do you have solid, textual information that explains and relates to the key concepts of your site? If in the cattle-ranch example, "horseback riding" was absent from common words, you may then want to review the "activities" page of the site. Does it include mostly images, or only list a schedule of riding lessons, rather than conceptually relevant information?

It may sound obvious, but if you want to rank for a certain set of keywords, but we don't even see those keyword phrases on your website, then ranking for those phrases will be difficult.

2b. When you see general, non-illustrative common words that don't relate helpfully to your site's content (e.g. a top listing of "driving directions" or "contact us"), then it may be beneficial to increase the ratio of relevant content on your site. (Although don't be too worried if you see a few of these common words, as long as you also see words that are relevant to your main topics.) In the cattle ranch example, you would give visitors "driving directions" and "contact us" information. However, if these general, non-illustrative terms surface as the highest-rated common words, or the entire list of common words is only these types of terms, then Google (and likely other search engines) could not find enough "meaty" content.

2c. If you find that many of the common words still don't relate to your site, check out our blog post on unexpected common words.

3. Here are a few of our favorite posts on improving your site's content:
Target visitors or search engines?

Improving your site's indexing and ranking

NEW! SES Chicago - Using Images

4. Should you decide to update your content, please keep in mind that we will need to recrawl your site in order to recognize changes, and that this may take time. Of course, you can notify us of modifications by submitting a Sitemap.

Happy holidays from all of us on the Webmaster Central team!

SES Chicago: Googlers Trevor Foucher, Adam Lasnik and Jonathan Simon
 
Deftly dealing with duplicate content PDF Print E-mail
Monday, 25 December 2006
At the recent Search Engine Strategies conference in freezing Chicago, many of us Googlers were asked questions about duplicate content. We recognize that there are many nuances and a bit of confusion on the topic, so we'd like to help set the record straight.



What is duplicate content?

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it's unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and -- worse yet -- linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries.



What isn't duplicate content?

Though we do offer a handy translation utility, our algorithms won't view the same article written in English and Spanish as duplicate content. Similarly, you shouldn't worry about occasional snippets (quotes and otherwise) being flagged as duplicate content.



Why does Google care about duplicate content?

Our users typically want to see a diverse cross-section of unique content when they do searches. In contrast, they're understandably annoyed when they see substantially the same content within a set of search results. Also, webmasters become sad when we show a complex URL (example.com/contentredir?value=shorty-george?=en) instead of the pretty URL they prefer (example.com/en/shorty-george.htm).



What does Google do about it?

During our crawling and when serving search results, we try hard to index and show pages with distinct information. This filtering means, for instance, that if your site has articles in "regular" and "printer" versions and neither set is blocked in robots.txt or via a noindex meta tag, we'll choose one version to list. In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we'll also make appropriate adjustments in the indexing and ranking of the sites involved. However, we prefer to focus on filtering rather than ranking adjustments ... so in the vast majority of cases, the worst thing that'll befall webmasters is to see the "less desired" version of a page shown in our index.



How can Webmasters proactively address duplicate content issues?

  • Block appropriately: Rather than letting our algorithms determine the "best" version of a document, you may wish to help guide us to your preferred version. For instance, if you don't want us to index the printer versions of your site's articles, disallow those directories or make use of regular expressions in your robots.txt file.
  • Use 301s: If you have restructured your site, use 301 redirects ("RedirectPermanent") in your .htaccess file to smartly redirect users, the Googlebot, and other spiders.

  • Be consistent: Endeavor to keep your internal linking consistent; don't link to /page/ and /page and /page/index.htm.
  • Use TLDs: To help us serve the most appropriate version of a document, use top level domains whenever possible to handle country-specific content. We're more likely to know that .de indicates Germany-focused content, for instance, than /de or de.example.com.
  • Syndicate carefully: If you syndicate your content on other sites, make sure they include a link back to the original article on each syndicated article. Even with that, note that we'll always show the (unblocked) version we think is most appropriate for users in each given search, which may or may not be the version you'd prefer.
  • Use the preferred domain feature of webmaster tools: If other sites link to yours using both the www and non-www version of your URLs, you can let us know which way you prefer your site to be indexed.

  • Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details.
  • Avoid publishing stubs: Users don't like seeing "empty" pages, so avoid placeholders where possible. This means not publishing (or at least blocking) pages with zero reviews, no real estate listings, etc., so users (and bots) aren't subjected to a zillion instances of "Below you'll find a superb list of all the great rental opportunities in [insert cityname]..." with no actual listings.
  • Understand your CMS: Make sure you're familiar with how content is displayed on your Web site, particularly if it includes a blog, a forum, or related system that often shows the same content in multiple formats.
  • Don't worry be happy: Don't fret too much about sites that scrape (misappropriate and republish) your content. Though annoying, it's highly unlikely that such sites can negatively impact your site's presence in Google. If you do spot a case that's particularly frustrating, you are welcome to file a DMCA request to claim ownership of the content and have us deal with the rogue site.


In short, a general awareness of duplicate content issues and a few minutes of thoughtful preventative maintenance should help you to help us provide users with unique and relevant content.
 
<< Start < Prev 1 2 3 4 5 6 7 8 Next > End >>

Results 55 - 63 of 71