How much of the Internet has Google catalogued? Where would I find this out? Perhaps, I would see a summary of the catalogues of each major search engine?
How fast is Google's cataloguing? I am just curious about all of these things...
How much Webspace do they have? How much do the cached pages take up?
Archived topic from Iceteks, old topic ID:699, old post ID:5623
Google's Catalogue?
- Red Squirrel
- Posts: 29209
- Joined: Wed Dec 18, 2002 12:14 am
- Location: Northern Ontario
- Contact:
Google's Catalogue?
As we all know, google has 3,083,324,652 pages in it's search. (it says at the bottom ) and I took a typical cached site (I typed "page" and got M$ ) and saved it, it was 32KB counting the google header, which may or may not be dynamicly processed, but let's say that it's not.
So if each page is cached and takes 32KB of space, that uses about 57.43139706552028656005859375 TB of disk space. (1TB = 1024GB)
That's a lot of space!
As for time, it takes quite a while, but it is said to be the fastest, for cataloging pages. It took about 2 months for the new IceTeks to show up. But once it knows of the site, it takes from days to weeks for updated content to show up.
For example, sites that are affiliated with us already show up when I type "IceTeks" and they finally cleaned up the old reliexec.dynu.net URLs (dropping us to 95 results for keyword IceTeks!).
It would be interesting to know how much of the internet is on it and how much is not though. Google probably has almost every site out there. Even "private" warez sites show up. Google spiders the web and searches through urls to find new sites. This is how page rank works as well, when a site links to another, it must mean that site is popular if others are linking to it, so it ranks higher.
There's lot of other algorthms it uses, most, they don't want no one to know. There's a lot of formulas and stuff as well. It's quite interesting how it works.
Archived topic from Iceteks, old topic ID:699, old post ID:5669
So if each page is cached and takes 32KB of space, that uses about 57.43139706552028656005859375 TB of disk space. (1TB = 1024GB)
That's a lot of space!
As for time, it takes quite a while, but it is said to be the fastest, for cataloging pages. It took about 2 months for the new IceTeks to show up. But once it knows of the site, it takes from days to weeks for updated content to show up.
For example, sites that are affiliated with us already show up when I type "IceTeks" and they finally cleaned up the old reliexec.dynu.net URLs (dropping us to 95 results for keyword IceTeks!).
It would be interesting to know how much of the internet is on it and how much is not though. Google probably has almost every site out there. Even "private" warez sites show up. Google spiders the web and searches through urls to find new sites. This is how page rank works as well, when a site links to another, it must mean that site is popular if others are linking to it, so it ranks higher.
There's lot of other algorthms it uses, most, they don't want no one to know. There's a lot of formulas and stuff as well. It's quite interesting how it works.
Archived topic from Iceteks, old topic ID:699, old post ID:5669
Honk if you love Jesus, text if you want to meet Him!
-
- Posts: 5140
- Joined: Fri Jan 10, 2003 1:14 am
Google's Catalogue?
Thanks for the info, Red.
Archived topic from Iceteks, old topic ID:699, old post ID:5690
Archived topic from Iceteks, old topic ID:699, old post ID:5690