As we all know, google has 3,083,324,652 pages in it's search. (it says at the bottom
) and I took a typical cached site (I typed "page" and got M$
) and saved it, it was 32KB counting the google header, which may or may not be dynamicly processed, but let's say that it's not.
So if each page is cached and takes 32KB of space, that uses about 57.43139706552028656005859375 TB of disk space. (1TB = 1024GB)
That's a lot of space!
As for time, it takes quite a while, but it is said to be the fastest, for cataloging pages. It took about 2 months for the new IceTeks to show up. But once it knows of the site, it takes from days to weeks for updated content to show up.
For example, sites that are affiliated with us already show up when I type "IceTeks" and they finally cleaned up the old reliexec.dynu.net URLs (dropping us to 95 results for keyword IceTeks!).
It would be interesting to know how much of the internet is on it and how much is not though. Google probably has almost every site out there. Even "private" warez sites show up. Google spiders the web and searches through urls to find new sites. This is how page rank works as well, when a site links to another, it must mean that site is popular if others are linking to it, so it ranks higher.
There's lot of other algorthms it uses, most, they don't want no one to know. There's a lot of formulas and stuff as well. It's quite interesting how it works.
Archived topic from Iceteks, old topic ID:699, old post ID:5669