inteliture.com
Google

Thursday, July 26, 2007

Wednesday, July 18, 2007

Hacking With Javascript

Intro Javascript is used as a client side scripting language, meaning that your browser is what interprets it. It is used on webpages and is secure (for the most part) since it cannot touch any files on your hard drive (besides cookies). It also cannot read/write any files on the server. Knowing javascript can help you in both creating dynamic webpages, meaning webpages that change, and hacking. First I will start with the basic javascript syntax, then I will list a few sites where you can learn more, and then I will list a few ways you can use javascript to hack. There are a few benifits of knowing javascript. For starters, it is really the only (fully supported) language that you can use on a website making it a very popular language on the net. It is very easy to learn and shares common syntax with many other languages. And it is completely open source, if you find something you like done in javascript you can simply view the source of the page and figure out how it's done. The reason I first got into javascript was because back before I got into hacking I wanted to make my own webpage. I learned HTML very quickly and saw Dynamic HTML (DHTML) mentioned in a few tutorials. I then ventured into the land of javascript making simple scripts and usful features to my site. It was only after I was pretty good with javascript and got into hacking that I slowly saw it's potential to be used milisously. Many javascript techniques are pretty simple and involve tricking the user into doing something. Almost pure social engineering with a bit of help from javascript. After using simple javascript tricks to fake login pages for webbased email I thought about other ways javascript could be used to aid my hacking, I studied it on and off for around a year. Some of these techniques are used by millions of people, some I came up with an are purely theorectical. I hope you will realize how much javascript can aid a hacker. 1. Basic Syntax 2. Places To Learn More Advanced Javascript 3. Banner Busting & Killing Frames 4. Getting Past Scripts That Filter Javascript 5. Stealing Cookies 6. Stealing Forms 7. Gaining Info On Users 8. Stories Of Javascript Hacks 9. Conclusion
1. Basic Syntax The basics of javascript are fairly easy if you have programmed anything before, although javascript is not java, if you know java you should have no problems learning it. Same for any other programming language, as most share the same basics as javascript uses. This tutorial might not be for the complete newbie. I would like to be able to do a tutorial like that, but I don't have the time or patience to write one. To begin if you don't know html you must learn it first! Javascript starts with the tag
Anything between these two tags is interpreted as javascript by the browser. Remember this! Cause a few hacks use the fact that if you use
.. either way is fine. I would also like to mention that many scripts have right before the tag, this is because they would like to make it compatible with other browsers that do not support javascript. Again, either way is fine, but I will be using the because that is how I learned to script and I got used to putting it in. Javascript uses the same basic elements as other programming languages.. Such as variables, flow control, and functions. The only difference is that javascript is a lot more simplified, so anyone with some programming experience can learn javascript very quickly. The hardest part of scripting javascript is to get it to work in all browsers. I will now go over the basics of variables: to define a variable as a number you do: var name = 1; to define a variable as a string you do: var name = 'value'; A variable is basically the same in all programming languages. I might also point out that javascript does not support pointers. No structs to make your own variables either. Only variable types are defined by 'var'. This can be a hard thing to understand at first, but javascript is much like C++ in how it handles variables and strings. A string is a group of characters, like: 'word', which is a string. When you see something like document.write(something); it will try to print whatever is in the variable something. If you do document.write('something'); or document.write("something"); it will print the string 'something'. Now that you got the variables down lets see how to use arithmetic operators. This will make 2 variables and add them together to make a new word:
first we define the variable 'name' as b0iler, then I define 'adjective' as owns. Then the document.write() function writes it to the page as 'name'+'adjective' or b0ilerowns. If we wanted a space we could have did document.write(name+' '+adjective); Escaping characters - This is an important concept in programming, and extremely important in secure programming for other languages.. javascript doesn't really need to worry about secure programming practice since there is nothing that can be gained on the server from exploitting javascript. So what is "escaping"? It is putting a \ in front of certain characters, such as ' and ". If we wanted to print out: b0iler's website We couldn't do: document.write('b0iler's website'); because the browser would read b0iler and see the ' then stop the string. We need to add a \ before the ' so that the browser knows to print ' and not interpret it as the ending ' of the string. So here is how we could print it: document.write('b0iler\'s website'); There are two types of comments in javascript. // which only lasts till the end of the line, and /* which goes as many as far as possible until it reaches */ I'll demonstrate:
The only thing that script will do is print "this will show up". Everything else is in comments which are not rendered as javascript by the browser. Flow Control is basically changing what the program does depending on whether something is true or not. Again, if you have had any previous programming experience this is old stuff. You can do this a few different ways different ways. The simplest is the if-then-else statements. Here is an example:
Lets break this down step by step. First I create the variable 'name' and define it as b0iler. Then I check if 'name' is equal to "b0iler" if it is then I write 'b0iler is a really cool guy!', else (if name isn't equal to b0iler) it prints 'b0iler can not define variables worth a hoot!'. You will notice that I put { and } around the actions after the if and else statements. You do this so that javascript knows how much to do when it is true. When I say true think of it this way: if (name == 'b0iler') as if the variable name is equal to 'b0iler' if the statement name == 'b0iler' is false (name does not equal 'b0iler') then whatever is in the {} (curely brackets) is skipped. We now run into relational and equality operators. The relational operators are as follows: > - Greater than, if the left is greater than the right the statement is true. < - Less than, if the left is lesser than the right the statement is true. >= - Greater than or equal to. If the left is greater than or equal to the right it is true. <= - Less than or equal to. If the left is lesser than or equal to the right it is true. So lets run through a quick example of this, in this example the variable 'lower' is set to 1 and the variable 'higher' is set to 10. If lower is less than higher then we add 10 to lower, otherwise we messed up assigning the variables (or with the if statement). and now the equality operators, you have already seen one of them in an example: if (name == 'b0iler') the equality operators are == for "equal to" and != for "not equal to". Make sure you always put two equal signs (==) because if you put only one (=) then it will not check for equality. This is a common mistake that is often overlooked. Now we will get into loops, loops continue the statements in between the curly brackets {} until they are no longer true. There are 2 main types of loops I will cover: while and for loops. Here is an example of a while loop: First 'name' is set to b0iler, then 'namenumber' is set to 1. Here is where we hit the loop, it is a while loop. What happens is while namenumber is less than 5 it does the following 3 commands inside the brackets {}: name = name + name; document.write(name); namenumber = namenumber + 1; The first statement doubles the length of 'name' by adding itself on to itself. The second statement prints 'name'. And the third statement increases 'namenumber' by 1. So since 'namenumber' goes up 1 each time through the loop, the loop will go through 4 times. After the 4th time 'namenumber' will be 5, so the statement namenumber < type="text/javascript"> text now upload that page and view it in a browser. View the source of the page and find where the site added it's banner html. If it came after the and before the then you need to see if it came before or after the which is in between those. If it is before, then it is the tag that is the key tag which the site adds it's banner after. If it is under the than you know it puts it after the tag. So now that we know where the site adds it's banner html what do we do to stop it? We try to make a "fake" tag and hopefully the site adds it's banner html to the fake one instead. Then we use javascript to print the real one. We can do a few things, here is the list:
the basic to stop it. -this keytag is the real one.


If all worked out you should have a page with no annoying popups or flashing banners. If not I guess you will have to play around a little and figure it out for yourself. Since every free host uses different keytags and methods of adding it's banner I can't go over them all one by one. I decided to go over a real example of a free site that add popup ads or banners to every page you have. I'll be using angelfire since I hate them and because that's the one I picked out of my lucky hat. Just remember that sites can change the way they add banners anytime they feel like, so this method might not work the same way as I am showing. Doing this also breaks the TOS (Terms Of Service) with your host, so you might get your site taken down without any warning. Always have complete backups of your site on your harddrive, espechially if you have a hacking site or are breaking the TOS. angelfire ------------------------ begin ------------------------

rest of test page

------------------------ end ------------------------ as you can see angelfire puts their ad right after the tag. All they are using to protect us from getting rid of the ad is a so.. we can put something like this to defeat the ad:
So angelfire's server will add the javascript for thier advertisment after the first they see. That will put the ad after
. This means that user's browsers will think that and the angelfires ad is css (cascading style sheet).. which is the

Monday, July 16, 2007

Koool Computer Trick..check this out....

Administrator like account


1-Introduction

This article introduce very simple way to get Administrator like account and do the job and after finish recover your way, after that Get Admin Password later in your home by Cracking, After get the Admin Password Create a hidden user account and do all your jobs free, and Explain how to make a USB Storage Device Bootable corresponding to any system boot, and how to bypass Mother Board password by Default Passwords, and how to extract it if you are in the system

2-To Hackers / Security Systems Engineers

First All must know that both Hackers / Security Systems Engineers Are 2 faces to the same coin Any way, I try this on Windows XP SP2 I want all to try it on Windows Server 2003, Windows Vista Any Windows NT and POST a Message to make all know what versions exactly this idea can apply for

3-Close Look to hole

Microsoft stores all Security Information in many files but the main file is the SAM file (
Security Accounts Manager)! this file contain critical information about users account you can explore the folder
$windir$\system32\config
You will find all things and may discover some thing new, but what amazing here is that the file is available, so we can apply our idea
shot1 You will Not be able To copy them Under XP

4-Dose Microsoft Know and Why!?

Yes Microsoft Know all things, and done on purpose why? I always for many years ask my self why Microsoft doesn’t do real security on their systems from the CD setup to all security aspects In the system, I found(my opinion may wrong)that they need to achieve 2 strategic things

1-They need their software spread and all depend on it and in one day when they feel that they are the One The security will done and all money will go to One Pocket

2-They Forced/Like to Make Some Organizations Hack other systems

Proof:
They can make this File SAM Unavailable by storing the information in FAT, FAT32, NTFS Areas (Sectors reserved by The Operating SYSTEM to Store the Addresses of the files on the HardDisk File Allocation Table) So that it is hard to extract. But they don't!!!!!

5-Understand the Idea

The Idea is simple I will explain it manually and it can then be programmed it is so easy here is the idea

The SAM file is available and the SAM file contain a Security Information, so I created a Free Windows XP SP2 Logon account (Administrator Account without password) that means when windows Lunch it Will enter directly to the system without asking about any password And windows will store this Account in The SAM file on My PC So the SAM file on My PC contain an Account will Make you enter Directly to the Windows, so I will take My SAM File and Replace (by renaming, we will need the original file to recover our way) It with the other SAM File in The Other System or Machine So When you restart It will make you enter directly to the Windows With Administrator Like Account ,do what you need and then back all things to the previous state. All These Steps will be under other system bootable DOS, Knoppiex, Windows Live CD, Because Windows XP will not make u able to copy the Files

6-Get Admin Like Account (The Simple Way)


1- Download My 2 SAM files I Include them in Downloads
2- Go to the target Machine , and try to Access it and Boot from any device CD-ROM, Floppy.
3- After Get Access to the Boot Command prompt c:> or Boot Live OS CD, Go to the windows folder $windir$\system32\config And Copy the SAM File and System File (we will need it later) To other folder, Then go to $windir$\repair copy SAM file
And then Rename the 2 SAM Files to SAM1 in their original places
4- Copy My SAM/config File and Paste it in the windows folder $windir$\system32\config Copy My SAM/Repair File and Paste it in the windows folder $windir$\repair (may this step not required)
5- Reboot and Make windows enter Normally
6- Yeah, No You are in The System
7- Copy the files in step 3 to Floppy Disk or Flash Stick Or Send it to your mail via Internet
8- After finish repeat step 2 and delete My SAM files and Rename Both SAM1 to SAM
9- Reboot , Congratulation you recover your way

7-Crack the SAM-Know the real Admin Password and Apply Hint 8

There is many ways I will introduce 2 ways and explain 1 After you get the SAM File and System File there are Programs That extract the Accounts and their passwords, depending on the idea of cracking the HASH (the HASH is one way encryption method) so that The program will generate random passwords and convert them to HASH and then compare it with the HASHES in the SAM File , so it may take a long time but for fast you will pay more money for ready made HASHES with their user names and passwords the 2 program are

1-L0phtcrack v4.0 (LC4 alternate name) the most famous on the NET
2-SAMInside http://www.insidepro.com/I include on the Downloads

I will explain fast SAMInside

shot1
This is the main window press Ctrl+O or by mouse click Import SAM and SYSTEM

shot1
Window will open to import the 2 files and the program will start to crack the Accounts and get them, and then display users names and their passwords

Any other tool will do the job try all and select your best I Explain here SAMInside because he give me results with 6 character only password and get it FAST

8-Creat a Hidden User Accountn

Windows NT / Windows 2000 and Windows XP has a security setting to hide accounts from the Logon Screen/Control panel users accounts
shot1 Press
Ctrl+Alt+Delet
Give you another Access Dialog


Steps:

1-After getting Admin Password enter to the system
2-create an Account with password
3-click start - > Run - > type Regedit press Enter
4-Go to
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\WindowsNT\ CurrentVersion\Winlogon\SpecialAccounts\UserList

shot1

5- Create a new DWORD Value on the UserList
6-Name it with Name of Account to be Hidden
7-set the Value Data of this DWORD Value to 0 to hide it /1 to appear it
8- close Regedit and Reboot
9- Press Ctrl+Alt+Delete when logon Screen Appear another login dialog appear type You hidden user name and password and press Enter

Note:

1- the account profile will be visible in \Documents and Settings, But it will be hidden from Logon Screen and User Account in the control panel

2-there is other method that
Inject your Account directly to the Admin SAM without know the Admin Pass, but believe me you don't Expect the result, so if you want try it (if the password hard to get)

9-USB Boot for FAT32, NTFS or any File System


HP Always amazing me to do this we need 2 tools

1- HP USB Disk Storage Format Tool v 2.0.6 I include in Downloads If u want to find more go to http://www.hp.com/
2- NTFSDOS Professional Boot Disk Wizard I include in Downloads If u want to find more go to http://www.winternals.com/
shot1
Just connect your USB Storage
steps:
1- Prepare a Startup Disk or Startup CD , Or any Equivalent
2- In the HP tool select the Device->your USB Storage
3- Select File System FAT or FAT32
4- Check "create a DOS startup disk" checkbox and then select option "using DOS System Files Located at"
5- brows your location
6- Click Start
7- Now you have a Bootable USB Storage Device
8- Now in the NTFSDOS Professional Boot Disk Wizard follow the wizard and you will get a NTFS bootable USB Storage

Why we need NTFS ?
If the Partition of the Windows System is NTFS so with normal Startup you will not be able to access any files because the File System is not Recognized by MS-DOS when we install NTFSDOS Professional on the bootable disk it will allow you To Access any File Under NTFS

Note:
Make sure that the option in Mother board Setup of First Boot "USB-Hard Disk" if you want to boot from a USB


Haking "admin" from "user" mode n more






Haking "admin" from "user" mode n more







really that is possible !


u know why is it a "user" account because it lacks come service layer than that in "administrator" account

Using simple command line tools on a machine running Windows XP we will obtain system level privileges, and run the entire explorer process (Desktop), and all processes that run from it have system privileges. The system run level is higher than administrator, and has full control of the operating system and it’s kernel. On many machines this can be exploited even with the guest account. At the time I’m publishing this, I have been unable to find any other mention of people running an entire desktop as system, although I have seen some articles regarding the SYSTEM command prompt.

Local privilege escalation is useful on any system that a hacker may compromise; the system account allows for several other things that aren’t normally possible (like resetting the administrator password).

The Local System account is used by the Windows OS to control various aspects of the system (kernel, services, etc); the account shows up as SYSTEM in the Task Manager

Local System differs from an Administrator account in that it has full control of the operating system, similar to root on a *nix machine. Most System processes are required by the operating system, and cannot be closed, even by an Administrator account; attempting to close them will result in a error message. The following quote from Wikipedia explains this in a easy to understand way:


You can trick the system into running a program, script, or batch file with system level privileges.

One sample

One trick is to use a vulnerability in Windows long filename support.
Try placing an executable named Program.*, in the root directory of the "Windows" drive. Then reboot. The system may run the Program.*, with system level privileges. So long as one of the applications in the "Program Files" directory is a startup app. The call to "Program Files", will be intercepted by Program.*.

Microsoft eventually caught on to that trick. Now days, more and more, of the startup applications are being coded to use limited privileges.



Quote:
In Windows NT and later systems derived from it (Windows 2000, Windows XP, Windows Server 2003 and Windows Vista), there may or may not be a superuser. By default, there is a superuser named Administrator, although it is not an exact analogue of the Unix root superuser account. Administrator does not have all the privileges of root because some superuser privileges are assigned to the Local System account in Windows NT.


Under normal circumstances, a user cannot run code as System, only the operating system itself has this ability, but by using the command line, we will trick Windows into running our desktop as System, along with all applications that are started from within.
Getting SYSTEM
I will now walk you through the process of obtaining SYSTEM privileges.
To start, lets open up a command prompt (Start > Run > cmd > [ENTER]).
At the prompt, enter the following command, then press [ENTER]:

Code:

at


If it responds with an “access denied” error, then we are out of luck, and you’ll have to try another method of privilege escalation; if it responds with “There are no entries in the list” (or sometimes with multiple entries already in the list) then we are good. Access to the at command varies, on some installations of Windows, even the Guest account can access it, on others it’s limited to Administrator accounts. If you can use the at command, enter the following commands, then press [ENTER]:

Code:

at 15:25 /interactive “cmd.exe”


Lets break down the preceding code. The “at” told the machine to run the at command, everything after that are the operators for the command, the important thing here, is to change the time (24 hour format) to one minute after the time currently set on your computers clock, for example: If your computer’s clock says it’s 4:30pm, convert this to 24 hour format (16:30) then use 16:31 as the time in the command. If you issue the at command again with no operators, then you should see something similar to this:

When the system clock reaches the time you set, then a new command prompt will magically run. The difference is that this one is running with system privileges (because it was started by the task scheduler service, which runs under the Local System account). It should look like this:

You’ll notice that the title bar has changed from cmd.exe to svchost.exe (which is short for Service Host). Now that we have our system command prompt, you may close the old one. Run Task Manager by either pressing CTRL+ALT+DELETE or typing taskmgr at the command prompt. In task manager, go to the processes tab, and kill explorer.exe; your desktop and all open folders should disappear, but the system command prompt should still be there.
At the system command prompt, enter in the following:

Code:

explorer.exe



A desktop will come back up, but what this? It isn’t your desktop. Go to the start menu and look at the user name, it should say “SYSTEM”. Also open up task manager again, and you’ll notice that explorer.exe is now running as SYSTEM. The easiest way to get back into your own desktop, is to log out and then log back in. The following 2 screenshots show my results (click to zoom):

System user name on start menu



explorer.exe running under SYSTEM


What to do now
Now that we have SYSTEM access, everything that we run from our explorer process will have it too, browsers, games, etc. You also have the ability to reset the administrators password, and kill other processes owned by SYSTEM. You can do anything on the machine, the equivalent of root; You are now God of the Windows machine. I’ll leave the rest up to your imagination.





ADMINISTRATOR IN WELCOME SCREEN.



When you install Windows XP an Administrator Account is created (you are asked to supply an administrator password), but the "Welcome Screen" does not give you the option to log on as Administrator unless you boot up in Safe Mode.
First you must ensure that the Administrator Account is enabled:
1 open Control Panel
2 open Administrative Tools
3 open Local Security Policy
4 expand Local Policies
5 click on Security Options
6 ensure that Accounts: Administrator account status is enabled Then follow the instructions from the "Win2000 Logon Screen Tweak" ie.
1 open Control Panel
2 open User Accounts
3 click Change the way users log on or log off
4 untick Use the Welcome Screen
5 click Apply Options
You will now be able to log on to Windows XP as Administrator in Normal Mode.


EASY WAY TO ADD THE ADMINISTRATOR USER TO THE WELCOME SCREEN.!!



Start the Registry Editor Go to:
HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ Windows NT \ CurrentVersion \ Winlogon \ SpecialAccounts \ UserList \
Right-click an empty space in the right pane and select New > DWORD Value Name the new value Administrator. Double-click this new value, and enter 1 as it's Value data. Close the registry editor and restart.



What is hacking ?> Explained here !!


What are hackers?

Technically, a hacker is someone who is enthusiastic about computer programming and all things relating to the technical workings of a computer. Under such a definition, I would gladly brand myself a hacker. (There is in fact more to it than that - hackerdom is an entire culture in its own right.) However, most people understand a hacker to be what is more accurately known as a 'cracker'. Worryingly, people tend to prefer to use the word 'hacker' over the more technically correct 'cracker'. This means that many are afraid to use the word for its correct meaning. On this website, when I refer to a hacker, I actually mean a cracker. This is because I prefer to use language that I feel most people understand, rather than language that is technically correct. If you want to know what a cracker is, please read ahead to the next section...

What are crackers?

Crackers are people who try to gain unauthorised access to computers. This is normally done through the use of a 'backdoor' program installed on your machine. A lot of crackers also try to gain access to resources through the use of password cracking software, which tries billions of passwords to find the correct one for accessing a computer. Obviously, a good protection from this is to change passwords regularly. Another good move is the use of software that supports intruder lockout, where no further passwords are accepted after a certain number of bad passwords have tried. Even the correct password wouldn't allow access. Such blocks are normally released after a period of time has elapsed (eg 15 minutes). Of course, an even better idea is never to put security-sensitive resources on the Internet in the first place. If you don't want something to be accessed from the Internet, then make it so that it is only accessible from your local network, or even just from one computer. However, backdoor programs are programs that can expose files to the Internet that were never meant to be shared with other people. You can protect yourself from these by using a firewall and a good up-to-date anti-virus program. You would normally get such a backdoor program by opening an e-mail attachment containing the backdoor program. It is normal for such a backdoor program to send out more copies of itself to everyone in your address book, so it is possible for someone you know to unintentionally send you a malicious program. Note that this can normally only be done if you are using Microsoft Outlook or Outlook Express. A few backdoor programs can work with any e-mail program by sitting in memory and watching for a connection to a mail server, rather than actually running from within a specific mail program. If you do use Outlook or Outlook Express, and you do not have the correct security patches installed, it may be possible for a malicious program to be executed from an e-mail when you receive it, without the need for you to click on any attachments. Note that the same bug also affects Internet Explorer. A security patch is available for this, but personally I would advise that you use different mail and web browsing software. There are other ways of cracking as well, some more widespread than others. See How do hackers hack? for more information. Note that on most of this website, I refer to 'hackers' instead of 'crackers'. I mean 'crackers'. I explain this in more detail above.

What damage can a hacker do?

This depends upon what backdoor program(s) are hiding on your PC. Different programs can do different amounts of damage. However, most allow a hacker to smuggle another program onto your PC. This means that if a hacker can't do something using the backdoor program, he can easily put something else onto your computer that can. Hackers can see everything you are doing, and can access any file on your disk. Hackers can write new files, delete files, edit files, and do practically anything to a file that could be done to a file. A hacker could install several programs on to your system without your knowledge. Such programs could also be used to steal personal information such as passwords and credit card information. Some backdoor programs even allow a hacker to listen in on your conversations using your computer's microphone if one is attached! Hackers can do great damage to your computer. They could delete vital files from your hard disk, without which your computer could not work. However, you can re-install these from backups (you do keep backups, don't you?) In theory, the absolute worst damage a hacker could do is turn your computer into a large paperweight. It is possible - the CiH virus demonstrated how. This virus attacked your computer using the then new Flash BIOS technology. This capability was intended to be used to upgrade your computer's BIOS. (The BIOS is a program stored on a chip inside your computer. It controls quite a lot of low-level stuff and is a very vital part of your computer. It is the BIOS that does all the memory checks when you turn on, and also performs the first stage in loading your operating system.) However, the virus used this 'feature' to destroy the BIOS. Without the BIOS, the computer can't work. The only way to recover from this would be to replace your computer's motherboard. At the time of writing this, there are no backdoor programs that can do the same thing, but it is easy enough for a hacker to install a virus that does. Since the CiH virus, many BIOSs have a "flash write protect" option in BIOS setup, and/or a jumper setting on the motherboard that has a similar effect. See your motherboard manual for details.

How does a firewall protect me?

Basically, firewalls protect your computer from unauthorised access attempts. There are two kinds of firewall. Networked computers tend to be connected to the Internet through just one or two computers (hence only one Internet connection is required). These computers behave as firewalls by blocking any unauthorised packets. Any other packets are passed on to the computer they are intended for. This kind of firewall is called a corporate firewall. The kind of firewall you may be more familiar with is a personal firewall- this is a program that runs on your computer, and blocks any unauthorised incoming packets. Personally, I use ZoneAlarm. The great thing about ZoneAlarm is that it is easy to configure. Also, it only allows chosen programs to access the Internet- allowing you to block hackers that use standard protocols such as FTP. In case of emergency, it also has an emergency stop button, which allows you to block allfree by private individuals and charities. Businesses, governments, and educational institutions can download ZoneAlarm on the basis of a 60-day free trial. See ZoneLab's website for more information. access to the Internet immediately. ZoneAlarm can be downloaded and used for Remember that although a firewall stops hackers from getting in, it will not remove any existing 'backdoor' software from your machine. For this, you need a good anti-virus product like Norton or Sophos. Also make sure that you use your anti-virus software regularly, and that you keep it up-to-date.

How do I report hackers?

When an access attempt occurs, if you have alert popups turned on, ZoneAlarm will tell you the IP address of the possible hacker. This looks something like 123.123.123.123 (example only). You can use this information to track down and report hackers to their ISP. Bare in mind that you are unlikely to get any response apart from a simple acknowledgement- they have to deal with hundreds of reports like yours every day. Here is a rough guide of how to report hackers (note: some of the programs referred to are only available in Windows):
  1. Make a note of all the information ZoneAlarm gives you. If possible, use ZoneAlarm's text log option- many ISPs prefer text log format (personally, I supply ZoneAlarm's text log and an English translation).
  2. Select Start, Run... In the Run box, type in "winipcfg" and then click OK. This will tell you what your IP address is (among other things). Write down the IP address.
  3. Use an Internet tool like SamSpade's address digger to look up which ISP uses the IP address given in your firewall's log.
  4. This will return a lot of technical information. Some ISPs add remarks to this information telling you where to send abuse reports to. Make a note of any such e-mail addresses. If there is no such information, look at the official name for the server (near the top), or the names of the domain name servers. To convert these to an e-mail address, remove everything before the first period, including the period itself, then add 'abuse@' in front of it.
  5. Now send an e-mail to the abuse address(es) you have. If the recipient obviously isn't English (eg if the e-mail address ends in .de (Germany) or .fr (France)), write it in their language, if you know it. If not, don't worry, most people speak at least a little English, and the technical language of computers is the same almost anywhere you go!
  6. Include in the message what ZoneAlarm told you. Also include your own IP address (this is what winipcfg told you), the date, the time, your time zone (in relation to GMT), and an indication of how accurate your computer's clock is (eg if you set it by the atomic clock every day, say so!)


What is a port scan?

A port scan is, quite simply, a test to see if a computer exists and responds to access attempts on a certain port (eg TCP port 80, used by the HTTP protocol). Port scans, on their own, are quite harmless and have many legitimate uses. However, they also have a malicious use, which is to test to see if any particular backdoor software is running on a computer for the purposes of then using such backdoor software. In my Internet logs, I include all unauthorised port scans of my computer. I tend to describe these port scans as hack attempts, since it is most likely that this is what they are. To be absolutely pedantic, I shouldn't really describe them as such, since there may be other explanations.


What is an IP address?

An IP address is a number that can uniquely identify any computer on the Internet. With the current Internet protocol (IPv4), an IP address is a 32-bit number. That means that as a binary number, it would be stored as 32 ones and zeroes. There are 4,294,967,296 possible IP addresses. However, we humans tend to split IP addresses into four 8-bit numbers, express these numbers using our decimal number system, and separate them with dots. With 8-bit numbers, each number must be a whole number in the range 0 to 255, inclusive. For example, an IP address of 2,071,690,107 would probably be expressed as 123.123.123.123 (example only). Some people might express an IP address in hexadecimal as well (7B7B7B7Bh in this case). The dotted IP address is by far the most common, however. As the Internet grows, plans are being made to increase the size of IP addresses. (The "next" Internet protocol, IPv6, uses 128-bit IP addresses.) The problem with that, of course, is that quite a few Internet protocols would need to be rewritten, since they are designed to work with 32-bit IP addresses. This includes the Internet Protocol itself (IP). Thankfully, Internet packets include an IP version flag, so it would be possible to have both old and new implementations of the IP communicate with each other. (The newer implementation would use the older protocol when communicating with older implementations. Implementations of the IP would know whether a computer was using the older or newer protocol from the version flag. Unfortunately, older implementations would not be able to access anything outside of the 32-bit IP range.) IP addresses can be statically or dynamically allocated. Statically allocated IP addresses always refer to the same computer. However, dynamically allocated IP addresses can refer to different computers at different times. For example, if you have a dial-up Internet connection, your IP address doesn't become unused when you hang up- it is allocated to someone else. When you reconnect, you are allocated a new IP address. This is dynamic allocation.


How can I hack?

I don't like that first person pronoun... I don't mind explaining how hackers hack, but I won't explain how you can hack. This is not a pro-hacking website. This is a computer security site. My aim is not to encourage or assist hacking in any way. I aim to try to inform people of the risks that they may be exposed to, so that they can better protect themselves from these risks. I also provide this website as a resource for those with an academic interest. If you want a rough idea of some of the cracking methods that other people (not you) use, just read on to the next section.


How do hackers hack?

There are many ways in which a hacker can hack. The most common way is by using a backdoor program. See What damage can a hacker do? for more information on these. However, there are some 'special' cases. Click a link below for more information. NetBIOS - UDP 137, UDP 138, TCP 139
ICMP Ping - Internet Control Message Protocol
FTP - TCP 21
rpc.statd - TCP 111, TCP 9704
lpr - TCP 515
HTTP - TCP 80



How can NetBIOS be harmful?

NetBIOS hacks are the worst kind, since they don't require you to have any hidden backdoor program running on your computer. This kind of hack exploits a bug in Windows 9x. NetBIOS is meant to be used on local area networks, so machines on that network can share information. Unfortunately, the bug is that NetBIOS can also be used across the Internet - so a hacker can access your machine remotely. Not all Windows computers are vulnerable to this kind of attack. If you have a firewall that blocks incoming NetBIOS packets, you are safe. Some network configurations will also be immune. To find out whether you are vulnerable, visit GRC's ShieldsUP!, and click the "Test My Shields!" image half way down the page. Note that GRC will attempt to connect to your computer using NetBIOS - this is just to test whether your computer is vulnerable. GRC will not retain any information about your computer, nor will any damage be done. NetBIOS uses TCP port 139, UDP port 137 and UDP port 138.


How can ICMP Ping be harmful?

ICMP is one of the main protocols that makes the Internet work. It standards for Internet Control Message Protocol. 'Ping' is one of the commands that can be sent to a computer using ICMP. Ordinarily, a computer would respond to this ping, telling the sender that the computer does exist. This is all pings are meant to do. Pings may seem harmless enough, but a large number of pings can make a Denial-of-Service attack, which overloads a computer. Also, hackers can use pings to see if a computer exists and does not have a firewall (firewalls can block pings). If a computer responds to a ping, then the hacker could then launch a more serious form of attack against a computer. People who do have firewalls normally don't bother to report pings, because they are innocent in themselves - allowing the hacker to continue hacking for quite a long period of time.


How can FTP be harmful?

FTP is a standard Internet protocol, standing for File Transfer Protocol. You might use it for file downloads from some websites. If you have a web page of your own, you might use FTP to upload it from your home computer to the web server. However, FTP can also be used by some hackers... FTP normally requires some form of authentication for access to private files, or for writing to files. Hackers can get round this by using programs called "backdoor programs". You wouldn't know if you had one of these, unless you used an up-to-date virus scanner regularly. You could get a backdoor program by opening an infected E-mail attachment. FTP backdoor programs, such as Doly Trojan, Fore, and Blade Runner, simply turn your computer into an FTP server, without any authentication. Using a known protocol such as FTP is easier for hackers because the protocol is already defined - not so much new software needs to be written to use it (a normal FTP client could be used - the hacker wouldn't need any specialist software). Also, since FTP has legitimate uses, many firewalls do not block it. Luckily, ZoneAlarm does.


How can rpc.statd be harmful?

This is a problem specific to Linux and Unix. I am not too sure with what precisely rpc.statd should be used for. I do, however, know that it is used by hackers. rpc.statd is typically used as a 'file locking status monitor' (whatever that is) on local area networks. Not all versions of Linux/Unix use it, and some versions have had the security glitch I am about to describe fixed. The problem is the infamous unchecked buffer overflow problem. This is where a fixed amount of memory is set aside for storage of data. If data is received that is larger than this buffer, the program should truncate the data or send back an error, or at least do something other than ignore the problem. Unfortunately, the data overflows the memory that has been allocated to it, and the data is written into parts of memory it shouldn't be in. This can cause crashes of various different kinds. However, a skilled hacker could write bits of program code into memory that may be executed to perform the hacker's evil deeds. That is the problem. rpc.statd uses TCP ports 111 and 9704.


How can lpr be harmful?

This is a similar problem specific to Linux and Unix. lpr is typically used as a printing system. Not all versions of Linux/Unix use it, and some versions have had the security glitch I am about to describe fixed. The problem is the infamous unchecked buffer overflow problem (again). See rpc.statd for more information on this problem. Basically, the result of this problem is that data can be written into parts of memory it shouldn't be written to. A skilled hacker could write program code into memory to perform his evil deeds. lpr uses TCP port 515.


How can HTTP be harmful?

HTTP stands for HyperText Transfer Protocol. It is one of the main protocols used on the Internet- it is what you are using right now to view this web page. HTTP hacks can only be harmful if you are using Microsoft web server software, such as Personal Web Server. There is a bug in this software called an 'unchecked buffer overflow'. If a user makes a request for a file on the web server with a very long name, parts of the request get written into parts of memory that contain active program code. A malicious user could use this to run any program they want on the server. The Code Red worm takes advantage of this. This worm even managed to infect the Microsoft Windows Update site at one point. Despite what I have just said, it is still possible for home users to become infected with such worms, since some people install Personal Web Server without knowing what it is. Some computers even have PWS pre-installed when you buy them. To see if PWS is running on your computer, hover your mouse over each of the icons in the bottom right corner of your screen, until a small description appears. If one of the icons is PWS, right-click it and choose to exit. Then, use Add/Remove Programs in Control Panel to remove the program from your system. Microsoft Personal Web Server is used to serve web pages directly from your computer to the rest of the world. Of course, you would need to be connected to the Internet 24 hours a day in order to do this. Most people will tend to upload Internet material to their ISP, rather than provide access to it directly from their own computer. And just to clear up any remaining confusion: Microsoft Personal Web Server is not required to surf the Internet- all you need to surf the Internet is a web browser and an Internet connection (such as dial-up).

HTTP uses TCP port 80. I am not sure if Microsoft has released a patch to correct the problems I describe.

Google Page Rank Explained






Google Page Rank Explained


Credit : David Austin
Grand Valley State University Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.

How to tell who's important

If you've ever created a web page, you've probably included links to other pages that contain valuable, reliable information. By doing so, you are affirming the importance of the pages you link to. Google's PageRank algorithm stages a monthly popularity contest among all pages on the web to decide which pages are most important. The fundamental idea put forth by PageRank's creators, Sergey Brin and Lawrence Page, is this: the importance of a page is judged by the number of pages linking to it as well as their importance. We will assign to each web page P a measure of its importance I(P), called the page's PageRank. At various sites, you may find an approximation of a page's PageRank. (For instance, the home page of The American Mathematical Society currently has a PageRank of 8 on a scale of 10. Can you find any pages with a PageRank of 10?) This reported value is only an approximation since Google declines to publish actual PageRanks in an effort to frustrate those would manipulate the rankings. Here's how the PageRank is determined. Suppose that page Pj has lj links. If one of those links is to page Pi, then Pj will pass on 1/lj of its importance to Pi. The importance ranking of Pi is then the sum of all the contributions made by pages linking to it. That is, if we denote the set of pages linking to Pi by Bi, then \[  I(P_i)=\sum_{P_j\in B_i} \frac{I(P_j)}{l_j}  \] This may remind you of the chicken and the egg: to determine the importance of a page, we first need to know the importance of all the pages linking to it. However, we may recast the problem into one that is more mathematically familiar. Let's first create a matrix, called the hyperlink matrix, $ {\bf H}=[H_{ij}] $ in which the entry in the ith row and jth column is \[  H_{ij}=\left\{\begin{array}{ll}1/l_{j} &   \hbox{if } P_j\in B_i \\  0 & \hbox{otherwise}  \end{array}\right.  \] Notice that H has some special properties. First, its entries are all nonnegative. Also, the sum of the entries in a column is one unless the page corresponding to that column has no links. Matrices in which all the entries are nonnegative and the sum of the entries in every column is one are called stochastic; they will play an important role in our story. We will also form a vector $ I=[I(P_i)] $ whose components are PageRanks--that is, the importance rankings--of all the pages. The condition above defining the PageRank may be expressed as \[  I = {\bf H}I  \] In other words, the vector I is an eigenvector of the matrix H with eigenvalue 1. We also call this a stationary vector of H. Let's look at an example. Shown below is a representation of a small collection (eight) of web pages with links represented by arrows. Google Page Rank Explained - The Ethical Hacking The corresponding matrix is
Google Page Rank Explained - The Ethical Hacking
with stationary vector Google Page Rank Explained - The Ethical Hacking
This shows that page 8 wins the popularity contest. Here is the same figure with the web pages shaded in such a way that the pages with higher PageRanks are lighter. Google Page Rank Explained - The Ethical Hacking

Computing I

There are many ways to find the eigenvectors of a square matrix. However, we are in for a special challenge since the matrix H is a square matrix with one column for each web page indexed by Google. This means that H has about n = 25 billion columns and rows. However, most of the entries in H are zero; in fact, studies show that web pages have an average of about 10 links, meaning that, on average, all but 10 entries in every column are zero. We will choose a method known as the power method for finding the stationary vector I of the matrix H. How does the power method work? We begin by choosing a vector I 0 as a candidate for I and then producing a sequence of vectors I k by \[  I^{k+1}={\bf H}I^k  \] The method is founded on the following general principle that we will soon investigate.
General principle: The sequence I k will converge to the stationary vector I.
We will illustrate with the example above.
I 0
I 1
I 2
I 3
I 4
... I 60
I 61
1 0 0 0 0.0278 ... 0.06 0.06
0 0.5 0.25 0.1667 0.0833 ... 0.0675 0.0675
0 0.5 0 0 0 ... 0.03 0.03
0 0 0.5 0.25 0.1667 ... 0.0675 0.0675
0 0 0.25 0.1667 0.1111 ... 0.0975 0.0975
0 0 0 0.25 0.1806 ... 0.2025 0.2025
0 0 0 0.0833 0.0972 ... 0.18 0.18
0 0 0 0.0833 0.3333 ... 0.295 0.295
It is natural to ask what these numbers mean. Of course, there can be no absolute measure of a page's importance, only relative measures for comparing the importance of two pages through statements such as "Page A is twice as important as Page B." For this reason, we may multiply all the importance rankings by some fixed quantity without affecting the information they tell us. In this way, we will always assume, for reasons to be explained shortly, that the sum of all the popularities is one.

Three important questions

Three questions naturally come to mind:
  • Does the sequence I k always converge?
  • Is the vector to which it converges independent of the initial vector I 0?
  • Do the importance rankings contain the information that we want?
Given the current method, the answer to all three questions is "No!" However, we'll see how to modify our method so that we can answer "yes" to all three. Let's first look at a very simple example. Consider the following small web consisting of two web pages, one of which links to the other:
Google Page Rank Explained - The Ethical Hacking
with matrix Google Page Rank Explained - The Ethical Hacking
Here is one way in which our algorithm could proceed:
I 0
I 1
I 2
I 3=I
1 0 0 0
0 1 0 0
In this case, the importance rating of both pages is zero, which tells us nothing about the relative importance of these pages. The problem is that P2 has no links. Consequently, it takes some of the importance from page P1 in each iterative step but does not pass it on to any other page. This has the effect of draining all the importance from the web. Pages with no links are called dangling nodes, and there are, of course, many of them in the real web we want to study. We'll see how to deal with them in a minute, but first let's consider a new way of thinking about the matrix H and stationary vector I.

A probabilitistic interpretation of H

Imagine that we surf the web at random; that is, when we find ourselves on a web page, we randomly follow one of its links to another page after one second. For instance, if we are on page Pj with lj links, one of which takes us to page Pi, the probability that we next end up on Pi page is then $ 1/l_j $ . As we surf randomly, we will denote by $ T_j $ the fraction of time that we spend on page Pj. Then the fraction of the time that we end up on page Pi page coming from Pj is $ T_j/l_j $ . If we end up on Pi, we must have come from a page linking to it. This means that \[  T_i = \sum_{P_j\in B_i} T_j/l_j  \] where the sum is over all the pages Pj linking to Pi. Notice that this is the same equation defining the PageRank rankings and so $  I(P_i) = T_i $ . This allows us to interpret a web page's PageRank as the fraction of time that a random surfer spends on that web page. This may make sense if you have ever surfed around for information about a topic you were unfamiliar with: if you follow links for a while, you find yourself coming back to some pages more often than others. Just as "All roads lead to Rome," these are typically more important pages. Notice that, given this interpretation, it is natural to require that the sum of the entries in the PageRank vector I be one. Of course, there is a complication in this description: If we surf randomly, at some point we will surely get stuck at a dangling node, a page with no links. To keep going, we will choose the next page at random; that is, we pretend that a dangling node has a link to every other page. This has the effect of modifying the hyperlink matrix H by replacing the column of zeroes corresponding to a dangling node with a column in which each entry is 1/n. We call this new matrix S. In our previous example, we now have
Google Page Rank Explained - The Ethical Hacking
with matrix Google Page Rank Explained - The Ethical Hacking
and eigenvector Google Page Rank Explained - The Ethical Hacking
In other words, page P2 has twice the importance of page P1, which may feel about right to you. The matrix S has the pleasant property that the entries are nonnegative and the sum of the entries in each column is one. In other words, it is stochastic. Stochastic matrices have several properties that will prove useful to us. For instance, stochastic matrices always have stationary vectors. For later purposes, we will note that S is obtained from H in a simple way. If A is the matrix whose entries are all zero except for the columns corresponding to dangling nodes, in which each entry is 1/n, then S = H + A.

How does the power method work?

In general, the power method is a technique for finding an eigenvector of a square matrix corresponding to the eigenvalue with the largest magnitude. In our case, we are looking for an eigenvector of S corresponding to the eigenvalue 1. Under the best of circumstances, to be described soon, the other eigenvalues of S will have a magnitude smaller than one; that is, $ |\lambda|  if  is an eigenvalue of S other than 1.   We will assume that the eigenvalues of <b>S</b> are <IMG alt= and that \[  1 = \lambda_1 > |\lambda_2| \geq |\lambda_3| \geq \ldots \geq |\lambda_n|   \] We will also assume that there is a basis vj of eigenvectors for S with corresponding eigenvalues $ \lambda_j $ . This assumption is not necessarily true, but with it we may more easily illustrate how the power method works. We may write our initial vector I 0 as \[  I^0 = c_1v_1+c_2v_2 + \ldots + c_nv_n  \] Then \begin{eqnarray*} I^1={\bf S}I^0 &=&c_1v_1+c_2\lambda_2v_2 + \ldots + c_n\lambda_nv_n \\ I^2={\bf S}I^1 &amp;=&c_1v_1+c_2\lambda_2^2v_2 + \ldots + c_n\lambda_n^2v_n \\ \vdots &amp; & \vdots \\ I^{k}={\bf S}I^{k-1} &amp;=&c_1v_1+c_2\lambda_2^kv_2 + \ldots + c_n\lambda_n^kv_n \\ \end{eqnarray*}  Since the eigenvalues $ \lambda_j $ with $ j\geq2 $ have magnitude smaller than one, it follows that $ \lambda_j^k\to0 $ if $ j\geq2 $ and therefore $ I^k\to I=c_1v_1 $ , an eigenvector corresponding to the eigenvalue 1. It is important to note here that the rate at which $ I^k\to I $ is determined by $ |\lambda_2| $ . When $ |\lambda_2| $ is relatively close to 0, then $ \lambda_2^k\to0 $ relatively quickly. For instance, consider the matrix \[  {\bf S} = \left[\begin{array}{cc}0.65 & 0.35 \\ 0.35 & 0.65 \end{array}\right].   \] The eigenvalues of this matrix are $ \lambda_1=1 $ and $ \lambda_2=0.3 $ . In the figure below, we see the vectors I k, shown in red, converging to the stationary vector I shown in green. Google Page Rank Explained - The Ethical Hacking Now consider the matrix \[  {\bf S} = \left[\begin{array}{cc}0.85 & 0.15 \\ 0.15 & 0.85 \end{array}\right].   \] Here the eigenvalues are $ \lambda_1=1 $ and $ \lambda_2=0.7 $ . Notice how the vectors I k converge more slowly to the stationary vector I in this example in which the second eigenvalue has a larger magnitude. Google Page Rank Explained - The Ethical Hacking

When things go wrong

In our discussion above, we assumed that the matrix S had the property that $ \lambda_1=1 $ and $  |\lambda_2| . This does not always happen, however, for the matrices S that we might find.   Suppose that our web looks like this:        In this case, the matrix <b>S</b> is   <IMG src= Then we see
I 0
I 1
I 2
I 3
I 4
I 5
1 0 0 0 0 1
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
In this case, the sequence of vectors I k fails to converge. Why is this? The second eigenvalue of the matrix S satisfies $ |\lambda_2|=1 $ and so the argument we gave to justify the power method no longer holds. To guarantee that $ |\lambda_2| , we need the matrix S to be primitive</i>. This means that, for some <i>m</i>, <b>S</b><i>m</i> has all positive entries. In other words, if we are given two pages, it is possible to get from the first page to the second after following <i>m</i> links. Clearly, our most recent example does not satisfy this property. In a moment, we will see how to modify our matrix <b>S</b> to obtain a primitive, stochastic matrix, which therefore satisfies <IMG alt=S is
Google Page Rank Explained - The Ethical Hacking
with stationary vector Google Page Rank Explained - The Ethical Hacking
Notice that the PageRanks assigned to the first four web pages are zero. However, this doesn't feel right: each of these pages has links coming to them from other pages. Clearly, somebody likes these pages! Generally speaking, we want the importance rankings of all pages to be positive. The problem with this example is that it contains a smaller web within it, shown in the blue box below. Google Page Rank Explained - The Ethical Hacking Links come into this box, but none go out. Just as in the example of the dangling node we discussed above, these pages form an "importance sink" that drains the importance out of the other four pages. This happens when the matrix S is reducible; that is, S can be written in block form as \[  S=\left[\begin{array}{cc} * & 0 \\ * & * \end{array}\right].  \] Indeed, if the matrix S is irreducible, we can guarantee that there is a stationary vector with all positive entries. A web is called strongly connected if, given any two pages, there is a way to follow links from the first page to the second. Clearly, our most recent example is not strongly connected. However, strongly connected webs provide irreducible matrices S. To summarize, the matrix S is stochastic, which implies that it has a stationary vector. However, we need S to also be (a) primitive so that $ |\lambda_2| and (b) irreducible so that the stationary vector has all positive entries.   A final modification</H3>  To find a new matrix that is both primitive and irreducible, we will modify the way our random surfer moves through the web. As it stands now, the movement of our random surfer is determined by <b>S</b>: either he will follow one of the links on his current page or, if at a page with no links, randomly choose any other page to move to. To make our modification, we will first choose a parameter <IMG alt= between 0 and 1. Now suppose that our random surfer moves in a slightly different way. With probability $\alpha$ , he is guided by S. With probability $ 1-\alpha $ , he chooses the next page at random. If we denote by 1 the $ n\times n $ matrix whose entries are all one, we obtain the Google matrix: \[  {\bf G}=\alpha{\bf S}+ (1-\alpha)\frac{1}{n}{\bf 1}  \] Notice now that G is stochastic as it is a combination of stochastic matrices. Furthermore, all the entries of G are positive, which implies that G is both primitive and irreducible. Therefore, G has a unique stationary vector I that may be found using the power method. The role of the parameter $\alpha$ is an important one. Notice that if $ \alpha=1 $ , then G = S. This means that we are working with the original hyperlink structure of the web. However, if $ \alpha=0 $ , then $ {\bf G}=1/n{\bf 1} $ . In other words, the web we are considering has a link between any two pages and we have lost the original hyperlink structure of the web. Clearly, we would like to take $\alpha$ close to 1 so that we hyperlink structure of the web is weighted heavily into the computation. However, there is another consideration. Remember that the rate of convergence of the power method is governed by the magnitude of the second eigenvalue $ |\lambda_2| $ . For the Google matrix, it has been proven that the magnitude of the second eigenvalue $ |\lambda_2|=\alpha $ . This means that when $\alpha$ is close to 1 the convergence of the power method will be very slow. As a compromise between these two competing interests, Serbey Brin and Larry Page, the creators of PageRank, chose $ \alpha=0.85 $ .

Computing I

What we've described so far looks like a good theory, but remember that we need to apply it to $ n\times n $ matrices where n is about 25 billion! In fact, the power method is especially well-suited to this situation. Remember that the stochastic matrix S may be written as \[  {\bf S}={\bf H} + {\bf A}  \] and therefore the Google matrix has the form \[  {\bf G}=\alpha{\bf H} + \alpha{\bf A} + \frac{1-\alpha}{n}{\bf 1}  \] Therefore, \[  {\bf G}I^k=\alpha{\bf H}I^k + \alpha{\bf A}I^k + \frac{1-\alpha}{n}{\bf 1}I^k  \] Now recall that most of the entries in H are zero; on average, only ten entries per column are nonzero. Therefore, evaluating HI k requires only ten nonzero terms for each entry in the resulting vector. Also, the rows of A are all identical as are the rows of 1. Therefore, evaluating AI k and 1I k amounts to adding the current importance rankings of the dangling nodes or of all web pages. This only needs to be done once. With the value of $\alpha$ chosen to be near 0.85, Brin and Page report that 50 - 100 iterations are required to obtain a sufficiently good approximation to I. The calculation is reported to take a few days to complete. Of course, the web is continually changing. First, the content of web pages, especially for news organizations, may change frequently. In addition, the underlying hyperlink structure of the web changes as pages are added or removed and links are added or removed. It is rumored that Google recomputes the PageRank vector I roughly every month. Since the PageRank of pages can be observed to fluctuate considerably during this time, it is known to some as the Google Dance. (In 2002, Google held a Google Dance!)

Summary

Brin and Page introduced Google in 1998, a time when the pace at which the web was growing began to outstrip the ability of current search engines to yield useable results. At that time, most search engines had been developed by businesses who were not interested in publishing the details of how their products worked. In developing Google, Brin and Page wanted to "push more development and understanding into the academic realm." That is, they hoped, first of all, to improve the design of search engines by moving it into a more open, academic environment. In addition, they felt that the usage statistics for their search engine would provide an interesting data set for research. It appears that the federal government, which recently tried to gain some of Google's statistics, feels the same way. There are other algorithms that use the hyperlink structure of the web to rank the importance of web pages. One notable example is the HITS algorithm, produced by Jon Kleinberg, which forms the basis of the Teoma search engine. In fact, it is interesting to compare the results of searches sent to different search engines as a way to understand why some complain of a Googleopoly.

David Austin
Grand Valley State University








Latest page update: made by rahuldutt1, Jun 20, 2:20 PM EDT
(about this update

About This Update


rahuldutt1

Edited by rahuldutt1







4052 words added


53 images added






view changes



-
complete history)


















Comment as Anonymous (sign in
or sign up!
)









































No comments for this page yet.