Yanick's Guide to Web Authoring

Introducing the Web and its Friends

This section presents general notions about the web and its paraphernalia in general. It is meant to give a first idea of what is what in the beautiful world of the 'net.

The Whole Truth About the WWW, HTML and Others Such Vowel-Deprived Acronyms

Ever wondered what is the difference between the internet and the web? What exactly is a web page? Why there is no blue rabbits? If so, this section is just for you.

First, there's the internet (also called the Net). The internet is simply the biggest network of computers in the whole wide world. And I'm not bragging: it is. It is also defined as a network of networks (hence the name of internet), as it is composed of smaller networks, which are sometime themselves composed of smaller networks, etc.

As an aside, a word that we hear from time to time is intranet. This refers to a internet-like network inside a particular compagnie (or group). These intranets usually offer the exact same services as the internet but with its usage restricted to the company itself. End of the aside.

So, the internet is a bunch of computers plugged together. This does not sound terribly exciting, isn't? Well, things might get a little more exciting if I tell you that the internet hosts many services.

One of these services is the email transmission system, which permit us to send messages to anyone anywhere in a few milliseconds. Another is the web.

The World Wide Web, also called the WWW (the only acronym that actually takes more time to say than the original term) and, more simply, the web, is a huge document repository. These documents can contains text, images, sound and even be interactives.

Before going further, let's give ourself a common vocabulary. I define a web site as a bunch of web documents forming a whole about a certain subject. I call a web page or a web document the atomic part of a web site, i.e., what appears in the web browser's window. For example, Yanick's Guide to Web Authoring is a web site while this introduction is a page. I'm not sure this is the standard definition, but let's pretend it is. Also, a web site can be part of a greater web site. To take back our example, Ygtwa is but a part of the my home site (a home site being the site of a particular individual).

Where were we? Ah, yes, web pages. A web page is primarily written in a language known as HTML (HyperText Markup Language). A hypertext is simply a normal text with parts of it linked to other parts of the same text or to other texts. This is the neat concept that give you the possibility to jump from one web page to another.

While HTML give you text formatting, the possibility to decorate your page with sound and images (without forgetting the spiffy links), it is a formatting, thus static language. I repeat, HTML is not a programming language; I'll explain the difference in the section and HTML. To return to our present topic, this means that a HTML document always remains the same and cannot interact with its readers. However, many methods to add dynamism to a web page exists.

First, there's the scripts, which are tiny programs that can be enclosed in a web page. A couple of web scripting languages exists, Javascript being one of the most popular and widespread.

There's also CGI (Common Gateway Interface) programs. These are programs located outside of the web pages. Such programs can be written in any kind of programming language, although in general they tend to be either written in perl or C.

This list would not be complete without Java. Java is a full-fledged programming language. The interest of Java is that it is possible to embed a java applet in a web page. Some java lingo: an application is a java program that run by itself while an applet is an application embedded in a web page. There's no size's limit on an applet, so an applet can readily be bigger than a application.

Internet Addresses: Big, But Not Bad

Internet addresses are, you will agree, long beasties. Just take the address of this site, http://www.iro.umontreal.ca/~champoux/ygtwa/introduction.html. No less than 61 characters! Email addresses are slightly shorter, but not by much (one of mine is champoux@iro.umontreal.ca, which totalizes only 25 characters). Why are they so long? Well, because they are the specific address of a single document/person on the whole net. That's right, an internet address is as complete as a classic snail mail address.

Web Address

In order to understand the anatomy of a web address, I will use the address of this page and analyze it part by part.

http://www.iro.umontreal.ca/~champoux/ygtwa/introduction.html
http://
Tells the browser which transfer protocol to use to retrieve the document. The most common transfer protocols are http (HyperText Tranfers Protocol), ftp (File transfer Protocol) and file. The first two are used to retrieve files over the internet, while the last is used to retrieve a document present on your own computer.
Cool tip: Most browsers take http as the default protocol, which mean that you don't really have to type this prefix (an economy of no less than 7 characters!).
www.iro.umontreal.ca
This is the web server on which the web page is. From left to right there is the server's name, then the server's network, then the network englobing the server's network, etc. Here, for example, the web page is on the server named www of the Operational Research and Computer Science department (iro, it makes sense in French, trust me) from the university of Montréal (umontreal) which is located in Canada (ca). The last term defines either the server's location or purpose. the following table presents some of the most common suffixes.
.com Company
.org Non-lucrative organization
.gov Governmental organization
.ca Canada
.au Australia

Cool tip: as the vast majority of web servers are called www most browsers will, if not provided, add this prefix to any given address. Furthermore, they will also add .com at its end if need. Thus you could have accessed this page by only typing iro.umontreal.ca/~champoux/ygtwa/introduction.html.
/~champoux/ygtwa/introduction.html
This is the path to the web page's physical file on the server. The initial tilde (~) is a UNIX thing that say "go look in this champoux fella's home directory". Once in my home directory, it looks for a directory named "ygtwa" from which it will read the file "introduction.html". The file's extension (in this case .html) tells the browser what kind of file it is so that it can process it correctly. The extensions .html and .htm are for HTML documents, while .gif and .jpeg are images. Your browser can be taught to recognize and process more file types. For example, it is possible to access a Word document from the web via your browser. The browser will recognize the Word document by its extension .doc and call MS Word to read it.
If a directory is specified, but no file, most servers are configured to return a specific web page (usually named index.html, but can be sometime be default.html, depending on the server's configuration). If no such web page exists in the directory, the directory itself will be shown (this default web page is thus a protection if you don't want your guests to peek at your site's guts.). For example, if you access http://www.iro.umontreal.ca/~champoux/ygtwa, it will be the index.html file that will be shown to you.

Email Address

Once one understand the web addressing system, email addresses are mere child play. Let's take my email at the university (champoux@iro.umontreal.ca) The email address is formed of the user's login name (champoux), the '@' symbol (pronounced 'at') and the domain on which the user resides. In my case, I'm Champoux at the IRO department of the University of Montréal, located in Canada. Another of my emails is ychampou@newbridge.com, which cam be read as user ychampou who works at Newbridge, which is a compagnie.

Told you it was easy.


[ Yanick's Guide to Web Authoring ]
Yanick Champoux