HyperText Markup Language, commonly abbreviated HTML or in its latest version
HTML

HyperText Markup Language, commonly abbreviated HTML or in its latest version

HTML5 is the markup language designed to represent web pages.

This language allows:

to write hypertext, hence the name,

to semantically structure the page,

to format the content,

create input forms,

to include multimedia resources including images, videos, and computer programs,

to create interoperable documents with a wide variety of equipment in accordance with web accessibility requirements.

It is often used in conjunction with the JavaScript programming language and Cascading Style Sheets (CSS). HTML is inspired by the Standard Generalized Markup Language (SGML). This is an open format.

Summary

1 Denominations

2 Evolution of language

2.1 1989-1992: Origin

2.2 1993: Contributions from NCSA Mosaic

2.3 1994: Contributions of Netscape Navigator

2.4 1995-1996: HTML 2.0

2.5 1997: HTML 3.2 and 4.0

2.6 2000-2006: XHTML

2.7 From 2007 to the present day: HTML 5 and abandonment of XHTML 2

2.8 The future of HTML: without a version number?

3 Description of HTML

3.1 HTML syntax

3.2 Structure of HTML documents

3.3 HTML elements

3.4 HTML attributes

3.5 Character set

3.5.1 Escape technique

4 HTML interoperability

5 Notes and references

6 See also

6.1 Related articles

6.2 External links

6.2.1 Official documents

Denominations

English Hypertext Markup Language literally translates to Hypertext Markup Language1. Usually, the acronym HTML is used, sometimes even repeating the word "language" as in "HTML language". Hypertext is sometimes written Hypertext to mark the T in the acronym HTML.

The uninformed public will sometimes speak of HTM instead of HTML, HTM being the truncated three-letter filename extension, a limitation found on older Microsoft operating systems.

Language evolution

During the first half of the 1990s, before the emergence of web technologies such as JavaScript (js), Cascading Style Sheets (CSS), and the Document Object Model (Dom), the evolution of HTML dictated the evolution of the World Wide Web. Since 1997 and HTML 4, the evolution of HTML has slowed down considerably; 10 years later, HTML 4 is still used in web pages. In 2008, the specification for HTML5 was under study2 and became in common use in the second half of the 2010s.

1989-1992: Origin

HTML is one of the three inventions that formed the basis of the World Wide Web, along with the Hypertext Transfer Protocol (HTTP) and web addresses (URLs). HTML was invented to make it possible to write hypertextual documents linking different Internet resources with hyperlinks. Today, these documents are referred to as a "web page". In August 1991, when Tim Berners-Lee publicly advertised the web on Usenet, he cited the only SGML but gave the URL of a document with a .html suffix.

In his book weaving the web3, Tim Berners-Lee describes the decision to base HTML on SGML as being as 'diplomatic' as it is technical: technically, he found SGML too complex, but he wanted to attract the hypertext community who saw SGML as the language. the most promising for standardizing the format of hypertext documents. In addition, SGML was already in use by its employer, the European Organization for Nuclear Research (CERN). ;

The first elements of HTML include:


The title of the document,

Hyperlinks,

The structuring of the text into titles, sub-titles, lists or plain text,

A rudimentary index search mechanism.

The description of HTML is then quite informal and mainly defined by the support of various contemporary web browsers. Dan Connolly helped make HTML a true SGML4 application.

1993: Contributions from NCSA Mosaic

The state of HTML then corresponds to what one might call HTML 1.0. However, there is no specification with this name, especially because the language was undergoing rapid development at the time. However, a standardization effort was underway5. From late 1993, the term HTML + was used to refer to the future version of HTML6,7. Despite the standardization effort thus initiated, and until the late 1990s, HTML was primarily defined by browser implementations.

With the NCSA Mosaic browser, HTML knows two major inventions:

First, the invention of the IMG element makes it possible to integrate images (initially, only in GIF and XBM formats) into web pages (Mosaic 0.10).

Then the invention of forms (Mosaic 2.0pre5) makes the web interactive by allowing visitors to enter data in pages and send it to the webserver. This invention makes it possible in particular to place orders, and therefore to use the web for e-commerce.

1994: Contributions of Netscape Navigator

With the release of Netscape Navigator 0.9 on October 13, 1994, support for many presentation elements were added: text attributes, blinking, centering, etc.

HTML development then takes two divergent paths:

On the one hand, browser developers are focused on maximizing the visual impact of web pages in response to user requests8.

On the other hand, web designers propose to extend the semantic description capabilities (logos, footnotes, etc.) and application domains (mathematical formulas, tables) of HTML.

Designers follow SGML's principles of leaving presentation to a style language. In this case, Cascading Style Sheets (CSS) are intended for HTML. Only table support is quickly integrated into browsers, in particular, because it allows a very marked improvement in the presentation. In addition to the multiplication of presentation elements, then software that produced and consumed HTML often conceived of documents as a series of formatting commands rather than a markup representing the tree structure now known as the Document Object Model (DOM). The lack of structure of the HTML then implemented is sometimes denounced as being "tag soup", in English: tag soup.

1995-1996: HTML 2.0

In March 1995, the newly founded World Wide Web Consortium (W3C) released the result of its research on HTML +: the HTML 3.0 draft. It includes support for tables, figures, and mathematical expressions. This draft expires on September 28, 1995, without giving any direct follow-up. At the end of 1995, RFC 18669 describing HTML 2.0 was finalized. The main editor is Dan Connolly. This document describes HTML as it existed before June 1994, so without the many additions of Netscape Navigator.

1997: HTML 3.2 and 4.0

On January 14, 1997, the W3C released the HTML 3.2 specification. It describes the current practice observed at the beginning of 199610, therefore with part of the additions of Netscape Navigator and Internet Explorer. Its most important novelties are the standardization of tables and many presentation elements. HTML 3.2 precedes HTML 4.0 narrowly and contains elements for styling and scripting support.

On December 18, 1997, the W3C published the HTML 4.0 specification which standardizes many extensions supporting styles and scripts, frames and objects (generalized content inclusion). HTML 4.0 also brings various improvements for the accessibility of content11 including mainly the possibility of a more explicit separation between structure and presentation of the document, or the support of additional information on certain complex content such as forms, tables, or acronyms. . HTML 4.0 introduces three variations of the format, intended to encourage the evolution towards more meaningful markup while taking into account the temporary limitations of production tools:


The strict variant excludes so-called "presentation" elements and attributes, intended to be replaced by CSS styles, as well as applet and frame elements which are replaced by the object element designed to be more interoperable and accessible. The transitional variant extends the strict variant by taking over the deprecated elements and attributes of HTML 3.2, the presentation elements of which were commonly used by HTML editors of the day. The frameset variant standardizes the technique of framesets composing a single resource from several web pages assembled by the browser.

These variations continue thereafter without significant modifications in HTML 4.01 and in the XHTML 1.0 transition format from HTML. The latest HTML specification is version 4.01 dated 24

From 2007 to the present day: HTML 5 and abandonment of XHTML 2

In March 2007, drawing the consequence of the reluctance of part of the industry and web content designers to face XHTML 2.014, the W3C launched the development of HTML and created a new working group supervised by Chris Wilson (Microsoft) and initially Dan Connolly (W3C), now Michael Smith (W3C). These include15:


To develop HTML to describe the semantics of documents but also online applications;

to achieve an extensible language via XML while maintaining a non-XML version compatible with the HTML parsers (parsers) of current browsers;

and enrich user interfaces with specific controls: progress bars, menus, fields associated with specific types of data.

The work of the WHATWG was formally adopted in May 2007 as the starting point for a new HTML516 specification. This document17 was published as a Working Draft on January 22, 2008. Among the design principles mentioned by the working group are in particular18:


the compatibility of future HTML implementations with existing web content, and the ability for former user agents to leverage future HTML 5 content;

a pragmatic approach, preferring evolutions to radical modifications, and adopting technologies or practices already widely shared by the authors of current content;

the priority is given, in the event of a conflict of interest, to the needs of the users over those of the authors, and consequently, to those of the authors over the constraints of implementation by browsers;

the compromise between the semantic richness of the language and the practical utility of the solutions available to fulfill the major objective of independence from the restitution medium.

An Accessibility Task Force was created by the W3C in November 2009 in order to resolve the compatibility problems of the new format with accessibility standards19, linked in particular to the implementation of ARIA, to textual alternatives, and to the new canvas and video elements20.


Development of XHTML 2.0 is initially continued in parallel, in response to the needs of other areas of the web, such as mobile devices, enterprise applications, and server applications21. Then, in July 2009, the W3C decided not to renew the XHTML 2 Working Group at the end of 200922. With the abandonment of XHTML 2, version XHTML 1.1 therefore remains the standardized version. HTML5 will be compatible with XHTML and XML, and will therefore allow XHTML523 documents. However, it is likely that the W3C is moving towards an outright abandonment of XHTML 1.1, because the implementation of XML in HTML5 makes unnecessary the definition of XHTML type document yy (where yy are the version numbers) 23.

Main article: HTML5.

The Future of HTML: Without a Version Number?

In January 2011, differences of opinion between Ian Hickson (engineer at Google), who wrote the HTML5 specification, and the members of the W3C working group led the WHATWG to create HTML Living Standard (literally: living standard of HTML). , an HTML specification intended to be constantly evolving, in order to keep pace with the rapid development of new functionalities by developers of browsers24 (as opposed to numbered versions, therefore “fixed”).

The HTML Living Standard aims to include HTML5 and to develop it continuously25. In particular, in the version of August 22, 2012, the reference document25 explains that the W3C HTML5, published on June 22, 2012, is based on a version of the HTML Living Standard, but that the HTML Living Standard does not stop at this version, and continues to evolve. It develops, in particular, the differences between the W3C version (the HTML5) and the HTML Living Standard version (for example, new bugs are not taken into account in HTML5, syntactic differences are listed, and new tags created by the HTML Living Standard is not included in HTML5).

Description of HTML

HTML is a document format description language that takes the form of a markup language whose syntax comes from the Standard Generalized Markup Language (SGML).

HTML syntax

Up to and including version 4.01, HTML is formally described as an application of the Standard Generalized Markup Language (SGML). However, successive specifications admit, through various means, that user agents are not, in practice, SGML compliant parsers26. Web browsers have never been able to decipher all of the syntax variations allowed by SGML27; on the other hand, they are generally able to automatically catch many syntax errors, according to the first part of "Postel's law": "Be liberal in what you accept, and conservative in what you send" (RFC 79128). In fact, developers of web pages and web browsers have always taken a great deal of freedom with the syntactic rules of SGML. Finally, HTML's document type definition (DTD), the formal technical description of HTML, was not written by Dan Connolly until a few years after the introduction of HTML4.

Despite the freedoms taken with the standard, the terminology specific to SGML is used: document, element, attribute, value, tag, entity, validity, application, etc. Thanks to the DTD, it is possible to automatically check the validity of an HTML document using an SGML29 parser.

Originally, HTML was designed to simply mark up (or mark up) text, including adding hyperlinks to it. We used a minimum of tags, as in the following HTML document:

<TITLE> Example of HTML </TITLE>

This is a sentence with a <A HREF=target.html> hyperlink </A>.

<P>

This is a paragraph o & ugrave; there is no hyperlink.

This example contains text, five tags, and an entity reference:


<TITLE> is the start tag for the TITLE element.

</TITLE> is the closing tag of the TITLE element.

Sample HTML is the content of the TITLE element.

<A HREF=target.html> is the opening tag for element A, with:

HREF = target.html, the HREF attribute whose value is target.html.

<P> is the opening tag of the P element. However, it is used here as if it were a paragraph separator, and this is even how it is often presented in older documentation of HTML. This is the opening tag of the paragraph whose content is This is a paragraph o & ugrave; there is no hyperlink. The closing tag of the P element, which is optional, is omitted here. The P element is implicitly terminated when a new paragraph begins or the parent element is closed (in this case).

& ugrave; is an entity reference representing the "ù" character.

Tags can be written in lowercase or uppercase. The use of lower case letters is becoming more common because XHTML requires them.

A valid HTML document is one that follows SGML syntax, uses only standardized elements and attributes, and respects the element nesting described by the standard. Only one document type declaration is missing from the previous example for it to be a valid HTML 2.0 document30.

However, a valid document is not sufficient to comply with the targeted HTML specification. Indeed, in addition to the validity requirement, a compliant document is subject to other constraints which are not expressed by the document type definition (DTD), but which are expressed by the specification itself. This is particularly the case with the content type of certain attributes, such as that of the DateTime attribute: to be compliant with HTML 4.01, it must itself comply with a subset of the ISO 860131 standard. A strictly SGML parser such as the W3C HTML validator cannot, therefore, guarantee the conformity of an HTML document.

Structure of HTML documents

In the early years, HTML documents were often thought of as flat structures, and tags as style32 commands. So the <p> tag was considered a line break, and the </p> tag was ignored. Or when JavaScript 1.0 appeared, it only gave access to document links and forms through the document. forms and documents. links tables.

With the introduction of the Cascading Style Sheets and the Document Object Model, it became necessary to consider that HTML documents have a true tree structure, with a root element containing all the other elements33. The opening and closing tags of these elements remain optional. However, today there is a tendency to tag every element34 and indicate the DTD. With the exception of the root element, each element has exactly one direct parent element; this "document tree" is notably used by the formatting structure which is derived from it for the application of cascading style sheets where each element can have its own background, border, and margin.

The structure and code of websites can also be viewed by adding view-source: in front of the page URL. For this page

HTML elements

Main article: HTML element.

HTML version 4 describes 91 elements. Following the specification of HTML 4, the functionalities implemented by HTML can be broken down as follows:

The general structure of an HTML35 document

At the top level, an HTML document is separated between a header and a body. The header contains information about the document, including its title and possibly metadata. The body contains what is displayed.

Language information36

It is possible to specify the language of any part of the document and to manage the mixture of text written from left to right with text from right to left.

Semantic marking37

HTML helps to differentiate specific content such as quotes from external works, computer code snippets, emphasized passages, and abbreviations. Some of these elements, initially designed to support technical documentation, are very rarely used (differentiation between variable and example value elements in a computer code, for example, or even an instance of a term defined in the context).

Lists38

HTML differentiates between unordered and ordered lists, depending on whether or not the formal order of the content in the code is information itself. Definitive lists also exist, but their scope is not exactly determined.

Tables39

This functionality is formally used for the presentation of tabular data, but was mainly exploited for its layout capabilities before Cascading Style Sheets (CSS) reached a sufficient degree of maturity.

Hyperlinks40

The primary functionality of HTML.

Inclusion of images, applets, and miscellaneous objects41

Originally HTML only allowed hyperlinks to external media. The invention of specialized elements for multimedia enabled the automatic inclusion of images, music, video, etc. in web pages.

Grouping elements42

Not giving meaning to the content they mark up, these generic elements make it possible to apply presentation styles, perform processing via scripts or any other operation requiring the isolation of part of the content.

Presentation Style43

Each element, or even the entire document, can have styles applied. Styles are defined in the document or come from external Cascading Style Sheets (CSS).

Text presentation marking44

Developed before the generalization of CSS to quickly provide the functionality to graphic designers. For the most part now officially discouraged.

Frames45

Also known as frames, an often-criticized feature that allows multiple HTML documents to be displayed in a single window.

Form for interactive data insertion46

Form elements allow visitors to enter text and files into web pages.

Scripts47

Allows you to associate pieces of programs with user actions on the document. The languages used are generally JavaScript and VBScript.

HTML attributes

Attributes allow you to specify the properties of HTML elements. There are 188 attributes in version 4 of HTML48.

Some attributes apply to almost all items:

The generic attributes id (unique identifier) and class (repeatable identifier) 49 intended to allow the application of external processing, such as the application of presentation styles or manipulation of the document tree via a scripting language. Added to this is the style50 attribute allowing to define the presentation style of the element (generally in CSS), and the title51 attribute providing additional information of a generally free nature (The major exception is the use of the title to determine the permanent style and any alternative styles applied to a document via link elements).

The dir and lang36 internationalization attributes specifying the writing direction and language of the content;

The onclick, ondblclick, onkeydown, onkeypress, onkeyup, onmousedown, onmousemove, onmouseout, onmouseover, onmouseup52 event handlers, which capture events generated in the element to call a script.

Other attributes are unique to a single element, or similar elements. For example:

The elements that allow you to include graphic resources in the document have height and width attributes so that the browser can anticipate the size of the resource to display before it has been downloaded: image, object, I frame.

specific elements are endowed with an attribute assuming a single function, such as the label element of the labels of form controls and it's for attribute designating the control concerned: that is, in HTML, and with the use map and is map attributes of images, one of the very few explicit and formalized associations between elements, regardless of their linear order in the source code.

Most of the attributes are optional. However, some elements have mandatory attributes:

By their nature: the img element must have an src attribute specifying the URI of the graphic resource it represents. The same is true of all so-called "empty" and "replaced" 53 elements which, at the cost of a departure from SGML rules, do not have their own content. This is also the case for elements that are not empty for functional reasons, such as the form element whose action attribute indicates the target server that will process the data after submission;

for reasons related to the accessibility of the content: the images are thus endowed with a mandatory alt attribute allowing to indicate raw textual content intended to replace the graphic resource in the consultation contexts where it cannot be restored or perceived.

The content type of HTML attributes is partly outside the scope of this standard, and its validation falls under third-party standards such as URIs, content types, or language codes.

Finally, some attributes are of Boolean type. These are the only attributes whose syntax can be validly implied in HTML: the selected attribute of a form control can thus be shortened to the form selected replacing the full form selected = "selected". This particular form is one of the points differentiating HTML from the syntax of “well-formed” documents in the XML sense.

Character set

Web pages can be written in a variety of languages and a large number of characters can be used, requiring either one character set per type of writing or one universal character set. When HTML appeared, the Unicode universal character set was not yet invented, and many character sets were used alongside each other, including ISO-8859-1 for the Latin and West European alphabet, Shift-JIS for Japanese, KOI8-R for Cyrillic. Today, the UTF-8 encoding of Unicode is the most widely used.

The HTTP communication protocol transmits the name of the character set. The HTML header may include a reminder of this character set, which should be the same unless there is a setting error. Finally, following an incorrect setting, the character set actually used may still differ from the announced set. These incorrect settings generally cause text display errors, especially for characters not covered by the ASCII standard.

Exhaust technology

Main article: List of XML and HTML character entities.

Prior to the generalization of Unicode, entities were defined to represent certain non-ASCII characters. It started with the characters of ISO 8859-1 in the HTML 2.0 standard. For diacritics, these entities follow a simple principle: the letter followed by the abbreviation of the associated diacritic.

ISO 8859-1 Diacritics

character entity reference remark

á & Aacute; acute for acute accent

? & Acirc; circ for the circumflex accent

At & Agrave; grave for grave accent

? & Aring; ring for the chief round

? & Atilde; tilde for the tilde

? & Auml; uml (Umlaute) for umlaut

? & Ccedil; cedil for cedilla

? & Oslash; slash for slash

HTML interoperability

As formalized by the W3C, HTML is designed to optimize the interoperability of documents. HTML is not used to describe the final rendering of web pages. In particular, unlike desktop publishing, HTML is not designed to specify the visual appearance of documents. Instead, HTML is designed to make sense of the different parts of the text: title, list, important passage, quote, etc. HTML was developed with the intuition that devices of all kinds would be used to browse the web: personal computers with screens of varying resolution and color depth, cell phones, computer synthesis, and recognition devices. Speech, computers with low and high bandwidth, and so on.

Because HTML is not tied to the final rendering of the document, the same HTML document can be viewed using a variety of hardware and software. At the hardware level, a document can in particular be displayed on a computer screen in graphic mode or a computer terminal in text mode, it can be printed, or it can be spoken by voice synthesis. At the software level, HTML does not make assumptions either, and several types of software read HTML: web browser, crawler, and various scripts (in Perl, PHP) for automatic processing.

A high degree of interoperability helps lower costs for content providers because a single version of each document serves a wide variety of needs. For the web user, interoperability allows for the existence of many competing browsers, all capable of viewing the entire web.

Each version of HTML has tried to reflect the greatest consensus among industry players so that the investments made by content providers are not wasted and their documents quickly become unreadable. The separation of the content and the form was not always respected during the development of the language, as evidenced for example by the text style markup, which makes it possible to indicate in particular the desired font for display, its size, or its color.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了