The history of the development of markup languages. Document Markup Languages ​​- What is Their Main Purpose Data Markup Language is

Markup languages

Markup language(text) in computer terminology, a set of characters or sequences inserted into text to convey information about its output or structure. It belongs to the class of computer languages. A text document written using a markup language contains not only the text itself (as a sequence of words and punctuation marks), but also Additional information about its various sections - for example, an indication of headings, highlights, lists, etc. In more difficult cases A markup language allows you to insert interactive elements and content from other documents into a document.

It should be noted that a markup language is not Turing complete and is not usually considered a programming language, although strictly speaking it is.

HTML (from English. Hyper Text Markup Language-- "Hypertext Markup Language") - developed by the British scientist Tim Berners-Lee around 1986-1991 within the walls of the European Center nuclear research in Geneva (Switzerland). HTML was created as a language for the exchange of scientific and technical documentation, suitable for use by people who are not specialists in the field of layout. HTML successfully dealt with the complexity of SGML by defining a small set of structural and semantic elements called descriptors. Descriptors are also often referred to as "tags". With HTML, you can easily create a relatively simple yet beautifully designed document. In addition to simplifying the structure of the document, support for hypertext has been added to HTML. Multimedia features were added later.

Initially, the HTML language was conceived and created as a means of structuring and formatting documents without being tied to the means of reproduction (display). Ideally, text with HTML markup should be reproduced without stylistic and structural distortions on equipment with various technical equipment (color screen of a modern computer, monochrome screen of an organizer, limited-sized screen of a mobile phone or device and programs for voice reproduction of texts). However, the modern use of HTML is very far from its original purpose. For example, tag

, used several times for page formatting, is designed to create the most common tables in documents. Over time, the platform's core idea of ​​HTML independence has been sacrificed in favor of modern needs for multimedia and graphic design.

XML(English) eX tensibleM arkupL angle-- extensible markup language; pronounced [ ex-em-eml]) is a markup language recommended by the World Wide Web Consortium (W3C). The XML specification describes XML documents and partially describes the behavior of XML processors (programs that read XML documents and provide access to their content). XML was designed to be a language with a simple, formal syntax that would be easy for programs to create and process documents, while also being easy for humans to read and create documents, with an emphasis on web use. The language is called extensible because it does not fix the markup used in documents: the developer is free to create markup according to the needs of a particular area, being limited only by the syntax rules of the language. The combination of simple formal syntax, human-friendliness, extensibility, and reliance on Unicode encodings for representing the content of documents has led to the widespread use of both XML itself and many derivatives. specialized languages based on XML in a wide variety of software tools.

XHTML(English) Ex tensibleH ypert extM arkupL angle-- extensible hypertext markup language) -- a family of XML-based web page markup languages ​​that replicate and extend the capabilities of HTML 4. The XHTML 1.0 and XHTML 1.1 specifications are World Wide Web Consortium recommendations, however, this moment its development was halted with the recommendation to use HTML. New versions of XHTML are not released.

The main difference between XHTML and HTML is the processing of the document. XHTML documents are processed by their module (parser) in the same way as XML documents. During this processing, errors made by developers are not corrected.

XHTML conforms to the SGML specification because XML is a subset of it. HTML has many features in the process of processing and has actually ceased to belong to the SGML family, which is enshrined in the draft HTML 5 specification.

The browser chooses the parser to process the document based on the content-type header received from the server:

HTML - text/html

XHTML - application/xhtml+xml

· For local viewing on the client, the selection is based on the file extension.

· In Internet Explorer up to version 8, there is no parser for processing XHTML documents.

WML(English) Wireless Markup Language-- "wireless markup language") -- document markup language for use in cell phones and other mobile devices according to the WAP standard.

The structure resembles somewhat simplified HTML, but there are key differences, since WML is aimed at devices that do not have the capabilities of personal computers (small screen, not all devices can display graphics, small memory size, etc.): all information in WML is contained in the so-called "decks" (eng. deck). Dec is the smallest unit of data that can be transferred by the server. The decks contain "cards" ( card) (each map is limited by tags and). There should always be at least one card in one deck, but there may be several. At the same time, only one card is displayed on the device screen at a time, and the user can switch between them by clicking on the links - this is done to reduce the number of requests for information to the server; at the same time, the size of WML pages should not exceed 1-4 kilobytes.

VML(English) Vector Markup Language-- vector markup language) was developed by Microsoft to describe vector graphics. VML was submitted to the W3C in 1998 by Microsoft, Macromedia, and others. Around the same time, Adobe, Sun, and several other companies submitted PGML documents for consideration. Both of these languages ​​later became the basis for SVG.

PGML (Precision Graphics Markup Language, loosely translated into Russian - "precision graphics markup language") - an XML-based markup language used to describe vector graphics on a web page (diagrams, individual interface elements) in the form of text in XML format, uses an image construction model , similar to PDF and PostScript. It was submitted to the W3C consortium by Adobe Systems, IBM, Netscape Communications and Sun Microsystems in 1998, but was not accepted as recommended. Almost simultaneously, Microsoft submitted its VML project for consideration, a year later a more advanced SVG language was developed, based on the idea of ​​​​two technologies. SVG has received a W3C recommendation and has become the main format for describing vector graphics on a web page.

SVG(from English. S calableV ectorG raphics-- scalable vector graphics) -- the scalable vector graphics markup language, created by the World Wide Web Consortium (W3C) and included in a subset of the extensible markup language XML, is designed to describe two-dimensional vector and mixed vector / bitmap graphics in XML format. Supports both still and animated interactive graphics -- or, in other terms, declarative and scripted. Does not support the description of three-dimensional objects. It is an open standard that is a recommendation of the W3C, the organization behind standards such as HTML and XHTML. SVG is based on the VML and PGML markup languages. Developed since 1999.

XBRL(English) eX tensibleB businessR eportingL angle, lit. Extensible Business Reporting Language is an open standard for electronic financial reporting. The XBRL format is based on the Extensible Markup Language XML. XBRL uses the XML syntax as well as XML-related technologies such as the XML namespace, XML Schema, XLink, and XPath. One of the purposes of XBRL is to represent and exchange financial information such as the financial statements of companies. The XBRL language specification is developed and published by XBRL International, Inc., an independent international organization.

To improve the visual perception of the web, CSS technology has become widely used, which allows you to set uniform design styles for many web pages. Another innovation worth noting is the URN resource naming system. Uniform Resource Name).

A popular concept for the development of the World Wide Web is the creation of a semantic web. The Semantic Web is an add-on to the existing World Wide Web, which is designed to make the information posted on the network more understandable to computers. The Semantic Web is the concept of a network in which each resource in human language is provided with a description that a computer can understand. The Semantic Web provides access to clearly structured information for any application, regardless of platform and regardless of programming languages. Programs will be able to find the necessary resources themselves, process information, classify data, identify logical relationships, draw conclusions, and even make decisions based on these conclusions. If widely adopted and implemented well, the Semantic Web has the potential to revolutionize the Internet. To create a computer-friendly description of a resource, the Semantic Web uses the RDF format (Eng. Resource Description Framework), which is based on XML syntax and uses URIs to identify resources. New in this area is RDFS (Eng. RDF Schema) and SPARQL (eng. Protocol And RDF Query Language) a new query language for fast access to RDF data.

markup languages) is a set of special instructions, called tags, designed to form a structure in documents and define relationships between various elements of this structure. In other words, the markup shows which part of the document is the heading, which is the subheading, what should be considered the name of the author, etc. The markup is divided into stylistic markup, structural markup, and semantic markup.

Stylistic markup

Stylistic markup is responsible for the appearance of the document. For example, in HTML, this type of markup includes tags such as (italic), (bold), (underline) (strikethrough text), etc.

Structural markup

Structural markup defines the structure of a document. In HTML, for this type of markup, for example, tags (paragraph), (title),

(section), etc.

Semantic markup

Semantic markup informs about the content of the data. Tags are examples of this type of markup. (document name), (code, used for code listings), (variable),

(author's address).

The basic concepts of any markup language are tags, elements, and attributes.

Tags and elements.

The meanings of tags and elements are often confused.

Tags, or control tags as they are also called, serve as instructions for a client-side rendering program to deal with the content of a tag. In order to highlight the tag relative to the main content of the document, angle brackets are used: the tag begins with a less-than sign (<) и завершается знаком "больше" (>), inside which the name of the instructions and their parameters are placed. For example, in HTML, the tag indicates that the following text should be in italics.

An element is the tags together with their content. The following construction is an example of an element:

This text is in italics .

The element consists of an opening tag (in our example, this is the tag ), the content of the tag (in the example it is the text "This is text in italics") and the closing tag(), although sometimes in HTML, the closing tag can be omitted.

Attributes

Attributes are used to specify any parameters that specify the characteristics of this element when defining an element.

Attributes consist of a "name" = "value" pair, which can be specified when defining an element in a start tag. You can leave spaces to the left and right of the equal sign. The attribute value is specified as a string enclosed in single or double quotes.

Any tag can have an attribute if that attribute is defined.

When an attribute is used, the element takes the following form:

<имя_тега атрибут = "значение"> tag content

Text is center aligned

One opening tag can contain several attributes, for example:

Text size and color specified

The history of the development of markup languages.

The concept of hypertext was introduced by W. Bush in 1945, and starting from the 60s, the first applications using hypertext data began to appear. However, the main development this technology received when there was a real need for a mechanism for combining a variety of information resources, providing the ability to create, view non-linear text.

In 1986, ISO approved the Standardized Generalized Markup Language. This language is intended to create other markup languages, it defines the allowed set of tags, their attributes and internal structure document. Thus, it is possible to create your own tags related to the content of the document. It is now becoming apparent that such documents are difficult to interpret without a markup language definition, which is stored in a Document Type Definition (DTD). The DTD grouped all the language rules in the SGML standard. In other words, the DTD describes the relationship between tags and the rules for their application. Moreover, for each class of documents, its own set of rules is defined that describes the grammar of the corresponding markup language. Thus, only with the help of DTD can one check the correct use of tags and, therefore, it must be sent along with the SGML document or included in the document.

At that time, in addition to SGML, there were several other competing similar languages, but the popularity (HTML, which is one of its descendants) gave SGML an undeniable advantage over its counterparts.

Using SGML, you can describe structured data, organize the information contained in documents, and present this information in some standardized format. But because of its complexity, SGML was used mainly to describe the syntax of other languages, and few applications dealt directly with SGML documents. SGML is usually used only in large projects, for example, to create a unified document management system for a large company.

The HTML markup language is much simpler and more convenient than SGML , its instructions are primarily intended to control the process of displaying document content on the screen. HTML as a way to mark up technical documents was created by Tim Berners-Lee in 1991 specifically for the scientific community. Initially, it was just one of the SGML applications.

Despite the fact that the only thing that HTML can do is classify parts of a document and ensure that it displays correctly in a browser, it is the most popular markup language. This is because HTML is fairly easy to learn. All you have to do is learn the HTML commands. The DTD for HTML is stored in the browser. In addition, it should be noted that HTML is designed to work on a variety of platforms. But it has a number of significant limitations:

  1. HTML has a fixed set of tags, and this set cannot be extended or changed;
  2. HTML language tags show only how the data should be presented, that is, the appearance of the document. HTML does not carry information about the meaning of the content contained in the tags, the structure of the document.

Any document has three components:

  • content;
  • structure;
  • style.

Usually the content of the document is not presented in an arbitrary order, but has a certain structure . The structure is the composition and sequence of parts (blocks) of the document.

Style A document defines the form in which its content will be output to a particular device (for example, a printer or a display). The concept of style includes the characteristics of the font (name, size, color) of the entire output document or its individual blocks, the order of pagination, the arrangement of blocks on pages, and other parameters.

Document markup languagesare artificial languages ​​designed to describe the structure of a document and the relationships between the various objects of the structure. Markup data is also called metadata.

The first markup language is GML language .His immediate successor was SGML language - a standard generalized markup language that defines the rules for writing document markup elements.

Document markup language requirements:

  1. The language must be human readable.
  2. Marked up document files must be textual and encoded using code characters ASCII
  3. The language can use links to both internal resources (in the same document) and external resources (in other documents).

In SGML and similar languages ​​use special document markup tools:

  • document structure;
  • descriptors or elements and their associated attributes;
  • entities (entities);
  • comments.

SGML Documents have a tree structure.

Descriptors in SGML placed at the beginning (opening descriptor) and at the end (closing descriptor) of each element (item ).

Attributes are simple symbolic constructions ( items ) that are added to elements to give them refinement of the descriptor action.

Generic markup languages ​​like SGML , allow the use of attributes with which up to 15 different types of values ​​can be associated, including:

  • References to any resources outside the document that are usually referred to as entities ( entities ).
  • Unique identificator ( ID ) element in the document.
  • Identifier pointers ( ID Pointers ) that have cross-references for those elements that have ID mentioned in the document.
  • Element tags or attributes that define the tags in the element's content.
  • Character data ( character data ), or CDATA , which are any valid characters that cannot be used as attribute values.

Comments allow you to add information that will not be visible after processing the document. Comments do not affect the speed of document processing, are not considered or processed as part of the content SGML -document. They are simply included in the source text.

To check the conformity of the document with the markup of a given type, special programs are used - analyzers (parsers). Parsers are either standalone programs or part of an SGML document processing program. In order for the parser to perform document validation, a special document is created calleddocument type definition

HTML language is a language application SGML for use in Internet with a fixed structure, a fixed set of elements (descriptors) and their attributes, as well as a fixed set of entities. extended markup language XML (Extensible Markup Language). XML language is a subset of the language SGML , fully compatible with it.

The XML language provides a wide range of functionality that is not available in HTML

4 . 3 .2. Versions and extensions of HTML and XML

First version hypertext markup language– HTML (HyperText Markup Language), like the Web technology itself, was developed by Tim Berners Lee in 1991. HTMLis an implementation of the SGML language rules for a document type that has been named documents HTML. The language defines a fixed structure, a fixed set of tags and their attributes, and a fixed set of entities. HTML document processing programs are called Web-browsers . The result of document processing is Web-page displayed on the display screen.

In 1994, the Internet Support Group - IETF ( Internet Engineering Task Force) developed the HTML 2.0 specification, which began the widespread adoption of the HTML language on the web Internet . In the same year, the W3C (World Wide Web Corporation) consortium was created, bringing together 165 commercial and academic organizations, developers and users (from its inception to the present, this organization is headed by T. B. Lee). latest version HTML specification - HTML 4.01 was adopted by the consortium in December 1999.

  • The XML language provides a wide range of functionality not found in HTML.

The latest version of the XML language specification, XML 1.1, was adopted in April 2004.

Based on the XML language, the W3C has developed a further development of the language HTML - XHTML language (Extended HTML - extended HTML). The first version of this language, XHTML 1.0, was adopted in January 2000. This version is actually a reformulation of HTML 4 as an application of XML 1.0. It is assumed that further development of the HTML language will be carried out in accordance with the XHTML specifications.

A new version of XHTML, XHTML 1.1, was adopted by the W3C in May 2001. This recommendation defines a new document type, module-based XHTML. Each XHTML 1.1 module contains one or more HTML language elements and/or attributes.

According to specification, documents XHTML 1.1 consists of the following groups of modules XHTML :

Core modules are modules that are required in any type of document conforming to the XHTML specification (this group includes modules Structure , Text , Hypertext and List ).

Applet module , containing a single element< applet > (this element has been deprecated and it is recommended to use the element instead< object > ).

Text extension modules, which define various additional text markup modules (this group includes modules Presentation , Edit and Bi-directional Text ).

Form modules (this group includes modules Basic Forms and Forms).

Table modules (this group includes modules Basic Tables and Tables).

Image module A that provides basic image embedding capabilities (this module can also be independently used in some implementations by client-side image maps).

Client - side Image Map module , which provides elements for client-side image maps (this module requires the inclusion of the module Image ).

Object module A that provides support for including general purpose objects.

Frames module A that provides elements related to frames.

URL (relative URLs of the document are calculated using this element).

Name Identification Module , used to identify specific elements in HTML documents.

Legacy module A that specifies elements and attributes that are no longer recommended in previous versions of HTML and XHTML and are no longer recommended.

4 . 3.3. Types of HTML and XHTML structures

According to specification HTML 4.01 for documents HTML defines three structures , described by the three DTDs. Developers web -pages must include one of three type declarations in their documents. The difference between DTDs lies in the elements they support. Announcement DTD should be placed at the very beginning of the document.

HTML 4.01 Strict DTD (strict definition) includes all elements and attributes that are not overridden ( deprecated ) and not used in framed documents.

HTML 4.0 Transitional DTD Definition (transitional definition) includes all elements included in the strict DTD , as well as deselected elements and attributes.

Definition of the HTML 4.0 Frameset DTD (definition for frames) includes, in addition to the elements of the transitional DTD, frames.

First line of document HTML , defined in accordance with the specification XHTML

This line specifies the version to use XML and character encoding of the document. When encoding characters in XML double byte code is used Unicode . As parameter values encoding the most commonly used encodings UTF-8 , in which the values ​​of the first 128 characters are represented in a single-byte encoding, the characters of the most common languages ​​(including Russian and Ukrainian) are represented by two bytes, and the remaining characters are represented by three bytes. Encoded UTF-16 all characters are represented by two bytes (this encoding is recommended for Russian and Ukrainian documents HTML).

HTML markup language

To date, there are many technologies for creating Web pages, without which a Web master cannot do. But the basis for developing Web documents is, of course, the HTML hypertext markup language.

HTML is primarily a markup language, and code written in it is executed on the client's computer in a Web browser application. Related to this is its relative simplicity and ease of development.

Why do you need a markup language?

When you create a regular document in a word processor program, you can easily format the document, such as setting characters to italic or bold, giving a paragraph a heading or plain text style, and so on. What you do with the document on the monitor screen is transferred to paper in the same form when printed on the printer.

Whether you select an option from the drop-down menus or give a key command, you immediately see the result of your efforts on the screen. However, the specific commands that implement the display of the document on the screen or on paper will be hidden from you.

In the case of Web pages, the user does not deal with paper, but with electronic documents received via the Internet. The principle of displaying a document with formatting tools of the parent application is not acceptable here. The user would have to have too many applications or all kinds of converters on his computer in order to work effectively with the many possible document formats.

The idea of ​​solving the problem of exchanging documents between different computers and applications over the Internet is based on the HyperText Markup Language (HTML). This language was created more than 15 years ago as a document format standard and was adopted by the vast majority of Internet users, and most importantly, by all manufacturers. software and equipment for the Web. Documents marked up according to HTML can be read on any computer that has only one viewer for such documents - a browser.

Thanks to the HTML markup language, a Web client can view a document on the screen of his computer in the form in which the developer intended it: with certain font sizes and paragraph breaks, with a certain arrangement of pictures, hyperlinks, and so on.

A text document written in HTML has a size in bytes several times smaller than the size of a similar document prepared in a word processor (for example, Word).

Berners-Lee (the developer) based the developed language on the SGML language and methods of working with hypertext, which is the reason for the name of the language he created - HTML. New language used basic SGML constructs to describe documents and hypertext links.


Hypertextis a way of organizing text, graphics, and other data in which data elements are linked together. Both elements of one document and elements of different documents can be linked. The hypertext structure is at the heart of the World Wide Web.

Hypertexts are electronic documents. You can work with hypertexts only on a computer; in printed form, hypertexts do not exist. An example of a hypertext system is the well-known Windows help system.

Connections in the hypertext structure are carried out using links. Thanks to links, the user can call another document from one document, the next document from it, and so on.

In 1989, Berners-Lee developed information system, reminiscent of the poutine of documents linked by links. Documents are stored on servers located all over the world and interconnected by Internet channels. He developed HTTP protocol - the language in which servers are to exchange hypertext documents, and wrote the first Web server and browser programs. He approached the Internet community directly, and in 1991 enthusiasts began to create the first Web sites.

Over the years, the World Wide Web has grown rapidly to become the most popular service on the Internet. It currently satisfies the information needs of the widest range of users, including millions of Web sites. Large sites host thousands and hundreds of thousands of documents, and the total number of documents on the WWW is increasing every second, since a huge army of specialists and amateurs in different parts of the globe is working on their creation.

world wide webor abbreviated web- it is a global hypertext information dissemination system using the Internet as a transport channel.

In fact, the World Wide Web is a hypertext space of documents that is not related to the geography of the Web sites themselves. Therefore, in this space, the physical distance between nodes does not make sense. You can view Web pages on the monitor screen in the same way, which are stored both on a computer disk in the next room and on a server located in another country.

The World Wide Web operates according to certain standards, which are developed and implemented by an association of research and industrial organizations - a consortium W3C(short for World Wide Web Consortium). .

The HTML markup language was based on the SGML language. Markup tools for paragraphs, headings, lists and other elements available in HTML were also provided in SGML. The merit of the inventor of HTML is that he introduced into the markup language what was not SGML - these are hypertext links.

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Similar Documents

    Definition of the concept of hypertext. The main parts of an SGML document. History of creation standard language HTML document markup. Differences between XHTML syntax and HTML. RSS is a family of XML formats for describing news feeds. Using the KML markup language.

    presentation, added 02/15/2014

    Fundamentals of the programming language of Web pages - HTML. The types of information a Web page can contain are text, graphics, sound, animation, and video. Toolkit for creating Web-pages. Basic HTML editors that are used for Web design.

    abstract, added 01/19/2011

    general characteristics Hypertext Markup Language. The structure of the HTML document. An overview of the main features of HTML. Elements of modern web-page design. Analysis practical application HTML (on the example of tutorials).

    term paper, added 11/24/2012

    Basic tags and attributes of the HTML language. Creation of a website, which should be several interconnected pages. Consider different attribute and tag values ​​on pages and other documents. Screen forms of the developed pages.

    laboratory work, added 04/16/2014

    What is markup. A markup language is a set of conventions about formatting principles that are used to encode text blocks. Possibilities of SGML, HTML, XML formats, creation history, application specifics, control over information placement.

    abstract, added 03/22/2010

    The new hypertext markup language XHTML. Validation of XHTML documents, determination of their type. Common mistakes in XHTML markup. Conformity of user agents. Using XHTML with other namespaces. Extension of HTML semantics.

    term paper, added 07/14/2009

    Studying the recursive descent algorithm and the grammar building system using the Lex lexical analyzer. Writing an interpreter program for the HTML markup language. Checking the input sequence for the correctness of the input as common function programs.

    control work, added 12/25/2012