Skip to content

HTML

What HTML actually is (and is not)

HTML is a document markup language, not a programming language. It describes what things are, not how they look or how they behave. Browsers parse HTML into a DOM tree that becomes the foundation for CSS styling, JavaScript behavior, accessibility, and security decisions. If the structure is wrong, everything built on top becomes fragile. HTML correctness is a leverage multiplier.

The document model (DOM)

When a browser loads HTML, it incrementally builds a Document Object Model (DOM).

  • Nodes represent elements, text, comments, etc.
  • Parent–child relationships define structure.
  • The DOM is live and mutable via JavaScript.
  • CSS selectors and JS APIs operate on this tree.

Poor HTML leads to:

  • Unexpected layout behavior
  • Accessibility failures
  • Hard-to-debug JavaScript issues

Required document skeleton

Every HTML document follows this high-level structure:

<!DOCTYPE html>
<html lang="en">
<head>
<!-- metadata -->
</head>
<body>
<!-- visible content -->
</body>
</html>

<!DOCTYPE html>

Enables standards mode. Without it, browsers may enter quirks mode (legacy behavior). Always include it.

<html>

Root element of the document. lang attribute is critical for accessibility and SEO.

Contains metadata, not content. Common responsibilities:

  • Character encoding
  • Page title
  • CSS inclusion
  • Script loading hints
  • SEO and social metadata

<body>

Contains all visible, interactive content. This is where the application lives. Core metadata tags (inside ) <meta charset="utf-8">

  • Must be early in .
  • Prevents encoding bugs.

<title>

Browser tab title.

Used by search engines and bookmarks.

<meta name="viewport">

Essential for responsive layouts.

Controls how mobile browsers scale content.

Used for stylesheets, icons, preloads.

Declarative resource loading.

<script>

Loads JavaScript.

defer and async dramatically affect execution timing.

Structural content tags (the backbone)

These define meaningful sections, not layout.

Introductory content for a page or section.

Often contains navigation or headings.

Major navigation blocks.

Helps screen readers and search engines.

<main>

Primary content of the document.

Should appear once per page.

<section>

Thematic grouping of content.

Should usually have a heading.

<article>

Self-contained, reusable content.

Examples: blog post, comment, card.

<aside>

Tangential or supplementary content.

Sidebars, callouts, related links.

Footer for a page or section.

Metadata, links, legal text.

Semantic structure matters more than visual layout.

Headings (critical and often misused)

Headings represent structure, not font size.

<h1> through <h6>

Define a logical outline.

<h1> is the top-level heading.

Do not skip levels arbitrarily.

Used heavily by screen readers.

Text-level semantics (meaning over style)

Prefer semantic tags over generic ones.

<p>

Paragraphs of text.

Automatic spacing and block behavior.

<strong> vs <b>

<strong> = importance

<b> = visual emphasis only

<em> vs <i>

<em> = emphasis (affects screen readers)

<i> = alternate voice or style

<span>

No semantic meaning.

Use only when nothing else fits.

Lists (structured repetition)

Lists convey structure to accessibility tools.

<ul>

Unordered list

<ol>

Ordered list

<li>

List item (must be inside ul/ol)

<a>

Hyperlink element. href defines navigation target. Without href, it’s not a link.

Important notes:

  • Links are for navigation, not actions.
  • Buttons trigger actions; links navigate.

Media elements

<img>

Embeds images.

alt is mandatory for accessibility.

Does not have a closing tag.

<video> / <audio>

Native media playback.

Support controls, captions, multiple sources.

Forms (user input and data submission)

Forms define how users interact with data.

Form semantics affect:

  • Keyboard navigation
  • Validation
  • Screen reader behavior

<form>

Groups inputs and defines submission behavior.

<input>

Many types: text, email, checkbox, radio, password, etc.

Type determines validation and UI.

<textarea>

Multi-line text input.

<select> / <option>

Dropdown selection.

<label>

Associates text with an input. Crucial for accessibility.

Tables (structured data only)

Misuse of tables for layout is a historical antipattern.

<table>

For tabular data, not layout.

Core elements:

  • <thead>, <tbody>, <tfoot>
  • <tr> rows
  • <th> header cells
  • <td> data cells

Script and style integration

<script>

Executes JavaScript.

  • defer → execute after parsing, before DOMContentLoaded.
  • async → execute as soon as loaded, order not guaranteed.

<style>

Inline CSS (generally discouraged at scale).

Script placement and attributes directly affect performance and correctness.

Attributes (data and behavior)

Use attributes for declarative intent, not logic.

Attributes modify elements:

  • Global attributes: id, class, hidden, tabindex
  • ARIA attributes for accessibility
  • data-* for custom metadata

HTML and accessibility

Correct HTML:

  • Enables keyboard navigation
  • Enables screen readers
  • Enables assistive technologies

Bad HTML:

  • Cannot be fixed fully with JavaScript
  • Breaks users silently

Accessibility starts with semantic HTML, not ARIA.

HTML error handling (browser reality)

HTML is forgiving by design.

Browsers:

  • Auto-close tags
  • Reorder invalid structures
  • Guess intent

This tolerance hides bugs. Invalid HTML leads to unpredictable DOMs.

Rule:

  • Write valid HTML even if browsers “fix” it.

Mental checklist

  • Does this element describe what it is?
  • Is this structure meaningful without CSS?
  • Would this work with a keyboard only?
  • Does the DOM match my mental model?