AWS Glue Data Catalog

AWS Glue Data Catalog - Persistent metadata store

  • It a managed service that lets you store, annotate, and share metadata which can be used to query and transform data.
  • One AWS Glue Data Catalog per Region.
  • IAM policy control access.
  • Can be used for data governance. Glue

Key Metadata Components in AWS Glue

  • Databases: Logical groupings of tables.
  • Tables: Contain the actual data and these have:
    • Name: Table identifier
    • Schema: List of columns and their data types
    • Location: Where the data is physically stored (like an S3 bucket path)
    • Partitions: Subdivisions of tables for optimization
    • Classifiers: Define patterns to extract additional metadata

So, what is Metadata?
Simply put, metadata is information that describes other data. It’s like a label or summary attached to a file, object, or dataset.
Metadata doesn’t describe the content itself, but rather the characteristics of the data.

Examples:
Document: Title, author, creation date, file size, keywords, word count
Image: Camera model, capture date and time, location (GPS), image dimensions, resolution, color information
Email: Sender, recipient, subject line, date and time sent, attachment details
Website: Keywords, description, author, last updated, page language