OpenMetadata

OpenMetadata is an open-source project that is driving Open Metadata standards for data. It unifies all the metadata in a single place in a Centralized Metadata store and helps people Discover, Collaborate, and Get their data right.

Follow publication

OpenMetadata 0.6.0 Release — Metadata Versioning, Events API, One-Click Ingestion, and more

--

Written by Suresh Srinivas, Sriharsha Chintalapani

We are super excited to announce the OpenMetadata 0.6 release with the key new metadata versioning and eventing features. The introduction of metadata versioning is a significant development for data governance, debugging issues, and promotes user collaboration. We look forward to building amazing features around versioning, eventing, and lineage in future releases.

Community Update

It’s great to see the rapid growth of our community in the past 30 days. Thanks to all of our contributors who made valuable contributions to make 0.6 a successful release.

Key Metrics

  1. 100+ new members joined our Slack in the last 30 days
  2. 300+ commits are merged into 0.6 release
  3. 28 contributors provided features/improvements to the 0.6 release
  4. We have bi-weekly meetings scheduled for OpenMetadata. Please join our meetup and RSVP to our weekly meetings

We are thrilled to witness the continued growth of our OpenMetadata community. Come and join our community. Let us shape the future of metadata together!

New in 0.6.0 OpenMetadata Release

Metadata Versioning

The metadata in an organization is in constant flux. Technical metadata such as schemas for tables, pipeline tasks, or dashboard charts change over time. Users interact with and change business metadata including tags, ownership, descriptions, and so on. This history of changes is in itself important metadata and is currently not tracked in most organizations.

Starting in 0.6, OpenMetadata captures these changes as new versions of an entity. Entity versioning serves as an organizational memory for data. OpenMetadata maintains the version history for all entities in the Major.Minor number, starting with 0.1 as the initial version of an entity.

Changes in metadata will result in version changes as follows:

  • Backward compatible changes result in a Minor version change. A change in the description, tags, or ownership will increase the version of the entity metadata by 0.1 (e.g., from 0.1 to 0.2).
  • Backward incompatible changes result in a Major version change. For example, when a column in a table is deleted, the version increases by 1.0 (e.g., from 0.2 to 1.2).

Entity views in OpenMetadata provide a timeline visualization of all the metadata changes from version to version. Every version provides information on what changed from the previous version. We’ve built Versions APIs for developers to access a list of all versions, as well as to get a specific version of an entity.

Metadata versioning helps simplify the debugging process. Users can view the changes and instantly identify if a recent change led to the data issue. Data owners and admins can review the changes and revert if necessary.

Versioning also helps in broader collaboration among consumers and producers of data. Previously, it was the onus of the Owner or Admin to keep metadata current. With metadata versioning, admins can provide access to more users in the organization to change certain fields. Crowdsourcing makes metadata the collective responsibility of the entire organization to Continuously Improve Data.

Events API

When the state of metadata changes, an event is produced that indicates which entity changed, who changed it, and how it changed. These events can be used to integrate metadata into other tools or to trigger actions. Now, followers of data assets can be notified of events that interest them. Alerts can be sent to followers and downstream consumers about table schema changes, or backward-incompatible changes, like when a column is deleted. These events can be used to build powerful apps and automation that respond to the changes from activities. See issue-1138 for more details about this feature.

One-Click Deployment of Ingestion Pipelines

We all agree that manually deploying pipelines to fetch metadata is cumbersome. Keeping the user experience in mind, OpenMetadata is providing a UI integration with Apache Airflow as a workflow engine to run ingestion, data profiling, data quality, and other automation jobs. Admins can configure a service to run the OpenMetadata pipelines, and add an ingestion schedule to automatically kick off the ingestion jobs; all this right from the UI. This will deploy a workflow onto the cluster. This forms the basis for all future automation workflows.

New Entities: ML Models and Data Models

With the 0.6.0 release, adding of entities has been simplified. Two new data assets have been added — ML Models and Data Models. ML Models are algorithms trained on data to find patterns or to make predictions. Data Modeling tools such as DBT are getting adopted by many organizations. In the 0.6 release, we added support for DBT to get the data models into OpenMetadata such that users get to see what models are being used to generate the tables.

New Connectors

We are continually evolving our integration of data sources. Here’s a comprehensive list of connectors. If you do not have one listed that you would like to see in our upcoming release, please file a ticket here.

In the 0.6 release, we added support for:

  1. AWS Glue
  2. DBT
  3. Maria DB

User Interface

  • In the 0.6 release, the UI integration displays all the metadata changes of an entity over time as Version History. By clicking on the Version button, users can view the changelog of an entity from the very beginning. The earliest version relates to the metadata pulled from third-party systems using ingestion bots. The changes made by users or data engineers are also tracked and displayed along with a timestamp. Thereby providing a single pane view of the metadata’s evolution over time.
  • The UI supports setting up one-click metadata ingestion workflows.
  • Improvements have been made in showing the entity node details for lineage.
  • Guided steps have been added for setting up ElasticSearch used for Search and Suggest.
  • The entity details, search results page (Explore), landing pages, and components have been redesigned for better project structure and code maintenance.

OpenMetadata Quickstart

In the 0.6 release, we ship a Python package to simplify OpenMetadata docker installation.

pip install openmetadata-ingestion[docker]
metadata docker — start

Helm Charts

Installing OpenMetadata in your cloud provider or on-premise just got easier. We worked on Helm charts and here is the documentation on how to get the OpenMetadata and dependencies up and running on Kubernetes.

Other Features

  • Upgraded from JDBI 2 to JDBI 3, which will support newer versions.
  • Python3 client to access OpenMetadata APIs. This will help integrate OpenMetadata’s rich APIs into your applications.

OpenMetadata 0.6 release demo

We presented the high level features in our bi-weekly meeting. Please join our meetup and RSVP to our weekly meetings

Planned for 0.7.0 Release

Here are several features and improvements that are planned for our upcoming 0.7.0 release.

Support for User Collaboration

The next release is planned around support for user collaboration by allowing users to ask questions, suggest changes, provide feedback, and request new features for data assets. Tracking feature requests and feedback is critical metadata to gather in order to understand user needs. It is an important foundation for transforming the data culture of an organization to treat Data as a Product.

New Features for Lineage

Users will have the ability to add lineage information manually at table and column levels. Lineage will be used for tier propagation to upstream datasets. Currently, we are working on propagating column-level tags and descriptions using lineage.

Create Tests and Deploy them to Monitor Data Quality

In the upcoming release, we will be adding support for defining tests on the data profile metrics and allowing users to create custom metrics in addition to the default ones. These tests can be deployed onto Airflow to monitor the data quality continuously. Notifications and alerts will be sent when the quality issues are found by the tests to the owners and followers of the data.

Other Features

The forthcoming release will integrate metadata change events into Slack. A framework for integration will be built for other services such as Kafka and other notification frameworks.

Thanks to our Contributors

We are excited to see new community members joining and start contributing immediately. If you are interested in contributing code, we created good first issues to get you going. If you have any questions about code, installation, and docs, please reach out to us on Slack. If you have feature requests, please file a GitHub issue or reach out to us on Slack.

We are thankful to the following community members for their feedback and code contributions:

  1. Pere Miquel Brull Borràs, for his continued contributions to OpenMetadata. Pere added high-level OpenMetadata APIs, added Python tooling, and improved MLModel implementation.
  2. Vijay Mariadassou, contributed integration tests to improve the Python connector framework and fixed the developer documentation.
  3. Avi Greenwald, for fixing bugs, improving the Redash connector and adding Redash charts to OpenMetadata.
  4. Mithun Mathew, for adding policies to entities and his continued contributions to improve OpenMetadata.
  5. Rong Fengliang, for improving the ElasticSearch connector and the integration with OpenMetadata.
  6. Tom Vijlbrief, for his contributions towards adding Spatial Types from data sources and handling DOT notation in columns.
  7. Akash Jain, for adding helm charts to install OpenMetadata on Kubernetes.
  8. Reyhan Patria, dec0deit, and Abhi Khune for their first contributions.

We would love to hear your feedback on our workflow infrastructure, data sources, and dashboard services to prioritize OpenMetadata integrations.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in OpenMetadata

OpenMetadata is an open-source project that is driving Open Metadata standards for data. It unifies all the metadata in a single place in a Centralized Metadata store and helps people Discover, Collaborate, and Get their data right.

Written by Sriharsha Chintalapani

Builder at OpenMetadata. Apache Kafka , Storm PMC & committer. Previously at Uber, Hortonworks, Mozilla, Yahoo!.

No responses yet

Write a response