OpenMetadata 0.9.0 Release

Published in

OpenMetadata

11 min readMar 10, 2022

OpenMetadata 0.9.0 Release — Data Collaboration via Activity Feeds and Conversation Threads, Data Quality and Tests, Glossary Support , New Connectors and Lineage Support for many data sources.

Written by Vivek Subramanian, Suresh Srinivas, Sriharsha Chintalapani

The OpenMetadata community is very proud to bring three significant features in release 0.9 — Conversation Threads in the Activity Feed, Data Quality, and Glossary. Data has too many tools problems that fragment metadata and break user workflows causing user frustration and tool fatigue. The 0.9 release is laying a strong foundation for coherent packaging of many related functionalities together on the strong foundation of Collaboration.

We are iteratively building many important features based on community feedback.

With Conversation Threads, users can track the data as it changes, comment on it, request changes, and ask questions without having to jump to another tool. Threaded discussions on data assets enable users to collaborate to improve the data within an organization.
For Data Quality, we previously shipped the profiler and standard metrics. In 0.9, we enable users to choose which tables should be profiled and define data quality tests at a table and column level toward higher data quality and reliability.
OpenMetadata now supports you to build a Glossary — a Controlled Vocabulary to describe important concepts and terminologies within your organization to foster a common and consistent understanding of data.
We’ve continued to improve our Lineage and Security features in addition to supporting a dozen new connectors in this release.

OpenMetadata is making progress towards providing a rich, collaborative experience for data teams. We’ve published two blog posts in the last few weeks, which explain our thinking and the details behind some of the features we are building.

Please check out Building Access Control for OpenMetadata by Matt to understand the motivation and technical implementation details of the Role-Based Access Control feature implemented in the 0.8 release. Pere Miquel Brull recently posted ML is not just about ML, which talks about the evolving landscape of machine learning and how OpenMetadata can play an important role in the ML lifecycle.

OpenMetadata will be offered as a SaaS option soon. If you are looking for a hosted solution, please sign up here.

Community Update

We’ve had an amazing response from the community in the last seven months since we started the OpenMetadata open source project. We heartily thank our contributors, committers, and community at large, who’ve been generous with their feedback, ideas, and time.

Key Metrics

100+ new members joined our Slack in the last 30 days
320 commits are merged into 0.9 release
52 contributors provided features/improvements to 0.9 release
We have bi-weekly meetings scheduled for OpenMetadata. Please join our Meetup and RSVP to our weekly meetings.
Please give us a GitHub star if you like what we are doing. It really helps OpenMetadata reach a wider audience.

Stay tuned for all the awesome upcoming features around metadata. Join the OpenMetadata community so we can build it together.

New in 0.9 OpenMetadata Release

Activity Feeds & Conversation Threads

Data Collaboration via Activity Feeds and Conversation Threads

One of the challenges to user collaboration is the tool disconnect in the data ecosystem. Users have to jump from one tool to another (Slack, emails, jira, etc.) to ask questions, suggest a change, or comment on the data. This context switch breaks user workflows, reduces productivity, causes user frustration and tool fatigue. There is no way to track all the rich information from such user interactions in a single place as metadata.

As part of the 0.7 release, we introduced Activity Feeds to keep users informed about how data is changing in their organization, such as the creation of new data, updates to tags, descriptions, schema changes, etc. Activity Feeds now support Conversations. You can not only track the changes happening to the data, but you can also now actively provide comments, ask for additional changes, and collaborate with others to get the data right as the changes are happening. You can do this right there in the rich UI of OpenMetadata without breaking your workflow.

You can also create conversations on the existing data in the data asset details page. A chat icon is displayed next to all the data entities that support conversation threads. If data is missing the description, no need to create a ticket in an external system. Just click the chat icon and ask for a description. The owner of the data asset will be notified about the request. Similarly, create a conversation if you want to suggest a change, or have questions about the data. Bring in other experts to the conversation by mentioning them with @mentions so you can collaborate toward a solution. Refer to other data assets in the conversation with #mentions to bring additional context.

In OpenMetadata, the conversation threads belong to a data asset similar to a channel in Slack. But unlike tools like Slack, the conversation is not limited to just the channel members. Every conversation has a dynamic context of participants. A conversation automatically includes the data asset, the owner and followers of that data asset, people who are mentioned in the posts, and the user who created the thread as participants. All the conversations relevant to a user are tracked on the homepage of the user as Activity Feeds that include feeds related to the data entities that the user owns, follows, or was mentioned in by other users or had participated in the conversation. A new tab is introduced in the Data Entity pages, called Activity Feed, where all the activities related to that particular data asset are tracked and displayed in one place. We hope this will encourage teams to collaborate and share tribal knowledge in one central place.

In future releases, the users or owners will be able to flag certain threads and pin them on the entity page. They’ll also be able to filter by pinned threads. We’ll work on providing the ability to convert the conversation threads into Tasks so that we can nurture collaboration towards an action. We call such interactions as Micro Workflows to create tasks, work on them, and finally resolve them.

Data Quality and Writing Tests

The Data Profiler and Data Quality features are improving every release. Previously, the data profiler ran together with the metadata ingestion and hence delayed the actual ingestion of metadata. Profiling is expensive and time-consuming and hence we have separated the metadata ingestion and data profiling into two different jobs. The ingestion job extracts the metadata from sources and updates the entities’ instances, while the profiling job extracts the metrics from SQL sources, and runs data quality tests.

The profile job runs on all the tables that we are fetching data for and collects various metrics. Data profiling can be configured for specific tables, instead of all the tables to focus on only important data assets to control the costs. Users can follow the same configuration as in the ingestion workflow to filter out specific tables and schemas. The metadata ingestion can be scheduled at a different cadence to run more frequently. A different schedule can be set up for the profiling and testing workflows.

Data quality and tests play a significant role in making the data assets trustworthy. OpenMetadata now supports metadata standards for defining tests inspired by Great Expectations. Add tests on the rich metrics compiled by the data profiler.

All the test results are stored in a time-series fashion and can be retrieved using the API. The UI currently shows the last execution, and we will continue to iterate on the UX to make testing an approachable and integral part of the data lifecycle. Now, users can discover, understand the data, as well as if that particular data asset is reliable all in a single place.

This release showcases the first iteration of the Data Quality feature. We’re already working on several improvements, such as custom metrics, increasing the number of tests, and alert notifications around the test cases. We are also working on Great Expectations integration to gather test and test results metadata and store them in OpenMetadata.

Glossary

Glossaries are a Controlled Vocabulary in an organization used to define the concepts and terminologies specific to a particular domain. A glossary helps to establish consistent meaning to terms and to define them in a central place to establish a common understanding and to build a knowledge base. Glossary terms can also help to organize or discover data entities. OpenMetadata models the Glossary as a Thesauri, a Controlled Vocabulary that organizes terms with hierarchical, equivalent, and associative relationships.

Glossaries are a collection of hierarchy of Glossary Terms that belong to a domain.

A glossary term is specified with a preferred term for a concept or a terminology, example — Customer.
A glossary term must have a unique and clear definition to establish consistent usage and understanding of the term.
A term can include Synonyms, other terms used for the same concept, example — Client, Shopper, Purchaser, etc.
A term can have children terms that further specialize a term. Example, a glossary term Customer, can have children terms — Loyal Customer, New Customer, Online Customer, etc.
A term can also have Related Terms to capture related concepts. For Customer, related terms could be Customer LTV (LifeTime Value), Customer Acquisition Cost, etc.

Glossary Term has Assets using which you can discover all the data assets related to the term. Each term has a life cycle status (e.g., Draft, Active, Deprecated, and Deleted). A term also has a set of Reviewers who review and accept the changes to the Glossary for Governance.

The terms from the glossary can be used for labeling or tagging as additional metadata of data assets for describing and categorizing things. Glossaries are important for data discovery, retrieval, and exploration through conceptual terms and help in data governance.

Connectors

New Connectors

Of the 47 connectors that are supported by OpenMetadata, 12 new connectors have been added in the 0.9 release. The following connectors have been added to fetch metadata and to ingest the same into OpenMetadata:

Apache Atlas — We are making it easy to migrate your older catalogs into the OpenMetadata platform. Use our Atlas connector to migrate your metadata into OpenMetadata.
Apache Iceberg — OpenMetadata now supports the ingestion of Apache Iceberg tables as a tableType. Since Iceberg tables are tightly linked to a metastore like Glue or Hive, they are pulled in as part of the Hive or Glue ingestion and marked as external tables.
Azure SQL
ClickHouse
ClickHouse Usage
Databricks
Delta Lake
DynamoDB
IBM Db2
Power BI
MSSQL Usage — Previously, we supported the MSSQL connector. Now we support the Usage connector for MSSQL to ingest usage metadata.
SingleStore

Lineage Support

Earlier, lineage was mostly around Airflow. Now, lineage support has been added for several connectors to fetch the upstreams and downstream from the queries. All the lineage will be published when you run the metadata connector itself.

Lineage for Snowflake is supported via Usage as well as View definitions. Lineage via View Definitions is supported using SQLAlchemy for MySQL, Athena, AzureSQL, BigQuery, ClickHouse, Databricks, IBM Db2, Druid, Hive, MariaDB, MSSQL, Oracle, Postgres, Presto, Redshift, SingleStore, Snowflake, Trino, and Vertica.

We also support lineage from the dashboard connectors like Tableau, Metabase, and Superset. We get the data sources and query metrics from the dashboards and connect them to the relevant data sources.

Updates to Existing Connectors

In this release, there have been changes and updates to some of the existing connectors.

The Amundsen connector has been updated.
For the BigQuery and BigQuery Usage connectors, Application Default Credentials (ADC) have been implemented to provide a better and more secure approach to ingest data, without the use of credential files.
The Tableau connector has been upgraded to support Personal access token name and Secret.

UI Improvements

The Table Details page now has a Queries tab, which displays the queries that run against a table.
The trendline on the UI displays the number of rows next to it, so users need not hover over the trendline to look for more details.
Users can update teams based on updateTeams permission.
Admins can now remove users from Teams.
Users can delete their recently searched terms.

Other Features

OpenMetadata now supports Azure SSO as new integration in security.
Improvements have been made to the Single-Sign-On authentication from Okta and Google SSO.
OpenMetadata supported native SSO providers such as Google, Okta, and other SSO based on Auth0. Now, we also accommodate the OAuthProxy handler, which authenticates the user and returns the user’s email address in HTTP headers to login to OpenMetadata.
Owner support has been added for all services and databases to ensure that all services have an owner. This’ll help in collaborating with the service owner to get the required changes done, such as creating a new database, or configuration changes in a database service. Databases can have an owner independent of the table it contains.
Role-Based Access Control has been implemented in OpenMetadata to provide role-based permissions. The default role allocation for users has been simplified. The Permissions API has been integrated into the UI to get permissions for a logged-in user. Authorization checks have been added to the UI based on the logged-in user permissions.
Admins can choose a default role to assign to users during sign-up
Dataset level and column level lineage can be added manually.
Now, the table entity page loads the data incrementally.

Thanks to our Contributors

Yet again, thank you to our amazing community of data citizens! We are immensely grateful for your code contributions, active participation, continued support, and all the feedback that we’ve been receiving from you. We are listening! Please keep it coming.

Thank you to the following community members for your feedback and contributions:

Igor Kramer for providing improvements to the Metabase and Tableau ingestion; and for all the apt feedback around Postgres, Vertica, ClickHouse, and more.
Tom Vijlbrief for starting the discussion around Glossary and for the initial code contributions.
Haithem Souala for suggesting a dictionary that has now been added as a Glossary.
Teddy for code contributions on the Apache Iceberg ingestion.
Matt for working on the APIs.
Thank you Adam Sadek, Aleksandr Diamond, Comet, Dan Andreescu, Ebu, Eliseev Alexander, Francesco Mucio, Julia Valenti, Kevin Tai, Lihan, Rubens Rodrigues, SeungwanJo, Shuai Wang, and Sidharth Pallerla for providing valuable feedback that made it into the 0.9 release.

Register for our next Webinar on How to configure metadata extraction, usage, and data quality in OpenMetadata on 23rd March, 2022 at 9:00 AM PST

Please reach out to us on Slack if you have any questions about code, installation, and docs.

For feature requests, please file a GitHub issue or reach out to us on Slack. Interested in contributing code? Here are some good starting issues to get you going. Let’s together aim for the best possible code quality by driving the sonar cloud flagged issues to zero.

A huge THANK YOU for taking the time to explore and contribute to OpenMetadata. We look forward to your continued support and partnership with feedback, questions, and comments.