No More Tableau Downtime: Metadata Api For Proactive Data health

Trending 2 weeks ago
ARTICLE AD BOX

In today’s world, nan reliability of information solutions is everything. When we build dashboards and reports, 1 expects that nan numbers reflected location are correct and up-to-date. Based connected these numbers, insights are drawn and actions are taken. For immoderate unforeseen reason, if nan dashboards are surgery aliases if nan numbers are incorrect — then it becomes a fire-fight to hole everything. If nan issues are not fixed successful time, past it damages nan trust placed connected nan information squad and their solutions. 

But why would dashboards beryllium surgery aliases person incorrect numbers? If nan dashboard was built correctly nan first time, past 99% of nan clip nan rumor comes from nan information that feeds nan dashboards — from nan information warehouse. Some imaginable scenarios are:

  • Few ETL pipelines failed, truthful nan caller information is not yet in
  • A array is replaced pinch different caller one 
  • Some columns successful nan array are dropped aliases renamed
  • Schemas successful information storage person changed
  • And galore more.

There is still a chance that nan rumor is connected nan Tableau site, but successful my experience, astir of nan times, it is ever owed to immoderate changes successful information warehouse. Even though we cognize nan guidelines cause, it’s not ever straightforward to commencement moving connected a fix. There is no cardinal spot where you tin cheque which Tableau information sources trust connected circumstantial tables. If you person nan Tableau Data Management add-on, it could help, but from what I know, its difficult to find limitations of civilization sql queries utilized successful information sources.

Nevertheless, nan add-on is excessively costly and astir companies don’t person it. The existent symptom originates erstwhile you person to spell done each nan information sources manually to commencement fixing it. On apical of it, you person a drawstring of users connected your caput impatiently waiting for a quick-fix. The hole itself mightiness not beryllium difficult, it would conscionable beryllium a time-consuming one.

What if we could anticipate these issues and identify impacted information sources before anyone notices a problem? Wouldn’t that conscionable beryllium great? Well, location is simply a measurement now pinch nan Tableau Metadata API. The Metadata API uses GraphQL, a query connection for APIs that returns only nan information that you’re willing in. For much info connected what’s imaginable pinch GraphQL, do cheque retired GraphQL.org.

In this blog post, I’ll show you really to link to nan Tableau Metadata API using Python’s Tableau Server Client (TSC) room to proactively identify data sources utilizing circumstantial tables, truthful that you tin enactment accelerated earlier immoderate issues arise. Once you cognize which Tableau information sources are affected by a circumstantial table, you tin make immoderate updates yourself aliases alert nan owners of those information sources astir nan upcoming changes truthful they tin beryllium prepared for it.

Connecting to nan Tableau Metadata API

Lets link to nan Tableau Server utilizing TSC. We request to import successful each nan libraries we would request for nan exercise!

### Import each required libraries import tableauserverclient arsenic t import pandas arsenic pd import json import ast import re

In bid to link to nan Metadata API, you will person to first create a individual entree token successful your Tableau Account settings. Then update nan <API_TOKEN_NAME> & <TOKEN_KEY> pinch nan token you conscionable created. Also update <YOUR_SITE> pinch your Tableau site. If nan relationship is established successfully, past “Connected” will beryllium printed successful nan output window.

### Connect to Tableau server utilizing individual entree token tableau_auth = t.PersonalAccessTokenAuth("<API_TOKEN_NAME>", "<TOKEN_KEY>", site_id="<YOUR_SITE>") server = t.Server("https://dub01.online.tableau.com/", use_server_version=True) with server.auth.sign_in(tableau_auth): print("Connected")

Lets now get a database of each information sources that are published connected your site. There are galore attributes you tin fetch, but for nan existent usage case, lets support it elemental and only get nan id, sanction and proprietor interaction accusation for each information source. This will beryllium our maestro database to which we will adhd successful each different information.

############### Get each nan database of information sources connected your Site all_datasources_query = """ { publishedDatasources { name id proprietor { name email } } }""" with server.auth.sign_in(tableau_auth): consequence = server.metadata.query( all_datasources_query )

Since I want this blog to beryllium focussed connected really to proactively place which information sources are affected by a circumstantial table, I’ll not beryllium going into nan nuances of Metadata API. To amended understand really nan query works, you tin mention to a very elaborate Tableau’s ain Metadata API documentation.

One point to statement is that nan Metadata API returns information successful a JSON format. Depending connected what you are querying, you’ll extremity up pinch aggregate nested json lists and it tin get very tricky to person this into a pandas dataframe. For nan supra metadata query, you will extremity up pinch a consequence which would for illustration beneath (this is mock information conscionable to springiness you an thought of what nan output looks like):

{ "data": { "publishedDatasources": [ { "name": "Sales Performance DataSource", "id": "f3b1a2c4-1234-5678-9abc-1234567890ab", "owner": { "name": "Alice Johnson", "email": "[email protected]" } }, { "name": "Customer Orders DataSource", "id": "a4d2b3c5-2345-6789-abcd-2345678901bc", "owner": { "name": "Bob Smith", "email": "[email protected]" } }, { "name": "Product Returns and Profitability", "id": "c5e3d4f6-3456-789a-bcde-3456789012cd", "owner": { "name": "Alice Johnson", "email": "[email protected]" } }, { "name": "Customer Segmentation Analysis", "id": "d6f4e5a7-4567-89ab-cdef-4567890123de", "owner": { "name": "Charlie Lee", "email": "[email protected]" } }, { "name": "Regional Sales Trends (Custom SQL)", "id": "e7a5f6b8-5678-9abc-def0-5678901234ef", "owner": { "name": "Bob Smith", "email": "[email protected]" } } ] } }

We request to person this JSON consequence into a dataframe truthful that its easy to activity with. Notice that we request to extract nan sanction and email of nan proprietor from wrong nan proprietor object. 

### We request to person nan consequence into dataframe for easy information manipulation col_names = result['data']['publishedDatasources'][0].keys() master_df = pd.DataFrame(columns=col_names) for one successful result['data']['publishedDatasources']: tmp_dt = {k:v for k,v successful i.items()} master_df = pd.concat([master_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T]) # Extract nan proprietor sanction and email from nan proprietor object master_df['owner_name'] = master_df['owner'].apply(lambda x: x.get('name') if isinstance(x, dict) other None) master_df['owner_email'] = master_df['owner'].apply(lambda x: x.get('email') if isinstance(x, dict) other None) master_df.reset_index(inplace=True) master_df.drop(['index','owner'], axis=1, inplace=True) print('There are ', master_df.shape[0] , ' datasources successful your site')

This is really nan building of master_df would look like:

Sample output of code

Once we person nan main database ready, we tin spell up and commencement getting nan names of nan tables embedded successful nan information sources. If you are an avid Tableau user, you cognize that location are 2 ways to selecting tables successful a Tableau information source — one is to straight take nan tables and found a narration betwixt them and nan different is to usage a civilization sql query pinch 1 aliases much tables to execute a caller resultant table. Therefore, we request to reside some nan cases.

Processing of Custom SQL query tables

Below is nan query to get nan database of each civilization SQLs utilized successful nan tract on pinch their information sources. Notice that I person filtered nan database to get only first 500 civilization sql queries. In lawsuit location are much successful your org, you will person to usage an offset to get nan adjacent group of civilization sql queries. There is besides an action of utilizing cursor method successful Pagination erstwhile you want to fetch ample database of results (refer here). For nan liking of simplicity, I conscionable usage nan offset method arsenic I know, arsenic location are little than 500 civilization sql queries utilized connected nan site.

# Get nan information sources and nan array names from each nan civilization sql queries utilized connected your Site custom_table_query = """ { customSQLTablesConnection(first: 500){ nodes { id name downstreamDatasources { name } query } } } """ with server.auth.sign_in(tableau_auth): custom_table_query_result = server.metadata.query( custom_table_query )

Based connected our mock data, this is really our output would look like:

{ "data": { "customSQLTablesConnection": { "nodes": [ { "id": "csql-1234", "name": "RegionalSales_CustomSQL", "downstreamDatasources": [ { "name": "Regional Sales Trends (Custom SQL)" } ], "query": "SELECT r.region_name, SUM(s.sales_amount) AS total_sales FROM ecommerce.sales_data.Sales s JOIN ecommerce.sales_data.Regions r ON s.region_id = r.region_id GROUP BY r.region_name" }, { "id": "csql-5678", "name": "ProfitabilityAnalysis_CustomSQL", "downstreamDatasources": [ { "name": "Product Returns and Profitability" } ], "query": "SELECT p.product_category, SUM(s.profit) AS total_profit FROM ecommerce.sales_data.Sales s JOIN ecommerce.sales_data.Products p ON s.product_id = p.product_id GROUP BY p.product_category" }, { "id": "csql-9101", "name": "CustomerSegmentation_CustomSQL", "downstreamDatasources": [ { "name": "Customer Segmentation Analysis" } ], "query": "SELECT c.customer_id, c.location, COUNT(o.order_id) AS total_orders FROM ecommerce.sales_data.Customers c JOIN ecommerce.sales_data.Orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.location" }, { "id": "csql-3141", "name": "CustomerOrders_CustomSQL", "downstreamDatasources": [ { "name": "Customer Orders DataSource" } ], "query": "SELECT o.order_id, o.customer_id, o.order_date, o.sales_amount FROM ecommerce.sales_data.Orders o WHERE o.order_status = 'Completed'" }, { "id": "csql-3142", "name": "CustomerProfiles_CustomSQL", "downstreamDatasources": [ { "name": "Customer Orders DataSource" } ], "query": "SELECT c.customer_id, c.customer_name, c.segment, c.location FROM ecommerce.sales_data.Customers c WHERE c.active_flag = 1" }, { "id": "csql-3143", "name": "CustomerReturns_CustomSQL", "downstreamDatasources": [ { "name": "Customer Orders DataSource" } ], "query": "SELECT r.return_id, r.order_id, r.return_reason FROM ecommerce.sales_data.Returns r" } ] } } }

Just for illustration earlier erstwhile we were creating nan maestro database of information sources, present besides we person nested json for nan downstream information sources wherever we would request to extract only nan “name” portion of it. In nan “query” column, nan full civilization sql is dumped. If we usage regex pattern, we tin easy hunt for nan names of nan array utilized successful nan query.

We cognize that nan array names ever travel aft FROM aliases a JOIN clause and they mostly travel nan format <database_name>.<schema>.<table_name>. The <database_name> is optional and astir of nan times not used. There were immoderate queries I recovered which utilized this format and I ended up only getting nan database and schema names, and not nan complete array name. Once we person extracted nan names of nan information sources and nan names of nan tables, we request to merge nan rows per information root arsenic location tin beryllium aggregate civilization sql queries utilized successful a azygous information source.

### Convert nan civilization sql consequence into dataframe col_names = custom_table_query_result['data']['customSQLTablesConnection']['nodes'][0].keys() cs_df = pd.DataFrame(columns=col_names) for one successful custom_table_query_result['data']['customSQLTablesConnection']['nodes']: tmp_dt = {k:v for k,v successful i.items()} cs_df = pd.concat([cs_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T]) # Extract nan information root sanction wherever nan civilization sql query was used cs_df['data_source'] = cs_df.downstreamDatasources.apply(lambda x: x[0]['name'] if x and 'name' successful x[0] other None) cs_df.reset_index(inplace=True) cs_df.drop(['index','downstreamDatasources'], axis=1,inplace=True) ### We request to extract nan array names from nan sql query. We cognize nan array sanction comes aft FROM aliases JOIN clause # Note that nan sanction of array tin beryllium of nan format <data_warehouse>.<schema>.<table_name> # Depending connected nan format of really array is called, you will person to modify nan regex expression def extract_tables(sql): # Regex to lucifer database.schema.table aliases schema.table, debar alias shape = r'(?:FROM|JOIN)s+((?:[w+]|w+).(?:[w+]|w+)(?:.(?:[w+]|w+))?)b' matches = re.findall(pattern, sql, re.IGNORECASE) return list(set(matches)) # Unique array names cs_df['customSQLTables'] = cs_df['query'].apply(extract_tables) cs_df = cs_df[['data_source','customSQLTables']] # We request to merge datasources arsenic location tin beryllium aggregate civilization sqls utilized successful nan aforesaid information source cs_df = cs_df.groupby('data_source', as_index=False).agg({ 'customSQLTables': lambda x: list(set(item for sublist successful x for point successful sublist)) # Flatten & make unique }) print('There are ', cs_df.shape[0], 'datasources pinch civilization sqls utilized successful it')

After we execute each nan supra operations, this is really nan building of cs_df would look like:

Sample output of code

Processing of regular Tables successful Data Sources

Now we request to get nan database of each nan regular tables utilized successful a datasource which are not a portion of civilization SQL. There are 2 ways to spell astir it. Either usage nan publishedDatasources entity and cheque for upstreamTables aliases usage DatabaseTable and cheque for upstreamDatasources. I’ll spell by nan first method because I want nan results astatine a information root level (basically, I want immoderate codification fresh to reuse erstwhile I want to cheque a circumstantial information root successful further detail). Here again, for nan liking of simplicity, alternatively of going for pagination, I’m looping done each datasource to guarantee I person everything. We get nan upstreamTables wrong of nan section entity truthful that has to beryllium cleaned out.

############### Get nan information sources pinch nan regular array names utilized successful your site ### Its champion to extract nan tables accusation for each information root and past merge nan results. # Since we only get nan array accusation nested nether fields, successful lawsuit location are hundreds of fields # utilized successful a azygous information source, we will deed nan consequence limits and will not beryllium capable to retrieve each nan data. data_source_list = master_df.name.tolist() col_names = ['name', 'id', 'extractLastUpdateTime', 'fields'] ds_df = pd.DataFrame(columns=col_names) with server.auth.sign_in(tableau_auth): for ds_name successful data_source_list: query = """ { publishedDatasources (filter: { name: """"+ ds_name + """" }) { name id extractLastUpdateTime fields { name upstreamTables { name } } } } """ ds_name_result = server.metadata.query( query ) for one successful ds_name_result['data']['publishedDatasources']: tmp_dt = {k:v for k,v successful i.items() if k != 'fields'} tmp_dt['fields'] = json.dumps(i['fields']) ds_df = pd.concat([ds_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T]) ds_df.reset_index(inplace=True)

This is really nan building of ds_df would look:

Sample output of code

We tin request to flatten retired nan fields entity and extract nan section names arsenic good arsenic nan array names. Since nan array names will beryllium repeating aggregate times, we would person to deduplicate to support only nan unsocial ones.

# Function to extract nan values of fields and upstream tables successful json lists def extract_values(json_list, key): values = [] for point successful json_list: values.append(item[key]) return values ds_df["fields"] = ds_df["fields"].apply(ast.literal_eval) ds_df['field_names'] = ds_df.apply(lambda x: extract_values(x['fields'],'name'), axis=1) ds_df['upstreamTables'] = ds_df.apply(lambda x: extract_values(x['fields'],'upstreamTables'), axis=1) # Function to extract nan unsocial array names def extract_upstreamTable_values(table_list): values = set()a for inner_list successful table_list: for point successful inner_list: if 'name' successful item: values.add(item['name']) return list(values) ds_df['upstreamTables'] = ds_df.apply(lambda x: extract_upstreamTable_values(x['upstreamTables']), axis=1) ds_df.drop(["index","fields"], axis=1, inplace=True)

Once we do nan supra operations, nan last building of ds_df would look thing for illustration this:

Sample output of code

We person each nan pieces and now we conscionable person to merge them together:

###### Join each nan information together master_data = pd.merge(master_df, ds_df, how="left", on=["name","id"]) master_data = pd.merge(master_data, cs_df, how="left", left_on="name", right_on="data_source") # Save nan results to analyse further master_data.to_excel("Tableau Data Sources pinch Tables.xlsx", index=False)

This is our last master_data:

Sample Output of code

Table-level Impact Analysis

Let’s opportunity location were immoderate schema changes connected nan “Sales” array and you want to cognize which information sources will beryllium impacted. Then you tin simply constitute a mini usability which checks if a array is coming successful either of nan 2 columns — upstreamTables aliases customSQLTables for illustration below.

def filter_rows_with_table(df, col1, col2, target_table): """ Filters rows successful df wherever target_table is portion of immoderate worth successful either col1 aliases col2 (supports partial match). Returns afloat rows (all columns retained). """ return df[ df.apply( lambda row: (isinstance(row[col1], list) and any(target_table successful point for point successful row[col1])) or (isinstance(row[col2], list) and any(target_table successful point for point successful row[col2])), axis=1 ) ] # As an illustration filter_rows_with_table(master_data, 'upstreamTables', 'customSQLTables', 'Sales')

Below is nan output. You tin spot that 3 information sources will beryllium impacted by this change. You tin besides alert nan information root owners Alice and Bob successful beforehand astir this truthful they tin commencement moving connected a hole earlier thing breaks connected nan Tableau dashboards.

Sample output of code

You tin cheque retired nan complete type of nan codification successful my Github repository here.

This is conscionable 1 of nan imaginable use-cases of nan Tableau Metadata API. You tin besides extract nan section names utilized successful civilization sql queries and adhd to nan dataset to get a field-level effect analysis. One tin besides show nan old information sources pinch nan extractLastUpdateTime to spot if those person immoderate issues aliases request to beryllium archived if they are not utilized immoderate more. We tin besides usage nan dashboards entity to fetch accusation astatine a dashboard level.

Final Thoughts

If you person travel this far, kudos. This is conscionable 1 usage lawsuit of automating Tableau information management. It’s clip to bespeak connected your ain activity and deliberation which of those different tasks you could automate to make your life easier. I dream this mini-project served arsenic an enjoyable learning acquisition to understand nan powerfulness of Tableau Metadata API. If you liked reference this, you mightiness besides for illustration different 1 of my blog posts astir Tableau, connected immoderate of nan challenges I faced erstwhile dealing pinch large .

Also do cheque retired my erstwhile blog wherever I explored building an interactive, database-powered app pinch Python, Streamlit, and SQLite.


Before you go…

Follow maine truthful you don’t miss immoderate caller posts I constitute successful future; you will find much of my articles connected my . You tin besides link pinch maine connected LinkedIn aliases Twitter!

More