• Global BI Experts
  • Call (888) 556 5990
Posted on November 17, 2017 under SAP HANA

By:
Arman Avetisyan
Sr. SAP Basis/BI & Software Developer
aavetisyan@comerit.com

 

Nowadays to achieve good results from analytics, you should analyze both your enterprise and historical data.  SAP HANA is a great database for your enterprise system but for historical data, Hadoop is the most popular option to handle Big Data.  Hadoop is also very cost efficient because you can run it on commodity hardware.

In this tutorial we will explore:

  • How to relocate our historical data from enterprise HANA to Hadoop
  • How to combine them together, and
  • How to view the full data from HANA studio

In order to relocate the data from HANA to Hadoop we are going to use SAP HANA Data Lifecycle Management tool, which is part of the SAP HANA Data Warehousing Foundation (DWF).

The Data Lifecycle Manager (DLM) is an SAP HANA XS-based tool that helps to relocate less frequently used data in SAP HANA to the storage destinations such as the SAP HANA Dynamic Tiering option, Hadoop, or SAP IQ.

First, the DLM tool relocates the data and creates a virtual table in HANA studio for the historical data located in Hadoop. Then it combines with enterprise data located in the HANA table and finally, it creates the view for combined data.

HANA Hadoop 1

 

Procedure:

  • Create remote source from Hana studio for Hadoop
  • Build Storage destination in DLM
  • Build Lifecycle Profile in DLM
  • Append data relocation rules
  • Start data relocation
  • View combined data in HANA Studio

Create a remote source from HANA Studio for Hadoop

We will use SAP Smart Data access to create remote source for Hadoop. Use the code bellow in your HANA studio’s SQL editor, fill the parameters best on your Hadoop cluster,  and execute it.

CREATE REMOTE SOURCE "<Remote Source Name>"

ADAPTER "sparksql"

CONFIGURATION 'port=7860;ssl_mode=disabled; server=<Hadoop ip>;'

WITH CREDENTIAL TYPE 'PASSWORD'

USING 'user=<username>;password=<password>;

(ex <Remote Source Name> - CLAB_HADOOP)

After execution of the script you will see remote source for your Hadoop system in “Provisioning”-“Remote Sources” folder.

HANA Hadoop 2

 

 Execute the commands below in HANA Studio’s SQL editor to notify your index server and xsengine that you’ve created this remote source which will be used for data aging.

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini', 'SYSTEM')

SET ('data_aging', 'spark_remote_source') = '<Remote Source Name>' WITH RECONFIGURE;

ALTER SYSTEM ALTER CONFIGURATION ('xsengine.ini', 'SYSTEM')

SET('data_aging','spark_remot_source') = <Remote Source Name>' WITH RECONFIGURE;

Build a Storage Destination

To create new storage destination click on “+” button in “MANAGE STORAGE DESTINATIONS” tab

HANA Hadoop 3

 

Enter the destination name , select SPARK SQL as Storage destination type, enter schema name,  and click "save"

HANA Hadoop 4

 

After saving the configurations click the “Test Connection” button to test the connection

HANA Hadoop 5

 

After testing, activate the connection by clicking the “Activate” button

HANA Hadoop 6

 

Build a life-cycle profile

To create a new life-cycle profile, click on “+” button in “MANAGE” LYFECYCLE PROFILES” tab

HANA Hadoop 7

 

Enter name and storage destination type details and check Trigger type to schedule the execution or execute it manually.

In our case the source is HANA table (SALES_DATA_3) inside the COMERIT_DEMO1 schema.

The nominal key will be used to split data into two parts – Historical and Enterprise.

HANA Hadoop 8

 

Select “Storage Destination” in Destination Attributes tab

HANA Hadoop 9

 

Append data relocation rules

Open Rule Editor to set rules for data relocation.

Append the rule and click on validate syntax and it will show total and affected rows.

Total Records is the records count in Hana table and Affected records is the records that will be relocated.

This rule will relocate the data form SALES_DATA_3 table where TRANS_ID is less than 8000.

HANA Hadoop 10

 

Click "Save" and then click "Activate"

Start data relocation

Now we will run the data relocation process. You can run it immediately or on a scheduled basis. For now, we will run it immediately but you can change “Trigger type” to scheduled in order to run it periodically.

Now click “Run” – “Trigger Manually”

HANA Hadoop 11

 

After job execution, the Data Distribution graphs will be updated.

Source graph shows the data in HANA DB

clab_hadoop graph shows the data in Hadoop cluster.

HANA Hadoop 12

 

Finally, the union view of historical and enterprise data will be created in HANA and you can view it from HANA studio.

HANA Hadoop 13

Nov 17, 2017 11:57:34 AM / by Arman Avetisyan

Arman Avetisyan

Written by Arman Avetisyan

Arman is an experienced SAP BI, Big Data, and software developer whose combined development and SAP skills allows him to offer unique value to his clients. Comerit has been proud to have Arman as a part of our team since 2015.

SAP blog

Want more content like this? Make sure to drop your email below and you'll get updated each month about our latest blogs and think-pieces.

 

Sign up for updates

Recent Posts