Friday, August 26, 2016

Incremental Aggregation in informatica

Implementing Informatica Incremental Aggregation


Using incremental aggregation, we apply captured changes in the source data (CDC part) to aggregate calculations in a session. If the source changes incrementally and we can capture the changes, then we can configure the session to process those changes. This allows the Integration Service to update the target incrementally, rather than forcing it to delete previous loads data, process the entire source data and recalculate the same data each time you run the session.
Incremental Aggregation
When the session runs with incremental aggregation enabled for the first time say 1st week of Jan, we will use the entire source. This allows the Integration Service to read and store the necessary aggregate data information. On 2nd week of Jan, when we run the session again, we will filter out the CDC records from the source  i. e the records loaded after the initial load. The Integration Service then processes these new data and updates the target accordingly.
Use incremental aggregation when the changes do not significantly change the target.If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the target with entire source data and recalculate the same aggregation formula .
INCREMENTAL AGGREGATION, may be helpful in cases when we need to load data in monthly facts in a weekly basis.
Sample Mapping
Let us see a sample mapping to implement incremental aggregation:
Incremental Aggregation Mapping
Look at the Source Qualifier query to fetch the CDC part using a BATCH_LOAD_CONTROL table that saves the last successful load date for the particular mapping.
Incremental Aggregation Source Qualifier
Look at the ports tab of Expression transformation.
Incremental Aggregation Expression
Look at the ports tab of Aggregator Transformation.
Incremental Aggregation Aggregator
Now the most important session properties configuration to implement incremental Aggregation
Incremental Aggregation Session property
If we want to reinitialize the aggregate cache suppose during first week of every month we will configure the same session in a new workflow at workflow level with the Reinitialize aggregate cache property checked in.
Incremental Aggregation Session property
Example with Data
Now have a look at the source table data:
CUSTOMER_KEY
INVOICE_KEY
AMOUNT
LOAD_DATE
1111
5001
100
01/01/2010
2222
5002
250
01/01/2010
3333
5003
300
01/01/2010
1111
6007
200
07/01/2010
1111
6008
150
07/01/2010
2222
6009
250
07/01/2010
4444
1234
350
07/01/2010
5555
6157
500
07/01/2010
After the first Load on 1st week of Jan 2010, the data in the target is as follows:
CUSTOMER_KEY
INVOICE_KEY
MON_KEY
AMOUNT
1111
5001
201001
100
2222
5002
201001
250
3333
5003
201001
300
Now during the 2nd week load it will process only the incremental data in the source i.e those records having load date greater than the last session run date. After the 2nd weeks load after incremental aggregation of the incremental source data with the aggregate cache file data will update the target table with the following dataset:
CUSTOMER_KEY
INVOICE_KEY
MON_KEY
AMOUNT
Remarks/Operation
1111
6008
201001
450
The cache file updated after aggretation
2222
6009
201001
500
The cache file updated after aggretation
3333
5003
201001
300
The cache file remains the same as before
4444
1234
201001
350
New group row inserted in cache file
5555
6157
201001
500
New group row inserted in cache file
Understanding Incremental Aggregation Process
The first time we run an incremental aggregation session, the Integration Service processes the entire source. At the end of the session, the Integration Service stores aggregate data for that session run in two files, the index file and the data file. The Integration Service creates the files in the cache directory specified in the Aggregator transformation properties.
Each subsequent time we run the session with incremental aggregation, we use the incremental source changes in the session. For each input record, the Integration Service checks historical information in the index file for a corresponding group. If it finds a corresponding group, the Integration Service performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the Integration Service creates a new group and saves the record data.
When writing to the target, the Integration Service applies the changes to the existing target. It saves modified aggregate data in the index and data files to be used as historical data the next time you run the session.
Each subsequent time we run a session with incremental aggregation, the Integration Service creates a backup of the incremental aggregation files. The cache directory for the Aggregator transformation must contain enough disk space for two sets of the files.
The Integration Service creates new aggregate data, instead of using historical data, when we configure the session to reinitialize the aggregate cache, Delete cache files etc.
When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost.
Note: To protect the incremental aggregation files from file corruption or disk failure, periodically back up the files.


Tuesday, July 12, 2016

Type of tasks in Informatica:

Type of tasks in Informatica:

Tasks are used to create workflows and worklets. Tasks are designed in workflow manager and represent an executable set of actions. Which shows the run time behavior of ETL process. There are several types of tasks available.
1.Session Task
  • It is used to run the mapping by using set of instructions.
2.Command task
  • Allows to execute operating system command
  • In unix we can specify one or more unix commands or a shell script.
  • In windows we can specify one or more dos commands or batch files.
3.Decision Task 
  • It allows to define own condition that determines the execution of the workflow.
  • Integration service evaluates the decision condition specified in a decision task and returns true or false.
  • Predefined variable called $Decision_task_name.condition
4.Event Wait/Raise Task
  • Event Raise Task: Event Raise task raises the event while running the workflow. It represents the location of a user-defined event. 
  • Event Wait Task: The Event Wait task waits for an event to occur.
  • There are two types of event
                           Predefined event: A predefined event  is a file-watch event.
                           User defined event: A user defined event is a sequence of tasks in the workflow.
  • When all the tasks in the branch from the start task to the Event raise task complete, the Event raise task triggers the event. The Event wait task waits for the Event-Raise task to trigger the event before continuing with the rest of the tasks in its branch. 
5.Assignment task
  • It allows to assign a value to a user-defined workflow variable.
6.Timer 
  • It allows to set the time between the tasks to set dependencies in the workflow properly.
                            Absolute time: We can specify the time to start a particular task.
                            Relative time: Instruct the integration service to wait for specified period of time                                  after timer starts.
7.Email Tasks
  • Send Email during workflow.
  • Email addresses, a subject line and the email message text can be defined. When called from within a Session task, the message text can contain variable session-related metadata.
8.Control Task
  • It is used to stop or abort the workflow.

Fact Tables in Data Warehouse:


Fact Table in Data Warehouse:

Fact Table:The fact table that contains numerical values of what we measure. There are different types of fact tables are there.


Additive Fact:
 Additive fact is the fact that can be summed up through all the dimensions in the fact table. Sales fact is the example for additive fact.
Semi Additive Fact:
 Semi additive fact is the fact that can be summed up some of the dimensions in the fact table.
Non Additive Fact:
 Non additive fact is the fact that can not be summed for any dimension in the fact table.
Fact less Fact table:
 It is a fact table which will have keys from various dimensions but will not have any facts/measures in it.
Cumulative Fact: 
It is a fact which will store the data over a period of time.
Snap shot: 
It is a fact which will store the data at a point of time.

Sunday, March 27, 2016

V.IMP SQL used in INFORMATICA


1) display duplicate rows?

sql> select deptno from emp group by deptno having count(*)>1;

  2) delete duplicate rows?

sql> delete from emp e1 where rowid>(select min(rowid) from emp e2 where e1.deptno=e2.deptno);

  3) update emp sal based on hiredate?

sql> update emp set sal=sal+500 where hiredate in(select hiredate from emp having count(*)>1 group by hiredate);

  4) display top n max sal?

sql> select rownum,sal from (select sal from emp order by sal desc)where rownum<=5;

  5) display the 5th record of the table?

sql> select *from emp where empno=(select empno from emp where rownum<5 minus select empno from emp where rownum<4);

  6) select nth max salary?

sql> select min(sal) from (select distnict (sal) from emp order by sal desc) where rownum<='&n';

  7) display rownum with records?

sql> select rownum,emp. *from emp;

  8) Display the records between two range?

sql> select rownum,ename,empno from emp where rowid in(select rowid from emp where rownum <=&upto minus select rowid from emp where rownum<&Start);

  9) Odd number of records?

sql> select * from emp where (rowid,1) in (select rowid, mod(rownum,2) from emp);

  10) Even number of records?

sql> select * from emp where (rowid,0) in (select rowid, mod(rownum,2) from emp);
   
  11) How To Display last 5 records in a table?

sql> select * from (select rownum r, emp.* from emp) where r between (Select count(*)-5 from emp)
      and
     (Select count(*) from emp);