本文共 4883 字,大约阅读时间需要 16 分钟。
增量聚集(Incremental Aggregation)
PowerCenter Server 运行你增量的修改目标(target),而不是强迫process整个源和重新计算。
用增量聚集的条件:
不用增量聚集的情况:
增量策略中的一些决策条件:
源的分析(Source Analysis)
可能的场景:
识别需要比较的数据
决定比较的方法
基于加载策略的源
加载方法:直接加载到数据仓库中。
加载方法:对关系型数据,选择有效时间内的 数据进行加载,也可以在视图中进行选择,而不是集成在mapping中。把加载策略放到ETL组件中对ETL开发者和元数据来说更具可行性和可控制性。对非关系型数据,筛选有效的数据,在加载后修改筛选条件去删除旧的数据。对有效时间,可以放到mapping的变量中,也可以放到控制表中。
加载方法:Join目标表去判断哪些数据是新的。也可以去lookup目标表来判断是否存在。
基于加载策略的目标
I had seen several posts related to Incremental Aggregation in Informatica-I group of www.ittoolbox.com. Below is the step by step process to apply incremental aggregation.
Incremental aggregation is used when you need to apply captured changes in the source to aggregate calculations in a session. If the source changes only incrementally and you can capture changes, you can configure the session to process only those changes. This allows the Informatica Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session.
When you run an incremental aggregation session for the first time, IS process the entire source and creates Data File and Index File in cache directory specified. The Integration Service names the index file as PMAGG*.idx* and the data file as PMAGG*.dat*.
Note:
1. If you are using percentile or median functions, do not use Incremental Aggregation because IS uses system memory to process these functions in addition to the cache memory configured in the session properties.
2. Don’t move the aggregate files without correcting the configured path or directory for the files in the session properties.
3. Don’t change the configured path or directory for the aggregate files without moving the files to the new location.
4. Don’t delete cache files or change the number of partitions.
Step-by-step Process
Source Table
empno | sal | deptno | hire_dt |
1 | 2000 | 10 | Nov 29 2010 |
2 | 5000 | 10 | Nov 29 2010 |
3 | 8000 | 10 | Nov 29 2010 |
4 | 2000 | 20 | Nov 29 2010 |
5 | 10000 | 20 | Nov 29 2010 |
6 | 12000 | 30 | Nov 29 2010 |
7 | 22000 | 30 | Nov 29 2010 |
8 | 31000 | 30 | Nov 29 2010 |
9 | 1000 | 10 | Nov 30 2010 |
10 | 1000 | 10 | Nov 30 2010 |
11 | 1000 | 30 | Nov 30 2010 |
Below is the flow of mapping
The data will be pulled from the source on the basis of hiredate. We will create a parameter to pass hiredate as shown below.
Filter transformation
Filter transformation will be used to filter the data on the basis of hiredate as shown below
Aggregator Transformation
Aggregator transformation will aggregate department wise salary.
Update Strategy
Update Strategy transformation is used to flag rows for update.
Once you are done with the mapping design, next step is to create a Session and Workflow.
Create a session and set the properties for Target as shown below.
Check the below session properties as shown below.
Set the value of $$hiredt parameter to 11/29/2010 and run the session. You will get the below rows in target.
Deptno | Sal |
10 | 15000 |
20 | 12000 |
30 | 65000 |
After first run, go to session properties and uncheck the option “Reinitialize aggregate cache” as shown below.
Set the value of $$hiredt parameter to 11/30/2010 and run the session again. Now the value in target table will be as shown below.
Deptno | Sal |
10 | 17000 |
20 | 12000 |
30 | 66000 |
Common Error:
Below is the common error person used to get while using Incremental Aggregation.
TE_7051 : Aggregate Error: Index file timestamp is earlier than the timestamp of AGG transformation or Mapping
TE_7029 : Aggregate Information: Creating New Index and Data Files
Resolution:
This error occurs when the mapping associated to the incremental aggregation session is modified. Make sure that the associated mapping should not be modified.
转载地址:http://mzrli.baihongyu.com/