|Some Important Faqs & also solutions [message #136908]
||Mon, 12 September 2005 04:27
Registered: July 2005
I am sending some DW FAQs. These faqs will be useful in interview point. If there is any wrong please modify it. All the best friends.
1. What is DWH?
A. DWH is subject-oriented, non-volatile, integrated of time variant data base. It keeps historical data and current data. Current data gets populated into data ware house on specific frequency period. It always keeps summarized data.
2. Difference between data mart &DWH.
A. Data mart is enterprise level where as DWH is an organization level.
3. Different types of SCD's( Slowly Changing Dimensions)
A. Type1: Rows containing changes to existing dimensions are updated in the target by overwriting the existing dimension. In the Type 1 Dimension mapping, all rows contain current dimension data. Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table.
Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the target. Changes are tracked in the target table by versioning the primary key and creating a version number for each dimension in the table. Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table when you want to keep a full history of dimension data in the table. Version numbers and versioned primary keys track the order of changes to each dimension.
Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and inserts only those found to be new dimensions to the target. Rows containing changes to existing dimensions are updated in the target. When updating an existing dimension, the Informatica Server saves existing data in different columns of the same row and replaces the existing data with the updates.
4. Different types of partitions in Informatica
5. How do we improve Performance tuning in Informatica?
A. The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server. Increase the session performance by following.
1. The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid network connections.
2. Flat files: If your flat files stored on a machine other than the informatica server, move those files to the machine that consists of informatica server.
3. Relational data sources: Minimize the connections to sources, targets and informatica server to improve session performance. Moving target database into server system may improve session performance.
4. Staging areas: If u uses staging areas u force informatica server to perform multiple data passes. Removing of staging areas may improve session performance.
5. Run the informatica server in ASCII data movement mode improves the session performance. Because ASCII data movement mode stores a character value in one byte. Unicode mode takes 2 bytes to store a character.
6. If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.
7. We can improve the session performance by configuring the network packet size, which allows data to cross the network at one time. To do this go to server manger, choose server configure database connections. If u are target consists key constraints and indexes u slow the loading of data. To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session.
8. Running parallel sessions by using concurrent batches will also reduce the time of loading the data. So concurrent batches may also increase the session performance.
9. Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines.
10. In some cases if a session contains an aggregator transformation, you can use incremental aggregation to improve session performance. Avoid transformation errors to improve the session performance.
11. If the session contained lookup transformation you can improve the session performance by enabling the look up cache.
12. If your session contains filter transformation, create that filter transformation nearer to the sources or you can use filter condition in source qualifier.
13. Aggregator, Rank and joiner transformation may often decrease the session performance. Because they must group data before processing it. To improve session performance in this case use sorted ports option.
6. What are Conformed Dimensions?
A. A conformed dimension is a just a dimension used by two marts that match either in the dimension or in the values in the dimension. These are used in bottom-up approach.
Three types of “conformed” dimensions:
a. Dimension table identical to master copy
b. Dimension table has subset of rows from the master copy
i. Can improve performance when many dimension rows are not relevant to a particular process
c. Dimension table has subset of attributes from master copy
i. Allows for roll-up dimensions at different grains
7. What are Junk Dimensions?
A. A junk dimension is a convenient grouping of typically low-cardinality flags and indicators. Or a "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. A good example would be a trade fact in a company that brokers equity trades.
8. How do you handle slowly growing dimensions?
A. Slowly growing dimensions are dimensional tables that have slowly increasing dimension data, without updates to existing dimensions. You maintain slowly growing dimensions by appending new data to the existing table.
9. What is meant by Constraint Based Loading?
A. Under certain conditions, the informatica server can load data to targets based on primary-foreign key constraints in the mapping.
10. How do we update records without using update strategy?
A. Use treat rows as update in session properties.
11. What is your role in current project?
12. How do u know that the performance of a mapping should be improved?
13. Difference between Active and Passive transformations?
A. Active transformations can change the no. of rows that pass through it. Passive transformations can not change the no. of rows that pass through it.