Thursday, June 14, 2012

Clean up spaces in DS

Cleaning up the spaces in DataStage


Clearing Lookup Table files
Problem
When a DataStage job with a lookup stage aborts, there may be lookuptable files left in the resource directories and they will consume space. The filenames are similar to "lookuptable.20091210.513biba"
Cause
When a job aborts it leaves the temporary files for postmortem review in the resource directories. Usually that is done in scratch, however, for lookup files, they are created in resource. Lookup filesets will not go away, just like regular datasets.
A lookup fileset looks like:
/opt/IBM/InformationServer/Server/Datasets/export.dsadm.abcdefg.P000000_F0000

A lookup file looks like:
/opt/IBM/InformationServer/Server/Datasets/lookuptable.20091210.513biba
Diagnosing the problem
Look for files with filenames similar to "lookuptable.yyyymmdd.nnnnnnn" left on disk when no jobs are running.
Resolving the problem
All files with lookuptable at the beginning of the filename can be removed as long as there are no running jobs. These files get recreated with every new run of the job and are never reused. If the job runs successfully, then only the lookuptable file created during that job run is removed.
Link
command
ls -ltr | grep 'lookuptable*'
ls -ltr | grep 'export*'
rm –rf <files> (If anything returns from above command and any job is not running)

Clearning &PH& files
Cause
There is a &PH& directory in each DataStage project directory. Files in the &PH& directories under DataStage project directories store runtime information when jobs are running and need to be cleared out periodically.
Answer
To clear the &PH& directory from within DataStage:
1.       Ensure there are no DataStage jobs running anywhere on the system by running "ps -ef | grep dsrpc"
2.       From the DataStage Administrator, go to the Projects page, select the project whose file you want to clear and click the Command button. The Command Interface dialog box opens.
3.       Type the following into the command field: CLEAR.FILE &PH& (all uppercase)
4.       Click Execute to run the command and clear the file.


You can also remove the contents of the &PH& directory using the rm command. However, we suggest that this is scheduled for when the system is least used.

** Important **
Please only delete the contents of &PH& directory and not the &PH& directory itself.

You will find a directory for the &PH& in each project that you have created on that server.

You must not remove any contents or directories for the following:
DS_TEMP*
RT_BP*.O
RT_BP*
RT_LOG*
RT_STATUS*
RT_CONFIG*

These directories/files are related to the jobs within your projects that you are running on the server and if you manually remove or edit any of these files you are likely to corrupt the jobs and possibly the projects. These directories exist for each job that has been created within your project. You could ask your developers to review if there are any jobs that are no longer used and can be removed using the DataStage clients. You can additionally clear down some of the RT_LOG* files by clearing down any
large log files that exist for some of the jobs. This can be done from within DataStage Director client:
1.       Select the job
2.       Click on Job > Clear Log > Immediate purge (Clear all Entries).
You can create a shell script to manually delete the files. To ensure there are no locks only delete files that are from finished jobs. You need to make sure the files are older then the longest running job. Generally you can just delete files older then a week.
                DSPROJDIR=/opt/IBM/Ascential/DataStage/Projects
for project in `ls -l ${DSPROJDIR} | grep "^d" | grep -v "lost+found" | awk '{print $9}'`
do
find "/opt/IBM/Ascential/DataStage/Projects/$project/&PH&"  -mtime +13 -exec rm -f {} \;
done
                Link
                                http://www-304.ibm.com/support/docview.wss?uid=swg21457983

Clearning DataSet files
Data sets can be managed using the Data Set Management tool, invoked from the Tools > Data Set Management menu option within DataStage Designer (DataStage Manager in the 7.5 releases.) Alternatively, the 'orchadmin' command line program can be used to perform the same tasks.
The files which store the actual data persist in the locations identified as resource disks in the configuration files. These files are named according to the pattern below:

descriptor.user.host.ssss.pppp.nnnn.pid.time.index.random

descriptor: Name of the data set descriptor file.
user: Your user name.
host: Hostname from which you invoked the job which created the data set.
ssss: 4-digit segment identifier (0000-9999)
pppp: 4-digit partition identifier (0000-9999)
nnnn: 4-digit file identifier (0000-9999) within the partition
pid: Process ID of the job on the host from which you invoked the jop that creates the data set.
time: 8-digit hexadecimal time stamp in seconds.
index: 4-digit number incremented for each file.
random: 8 hexadecimal digits containing a random number to insure unique file names.


For example, suppose that your configuration file contains the following node definitions:

{
    node node0
    {
         fastname "host1"
         pools ""
         resource disk "/orch/s0" {pools ""}
         resource scratchdisk "/scratch" {pools ""}
    }
    node node1
    {
         fastname "host1"
         pools ""
         resource disk "/orch/s0" {pools ""}
         resource scratchdisk "/scratch" {pools ""}
    }
}

A data set named mydata.ds created by a job using this configuration file will contain data in two partitions, one for each processing node declared in the configuration file. Because each processing node contains only a single disk specification, each partition of data would be stored in a single file on each processing node. Following the naming convention shown above, the data file for partition 0 would be located on the host1 machine, in the /orch/s0 filesystem, and the file would be named:

/orch/s0/mydata.ds.user1.host1.0000.0000.0000.1fa98.b61345a4.0000.88dc5aef

The data file for partition 1 data would be similarly named:

/orch/s0/mydata.ds.user1.host1.0000.0001.0000.1fa98.b61345a4.0001.8b3cb144

It is important to understand that the file referenced in the job, called mydata.ds in our example, does not contain any actual data. It is a data set descriptor file, and it contains information about how the data set is constructed. In order for DataStage jobs to access the data, both the descriptor and the actual segment files must exist.


Cleaning up Data Sets

A good plan for managing data sets is to identify the Data Sets that are no longer required, and to use the Data Set Management tool to delete them. If you have the jobs that reference the data sets, you can open each of the data set descriptor files using the Data Set Management tool and then view and delete the data set. If you do not have the jobs, another possible method is to look in the resource disk locations for segment files with very old modification dates. Once you have identified the segment files, you can determine what the data set descriptor file name was.

/orch/s0/mydata.ds.user1.host1.0000.0000.0000.1fa98.b61345a4.0000.88dc5aef

In this example segment file shown above, the highlighted "mydata.ds" is the file name of the data set descriptor. You can then locate this file in your computer with the find command.

find /my_projects/datasets/ -name "mydata.ds" -print

Once you have located the descriptor file, you can then use the Data Set Management tool to view and delete the data set. If someone has already deleted the descriptor file, then the segments have been orphaned. There is no utility or function to recreate the descriptor file. In this situation, you can safely delete all the segment files named with the "mydata.ds" in the file name.


Cleaning up Data Sets from the command line

It is also possible to use the orchadmin executable program to delete data sets. This program is located in $APT_ORCHHOME/bin.

APT_CONFIG_FILE=/opt/IBM/Ascential/DataStage/Configurations/config-2x2.apt
APT_ORCHHOME=/opt/IBM/Ascential/DataStage/PXEngine/bin
cd `cat /.dshome`
. ./dsenv
LD_LIBRARY_PATH=$APT_ORCHHOME/lib:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH
APT_CONFIG_FILE=< /opt/IBM/Ascential/DataStage/Configurations/config-2x2.apt>; export APT_CONFIG_FILE
APT_ORCHHOME=$DSHOME/../PXEngine; export APT_ORCHHOME
PATH=$APT_ORCHHOME/bin:$PATH; export PATH
orchadmin describe -c -p -f -s -v -e /dsd/od251dev/data/incoming/9997/20120414/9997_MRKT_FCTR_VOL.ds
orchadmin delete <full path to descriptor file><datasetname.ds>
Link
http://www.isecor.com/kb/documentation/orchadmin.html

Clear out RT_LOG
                The RT_LOG* files are cleared when you clear the log of a job from the Director

No comments:

Post a Comment