Accessibility Assistance

Skip to Content

Digital Library Center Documentation & Technologies: Archiving

Internal Archiving with Tivoli/NSAM

Open Systems/CNS/Tivoli service provider

Any items which are ready for archiving should be placed into the TIVOLI drop box, located at on the archive server under Archive\DROPBOX.  Every hour, these folders will be examined, and then moved into TIVOLI.  Every night at 6pm, the TIVOLI service will run.  Once it runs, the resource will be deleted.


Additional Considerations:

  • Folders should be in the flat form ( i.e., UF00001532_00001, UF00001532_00002, UF00001532_00017, etc)
    • The processor will not step down into the folders. So, the digital resource folder must be placed at the highest level. 
    • If you have an intermediary folder between the drop box and resource folders, the processor will ignore them. (i.e., DROPBOX\DLOC\CA00001234\... ). If you need to do this temporarily for any reason, that is okay, but be aware that this will eat into the space we can use for the archiving process.
  • It is not necessary that the bibid and vid are actually present in the tracking database, as long as they appear to be valid.
  • Ensure that you are either done with all processing (such as sending to FDA) or that you retain an additional copy somewhere else.
    • Tivoli MOVES and then DELETES the digital resource files from our server.
  • A new status will appear in tracking indicating that some portion of the digital resource was archived into our TIVOLI solution.
  • If there is an identical file in the archive for a volume ( as defined by same bibid, vid, filename, filesize, and last write date ) the new file will be discarded.  It will not be ‘double archived’.
    • TIVOLI is done file-by-file really, and not by an entire digital resource. 
    • If you drop a new package into the dropbox with one new TIFF file and one new METS file, those two new files will be detected (by a mismatch on the last write date).  All the other files should exactly match the existing files, as the size and last write date will be identical to the archived file.  
      • Only the two new files will be selected for archiving and the rest will be deleted. 
      • The two files will be moved over to the area that TIVOLI will pickup at 6pm.  
      • TIVOLI wouldn't care that two files of the same name and same folder structure are archived.  It will retain both of them, and we will have to select which version we want to retrieve.  However, for administrative simplicity, we are renaming the file names in this case.  If you were to retrieve this sample package (and lets say you update 00002.tif and the mets file) you would get the following
        00001.tif
        00002.tif (first tiff archived)
        00002 (2009_10_10).tif ( non-matching file; originally same name; found in dropbox 10/10/2009)
        00003.tif
        UF12345678_00001.mets
        UF12345678_00001 (2009_10_10).mets ( second METS file archived )
        This essentially matches what TIVOLI has archived for it.  
      • As a corollary to the above, if you were to add a NEW file with the second load (say 00004.tif) which had never been archived for this resource before, it would simply be loaded as 00004.tif, since there is no duplicate archived.  The date would not be appended to the filename in this case.
    • The DLC archiving and dissemination tool keeps a log of every file that has been archived, the date, the size, the location, and the last write date.  This local log will enable retrieval, as well as avoiding duplication in the archiving process.
  • During the process by which you put in a request for copy to be disseminated, the dissemination tool might ask if you just want the latest version of each file, or if you truly want everything archived, as seen above.

Implementation Notes:

  • Scheduled Tasks
    • The Tivoli Preperation Tool runs as a scheduled task directly on the archiving server every hour from 7am to 5pm each day as the ufdc processor user.
    • The process of loading the data to CNS's Tivoli solution runs as a scheduled task on the archiving server once a day at 6pm
  • Tivoli Preparation Tool process
    1. All empty folders are removed from the destination area
    2. Loops through all the top level folders in the dropbox. If the folder is in the flattened form (i.e., UF12345678_00001) then process that folder. If it appears to just be a bibid, loop through all the subfolders that appear to volume folders ( i.e., UF12345678\00001 or UF12345678\VID00001 )
      1. Skips the folder if it was written to in the last five minutes (in case files are still being written)
      2. A folder is created in the destination area for this resource. Folder is based on the Bib ID. (i.e, UF12345678_00001 maps to UF\12\34\56\78\00001 )
      3. List of any files already archived for this volume is pulled from the database
      4. Processor recurses through all subfolders and files. If a file was already archived and has the exact same last write date and size, it is deleted. If a file in the same folder and with the same name was archived previously but had a different last write date or size, this file is renamed to include the current date at the end ( i.e., '00001.tif' becomes '00001 (2009_10_29).tif' ). In addition, a list of the files to be archived is created.
      5. All files and subfolders are moved into the tivoli archiving area. Whenever possible, entire directories are moved. If the directly already exists, files are just copied one at a time
      6. All the file information is saved to the database's tivoli log
      7. Finally, the empty source directory is deleted
    3. Process is complete and application terminates