Clients often assume that they can migrate from one DM or CM system to another by writing a procedure to extract documents and indexing data one at a time. Although some small systems can be migrated using this technique, larger systems can take months (even years) to give up data at the document level.
Our migration process starts by capturing raw data. If you have backups of optical disk media we'll take these off-site. If you have only single copies we'll remove and replace them one by one. Each OD will be read as a single disk image at the maximum rated speed of the drive. This saves an enormous amount of time because we don't have the overheads of database access, optical disk rotational or seek latency, cache handling or disk exchange to contend with. At the same time we dump the document locator database and any other associated databases, usually as raw data, too.
Back at the lab we extract individual image pages or other objects (like Word files, PDFs or COLD print streams) from the raw OD images, identify them (in a document/page sense), convert them if required, and write them to staging media. We typically use at least as much disk space as was occupied by the data on all of the optical disks supplied. Keeping all of the data cached increase our efficiency.
While that's going on, we deconstruct the document locator and other databases to build a list of which objects to include in each document, and what index data to migrate with them
Imaging systems frequently hold raster data in undocumented or obscure internal file formats, or in formats which are not supported by the next ventor's system. Image conversion takes the input format (say MO:DCA/IOCA, WIFF or undocumented raster data) and converts it to a new format (usually TIFF or PDF). At the same time, clients may wish to render (say) Microsoft Word files to TIFF for faster display and to avoid having expensive software on client machines.
This process can be very time consuming. To hit client timescales we usually run conversion sofware on multiple machines, transferring data by gigabit ethernet or removable drives
Once the image data is converted and the databases are denormalised, we can transform the data so that it's ready for the target system.At this stage we perform any required data transformations on the index data.
Some CM and DM systems have bulk import facilities, and where they exist we'll use those. Clients are often surprised at how slowly DM systems will accept bulk data, so we'll frequently build an archive off-site and use non-standard import facilities (dropping indices, for example) to introduce data to target systems. When all else fails, we'll write code to import data as fast as your new system will accept it and supply hardware and personnel to keep the import running 24 hours a day, 7 days a week until it's complete.
Of course, a migration is not complete until we can prove that it's complete. Every step is audited, and the audit report (listing every component of every document migrated, source and destination...) is a vital part of the delivery. We liaise with clients from the outset to make sure that their audit requirements are fulfilled.