Choosing a simple, complete set of patterns: the meat-and-potatoes of a content migration analysis. How do you take the ‘raw’ content on a network drive, and move it into a document management system with ‘the right’ metadata? How will you accomplish this on-budget and on-time? Make the hard choices during business analysis – to determine which metadata you need, and what can be discarded – strictly follow the investigations and analysis described in these past and future posts – and you will get there.
Our current task – how to describe the chaos in import-tool-friendly ways?
This article is part of a series discussing many aspects of content migration from an unstructured storage system to a document management system such as Autonomy iManage WorkSite, or Open Text Document Management.
The overview page is here: Content Migration to a DMS – Articles.
Describing Unstructured Content
As you analyze the environment baseline data, decide on the patterns to use for metadata descriptions. Your goal in defining the metadata descriptions is to describe the import data as efficiently and accurately as possible with a reasonable amount of effort. An import project will likely use a mixture of these, as the scenarios warrant.
Your goal during requirements definition is to use business priorities to determine what metadata to capture, and then to analyze the raw data enough to determine which of the following patterns is appropriate.
- Exclusion
- Inclusion
- Comprehensive mapping
- Simple rule
- Existing system lookup
These design patterns are specific to a Content Migration project. My notes on best practices for Requirements Collection are posted separately, but should be read side-by-side with the Requirements Guidelines and the Environment Baseline post.
Completing and verifying the list of rules or mappings is a design task. The pattern options are:
Exclusion
List folders, or name patterns, or file extensions to leave out of the import. Everything not excluded should be imported. This technique is appropriate if
- it’s easier – based on number of data points – to define what to leave out
- the list of what to import may change significantly between analysis and import.
Inclusion
List folders, or name patterns, or file extensions to import. Anything not described as an inclusion is implicitly excluded. This technique is appropriate when it’s easier based on number of data points to define what to bring in.
File extensions are more easily specified as an inclusion rule. Be cautious however if a substantial number of the input file set were created earlier than 2000, or come from a UNIX file system: rigorous and consistent file extensions were brought into the overwhelming majority only starting in the late 1990s by Microsoft applications and operating systems.
Comprehensive mapping
A fairly long list, i.e. between tens and thousands, itemizing network elements and the specific metadata value to be assigned to each. Best delivered as a spreadsheet, this information will likely be loaded into a database table to be used for look up. Creating this spreadsheet, and validating that it is accurate, will be a time-consuming task.
Appropriate when all of the following are true:
- There is a long list of matches to make.
- It is important to make a high percentage of accurate assignments.
- It is not practical to modify the original data set to conform to a simple set of rules. If you have the flexibility (and time) to modify the existing data set, I recommend instead changing it to conform either to a simple rule, or to an existing system lookup.
Simple rule
A shorter list of rules, i.e. less than ten, describing some broad groupings to categorize a set of import elements. Deliver as a text-based list or as a spreadsheet – it will probably end up as a lookup table, but might be hard-coded by the import process. Appropriate when there are not as many matches to make, e.g. less than twenty values can be used to describe all the elements in the set. Unless the import set is extraordinarily consistent, this approach will not achieve nearly as high accuracy as the comprehensive mapping pattern.
Use this approach in two different scenarios: when it is important for every element to match precisely and you have reviewed the existing content to be confident this will happen; or, when a lower percentage of matches is acceptable and the unmatched elements can be assigned a standard default value.
Existing system lookup
In this approach, the requirement simply states that the import element (such as a folder name) will precisely match a value found in the database. Use this approach in two different scenarios: when it is important for every element to match precisely and you have prepared the database table to ensure this will happen; or, when a lower percentage of matches is acceptable and the unmatched elements can be assigned a standard default value. Using this approach shifts the effort from requirements documentation over into metadata definition and setup. Strongly-favor this approach when: you will already be doing metadata definition and setup, or it is feasible to rename the elements in the import set to match the metadata.
Reading more
This article is part of a series discussing many aspects of content migration from an unstructured storage system to a document management system such as Autonomy iManage WorkSite, or Open Text Document Management.
The overview page is here: Content Migration to a DMS – Articles.
3 thoughts on “Import design patterns – Content Migration best practices”