How To Manage the Document Soup

The advent of open government initiatives enables citizens and rule-writers to generate large numbers of public comments on a daily basis. This influx of data can create information overload for federal rule-writers who must process massive amounts of redundant text data to find germane and novel contributions. Search engine technologies and document management systems help to sort out information in the proverbial “document soup”.

These tools, however, require users to know explicitly what they are looking for while sorting; current tools do not necessarily enable discovery of unique information and may inadvertently keep highly salient but rare text hidden. Existing software allows for the annotation of text, but it generally must be performed by users who know what they are looking for.

The “Sifter” web-based platform, currently in development by Texifter, closes the gap between automated text processing and human annotator judgments. A key aspect of Sifter is its ability to automatically cluster and ”sift”collections of text to allow networks of human users to asynchronously annotate their findings. The platform scales up allowing rule-writers to “crowdsource” the task of tracking down relevant information in public comment collections.

Additionally, Sifter allows users to form credential-based peer networks with other system users in order to better incorporate expertise on specific rules and distribute the analytic work widely. Try the PCAT beta and let us know what you think.

Back to the Forum