I spent last weekend reconfiguring and rebuilding the search indexes for a client. We indexed something like 3.5 million items, consisting of about three terabytes of data. It was a lengthy process, which provided me a lot of time to think about best practices for document management, document storage, and enterprise search. The first thing that struck me was the sheer magnitude of the document store. Granted, my client is a pretty big company, but I find it difficult to believe all those files are necessary and required for the day-to-day operations of the firm. In fact, we weren't indexing and searching the totality of documents in the organization, only those items contained within a particular collaboration/content management system.
Most of the content did not have any metadata associated with it beyond the obvious–author, time of creation, type of file, etc. So, the net result was a full text search and index of all those documents, which is not all that useful as corporate documents tend to use the same terms over and over, rendering search results that are difficult to deal with.
Paper to Bits?
Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader
Your access to unlimited PropertyCasualty360 content isn’t changing.
Once you are an ALM digital member, you’ll receive:
- Breaking insurance news and analysis, on-site and via our newsletters and custom alerts
- Weekly Insurance Speak podcast featuring exclusive interviews with industry leaders
- Educational webcasts, white papers, and ebooks from industry thought leaders
- Critical converage of the employee benefits and financial advisory markets on our other ALM sites, BenefitsPRO and ThinkAdvisor
Already have an account? Sign In Now
© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.