As they say in editorial reviews, "this
guide for database and data warehouse developers and managers describes
the process of building and managing a document warehouse, the
organization of unstructured text to facilitate storage and retrieval,
and the use of text mining techniques." Although too general, this
sentence neatly summarizes the content of the book and describes the
intended audience. This was one of the first books covering such broad
area, and I was eager to see how the author succeed in presenting so
many related topics.
The answer is clear...
With is clear, no nonsense approach, Dan Sullivan answered the
fundamental questions about text mining and document warehousing. He
provided lots of detailed examples for implementing document
warehouses, not just visions from 5000 feet up. Of course, he doesn't
really drill into numerous details of the actual implementation. To be
honest, I don't see how it could fit in only one book. However, it
would be nice to find a book covering such details, but in the
meantime, this is an essential resource for people building business
intelligence, decision support, market news, intellectual property and
other knowledge-based resources.
The book cover says that it teaches you to...
Design the architecture of a document warehouse
Find and retrieve text documents from multiple sources
Load information into the warehouse and transform it to the desired form
Select the right tools to thematically index, categorize, cluster, and summarize text
Adapt the appropriate meta data for your document warehouse
Use text mining for operational management, customer relationship management, and competitive analysis
Ensure the security and privacy of your document warehouse
...and I must admit this is only a part of to material covered
inside. It also provides you with "hands-on" tutorials on working with
various commercial text mining packages, along with short code snippets
in Perl, Python, SQL, etc. The author published various additional
materials at companion Web site. If you want to get a feeling about the type of information contained in Sullivan's book, this is a place to visit.
I've been working with text mining and related techniques for many
years, but this is probably the first "down to earth" resource that
systematically describes the whole field. I used it during the design
process of aboutAI.net, as it employs various text mining solutions.
Thumbs up!