There are differences between web mining and text mining due to the different kinds of data being mined. Structured data is the text that can be neatly fitted onto a spreadsheet and be conveniently searched and analyzed. Text mining looks at this kind of data by imposing this structure. But Web data mining’s contents are the other forms. These are considered unstructured due to the varied free form styles in which they are found, such as a Word document or a graph.
What Is Web Mining?
Web mining is a process which includes various data mining techniques to extract knowledge from web content, structure, and usage. It can be used for discovering useful information previously unknown.
Web mining can be classified based on the following categories:
- Web Content
- Web Structure
- Web usage
What Is Text Mining?
The process of text mining is the transformation and interpretation (often mathematical) of unstructured texts into structured data for purposes such as identification patterns. The idea behind text mining is to find patterns and associations in documents, which can be used for a variety of purposes.
Text mining, or text analysis, is an important research field that has applications in many fields, such as information retrieval and lexical decomposition. The goal of text analytics tools like these are to provide insight into the meaning behind word frequencies.
Web Mining Technologies
- Web Content Mining Web content mining is the process of converting raw data into useful information using the content on web pages from a specified website.
- Web Structure Mining The web graph is a structure that consists of nodes and hyperlinks. The presence of these connections between pages makes up for an edge. Document-level analysis looks at the links between pages within a single document while hyperlink analysis assesses relationships among different documents on an Internet web page or web site.
- Web Usage Mining The Web is a collection of interrelated files housed on one or more servers. Leveraging the client-server transactions, patterns of meaningful data are discovered.
Text Mining Technologies
- Summarization Summarizing a large amount of data while maintaining the main idea.
- Information Extraction Using pattern matching formats to extract information.
- Categorization Supervised learning technique that categorizes the document according to content.
- Visualization Using computer graphics to represent information and to visualize relationships.
- Clustering Grouping according to textual similarity based on the unsupervised technique.
- Question Answering Using a list of patterns to answer a natural language question.
- Sentiment Analysis Also known as opinion mining, it gathers peoples’ moods about a service or product.
The basic difference between web mining and text mining arises from the difference between two natures of the data.
- Text mining imposes a structure to the specified data to be mined for valuable information.
- Web mining deals with the unstructured forms of data, which includes Word documents, PDF files, and XML files.
Most data found in the world is unstructured. But many businesses are data driven. They depend on analyzed data for valuable information to make business decisions that generate growth and revenue. Natural Language Processing (NLP) is a powerful tool that can be used to create accurate and complete taxonomies, which will help in the metadata association process.