Saturday, August 4, 2012

A QUERY FORMULATION LANGUAGE FOR THE DATA WEB


A QUERY FORMULATION LANGUAGE FOR THE
DATA WEB
ABSTRACT:

We present a query formulation language in order to easily query and fuse structured data on the web. The main novelty of MashQL is that it allows people with limited IT-skills to explore and query one (or multiple) data sources without prior knowledge about the schema, structure, vocabulary, or any technical details of these sources. More importantly, to be robust and cover most cases in practice, we do not assume that a data source should have -an offline or inline- schema. This poses several language-design and performance complexities that we fundamentally tackle. To illustrate the query formulation power of MashQL, and without loss of generality, we chose the Data Web scenario. We also chose querying RDF, as it is the most primitive data model; hence, MashQL can be similarly used for querying relational databases and XML. We present two implementations of MashQL, an online mash up editor, and a Firefox add-on. The former illustrates how MashQL can be used to query and mash up the Data Web as simple as filtering and piping web feeds; and the Firefox add on illustrates using the browser as a web composer rather than only a navigator. To end, we evaluate MashQL on querying two datasets, DBLP and DBPedia, and show that our indexing techniques allow instant user-interaction.


EXISTING SYSTEM:

Before formulating a query, one has to know the structure of the data and the attribute labels (i.e., the schema). End-users are not expected to investigate “what is the schema” each time they search or filter information. In many cases, a data schema might be even dynamic, i.e., many kinds of items with different attributes are often being added and dropped. Other sources might be schema-free, or if it exists.

Drawback:

  • Everybody must have a knowledge about the process
  • Content would not display if the keyword not matching
PROPOSED SYSTEM:

We propose an interactive query formulation language, called MashQL. The novelty of MashQL (compared with related work) is that it considers all of the above assumptions together. Being a language -not merely an interface and, at the same time, assuming data to be schema-free is one of the key challenges addressed in the context of MashQL design and development. Without loss of generality, this article focuses on the Data Web scenario. We regard the Web as a database, where each data source is seen as table. In this view, a data mash up becomes a query involving multiple data sources. To illustrate the power of MashQL we chose to focus on querying RDF, which is the most primitive data model, hence, other models -as XML and relational databases - can be easily mapped into it.

Advantage:
  • No need of prior knowledge about the database data
  • Keyword not needed for searching and query will make as per user assumption type d content









SYSTEM MODELS
HARDWARE REQUIREMENT
CPU type                      :    Intel Pentium 4
Clock speed                   :    3.0 GHz
Ram size                       :    512 MB
Hard disk capacity         :    40 GB
Monitor type                 :    15 Inch color monitor
Keyboard type               :     internet keyboard

SOFTWARE REQUIREMENT
Operating System:  Android

Language           :  ANDROID SDK 2.3

Back End                    :    SQLite

Documentation   :    Ms-Office









MODULES
  • Data base design
  • Query Formulation Algorithm
  • Select the query subject
  • Select a property
  • Add an object filter
  • Query Language
  • Graph Signature Index
  • Implementation and Evaluation

MODULES DESCRIPTION

Data base design
In this module we are going database design in order to create table in the data base design.

Query Formulation Algorithm
This algorithm is used by the MashQL editor. Its novelty is that it one to navigate through and query a data graph(s) without assuming the end-user to know the schema or the data to adhere to a schema.

Select The Query Subject
That is, after specifying the dataset, users can select S from a dropdown list that contains, either: (i) ST: the set of the subject-types in G, such as Article; or (ii) SI: the union of all subject and object identifiers in the dataset; or (iii) a user-defined subject label. In the latter case, the subject is seen as a variable (S Î V) and displayed in italics; the default subject is the variable label anything.

Select A Property
Depending on the chosen subject(s) in step 1, a list of the possible properties for this subject is generated (Figure 6.B). There are four possibilities: (i) if (S Î ST), such as Article, the list will be the set of all properties that the instances of this subject-type have (e.g., Title, Author, Year). (ii) if (S Î SI), such as A1, the list will be the set of all properties that this particular instance(s) has. (iii) If the subject is a variable (S Î V), the list will be the set of all properties in the dataset. (iv) users can also choose the property to be a variable by introducing their own label.
Add An Object Filter
There are three types of filters the user can use to restrict P: a filtering function, an object identifier, or a query path. (i) A filtering function can be selected from a list . (ii) If a user wants to add an object identifier as a filter, a list of the possible objects will be generated. For example, if a user previously chose Any Article as a subject, and Author as a property, the list of the object identifiers would. The following formalizations specify what the list of object identifiers may contain.

Query Language
The notational system and constructs that make MashQL an expressive and yet intuitive query language, supporting all constructs of SPARQL.

Graph Signature Index
            Because of assumption (data is schema-free), the previous algorithm has to query the whole dataset in real-time, which can be a performance bottleneck because such queries may involve many self-joins. Hence, the interactivity of MashQL might be unacceptable. Thus, we propose a new way of indexing RDF, which we call the Graph Signature. The size of a Graph Signature is typically much smaller than the original graph, yielding fast response-time queries.

Implementation and Evaluation
We present two implementations of MashQL: a server-side mash up editor, and a Firefox add-on extension. We evaluate the response-time of MashQL on two large datasets: DBLP and DBPedia; and compare it with Oracle’s Semantic Technology. We will show queries can be answered instantly, regardless of the data size.



FLOW DIAGRAM



 




CONCLUSIONS AND FUTURE WORK

We proposed a query formulation language, called MashQL. We have specified four assumptions that a Data Web query language should have, and shown how MashQL implements all of them. The language-design and the performance complexities of MashQL are fundamentally tackled. We have designed and formally specified the syntax and the semantics of MashQL, as a language, not merely a single-purpose interface. We have also specified the query formulation algorithm, by which the complexity of understanding a data source (even it is schema-free) are moved to the query editor. We addressed the challenge of achieving interactive performance during query formulation by introducing a new approach for indexing RDF data. We presented two different implementation scenarios of MashQL and evaluated our implementation on two large datasets.



No comments:

Post a Comment