Tuesday, March 2, 2021

An idea for Search Engines

 An idea for search engines :  ‘Splits’ and ‘Wh-words at the boundaries’.



Let me first present a concept of ‘Pauses’ within sentences. Let’s call them splits.


E.g. - 

Put the book on the table.

The splits here are - 

Put the book | on the table. A split is where there is a slight pause, while speaking.


Another example - 

The doctors called the operation team on the next day.

The splits here are - 

The doctors called | the operation team | on the next day.



A more rigorous and technical definition of a split - 


There is a split between 2 consecutive words A and B '....A | B ….' in a sentence, if putting the splits as ‘…. | A B | ….’, and thus pronouncing the sentence with pauses before and after A and B respectively, and pronouncing A and B consecutively quickly, (and also pronouncing the words without a split between them quickly after one another) would make the pronunciation of the sentence sound “odd” (or against the standard rules of pronunciation). 


So, consider the first example above - Put the book | on the table.

The adjusted splits here would be - 

Put the | book on | the table.

Now try to pronounce this with pauses at these splits, and saying the words without a split between them quickly after each other, and thus also say the words ‘book’ and ‘on’ quickly after each other. That would lead to a pronunciation like : Put-the…...book-on…...the-table. That’s clearly odd.



Now that we have defined the splits, here is a crucial property of the splits - 

Every chunk generated with the splits is an answer to a wh-question asked to the immediately previous chunk.


Illustration - 

Put the book |where| on the table.

The doctors called |whom| the operation team |when| on the next day.


This can be exploited in search engines. An answer to a query lies in a chunk in a sentence (on a web-page), if the previous chunk to it contains the “main” part of the query and the Wh-word of the query is the wh-word at the split between the two chunks. 



Illustration  -


Query : Where do flamingos migrate in summer?


Now, just the fact that all the keywords are present in a sentence doesn’t mean that that sentence is answering the query. 


Suppose there is a sentence on the web - In summer, flamingos migrate to the wetlands.

This splits as - In summer | flamingos migrate | to the wetlands.

See the second split. The wh-word that lies at it is ‘WHERE’ which is the Wh-word of the query. And the “main” part of the query, which is ‘flamingos migrate in summer’ is what precedes the chunk to the right hand side of the second split. Since these 2 conditions are met, the right-hand-side-chunk (to the second split) - ‘to the wetlands’ - is the answer to the query. 


This sentence will be spotted by the existing prevalent technique also (of keywords), but there can be another sentence like say - 'In summer, the Indian players migrate to Australian grounds, where wings of flamingos are seen dispersed'. This sentence wont be picked up by the technique described in this write-up, which the keywords-technique would do so.


                                               ********************************


So, the help needed is as follows - 

IS IT POSSIBLE TO PROGRAM THESE SPLITS AND THE WH-QUESTIONS AT THE BOUNDARIES (BASICALLY WHAT IS DESCRIBED IN THE TECHNIQUE)?


Labels:

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home