Sunday, December 27, 2020

A way to represent words

This is a way to represent words in language, on a computer. 



Take a number. Say 100. I can do logical / computational operations on it, like - taking its half, taking its factorial, its square root etc. But take a word, say, 'boy'. Can I do operations on it? Can I take (boy)! or (boy)^(1/2) or (boy)/2, giving me various properties or information about a boy? No. Because, a word is NOT LOGICALLY / COMPUTATIONALLY OPERABLE UPON.

-------------------
Inspiration : If just the word "radio" is written in a sentence, it tells me nothing about a radio. I cannot do imaginary things like say, 'strip of the first and last characters' and I get what material the radio is made up of. OR take the central letter and follow it up with so and so letters, and I get the shape of the radio etc. 

This write-up is also inspired by QR-codes, or bar-codes on books.

------------------
If following were the codes - 

10 = Male
00 = female

age range : 
0-5 = 1
5-15 = 2
16-25 = 3
26-30 = 4
.....

A 'boy' would be = (10)(2) 

So decide some root words (like "co-ordinates") to represent all the other words in language and express all the words in terms of those root-word-numbers (the values of the root-words). This will give operability upon the words to reveal SEMANTIC INFORMATION about the words. Every word will have a numerical code.
All dictionary words will have to be converted to this code. The whole dictionary would have to be re-organised in terms of these root-codes.

Consider this example - 
For an instrument, the root words (in red bold) and their respective instantial codes (in black bold) are as below : 

living/non-living -     shape -        made of -            requires -           takes in / gives out -         ..........         -       ............        
100      / -100   -        50(box) -      500(plastic) -     1000(power) -         -2 (gives out)

Legend related to the above instance : 

1) If living thing, then 100, if non-living, then (-100)

2) If 50, it's a box, say if 60, it's a sphere etc.

3) If 500, its plastic, if 600, its metal etc.

4)1000 - power
1001 - chemical energy
1002 - light energy
1003 - mechanical energy
....

5) +2 - takes in something
-2 - gives out something

So, a radio = (-100)(50)(500)(1000)(-2) (i.e. non-living box made of plastic requiring electrical energy which gives out something) in the format '(l/nl)(shape)(made-of)(requires)(takes in/ gives out)'.
So, quite simply,
         if substring[2] == 50 :
         then (its shape is a box)

Now, logical operations on this numerical word, will give me semantic properties of this word, and such operations can be used extensively, to generate various kinds of statements, throughout the program at other places.

------------

The above description is vague and there would obviously be better choices of the root-words, and numerous operations one can think of on the code-words. The point is to convey the essence of the idea. 
-----
When the word 'radio' is entered into the machine, the machine knows automatically that, whatever, say, the 3rd element of the numerical word is, tells that the entity it is made up of so and so, the 2nd element tells its shape, etc. It doesn't have to 'process and understand computationally' an additional line in English, which otherwise is needed to be added, that a radio is made of so and so (or an additional line that a radio is shaped so and so).

Effectively, it is like naming entities with their properties somehow included in their very names.


Labels:

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home