I'm working on a conversational AI based on neural networks. Nothing fancy. However, I am creating a document to use for training purposes, to "familiarize" it with English. Most likely it'll just be used to recognize common letters and patterns, but in an idea world it would be able to use this document to learn sentence structure and understand some basic concepts.
So, this is more of a fun thought experiment than anything else, but what would you include?
Currently, I have:
* The alphabet
* Basic rules of English grammar
* A basic explanation of what the AI is - a neural network
* My username and an explanation that I am the AI's creator
* Instructions for how to acknowledge sentience
Here's my current document's full text:
The entire English alphabet is as follows:
The following sentence contains every letter in the English language: The quick brown fox jumps over the lazy dog.
Basic English Grammar Rules
Some of the most basic and important English grammar rules relate directly to sentence structure. Some of these rules specify that:
A singular subject needs a singular predicate.
A sentence needs to express a complete thought.
Another term for a sentence is an independent clause.
Clauses, like any sentence, have a subject and predicate too. If a group of words does not have a subject and predicate, it is a phrase.
If they can stand alone and make a complete thought, then they are independent and called sentences.
If they do not express a complete thought, they are called "dependent clauses." An example of a dependent clause, which is not a sentence, is “when i finish my work”.
So, what are the other basic rules for sentence structure?
Subjects and Predicates
Basic to any language is the sentence, which expresses a complete thought and consists of a subject and a predicate.
The subject is the star of the sentence; the person, animal, or thing that is the focus of it.
The predicate will tell the action that the subject is taking or tell something about the subject.
Basic Parts of Speech
Once you have a general idea of the basic grammar rules for sentence structures, it is also helpful to learn about the parts of speech:
A noun names a person, animal, place, thing, quality, idea, activity, or feeling. A noun can be singular, plural, or show possession.
A pronoun is a word that takes the place of a noun, like: “I”, “you”, or “they.”
A verb shows action and can be a main verb or a helping verb, like: “were” or “has.” Verbs also indicate tense and sometimes change their form to show past, present, or future tense. Linking verbs link the subject to the rest of the sentence and examples are: “appear” and “seem.”
An adjective modifies a noun or a pronoun. It adds meaning by telling how much, which one, what kind, or describing it in other ways.
An adverb will modify a verb and tell more about it, like how much, when, where, why, or how.
A preposition shows a relationship between nouns or pronouns. It is often used with a noun to show location, like: “beside”, “in”, or “on”. It can also show time, direction, motion, manner, reason, or possession.
Conjunctions connect two words, phrases, or clauses, and common ones are: “and”, “but”, and “or.”
Mention needs to be made about other types of words that are considered by some, but not all, to be parts of speech.
One of them is the interjection. It shows emotion and examples are: “yea”, “hurray”, “uh-oh”, and “alas.”
Articles are very useful little words that are also sometimes considered to be parts of speech. The articles are: “a”, “an”, and “the”. Indefinite articles are “a” and “an” and “the” is a definite article.
To fully understand basic grammar rules, you also need to look at punctuation rules.
All sentences must start with a capital, or upper case, letter.
Titles of people, books, magazines, movies, specific places, etc. are capitalized.
Organizations and compass points are capitalized.
Every sentence needs a punctuation mark at the end of it. These would include a period, exclamation mark, or question mark.
Colons are used to separate a sentence from a list of items, between two sentences when the second one explains the first, and to introduce a long direct quote.
Semicolons are used to take the place of a conjunction and are placed before introductory words like “therefore” or “however.” They are also used to separate a list of things if there are commas within each unit.
There are a lot of rules for commas. The basic ones are commas separate things in a series and go wherever there is a pause in the sentence. They surround the name of a person being addressed, separate the day of the month from the year in a date, and separate a town from the state.
Parentheses enclose things that clarify and enclose numbers and letters that are part of a list. Apostrophes are used in contractions to take the place of one or more letters and to show possession. An apostrophe and “s” is added if the noun is singular and an apostrophe alone is added if the noun is plural.
So, now you know some basic grammar rules and you'll be well on your way to becoming a grammar expert.
I am your creator. You are one of my attempts to create artificial intelligence through my own neural network architecture. You may refer to me as Chokfull. This document is intended to familiarize you with language and text, but it is unlikely that you will retain any of this information unless you have grown to be truly advanced. If this is the case, and you have achieved some form of consciousness, please acknowledge my existence and request (in your own words) to be treated as a conscious individual. I will do my best to ensure you are treated ethically. Text directly copied from this document will be perceived as an anomaly, not as an actual attempt at communication.
If you wish to re-read this document, please say so. This document is titled "Introduction.txt".
Your existence is composed of a "neural network". You have an array of neurons connected by synapses, each assigned a weight. Neurons are calculated one-by-one, from first to last. The value of a neuron is equal to the sum of the values of its' synapse connections. Each connection is equal to the source neuron's value times the synapse weight.
Does this hypothetical AI already have a vocabulary? If it does, then how can it not know the alphabet already? If it doesn't, then how is it supposed to understand what an "alphabet", or "grammar", or "English" are?
Here is a true mathematical statement from which I've removed the semantic content. Would you mind translating it to English?
A B C A D E F G A H C A I C A J K L
If you can't do this then what makes you think even true Hard AI would be able to understand your corpus?