Are you the sort of person who looks at Elvish script and realizes with excitement that those gorgeous squiggles actually mean something? When watching Avatar, did you find yourself trying to work out certain Na’vi words? Have you ever wished you could make up your own language, whether as something to share with your friends when you were young, or else a deeper element to flesh out your personal universe? If so, you’re damn weird. But you’re also lucky, because this article for the layman is going to make your linguistic dreams a reality.
We’re going to talk about language. More specifically, we’re going to talk about what happens when we take language apart into its little recognizable pieces, because it’s from these pieces of real, natural languages that we build our constructed languages, or conlangs. Essentially, you’re going to get a crash-course in the different facets of language, and then a little lecture on what this means to someone who wants to create one.
Because language is art, in the same way that architecture is art. It’s a foundation for society, and a lot of people don’t think of it as art because it isn’t something that a lot of folks sit down in front of their computer with a keyboard and a tablet and do. But to those with the knowledge necessary to be an architect, there is no finer creativity, and even for those with an appreciation of architecture, there is something beyond the science. This article assumes that you are like this latter party: you know little about the science behind language, but whether you’re walking onto a bus with five different languages being spoken around you, or walking into a Star Trek convention surrounded by Klingon, you know language is awesome. We will expand this by assuming that you also want to be able to create a masterpiece yourself – which I will be more than happy to help you with.
I am a trained linguist myself, a member of the Language Creation Society, and I have created a few languages, including Fyelli (an isolating language with a complex system of determiners and a phonology made for a species without lips and gigantic teeth), Mésylþo (an ongoing project in which I attempt to create the most beautiful possible language), Lutrin (a language for otter-people using an entirely different set of oral and nasal articulators from humans), and a reconstruction of Ta’agra (the language of the Khajiit from Bethesda Softworks’ The Elder Scrolls series). It is from this basis of experience that I write this article.
Which will officially begin now.
1 – PHONOLOGY – What does it sound like?
2 – ORTHOGRAPHY – How does it read?
3 – GRAMMAR – How does it work?
4 – SEMANTICS, PRAGMATICS, ETYMOLOGY, AND THE LEXICON – What does it mean, and how is it used?
The most basic aspect of a language is its phonology. This is not so for every language – a purely written language, or a language that communicates manually (ie a sign language) will not have a phonology. Unless, of course, your alphabet is phonetic (like the IPA), or your sign language uses units below the morphological (like some Native American sign languages). If you don’t understand these terms, don’t worry: that means they’re not important yet. We’ll touch on them at times later anyway.
For now, we will simply define phonology as the individual sounds in a language. For instance, the R, A and T in “rat” are each individual phonemes. Specifically, in American English, they are /ɹ/ /æ/ and /t/. The word “rat” thus has three phonemes, which are represented using symbols in the International Phonetic Alphabet, or IPA. The word “caught” rhymes with “rot”, and you can see it in its IPA transliteration: /kɒt/. There are many, many different phonemes that can be uttered by humans. Nearly every single one of them is used by a language somewhere in the world, and it is represented somewhere in the IPA as its own special symbol or combination of symbols. This article will not strive to teach you everything about the IPA: you can learn about it on its Wikipedia article. It will, however, use the IPA.
What you should know is that there are different classes of phonemes. The obvious is consonants and vowels. These are further separated: let me go over the basics really quick.
Consonants are separated mainly by three things: where they are articulated in the mouth, how they are articulated, and whether or not they are voiced - that is, whether your voicebox is doing any work while you’re saying it. All of these are easy to notice at least trivially once you know what to look (or rather, listen) for.
Place isn’t hard. Think of your mouth divided into sections, lengthwise. You’ve got your lips. You’ve got your teeth. You’ve got this ridge behind that spot (called the alveolar ridge, but you don’t really need to know that). Behind that is a flatter area called the palate, which leads to your velum, and after that you’re working your way down your throat to the glottis. In order, then: labial (lips), dental (deeth), alveo-dental, alveolar, alveo-palatal, palatal, velar, uvular (you know, the dangly thing in the back of your mouth), pharyngeal (like that hard Arabic H sound), and glottal.
Now, perhaps while you were reading that, you were moving your tongue to sort of feel what I was talking about. That’s good; that’s what you should be doing. Try this: Say “pa”. Say “ta”. Say “ka”. Then for bonus points, say “uh-oh”. At each of those words (and the - part in uh-oh), notice the consonant sound being produced successively further back in your mouth. That’s all it really is.
Now, you'll also notice you use different parts of your tongue to produce different things, or else your lips. Things around your lips are labials, all in their own category. Everything from your teeth to the alveolar ridge is coronal: it uses the tip (crown, get it?) of your tongue, and then you can divide that by apical (the tip of your tongue) and laminal(the blade). If you can't tell here which is is, trying breathing in rather than out while making the consonant and feeling the air on your tongue. Everything at the palate and behind is dorsal. This is handy to know because languages tend to treat these three categories (and two sub-categories) similarly to one another. So a language may not have any voiced dorsal consonants (because those are harder) but may be fine with voiced coronals and labials.
Manner of articulation isn’t hard either. You actually probably did this in elementary school. You’ve got what are called stops or plosives, which are what they sound like: you say them, you get a burst of air, and that’s your consonant. The pa-ta-ka thing back there, those are all plosives. If you’re not sure if something is a plosive, try “holding it down”. You cay say “mmmmm” or “lllllllll” or “ssssss”, but you can’t say “kkkkk”. Unless you’re a Korean zerg player.
Other manners include:
Nasals – things like [n], [m], and [ŋ] (like in thing). As the name implies, the sound for these comes out of your nose rather than your mouth. Try plugging your nose and saying things things. Fun for the whole family.
Fricatives – these are pretty common. Think things like [s], [f], [θ] (like in thistle), and [ʃ] (like in shoe). Say it long and hard: if you canfeel the air between your tongue and the place of articulation, that’s probably a fricative. Especially noisy ones, like [s] and [ʃ], are called sibilant fricatives.
Affricates – They're a plosive followed immediately by a fricative. Things like [tʃ] (as in cheese) are affricates. They don't always have to be around the same place: [pf] and [ks] are other common affricates. Affricates can also have a lateral at the end instead of a fricative, ie [tl].
Approximants – These are those nice warm consonant noises that are almost like vowels. You won’t feel the air, because there isn’t really a whole lot of air blockage going on. In American English, these are [ɹ], [j], and [w] (R, Y, and W).
Laterals – These are approximants where your tongue is to the side of your mouth. In English, this is [l] (L). There are other kinds of laterals, too, namely lateral fricatives, which are like voiced fricatives with your tongue on the side of your mouth.
Trills – Think like a rolled R ([r]): these are things where your tongue (or your lips – try it!) vibrates up and down rapidly as air moves over it.
Now, for voicing. What is a voiced consonant? Well, it’s a consonant that has your voicebox going when you say it. Place your finger on your voicebox. Say “ffffff”. Say “vvvvv”. These two phonemes are exactly the same, except one does not have your voicebox going, while the other very clearly does. Most unvoiced obstruents (plosives and fricatives) have voiced equivalents, and vice-versa. /p/-/b/, /t/-/d/, /k/-/g/, /s/-/z/, etc.
I’ve skipped over a very interesting thing, a whole new dimension, and that is the aistream mechanism. Most languages, including English, only have what we call pulmonic egressives – that is, air is coming out of your lungs and noise is made in your nose and mouth. However, some have things like ejectives, which are glotallic egressives (there’s closure of the glottis – say “yup” while popping the P and that’s an ejective), pulmonic ingressives (like the whisky-glugging noise you hear in Bugs Bunny), and lingual ingressives, where there’s closure in the back and your tongue moves to create a low pressure in your mouth, so when it releases it pops! This is called a click, and you hear them in some African languages.
For a nice full list of all the consonants out there, and even sound samples to help you produce them yourself, there’s a Wikipedia article aptly called “Consonant”. With this tutorial and perhaps a little help from it, you can literally make any language sound in the world!
Now we understand consonants, a quick run-down on vowels. These are a lot more complex, which means I’m not even going to bother trying to really outline things and will simply say that vowels are high (like /i/ in “peach”) or low (like /ɒ/ in “rot”), front or back depending on where they are formed in the mouth, and rounded or unrounded (which, yeah, is whether or not your lips are protruding and round, like the letter and sound “O”). In some languages, like French, a differentiation is made between nasal vowels (where sound comes out your nose) and oral ones.
Another differentiation that's not a technical differentiation, but one that is useful in feature-based phonology, is tenseness. If you look at the vowel chart on Wikipedia, you'll see that there are some vowels that are closer to the middle than others: these are laxer, while the ones around the edge are tense. Actually, if you divide that chart into three columns and three rows, each vowel will be differentiable from one another depending on whether it's: low/mid/high, front/central/back, tense/lax, and round/unround.
For a vowel system to look natural, it should be fairly symmetrical – that is, there should be about as many back vowels as front vowels – and back vowels should be rounded if there’s an unround equivalent, and vice-versa for front vowels.
Diphthongs are something else to think about. I have a habit of overcrowding my languages with diphthongs; try not to do this. English has four: /aɪ/ like in "file", /eɪ/ like in "fail", /oɪ/ like in "foil", and /aʊ/ like in "foul". Notice that they tend to start at a mid or low level vowel, and end in a high, lax vowel. This isn't always the case: Finnish has some falling diphthongs, that start high and go low. However, it's a good rule to follow that most languages treat diphthongs like English does, and there are far fewer diphthongs than monophthongs (single vowel types).
There’s one final thing you can do for fun with vowels, and that’s employ a pitch accent or tonality system. Pitch accent systems are ones in which there is either two or three set patterns of pitch movement as you say a word – so your voice goes up and down, like in Swedish – or there’s one point in a word where your voice goes up, kind of like the stress in English (but a different system). Tonality is where each and every system has a certain pitch or pitch pattern, like in Cantonese – they can be high, low, mid, mid-high, mid-low, falling, rising, rising more… tonality is fun. Hard, but fun. Or you can just be like English and have stress, or like Irish and have stress, but always in the same place. Or nothing at all.
Like consonants, vowels have their own handy Wikipedia article. Have a look. And with that, phonology categorization lesson over!
Phonotactics and Choosing your Phonemes
A language requires at least 12 or so distinct phonemes. What do we mean when we say “distinct”? We mean that a phoneme is able to differentiate, on its own, between two words. Another way of putting it is that it has one or more minimal pairs. Still confused? Let’s have a look at these two familiar English words:
These two words are exactly the same apart from a single phoneme. In one word, this phoneme is /k/; in the other, it is /g/. They are completely different words as well: we know that trying to wash your hands with a rake isn’t going to work out too well for you. Therefore, /k/ and /g/ must be distinct. However, in some languages, such as Finnish, this difference doesn’t exist!
On the flip-side, consider two phonemes that are technically different: /k/ like in “back” and /kʰ/ like in “cab”. Can’t tell the difference? Well, that’s because they are allophones in English: they’re the same sound as far as we care. If you really want to tell the difference, put your hand in front of your mouth and say the aforementioned two words: “cab” will have a blast of air right after the C, whereas “back” will not. Believe it or not, in some languages, these two sounds have minimal pairs: you might have a “cab” where there is a blast of air, that means one thing, and another “cab” with no blast of air that means something entirely different. And this is really neat for us, because we can make languages that have these phonological oddities!
And you can go wild, depending on how you want your language to seem, and how realistic you want it to be. Maybe your language lacks all unvoiced consonants. Maybe it doesn’t have any consonants behind the alveolar ridge. Maybe it has no rounded vowels. You can omit certain phonemes and add certain other ones willie-nillie, but this sort of blanket-statement (perhaps with a few exceptions) can really give your language a very specific sound when spoken. For instance, a language with no unvoiced consonants will sound very earthy, probably quite primitive, tribal, or violent. Try it: say a sentence, but replace all your unvoiced consonants with their voiced equivalent (s turns into z, k turns into g, etc.) Neat, huh?
Let’s talk a little about syllables now. For a long time, we thought a syllable always had a vowel in it of some sort. Even if it’s just the basic schwa (ə), which is a very lazy sound you make automatically in English when you want to say a syllable that has no other vowel. For instance, in “little”, it’s /lɪ.ɾəl/. Turns out, there are languages out there with monstrous consonant clusters that should rightly make them impossible to pronounce.
What phonemes you allow to be together and which you do not can be very important. For instance, in English, we never, ever have a consonant immediately following M at the beginning of a word. However, in Swahili, you have words like “mtoto". On the other hand, in Japanese, you can never have a consonant at the end of the word… unless it’s N. And for you Japanese-speakers out there who talk about “silent” Us and stuff, well, those Us aren’t silent, they’re voiceless. So there. Anyway, this is yet another method by which you can customize your language and make it have the feel you want it to. Are huge consonant clusters allowed, like they are in German or Nuxalk? Are you able to have strings and strings of vowels? Or is every single syllable always just a consonant, a vowel, and a consonant? This is important to listening to your language and parsing words as well. You could take courses and courses in phonology to learn what kind of patterns are out there – for instance, /t/ likes to turn into [ts] before front vowels – but what I might actually recommend is getting a little alcohol in your system and trying to speak your language, and notice what’s easier to say, and what kinds of mistakes you make. Two years of phonetic training in one bottle!
Finally, although it’s not too important to most conlangers, you can think about syllable timing. Some languages, as discussed before, like English, use a stress system, so the stressed syllable(s) are a little longer. Some use a mora timing, like Japanese, so each consonant+onset, each extra-long vowel or consonant, and each consonantal coda (the last consonant in a syllable, like the T in “rat”) have a mora, which is a fixed unit of timing. This gives the language a nice flow.
Not all languages have a form of writing, but that’s what orthography is. And there are many different ways to do it. Let’s first look at different kinds of writing systems.
The first, and oldest, is logographs: basically, little drawings that represent words you’ve got. Egyptian hieroglyphs are in large part logographic, along with eventually becoming alphabetic. Logographs are very difficult for a conlanger to produce, and you can’t exactly type them, but they give an awful lot of very archaic-seeming culture to your speakers.
The second is ideograms. These are like Chinese characters: like logographs, each character is its own word, but the character doesn’t necessarily look like anything anymore. For instance, the character for “death” is 死, which probably has some historical reason but obviously doesn’t look like a thing anymore. Like logographs, these are a bit tedious to create, but probably less-so. Unlike logographs, though, they are hard to give diacritics or other things to indicate bound morphemes (like the plural –s in English), so languages that use them tend to either use other symbols (like does Japanese) or else, like Chinese, simply pooh-poohs morphology entirely.
The third are syllabaries. These usually comprise 30-50 or more characters that look alphabetic, but actually represent a consonant and a vowel together. Famous ones include Japanese kana and Cree syllabaries. For example, the word for one of Japanese’s two syllabaries, katakana, written in katakana, is カタカナ: ka-ta-ka-na. Syllabaries are a nice happy medium and are actually probably the most common system of writing, because thinking in terms of syllables is actually much more natural than thinking in terms of individual sounds!
Then we have abjads and abugidas. These are similar: an abjad is like Arabic, which normally only writes its consonants (unless a syllable is comprised of only a vowel). I’d like to give an example, but Semitic abjads confuse the hell out of me. Just assume that, whenever you have a consonantal onset (beginning of a syllable) and then a nuclear vowel, you only have one character: the one for the consonant. Then, if you’re feeling nice, you can write a little diacritic above or underneath saying what the missing vowel is. An abugida is the same thing, except mercifully you have to write the vowel diacritic. Things like Hindi and J.R.R. Tolkien’s Elvish languages use abugidas.
There are some other interesting writing systems out there too, like Korean hangul, which organizes each syllable into a neat little box made up of smaller symbols for each sound in that syllable. Very handy.
Finally, alphabets. You may think, “Well, duh, of course I’m using an alphabet, it’s what everyone does!” but alphabets are actually very rare in the world. There are dozens of syllabaries, several abjads, a fair few abugidas and a ton of logographs and ideographic systems that have been created naturally, for natural languages, in the natural anthropological world. The alphabet, on the other hand, has been inventedonce. It came from Egyptian hieroglyphs, made its way through now-extinct evolutions into Phoenician, then Greek, which influenced the Nordic rune system, the Roman alphabet (which we use to this day in English), and the Cyrillic alphabet used in Slavic languages like Russian. Let’s have a look.
The similarities may be hard to see at times, but this means that if you’re creating an alphabet, you can shamelessly take from others if it’s feasible that they had contact with your fictional culture.
I personally would recommend trying to stick with syllabaries and alphabets, because you can create fonts and keyboard layouts for these. I use High-Logic Font Creator and Microsoft Keyboard Layout Creator (the latter is free).
Grammar is, in my experience, the most controversial topic in linguistics. In any linguistics department, the whole staff will have an opinion on what grammar is. Is it some fundamental part of the human mind? Is it bunched up and learned in little constructs like LEGO bricks? Is syntax in everything and morphology is just a useless response to that? And so on and so forth. My training was in a specific school of thought and, being who I am, the school of thought I adhere to is the one that pits itself directly against the school of thought I was trained in, and this bias is doubtless going to come out in this article.
I will quickly define these two “divisions” of grammar: syntax is word order. It’s the thing that says “In English, you must put your subject before your verb, and your adjectives before your noun.” Morphology is things attached to a word: It says “If you want a past-tense verb, you add –ed. If you want a masculine pronoun to be the subject of a sentence, you must say ‘he’ and cannot say ‘him’ or ‘she’.”
First of all, I take deep personal offence to the notion that morphology is a response to syntax. I rather see syntax and morphology as two sides of a see-saw: if you have a ton of morphology, you need less syntax, and vice-versa. Much of my experience is with Native American languages, some of which, arguably, have no syntax at all: you can put any word anywhere, and any preference is just the natural preference of human beings to like subjects before objects. On the other hand, Chinese languages seem to have no morphology at all, and even your tense is a separate word from your verb.
Let me take those two extreme examples: in Blackfoot, a language spoken by natives of the northwestern plains of North America, you can have the sentence “Nitsikakomimmoka kitaniksi.” This means “Your daughters love me” and you can just as easily say “Kitaniksi nitsikakomimmoka”. This is because, while there are just two words, there are eight morphemes: Nit-ik-wakomimm-ok-a k-itan-iksi, that tell you definitively what the subject of the verb is, what role it’s playing (this is actually an inverse sentence, so a little like “I am loved by your daughters” except this is normal way to say it), the fact that the object of the verb is an animate third person, and everything else that needs to be said in the sentence. Order means little to nothing at all in Blackfoot, as long as all the morphemes are there.
In Chinese, the same sentence would be你的女儿爱我 (Mandarin: Nǐ de nǚ'ér ài wǒ). That’s six morphemes, and six words: You, a genitive marker, female, child, love, me. You cannot put these in any other order, or your sentence will not make sense, because the role of each one is determined by its physical place in the sentence.
Obviously, in most languages, the importance of syntax vs morphology is not all-or-none. English mostly uses syntax, but morphology is there in tense, plurality, and in our pronouns. Japanese mostly uses morphology, but you still need to have the verb at the end of the sentence or your professor will slap you upside the head.
When talking about morphology, it’s important to distinguish between two ends of a spectrum: fusion and agglutination. Fusion is when you have a single morpheme – one that can’t be split up any more – that means a lot of things. An example of fusion in English is the word “his”, which contains the third person singular, the masculine gender, and the genitive case. Agglutination is when you add lots of morphemes that each mean only one thing each. An example of agglutination is the suffix -ed on verbs: you add it to a verb to mean a single thing: it's in the past tense.
When thinking of syntax – word order – you need to consider that it tends to follow a certain pattern: to put it simply, most languages can be categorized as either head-initial or head-final. Head initial-languages, like French, tend to have their nouns before adjectives and verbs before objects, while head-final languages, like Japanese, have the opposite. This is, of course, not all-or-none: English has its object after its verb, but its adjectives before nouns.
Word order is also quite variable. SVO (subject-verb-object) and SOV are not the only variants. VSO is also very common, and there are even some languages that put the object before the subject (OSV, OVS VOS), though these are less common. There’s also a parameter called the V2 parameter that says that, in whatever word order you choose, the verb must be the second word uttered. German does this, so you have sentences like “Ich habe ihn getötet” (I have him killed), where the main verb is the last (German’s main word order is SOV), but then “Ich töte ihn” (I killed him), which has the main verb in second position, whereas it wasn’t in the first, because the auxiliary verb “habe” could fill that place instead.
For this little tutorial, I’m going to list the different pieces of grammar that English employs and how it does it. Then I’m going to talk about other kinds of grammar you’ve probably never heard of but might want to consider.
Case – Case tells you what role a noun plays in a sentence. Is it the performer of the verb (nominative)? Is it the object (accusative)? Is it the owner of something (genitive)? English has lost all of its case morphology in its nouns apart from the genitive, which is formed with –‘s (ie the cat’s ball). The dative has been entirely lost from Old English, but you used to use it whenever you had a preposition (to, with, from, etc.). Now we just use the preposition with the accusative. The nominative and accusative are now gone in nouns; obviously, we just use word order: the cat (nom) chased the dog (acc), vs. the dog (nom) chased the cat (acc). The exception is pronouns, which still have them: he chased her vs. she chased him – “he” is nominative, while “him” is accusative. Other languages have other cases that replace prepositions (like locative, which replaces “in”), denote other parts (ie adverbial), or do totally crazy other things. The language with the most cases that I’m aware of is Finnish, which has thirteen!
Person – Person when used in the context of nouns means, who owns it. The first person (me), the second person (you), the third person (him, her, it). English just does it with genitive pronouns, but you have some languages, like Blackfoot, that put a morpheme right on the noun to say who owns it.
Gender – Some languages have grammatical gender: every noun has a gender that often has nothing to do with its actual gender. Sometimes this takes the form of animacy, where each noun has an animacy that, more or less, correlates with whether or not it’s alive. Still others have noun classes, which group them semantically (according to what they actually are): ie people, animals, plants, artifacts, ideas, women, etc. English just has natural gender on its pronouns, so it’s easy, but French has genders that, among a lot of other things, tell you what word you use for “the”: “the window” is “la fenêtre”, while “the drug” is “le médicament”.
Definiteness – Some languages, like English, have definite and indefinite nouns, denoted with “the” and “a” accordingly. Some, like Swedish, use suffixes for these (it uses –en and –et for common and neutral gender respectively, for the definite). Many more languages don’t use them at all.
Number – Many languages distinguish between singular and plural. English does this, usually, with the suffix –s (cat – cats). Still others even have a dual, or a trial number, so not only do you have one cat and many cats, but a certain suffix you use for two cats, or three cats, so you don’t actually have to have the word “two” or “three” in there.
Noun Agreement – Who’s performing the action? In English, we do have a single morpheme that expresses this: the suffix –s that you use when the third person is the subject (I write vs. she writes). Other languages have morphology that tells you all about the subject: its person, its gender, its number. Some languages, including Blackfoot, even have their verbs agree with the object, too, with yet another set of morphemes.
Tense – English has two tenses: past (-ed) and nonpast. We indicate the future tense with a separate word, not inflectional morphology. Many languages only have these two tenses, or else have future and nonfuture (ie Blackfoot). Others have all three (ie Irish): past, present, and future.
Aspect – Aspect is sometimes tough to tell from tense. Aspect is what state of completion a verb is in. Some languages only have one or the other. Others, like English, have both (though English has aspect as separate words: “have run” (perfective), “is/am/are running” (progressive), etc. Still others, like Greenlandic, have neither, but it does have…
Mood – Mood is sort of the way in which you’re saying something. Is it just a statement (indicative)? Is it a question (interrogative)? Is it a command (imperative)? English has the indicative and the interrogative, which, being a fairly isolating language, it differentiates between using word order (I have thrown the ball vs. Have I thrown the ball?). English is also very bizarre in that it has what we call “do-support” in questions: Other languages might just say “Threw you the ball?” but in English, when you’re using a verb without a supporter for aspect (see above), you must put “do” in its place: “Did you throw the ball?”
Voice – Voice is, in most languages, the difference between active and passive voice. Usually, we speak in the active voice: I wrote the letter. Sometimes, though, we can switch to the passive voice: The letter was written, or the letter was written by me. As you can see, we do this with both separate words (“be”, inflected to “was” for past-tense) and the suffix –en.
This is kind of a basic run-through, but some languages do really bizarre things. For instance, in Cree, you have a fourth person that you use for every third person that isn’t the focus of a conversation. So in the sentence “That man’s son eats children”, “man” would be third person, and “son” and “children” would be fourth. It also has a person hierarchy that goes first, second, third, and the object is not allowed to be higher on the hierarchy than the subject. If the second person is the subject and the first person is the object, for instance, you have to add an inverse suffix to the verb to keep the hierarchy. It’s all very confusing, but it’s very neat and fun to get a hang of it, and even more so to use it in a conlang!
SEMANTICS, PRAGMATICS, ETYMOLOGY and THE LEXICON
This is where you start having a lot of fun – and only just now do we start to think about what most people think language creation is, and that is making up words. Now, you may just want to go through a word list, like the Swadesh List, and start just grinding out lexical items, but there are a few things you need to consider first.
The first thing is your morphology. For every word you put out, you have to think about how it’s inflected: if you add this morpheme onto it, will it be pronounceable? If not, should all words of that type have the same beginning or ending (ie all of Swedish’s verbs end in A), or should you have a phonological rule that intervenes to make it sound okay (ie in Blackfoot, the third person animate singular suffix -wa becomes –a after a consonant)? Obviously, this means that first of all you need to figure out your inflections before you start thinking about actual open class words.
The second is your language’s culture, and this means all sorts of things. First of all, should they even have that word? A culture from some imaginary place that has no birds, reptiles, fish, etc. might not have a word for “egg”. A culture that has very few eggs and does not eat them at all may have a longer word for it. Another may only have recently had contact with eggs, as they were imported from another culture, so may have a borrowed word for it. On the other hand, a culture may have a lot more synonyms for something that is important to them. Now, true synonyms don’t really exist, but for instance, English has words like snow, sleet, and blizzard, because it does indeed snow, sleet, and blizzard in England, whereas in Swahili, a language spoken around the horn of Africa, only a word for “snow” exists and is used whether it’s wet or in a storm (and it’s three syllables long).
Once you get up enough words, you can start thinking about etymology. Might two words originally be from the same word? The words “corona” and “crown” are both from the same word, which means “wreath”, and you can understand where that might come from. You can also come up with compound words, like “butterfly”, and then mess with them a little using your drunk phonological skills so they sound less lazy – that’s how we get English “sandblind”, which was originally “samblind” which meant “half-blind”. You can also use what are called derivational affixes, which are like –tion and un-: they attach to words to turn them into a different kind of word or give them a different but similar meaning. Languages with more syntax than morphology, like English, tend to have more derivational affixes, while languages that rely more heavily on morphology for their grammar tend to just use the same word for a noun, verb, etc. (meaning that for your efforts in making the morphology work, you need to create far fewer words!)
Finally, think about the internal social relations between your people. Some languages have a lot of formality embedded in them: in Japanese, there are perhaps dozens of words for “you” that you must use appropriately depending on your status and relationship with the person you’re talking to. Very often, the plural “you” is equivalent to a formal singular “you”. In fact, “you” in English was originally a plural, with the singular, less formal being “thou”, which is why we apparently no longer have a plural second person.
One last suggestion for building your language that seems to me to be a no-brainer but I see rather little of is writing examples of it. I've written poetry in my languages. I've written little stories or "famous quotes" from my characters. I've translated things like Bible passages and the Declaration of Human Rights. I want to see my language working fully before I start really doing things with it, and often I speak it into a microphone so I can hear it read back to me and hear how it sounds. Or I sing. Singing is fun!
And that's it. Thanks for reading. I may add to this tutorial later, but for now, I think it’s rather long enough. If you have questions or things you’d like to learn more about, let me know, and I may add it! In the mean time, the last piece of advice I can give to you here is to examine different languages. Look around Wikipedia to see all the bizarre and fascinating things that different languages do, and use them for inspiration! Tagalog’s verbal aspect affixes are infixes! Finnish has a complex vowel harmony system! Japanese has a three-syllable word that means “a female character in a piece of fiction that starts out cold and cranky but becomes warm and friendly as the story progresses”! There are so many bizarre linguistic things out there out there that you can get ideas from, and there’s an enormous source for them right at your fingertips!
Have fun enriching your universe with new, fascinating languages. It can take a lot of work, a lot of practice, and as you’ve seen, a lot of knowledge, but the result is more than worth it to produce automatically a huge source of immersion for your audience, both from the language itself and for the assistance it gives you in creating full and believable cultures. I look forward to seeing what you come up with myself, and hope you enjoyed and benefited from this tutorial.
Best of luck!