Most Software is Not Built to Understand Language

Most of the software we use today is not designed to understand language.

If an application allows you to input language, the purpose is often that it should either be understood by yourself or others.

Examples of text you provide to software that is meant to be understood by yourself:

  • Naming products in a product catalog
  • Notes
  • Personal tasks

Examples of text you provide to software that is meant to be understood by others:

  • Emails
  • Chat messages
  • Social media posts

And there is a good reason why most of today’s software is not designed to understand language.

Historically, it has been incredibly time-consuming, expensive, cumbersome, and has required specialized knowledge to develop software that understands language. Therefore, it has also been associated with high risk.

Historically, from the moment you come up with a language-based task that can be solved in your application, to the point where you have the first prototype, it typically takes 1-2 months for inexperienced teams.

This is due to the complicated process involved in developing AI models that can understand language.

The annotation process in particular, where humans manually review data and attach labels to each example, is a major bottleneck in the development of AI models and is typically monotonous and tedious to carry out.

In some cases, it has been possible to bypass this entire process by using open models that can solve general problems such as Named Entity Recognition or sentiment analysis. Alternatively, applications have been designed in such a way that users annotate the data.

However, for many tasks, it is not possible to design your application so that users can annotate data. Openly available models often solve very general problems, and the truly valuable language problems to be solved are often specific to the context of your own application, making open models useless.

In these cases, which I dare say are the majority, there is no way around the slow annotation process.

All of this is why most people avoid thinking about use cases for their software products where language understanding is part of the solution.

However, the situation has changed dramatically over the past two years.

The introduction of ChatGPT and the growing number of APIs that provide access to the underlying general Large Language Models (LLMs) has significantly changed this.

Today, we have models that are so general in their knowledge and interface that they are capable of solving problems they haven’t specifically been trained to solve, with impressive accuracy.

This has eliminated most of the work from the previously slow process of developing applications that can understand and meaningfully process language.

It is now possible to reach a prototype in five minutes simply by writing a prompt that instructs a general language model to solve the problem.

The cost and risk of getting started with language-understanding problem-solving in your software application have therefore almost disappeared.

Very few people I talk to think about the impact of LLMs in this way.

So far, the thinking about what problems software can solve has centered around data types that can be meaningfully processed with logic, largely motivated by the fact that it has been expensive to get started with language-understanding software.

However, much of the risk associated with integrating AI into software applications has disappeared, making it much cheaper to think of digital solutions that can understand and process language.


Posted

Action Engines and the Language-First Software Design Paradigm

Google is a search engine. Language in, documents with relevant information out.

The software products of the future will be action engines. Language in, relevant actions out.

Actions could be create X, search for information about Y, update Z, ask follow-up questions to the user, etc.

It can be either a single action or a series of actions.

Many simple single-action operations should already be possible to build with the technology we have today.

Simple multi-action operations should also already be possible, as long as they don’t become too complex.

Just like everyone started developing software “Mobile First” when smartphones became widely used, everyone will also have to start building software “Language First,” as language becomes an increasingly popular interface for computers.

I have no idea if that’s how it will be, but it’s my best guess at the moment.

And then I just started typing on my phone on the way home on the train, and suddenly this post had written itself 😂

Posted

A Statement About AI Before LLMs

A statement about AI, before Large Language Models and Generative AI went crazy.

AI is used in software to make logical decisions when the input is unstructured data, such as text, images, and audio.

I'm working on a similar statement for the "after" period.

Posted

7 Ting at Fokusere på som ML-Nørd i 2024

Jeg tror, mange af os ML-nørder undervurderer, hvor meget indsigt vi har i den udvikling, der sker inden for generativ AI lige nu.

Som jeg ser det er 2024 en fantastisk mulighed for at øge vores indsigt yderligere og på den måde gøre vores viden endnu mere værdifuld i 2025 💎

Jeg tror jeg at vi som ML-nørder i 2024 bør fokusere på følgende:

  1. Vær hands-on. Ha’ fingrene i bolledejen og nyd at være i bygge processen. Ha’ fingrene i bolledejen og nyd at være i bygge processen. Engineering er en blanding af håndværk og teori, men det står sparsomt til med teorien inden for det her felt, så som jeg ser det er hands-on den eneste vej. Så skriver vi teorien bagefter.
  2. Fokuser på ægte anvendelighed. Løser det du laver et reelt problem? Eller er det et problem som du tror nogen har, men som ingen i virkeligeheden har? Optimer for virklige problemer.
  3. Hold dig orienteret om, hvad der sker i feltet, og hold dig opdateret på den seneste viden og de nyeste open source-projekter.
  4. Dan din egen unikke mening om hvordan man bygger med generativ AI. Del den med andre. Få deres input. Bliv klogere. Gentag.
  5. Vær opmærksom på “fascinations bias”. Nogle gange bliver en teknologi/metode hypet, fordi man kan lave en fascinerende demo med den. Forsøg at se igennem fascinationen. Der kan være aspekter af metoden, som er brugbare, men det er meget muligt, at selve fascinationen orienterer folk i en forkert retning. Zoom ud, vær kritisk, og skær fascinationen fra.
  6. Slå koldt vand i blodet. Giv en kæmpe f***-finger til FOMO. Du kan ikke vide det hele, og det skal nok gå alligevel. Glem det der unicorn start-up som alle siger der er kæmpe mulighed for at start lige nu (med mindre du er i gang med at lave sådan et start-up selvfølgelige 😅). Bare gør hvad du kan for at sikre, at du opfylder punkt 2, så skal du nok bevæge dig i den rigtige retning.
  7. Husk at det er mega svært det her, og der er ikke nødvendigvis nogle best practices og rigtige svar for det du sidder og laver. Det er det vi skal opfinde sammen i 2024!


Posted

2023: Tanker om AI, Software og Teknologi

Ting som jeg har læst, tænkt, noteret, spekuleret over etc. i 2023 som her er serveret som et råt brain-dump. Da det er rimelig råt, så kan redundans forekomme 🙃

  1. Meget tyder på, at deep learning-modeller lærer det, der er at lære i de data, de trænes med. Hvis de ikke lærer det, skyldes det enten uklarheder i data eller at der ikke er tilstrækkelig information i data til at løse opgaven (se Universal Approximation Theorem). Fremtidige fremskridt i AI vil derfor sandsynligvis i højere grad blive gjort mulige ved at udvikle metoder til at tilvejebringe data af højere kvalitet end ved at udvikle nye modelarkitekturer.
  2. Alle undervurderer evaluering af sprogmodeller. Det er ærgerligt, da evaluering kommer til at blive noget af det vigtigste at tage seriøst, hvis man vil lave imponerende og værdifulde ting med LLM'er. Derfor, væn dig til at tænke i evaluering af AI-systemer, og gå aldrig i gang med et AI-projekt uden at have en strategi for evalueringen af det. Evalueringen behøver ikke at være perfekt fra starten, og man skal regne med at udvikle evalueringen af systemet løbende.
  3. Når du arbejder med teknologi, bør brugernes behov altid komme før teknologien. En "bruger-først"-tankegang er afgørende for succes, i modsætning til en "teknologi-først"-tankegang. Det lyder virkelig som en floskel, men det er seriøst vigtigt, og det kræver disciplin at praktisere.
  4. Mennesker har en forenklet forståelse af, hvordan verden faktisk er. Det kan være meget uproduktivt at bilde sig selv ind, at ens forståelse af verden er perfekt. Hvis man gør det, ender man med at sidde med sine peers i et mødelokale og bilde sig selv ind, at man kan diskutere sig frem til, hvad ens brugere vil have. Det er i høj grad nemmere sagt end gjort og kræver disciplin konstant at udfordre sit syn på verden.
  5. Den generelle sprogforståelse i store sprogmodeller (LLM'er) er totalt undervurderet. Med LLM'er er det blevet mange tusinde gange billigere og hurtigere at lave et proof of concept på skræddersyede NLP-klassificeringsmodeller end tidligere – det kan nu praktisk talt gøres gratis. Det giver et kæmpe potientiale for at udvikle "oldschool" NLP, som rigtig mange overser, fordi det generative aspekt af LLM'er er så fascinerende.
  6. En brugbar måde at tænke på store sprogmodeller (LLM'er) er som et nyt interface, på linje med smartphonen, der hjælper brugere med at interagere med software. Smartphonen gjorde interaktionen med software mere mobil. LLM'er gør interaktion med software mere naturlig og kan håndtere en hel masse kompleksitet for brugeren. Gode eksempler er virksomheder, der har et komplekst produkt, som er svært for brugerne at betjene, navigere og forstå selv, fx et regnskabsprogram eller en bank. På nuværende tidspunkt er brugerens interface til de komplekse produkter ofte kundeservice. Det kommer LLM'er til at kunne hjælpe kundeservicemedarbejdere med og på sigt overtage helt. Rent engineeringmæssigt tænk:
    1. Input: Brugerintention formuleret i naturligt sprog.
    2. Mellemliggende output/input: Opfølgende spørgsmål og svar.
    3. Endelig output: Sekvens af funktionskald i det relevante software.
  7. Hvis store sprogmodeller (LLM'er) fortsætter med at blive mindre, hurtigere og billigere med samme hastighed, som de har gjort i 2023, så kan de skabe muligheder som at svære at forestille sig på nuværende tidspunkt på samme måde, som fremskridt i computerkraft har gjort det ifølge Moores lov.
  8. Udviklingen inden for 7B LLM-modeller går så stærkt, fordi de er tilgængelige at køre, træne og eksperimentere med for open source-entusiaster.
  9. Teknologifascination skygger ofte for virkelige forretningsproblemer.
  10. Hvis man bruger meget tid på at tale om AI use-cases frem for forretningsproblemer, er det enten fordi man er i gang med at lære at forstå, hvad AI grundlæggende kan, eller også er man som ekspert i gang med at formidle/sælge AI-løsninger til nogen, som ikke forstår det.
  11. Hvorvidt en teknologi er en succes, er 100% korreleret med hvor meget den bliver brugt. Biler er en succesfuld teknologi, fordi de bliver brugt. ChatGPT er en succes, fordi den bliver brugt. Tandbørster er en succes, fordi de bliver brugt. Så hvis du gerne vil lave teknologi, som bliver en succes, så handler det om at få nogen til at bruge din teknologi.
Posted

Language First

Ten years ago, many companies were concerned with a concept called "Mobile First," which prioritized developing software that worked well on a new interface gaining massive user adoption: the smartphone.

Within the near future, many companies will be concerned with a concept known as "Language First." This approach involves developing software that works well with an emerging interface gaining widespread use: Large Language Models (LLMs).

The chatbot interface has made LLMs famous. However, many things are pointing towards LLMs being not only great for chat but also for interacting with software systems through language. Because of the LLMs' reasoning capabilities and their ability to use tools, this new interface can help humans handle much of the cognitive load involved in learning and using complex systems.

In the near future, we as humans will be able to interact with much more complex software systems without being experts. Imagine if everyone could handle the most complex systems relevant in their daily lives as an expert would.

So, how are we as software developers going to design software for this future where "Language First" principles are going to dominate?

As the "Mobile First" paradigm was obsessed with graphical user interfaces and user experiences, the "Language First" paradigm will be obsessed with user intent. 

What problems does the user have that they can express in plain, non-expert layman language, and how can we use AI to understand it and orchestrate functionality to help the user achieve their goal?

This will involve organizing systems of prompts to build the reasoning framework for orchestrating functionality. Also it will involve indexing and organizing functionality (basically endpoints) and processes to give the LLMs the best possible conditions to do a good job.

Much like search intent which is a big thing in natural language information search, user intent will be the main focus in functionality search and execution.

To be continued... (I'm not done writing yet 😅)

Posted

Beautiful AI-based Products

If your product has a UX element that aligns user interactions with the goal of your AI models, then you have a beautifully machine-learning-based product.

Think of Midjourney. You write a prompt, get 4 generated images. Midjourney's goal is to generate the best possible images given a prompt. The user's desire is to get the best possible image from a prompt. There is 100% alignment between the product's and the user's goals. And you don't need to give advanced instructions to Midjourney users on selecting the best picture. Their human intuition guides them effortlessly. The result is invaluable data for Midjourney to train their models.

Think of Netflix. Occasionally, while you're watching Netflix, you will be asked, 'Are you still watching X?' Netflix understands how crucial this subtle piece of information is for their recommender systems and the user experience of the product. The recommender system needs to know if you are actually watching to determine what to recommend next. And if you fell asleep, it's convenient that Netflix can predict this and automatically stop the series for you, so you don't have to search for where you stopped watching. When you are prompted with 'Are you still watching X?', you as a user have a 100% interest and need to give Netflix almost the perfect information they need to evaluate their system.

More examples of products that do this are Google Search and every social media feed. Huge companies are built by integrating UX and AI to craft superior products. Yet, not many consider AI and UX in this integrated manner. It should be a consideration in every UX decision made.

Posted

Prompting Patterns: The Clarification Pattern

The more I use ChatGPT and develop software using LLM APIs, the more I realize that context is essential for LLMs to provide high-quality answers. When I use ChatGPT and receive unsatisfactory answers, it's typically due to a lack of information about the problem I'm presenting or my current situation. I often notice that I might be ambiguous about the task I want ChatGPT to solve, or ChatGPT perceives the issue in a manner I hadn't anticipated. However, I've observed that by adopting a simple pattern, I can significantly reduce these challenges, consistently leading to more accurate responses.

The pattern is as follows:

  1. Me: I instruct ChatGPT to perform a task. I tell it not to respond immediately but to ask clarifying questions if any aspect of my instruction is unclear.
  2. ChatGPT: Asks clarifying questions.
  3. Me: I answer the questions and tell it again not to execute the instruction but to ask further clarifying questions if any part of my answers is unclear.
  4. ChatGPT: It does one of two things.
    a) Asks additional clarifying questions. If this happens, return to step 3.
    b) Indicates it has no further questions. If this is the case, proceed to step 5.
  5. Me: I give the command to execute the instruction.

I call this the "Clarification Pattern." Recognizing this approach shifted my perspective from viewing prompt engineering solely as individual prompts to thinking in terms of human-AI conversations. Through these dialogues, I can build valuable context by clarifying ambiguities in both my understanding and that of ChatGPT, thus providing ChatGPT with the optimal conditions to deliver an excellent response.

Posted

Text Classifiers are an Underrated Application of LLMs

Before LLMs really became a thing, getting up and running with a text classifier for a non-standard problem from scratch, including the annotation of a dataset for training, would probably take at least 3 weeks of work hours. That amounts to 7,200 minutes. Today, getting up and running with a classifier using LLMs requires only writing a prompt, which takes about a minute.

That's a 7,200x productivity gain in the initial process of working with text classifiers.

One thing to note, however, is that in the 1-minute prompt scenario, you have collected zero data and therefore have nothing to measure your classifier's performance against. However, since you have a classifier, you can annotate much more efficiently using an active learning approach, and you have 7,199 minutes to knock yourself out with evaluating your classifier.

Everybody talks about chatbots and agents as the hot new thing, but honestly, a 7,200x productivity gain in text classifier development is also pretty huge!

Posted

Thought Debugging

Previously, tuning text classifiers required annotated datasets. This process entailed splitting the datasets into training and test sets, fine-tuning the model, and measuring its performance. Often, improving accuracy meant analyzing incorrect predictions to hypothesize about what the model failed to understand about the problem. Solutions for improving performance could involve adding more annotations, tweaking the annotation protocol, or adjusting preprocessing steps.

However, with the rise of Large Language Models (LLMs), the focus has shifted towards crafting effective prompts rather than constructing datasets. If a model doesn't respond accurately to a prompt, it is fine-tuned by adjusting the prompt to accommodate potential misunderstandings. A significant advantage of LLMs is their ability to explain the reasoning behind their predictions. This interactive approach allows users to probe the model's understanding and further refine prompts. Moreover, these models can express their thought processes, not only enhancing their performance but also introducing a technique that can be termed "Thought Debugging". This allows for the diagnosis and correction of their cognitive processes.

Posted