Textus

35 Matching Annotations

Mar 2023
www.nytimes.com www.nytimes.com

How ChatGPT Could Embed a ‘Watermark’ in the Text It Generates

2
1. anjadjATdiplomacy.edu 09 Mar 2023
  
  in Public
  
  When artificial intelligence software like ChatGPT writes, it considers many options for each word, taking into account the response it has written so far and the question being asked.It assigns a score to each optio
  
  ||JovanK|| dobra vizuelizacija kako radi algoritam :)
2. anjadjATdiplomacy.edu 09 Mar 2023
  
  in Public
  
  In the end, about 70 percent of the words in the generated text were on the special list — far more than would have been in text written by a person. A detection tool that knew which words were on the special list would be able to tell the difference between generated text and text written by a person.
  
  ||JovanK|| Kada bismo imali listu specijalnih reci koje algorithm vise forsira, a korisnik ne zna koje su to reci - ne bi znao sta da menja da zavara detektore :)
Visit annotations in context

Annotators

anjadjATdiplomacy.edu

URL

nytimes.com/interactive/2023/02/17/business/ai-text-detection.html
Apr 2022
www.pinecone.io www.pinecone.io

Domain Adaptation with Generative Pseudo-Labeling (GPL) | Pinecone

19
1. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  When we later tune our model to identify the difference between these positive and negative passages, we are teaching it to determine what are often very nuanced differences.
2. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  Adding these ‘negative’ training examples (Q, P-) is a common approach used in many bi-encoder fine-tuning methods, including multiple negatives ranking and margin MSE loss (the latter of which we will be using). Using hard negatives in-particular can significantly improve the performance of our models [3].
3. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  Excluding the positive passage (if returned), we assume all other returned passages are negatives. We then select one of these negative passages at random to become the negative pair for our query.
  
  remember
4. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  remember
5. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  Yes, those returned results are the most similar passages to our query, but they are not the correct passage for our query. We are, in essence, increasing the similarity gap between the correct passage and all other passages, no matter how similar they may be.
6. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  It may seem counterintuitive at first. Why would we return the most similar passages and train a model to view these as dissimilar?
7. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  Excluding the positive passage (if returned), we assume all other returned passages are negatives. We then select one of these negative passages at random to become the negative pair for our query.
8. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  The negative mining process is a retrieval step where, given a query, we return the top_k most similar results.
9. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  To fix this, we perform a negative mining step to find highly similar passages to existing P+ passages. As these new passages will be highly similar but not matches to our query Q, our model will need to learn how to distinguish them from genuine matches P+. We refer to these non-matches as negative passages and are written as P-.
10. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  The (query, passage) pairs we have now are assumed to be positively similar, written as (Q, P+) where the query is Q, and the positive passage is P+.
11. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  Query generation is not perfect. It can generate noisy, sometimes nonsensical queries. And this is where GPL improved upon GenQ. GenQ relies heavily on these synthetic queries being high-quality with little noise. With GPL, this is not the case as the final cross-encoder step labels the similarity of pairs. Meaning dissimilar pairs are likely to be labeled as such. GenQ does not have any such labeling step.
12. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  GPL is perfect for scenarios where we have no labeled data. However, it does require a large amount of unstructured text. That could be text data scraped from web pages, PDF documents, etc. The only requirement is that this text data is in-domain, meaning it is relevant to our particular use case.
13. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  Each of these steps requires the use of a pre-existing model fine-tuned for each task. The team that introduced GPL also provided models that handle each task. We will discuss these models as we introduce each step and note alternative models where relevant.
14. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  Pseudo labeling, using a cross-encoder model to assign similarity scores to pairs.
15. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  Negative mining, retrieving similar passages that do not match (negatives).
16. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  Query generation, creating queries from passages.
17. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  At a high level, GPL consists of three data preparation steps and one fine-tuning step.
18. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  As you may have guessed, the same applies to the first scenario of fine-tuning a pretrained model. It can be hard to find relevant, labeled data. With GPL we don’t need to. Unstructured text is all you need.
19. anjadjATdiplomacy.edu 21 Apr 2022
  
  in Public
  
  GPL hopes to solve this problem by allowing us to take existing models and adapt them to new domains using nothing more than unlabeled data. By using unlabeled data we greatly enhance the ease of finding relevant data, all we need is unstructured text.
Visit annotations in context

Annotators

anjadjATdiplomacy.edu

URL

pinecone.io/learn/gpl/
Mar 2022
www.deepset.ai www.deepset.ai

Build Smart Conversational Agents with the Latest Chatbot and Question Answering Technology

4
1. anjadjATdiplomacy.edu 21 Mar 2022
  
  in Public
  
  fallback? how?
2. anjadjATdiplomacy.edu 21 Mar 2022
  
  in Public
  
  Haystack can also be useful for fallback situations. In cases where the chatbot cannot easily classify the user's utterance into any of its predefined intents, Haystack can be called to help respond to the utterance which the chatbot would otherwise not know how to deal with.
3. anjadjATdiplomacy.edu 18 Mar 2022
  
  in Public
  
  either an information seeking intent from a user or a fallback intent, perform question answering on a large scale database of documents and then compose a well informed answer. Of course, we are going to keep it open source. That's why we'll be using Haystack and Rasa.
4. anjadjATdiplomacy.edu 18 Mar 2022
  
  in Public
  
  It's hard to anticipate all possible "intents" a future user might have.
Visit annotations in context

Annotators

anjadjATdiplomacy.edu

URL

deepset.ai/blog/build-smart-conversational-agents-with-chatbots-qa
Feb 2022
learning.rasa.com learning.rasa.com

Introduction to Rasa – Rasa Learning Center

8
1. anjadjATdiplomacy.edu 28 Feb 2022
  
  in Public
  
  Stories represent training data to teach your assistant what it should do next.
2. anjadjATdiplomacy.edu 28 Feb 2022
  
  in Public
  
  customer support logs, assuming data collection & re-use is covered in your privacy policy, or user conversations with your assistant.
3. anjadjATdiplomacy.edu 28 Feb 2022
  
  in Public
  
  user generated text as well as conversational patterns.
4. anjadjATdiplomacy.edu 28 Feb 2022
  
  in Public
  
  domain.yml is the configuration file of everything that your assistant "knows". It contains:
5. anjadjATdiplomacy.edu 28 Feb 2022
  
  in Public
  
  The data folder contains data that your assistant will learn from.
6. anjadjATdiplomacy.edu 28 Feb 2022
  
  in Public
  
  The config.yml file contains the configuration for your machine learning models.
7. anjadjATdiplomacy.edu 28 Feb 2022
  
  in Public
  
  The domain.yml file is the file where everything comes together.
8. anjadjATdiplomacy.edu 28 Feb 2022
  
  in Public
  
  This can be rule based, in which case we may be using a Regex or it can be based on a neural network. Rasa comes with a neural network architecture, called DIET, that sorts texts into intents and entities based on examples it's been provided.
Visit annotations in context

Annotators

anjadjATdiplomacy.edu

URL

learning.rasa.com/conversational-ai-with-rasa/training-data-rules/
learning.rasa.com learning.rasa.com

Pipelines and Policies – Rasa Learning Center

2
1. anjadjATdiplomacy.edu 25 Feb 2022
  
  in Public
  
  This playlist contains a series of videos that will help you get started with NLP. It was originally hosted on Youtube but we've since also moved it to our learning center.
  
  nesto
2. anjadjATdiplomacy.edu 25 Feb 2022
  
  in Public
  
  Here's a basic example of what a config.yml file might look like.
  
  bitno
Visit annotations in context

Annotators

anjadjATdiplomacy.edu

URL

learning.rasa.com/nlp-for-devs/

Created with Sketch. Annotators

Created with Sketch. URL

Created with Sketch. Annotators

Created with Sketch. URL

Created with Sketch. Annotators

Created with Sketch. URL

Created with Sketch. Annotators

Created with Sketch. URL

Created with Sketch. Annotators

Created with Sketch. URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL