aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authortmartins <thigm85@gmail.com>2020-08-24 10:12:50 +0200
committertmartins <thigm85@gmail.com>2020-08-24 10:12:50 +0200
commit2da007683c0a06380fb859f2ba124703d916f0fe (patch)
tree157cb2a35bd8e686e51161316e564c4d90aa9b37
parent24dadde24919aecb6ac8bab94d14f559d48bd9af (diff)
Add more text to connect to vespa tutorial
-rw-r--r--python/vespa/notebooks/connect-to-vespa-instance.ipynb751
1 files changed, 746 insertions, 5 deletions
diff --git a/python/vespa/notebooks/connect-to-vespa-instance.ipynb b/python/vespa/notebooks/connect-to-vespa-instance.ipynb
index 0b809ca2d8c..6dfecd0c099 100644
--- a/python/vespa/notebooks/connect-to-vespa-instance.ipynb
+++ b/python/vespa/notebooks/connect-to-vespa-instance.ipynb
@@ -15,10 +15,31 @@
"cell_type": "markdown",
"metadata": {},
"source": [
+ "`pyvespa` provides a python API to [vespa.ai](vespa.ai). It allow us to create, modify, deploy and interact with running Vespa instances. The main goal of the library is to allow for faster prototyping and ML experimentation. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This tutorial will show you how to connect to a pre-existing Vespa instance. We will use the https://cord19.vespa.ai/ app as an example. You can run this tutorial yourself in Google Colab by clicking on the badge located at the top of the tutorial."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
"## Install"
]
},
{
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The library is available at PyPI and therefore can be installed with `pip`."
+ ]
+ },
+ {
"cell_type": "code",
"execution_count": null,
"metadata": {},
@@ -37,6 +58,13 @@
]
},
{
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can connect to a running Vespa instance by created an instance of `Vespa` with the appropriate url. The resulting `app` will then be used to communicate with the application."
+ ]
+ },
+ {
"cell_type": "code",
"execution_count": null,
"metadata": {},
@@ -57,12 +85,26 @@
]
},
{
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "When building a search application, we usually want to expirement with different query models. A `Query` model consists of a match phase and a ranking phase. The matching phase will define how to match documents based on the query sent and the ranking phase will define how to rank the matched documents. Both phases can get quite complex and being able to easily express and experiment with them is very valuable."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In the example below we define the match phase to be the `Union` of the `WeakAnd` and the `ANN` operators. The `WeakAnd` will match documents based on query terms while the Approximate Nearest Neighbor (`ANN`) operator will match documents based on the distance between the query and document embeddings. This is an illustration of how easy it is to combine term and semantic matching in Vespa. "
+ ]
+ },
+ {
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
- "from vespa.query import Query, Union, WeakAnd, ANN, RankProfile\n",
+ "from vespa.query import Union, WeakAnd, ANN\n",
"from random import random\n",
"\n",
"match_phase = Union(\n",
@@ -74,7 +116,23 @@
" hits = 10,\n",
" label=\"title\"\n",
" )\n",
- ")\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We then define the the ranking to be done by the `bm25` rank-profile that is already defined in the application package. We set `list_features=True` to be able to collect ranking-features later in this tutorial. After defining the `match_phase` and the `rank_profile` we can instantiate the `Query` model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from vespa.query import Query, RankProfile\n",
"\n",
"rank_profile = RankProfile(name=\"bm25\", list_features=True)\n",
"\n",
@@ -91,6 +149,13 @@
]
},
{
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can use the `query_model` that we just defined to issue queries to the application via the `query` method."
+ ]
+ },
+ {
"cell_type": "code",
"execution_count": null,
"metadata": {},
@@ -103,10 +168,28 @@
]
},
{
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can see the number of documents that were retrieved by Vespa:"
+ ]
+ },
+ {
"cell_type": "code",
"execution_count": null,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "965"
+ ]
+ },
+ "execution_count": null,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"query_result.number_documents_retrieved"
]
@@ -115,12 +198,53 @@
"cell_type": "markdown",
"metadata": {},
"source": [
+ "And the number of documents that were returned to us:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "10"
+ ]
+ },
+ "execution_count": null,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(query_result.hits)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
"## Labelled data\n",
"\n",
"> How to structure labelled data"
]
},
{
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We often need to either evaluate query models or to collect data to improve query models through ML. In both cases we usually need labelled data. Lets create some labelled data to illustrate their expected format and their usage in the library."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Each data point contains a `query_id`, a `query` and `relevant_docs` associated with the query."
+ ]
+ },
+ {
"cell_type": "code",
"execution_count": null,
"metadata": {},
@@ -157,10 +281,560 @@
]
},
{
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can colect training data with the `collect_training_data` method according to a specific `query_model`. Below we will collect two documents for each query in addition to the relevant ones."
+ ]
+ },
+ {
"cell_type": "code",
"execution_count": null,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "<div>\n",
+ "<style scoped>\n",
+ " .dataframe tbody tr th:only-of-type {\n",
+ " vertical-align: middle;\n",
+ " }\n",
+ "\n",
+ " .dataframe tbody tr th {\n",
+ " vertical-align: top;\n",
+ " }\n",
+ "\n",
+ " .dataframe thead th {\n",
+ " text-align: right;\n",
+ " }\n",
+ "</style>\n",
+ "<table border=\"1\" class=\"dataframe\">\n",
+ " <thead>\n",
+ " <tr style=\"text-align: right;\">\n",
+ " <th></th>\n",
+ " <th>attributeMatch(authors.first)</th>\n",
+ " <th>attributeMatch(authors.first).averageWeight</th>\n",
+ " <th>attributeMatch(authors.first).completeness</th>\n",
+ " <th>attributeMatch(authors.first).fieldCompleteness</th>\n",
+ " <th>attributeMatch(authors.first).importance</th>\n",
+ " <th>attributeMatch(authors.first).matches</th>\n",
+ " <th>attributeMatch(authors.first).maxWeight</th>\n",
+ " <th>attributeMatch(authors.first).normalizedWeight</th>\n",
+ " <th>attributeMatch(authors.first).normalizedWeightedWeight</th>\n",
+ " <th>attributeMatch(authors.first).queryCompleteness</th>\n",
+ " <th>...</th>\n",
+ " <th>textSimilarity(results).queryCoverage</th>\n",
+ " <th>textSimilarity(results).score</th>\n",
+ " <th>textSimilarity(title).fieldCoverage</th>\n",
+ " <th>textSimilarity(title).order</th>\n",
+ " <th>textSimilarity(title).proximity</th>\n",
+ " <th>textSimilarity(title).queryCoverage</th>\n",
+ " <th>textSimilarity(title).score</th>\n",
+ " <th>document_id</th>\n",
+ " <th>query_id</th>\n",
+ " <th>relevant</th>\n",
+ " </tr>\n",
+ " </thead>\n",
+ " <tbody>\n",
+ " <tr>\n",
+ " <th>0</th>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>...</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.062500</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.000000</td>\n",
+ " <td>0.142857</td>\n",
+ " <td>0.055357</td>\n",
+ " <td>0</td>\n",
+ " <td>0</td>\n",
+ " <td>1</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1</th>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>...</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>1.0</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>97200</td>\n",
+ " <td>0</td>\n",
+ " <td>0</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>2</th>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>...</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.266667</td>\n",
+ " <td>1.0</td>\n",
+ " <td>0.869792</td>\n",
+ " <td>0.571429</td>\n",
+ " <td>0.679189</td>\n",
+ " <td>69447</td>\n",
+ " <td>0</td>\n",
+ " <td>0</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>3</th>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>...</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.142857</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.437500</td>\n",
+ " <td>0.142857</td>\n",
+ " <td>0.224554</td>\n",
+ " <td>3</td>\n",
+ " <td>0</td>\n",
+ " <td>1</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>4</th>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>...</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>1.0</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>97200</td>\n",
+ " <td>0</td>\n",
+ " <td>0</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>5</th>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>...</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.266667</td>\n",
+ " <td>1.0</td>\n",
+ " <td>0.869792</td>\n",
+ " <td>0.571429</td>\n",
+ " <td>0.679189</td>\n",
+ " <td>69447</td>\n",
+ " <td>0</td>\n",
+ " <td>0</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>6</th>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>...</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.111111</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.000000</td>\n",
+ " <td>0.083333</td>\n",
+ " <td>0.047222</td>\n",
+ " <td>1</td>\n",
+ " <td>1</td>\n",
+ " <td>1</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>7</th>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>...</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>1.0</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>116256</td>\n",
+ " <td>1</td>\n",
+ " <td>0</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>8</th>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>...</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.187500</td>\n",
+ " <td>1.0</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>0.250000</td>\n",
+ " <td>0.612500</td>\n",
+ " <td>14888</td>\n",
+ " <td>1</td>\n",
+ " <td>0</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>9</th>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>...</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.083333</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.000000</td>\n",
+ " <td>0.083333</td>\n",
+ " <td>0.041667</td>\n",
+ " <td>5</td>\n",
+ " <td>1</td>\n",
+ " <td>1</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>10</th>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>...</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>1.0</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>116256</td>\n",
+ " <td>1</td>\n",
+ " <td>0</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>11</th>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>...</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0.187500</td>\n",
+ " <td>1.0</td>\n",
+ " <td>1.000000</td>\n",
+ " <td>0.250000</td>\n",
+ " <td>0.612500</td>\n",
+ " <td>14888</td>\n",
+ " <td>1</td>\n",
+ " <td>0</td>\n",
+ " </tr>\n",
+ " </tbody>\n",
+ "</table>\n",
+ "<p>12 rows × 984 columns</p>\n",
+ "</div>"
+ ],
+ "text/plain": [
+ " attributeMatch(authors.first) \\\n",
+ "0 0.0 \n",
+ "1 0.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "5 0.0 \n",
+ "6 0.0 \n",
+ "7 0.0 \n",
+ "8 0.0 \n",
+ "9 0.0 \n",
+ "10 0.0 \n",
+ "11 0.0 \n",
+ "\n",
+ " attributeMatch(authors.first).averageWeight \\\n",
+ "0 0.0 \n",
+ "1 0.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "5 0.0 \n",
+ "6 0.0 \n",
+ "7 0.0 \n",
+ "8 0.0 \n",
+ "9 0.0 \n",
+ "10 0.0 \n",
+ "11 0.0 \n",
+ "\n",
+ " attributeMatch(authors.first).completeness \\\n",
+ "0 0.0 \n",
+ "1 0.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "5 0.0 \n",
+ "6 0.0 \n",
+ "7 0.0 \n",
+ "8 0.0 \n",
+ "9 0.0 \n",
+ "10 0.0 \n",
+ "11 0.0 \n",
+ "\n",
+ " attributeMatch(authors.first).fieldCompleteness \\\n",
+ "0 0.0 \n",
+ "1 0.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "5 0.0 \n",
+ "6 0.0 \n",
+ "7 0.0 \n",
+ "8 0.0 \n",
+ "9 0.0 \n",
+ "10 0.0 \n",
+ "11 0.0 \n",
+ "\n",
+ " attributeMatch(authors.first).importance \\\n",
+ "0 0.0 \n",
+ "1 0.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "5 0.0 \n",
+ "6 0.0 \n",
+ "7 0.0 \n",
+ "8 0.0 \n",
+ "9 0.0 \n",
+ "10 0.0 \n",
+ "11 0.0 \n",
+ "\n",
+ " attributeMatch(authors.first).matches \\\n",
+ "0 0.0 \n",
+ "1 0.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "5 0.0 \n",
+ "6 0.0 \n",
+ "7 0.0 \n",
+ "8 0.0 \n",
+ "9 0.0 \n",
+ "10 0.0 \n",
+ "11 0.0 \n",
+ "\n",
+ " attributeMatch(authors.first).maxWeight \\\n",
+ "0 0.0 \n",
+ "1 0.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "5 0.0 \n",
+ "6 0.0 \n",
+ "7 0.0 \n",
+ "8 0.0 \n",
+ "9 0.0 \n",
+ "10 0.0 \n",
+ "11 0.0 \n",
+ "\n",
+ " attributeMatch(authors.first).normalizedWeight \\\n",
+ "0 0.0 \n",
+ "1 0.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "5 0.0 \n",
+ "6 0.0 \n",
+ "7 0.0 \n",
+ "8 0.0 \n",
+ "9 0.0 \n",
+ "10 0.0 \n",
+ "11 0.0 \n",
+ "\n",
+ " attributeMatch(authors.first).normalizedWeightedWeight \\\n",
+ "0 0.0 \n",
+ "1 0.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "5 0.0 \n",
+ "6 0.0 \n",
+ "7 0.0 \n",
+ "8 0.0 \n",
+ "9 0.0 \n",
+ "10 0.0 \n",
+ "11 0.0 \n",
+ "\n",
+ " attributeMatch(authors.first).queryCompleteness ... \\\n",
+ "0 0.0 ... \n",
+ "1 0.0 ... \n",
+ "2 0.0 ... \n",
+ "3 0.0 ... \n",
+ "4 0.0 ... \n",
+ "5 0.0 ... \n",
+ "6 0.0 ... \n",
+ "7 0.0 ... \n",
+ "8 0.0 ... \n",
+ "9 0.0 ... \n",
+ "10 0.0 ... \n",
+ "11 0.0 ... \n",
+ "\n",
+ " textSimilarity(results).queryCoverage textSimilarity(results).score \\\n",
+ "0 0.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 0.0 0.0 \n",
+ "5 0.0 0.0 \n",
+ "6 0.0 0.0 \n",
+ "7 0.0 0.0 \n",
+ "8 0.0 0.0 \n",
+ "9 0.0 0.0 \n",
+ "10 0.0 0.0 \n",
+ "11 0.0 0.0 \n",
+ "\n",
+ " textSimilarity(title).fieldCoverage textSimilarity(title).order \\\n",
+ "0 0.062500 0.0 \n",
+ "1 1.000000 1.0 \n",
+ "2 0.266667 1.0 \n",
+ "3 0.142857 0.0 \n",
+ "4 1.000000 1.0 \n",
+ "5 0.266667 1.0 \n",
+ "6 0.111111 0.0 \n",
+ "7 1.000000 1.0 \n",
+ "8 0.187500 1.0 \n",
+ "9 0.083333 0.0 \n",
+ "10 1.000000 1.0 \n",
+ "11 0.187500 1.0 \n",
+ "\n",
+ " textSimilarity(title).proximity textSimilarity(title).queryCoverage \\\n",
+ "0 0.000000 0.142857 \n",
+ "1 1.000000 1.000000 \n",
+ "2 0.869792 0.571429 \n",
+ "3 0.437500 0.142857 \n",
+ "4 1.000000 1.000000 \n",
+ "5 0.869792 0.571429 \n",
+ "6 0.000000 0.083333 \n",
+ "7 1.000000 1.000000 \n",
+ "8 1.000000 0.250000 \n",
+ "9 0.000000 0.083333 \n",
+ "10 1.000000 1.000000 \n",
+ "11 1.000000 0.250000 \n",
+ "\n",
+ " textSimilarity(title).score document_id query_id relevant \n",
+ "0 0.055357 0 0 1 \n",
+ "1 1.000000 97200 0 0 \n",
+ "2 0.679189 69447 0 0 \n",
+ "3 0.224554 3 0 1 \n",
+ "4 1.000000 97200 0 0 \n",
+ "5 0.679189 69447 0 0 \n",
+ "6 0.047222 1 1 1 \n",
+ "7 1.000000 116256 1 0 \n",
+ "8 0.612500 14888 1 0 \n",
+ "9 0.041667 5 1 1 \n",
+ "10 1.000000 116256 1 0 \n",
+ "11 0.612500 14888 1 0 \n",
+ "\n",
+ "[12 rows x 984 columns]"
+ ]
+ },
+ "execution_count": null,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"training_data_batch = app.collect_training_data(\n",
" labelled_data = labelled_data,\n",
@@ -212,7 +886,74 @@
"cell_type": "code",
"execution_count": null,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "<div>\n",
+ "<style scoped>\n",
+ " .dataframe tbody tr th:only-of-type {\n",
+ " vertical-align: middle;\n",
+ " }\n",
+ "\n",
+ " .dataframe tbody tr th {\n",
+ " vertical-align: top;\n",
+ " }\n",
+ "\n",
+ " .dataframe thead th {\n",
+ " text-align: right;\n",
+ " }\n",
+ "</style>\n",
+ "<table border=\"1\" class=\"dataframe\">\n",
+ " <thead>\n",
+ " <tr style=\"text-align: right;\">\n",
+ " <th></th>\n",
+ " <th>query_id</th>\n",
+ " <th>match_ratio_retrieved_docs</th>\n",
+ " <th>match_ratio_docs_available</th>\n",
+ " <th>match_ratio_value</th>\n",
+ " <th>recall_10_value</th>\n",
+ " <th>reciprocal_rank_10_value</th>\n",
+ " </tr>\n",
+ " </thead>\n",
+ " <tbody>\n",
+ " <tr>\n",
+ " <th>0</th>\n",
+ " <td>0</td>\n",
+ " <td>1033</td>\n",
+ " <td>127518</td>\n",
+ " <td>0.008101</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1</th>\n",
+ " <td>1</td>\n",
+ " <td>928</td>\n",
+ " <td>127518</td>\n",
+ " <td>0.007277</td>\n",
+ " <td>0.0</td>\n",
+ " <td>0</td>\n",
+ " </tr>\n",
+ " </tbody>\n",
+ "</table>\n",
+ "</div>"
+ ],
+ "text/plain": [
+ " query_id match_ratio_retrieved_docs match_ratio_docs_available \\\n",
+ "0 0 1033 127518 \n",
+ "1 1 928 127518 \n",
+ "\n",
+ " match_ratio_value recall_10_value reciprocal_rank_10_value \n",
+ "0 0.008101 0.0 0 \n",
+ "1 0.007277 0.0 0 "
+ ]
+ },
+ "execution_count": null,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"evaluation = app.evaluate(\n",
" labelled_data = labelled_data,\n",