update vespa cloud notebook

author: tmartins <thigm85@gmail.com> 2020-09-03 09:25:31 +0200
committer: tmartins <thigm85@gmail.com> 2020-09-03 09:25:31 +0200
commit: d793edfa5e28bab79b43c9888afb411ab8c46e77 (patch)
tree: 9cacf3d43944c7614ab090facaa8522cff013a79 /python
parent: d5a061ad42d7b2755864f33d9caee28074740d8b (diff)
1 files changed, 467 insertions, 165 deletions
diff --git a/python/vespa/docs/sphinx/source/create-and-deploy-vespa-cloud.ipynb b/python/vespa/docs/sphinx/source/create-and-deploy-vespa-cloud.ipynb
index 3c0f4a4201d..a5350a8f227 100644
--- a/python/vespa/docs/sphinx/source/create-and-deploy-vespa-cloud.ipynb
+++ b/python/vespa/docs/sphinx/source/create-and-deploy-vespa-cloud.ipynb
@@ -1,37 +1,21 @@
 {
  "cells": [
   {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# hide\n",
-    "%load_ext autoreload\n",
-    "%autoreload 2"
-   ]
-  },
-  {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Build end-to-end Vespa apps with pyvespa\n",
+    "# Build end-to-end Vespa apps and deploy to Vespa Cloud\n",
     "\n",
-    "> Python API to create, modify, deploy and interact with Vespa applications"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "`pyvespa` provides a python API to [vespa.ai](vespa.ai). It allow us to create, modify, deploy and interact with running Vespa instances. The main goal of the library is to allow for faster prototyping and ML experimentation. "
+    "> Python API to create, modify, deploy and interact with Vespa applications\n",
+    "\n",
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vespa-engine/vespa/blob/tgm/reference-doc/python/vespa/docs/sphinx/source/create-and-deploy-vespa-cloud.ipynb)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This tutorial will create a text search application from scratch based on the MS MARCO dataset, similar to our [text search tutorials](https://docs.vespa.ai/documentation/tutorials/text-search.html). We will first show how to define the app by creating an application package [REF]. Then we locally deploy the app in a Docker container. Once the app is up and running we show how to feed data to it. After the data is sent, we can make queries and inspect the results. We then show how to add a new rank profile to the application package and to redeploy the app with the latest changes. We proceed to show how to evaluate and compare two rank profiles with evaluation metrics such as Recall and Reciprocal Rank."
+    "This self-contained tutorial will create a simplified text search application from scratch based on the MS MARCO dataset, similar to our [text search tutorials](https://docs.vespa.ai/documentation/tutorials/text-search.html). We will then deploy the app to [Vespa Cloud](https://cloud.vespa.ai/) and interact with it by feeding data, querying and evaluating different query models."
    ]
   },
   {
@@ -50,7 +34,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -74,7 +58,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -97,7 +81,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -124,86 +108,44 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This tutorial shows how to deploy the application package to [Vespa Cloud](https://cloud.vespa.ai/). For the following to work you need to sign-up to Vespa Cloud, register an application name there and generate your user API key on the Vespa Cloud console."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We first create a `VespaCloud` context named `cloud` that will handle the secure communication with Vespa Cloud servers. In order to do that, all we need is your Vespa Cloud tenant name, the application name that you registered and the user key you generated on the Vespa Cloud console:"
+    "To be able to deploy to [Vespa Cloud](https://cloud.vespa.ai/), you need to sign-up, register an application name on the Vespa Cloud console and generate your user API key."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Note:** It takes around 15 min to call `cloud.deploy` for the first time, as Vespa Cloud will have the setup the environment. Subsequent calls will be much faster, usually taking less than 10 seconds."
+    "We first create a `VespaCloud` instance that will handle the secure communication with Vespa Cloud servers. In order to do that, all we need is your Vespa Cloud tenant name, the application name that you registered, the user key you generated on the Vespa Cloud console and the application package that we created above."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [],
    "source": [
     "from vespa.package import VespaCloud\n",
     "\n",
-    "with VespaCloud(\"vespa-team\", \"ms-marco\", \"/Users/tmartins/sample_application/tmartins.vespa-team.pem\") as cloud:\n",
-    "    vespa = cloud.deploy('from-notebook', app_package)"
+    "vespa_cloud = VespaCloud(\n",
+    "    tenant=\"vespa-team\", \n",
+    "    application=\"ms-marco\", \n",
+    "    key_location=\"/Users/username/sample_application/username.vespa-team.pem\", \n",
+    "    application_package=app_package\n",
+    ")"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 5,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Deployment started in run 12 of dev-aws-us-east-1c for vespa-team.ms-marco.from-notebook. This may take about 15 minutes the first time.\n",
-      "INFO    [10:37:04]  Deploying platform version 7.278.21 and application version unknown ...\n",
-      "INFO    [10:37:05]  No services requiring restart.\n",
-      "INFO    [10:37:05]  Deployment successful.\n",
-      "INFO    [10:37:05]  Session 13751 for tenant 'vespa-team' prepared and activated.\n",
-      "INFO    [10:37:06]  ######## Details for all nodes ########\n",
-      "INFO    [10:37:06]  h711a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
-      "INFO    [10:37:06]  --- platform docker.ouroath.com:4443/vespa/centos-tenant:7.278.21\n",
-      "INFO    [10:37:06]  --- container on port 4080 has not started \n",
-      "INFO    [10:37:06]  h712a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
-      "INFO    [10:37:06]  --- platform docker.ouroath.com:4443/vespa/centos-tenant:7.278.21\n",
-      "INFO    [10:37:06]  --- logserver-container on port 4080 has config generation 13751, wanted is 13751\n",
-      "INFO    [10:37:06]  h713a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
-      "INFO    [10:37:06]  --- platform docker.ouroath.com:4443/vespa/centos-tenant:7.278.21\n",
-      "INFO    [10:37:06]  --- container-clustercontroller on port 19050 has config generation 13751, wanted is 13751\n",
-      "INFO    [10:37:06]  --- storagenode on port 19102 has config generation 13750, wanted is 13751\n",
-      "INFO    [10:37:06]  --- searchnode on port 19107 has config generation 13751, wanted is 13751\n",
-      "INFO    [10:37:06]  --- distributor on port 19111 has config generation 13751, wanted is 13751\n",
-      "INFO    [10:37:30]  Found endpoints:\n",
-      "INFO    [10:37:30]  - dev.aws-us-east-1c\n",
-      "INFO    [10:37:30]   |-- https://msmarco-container.from-notebook.ms-marco.vespa-team.aws-us-east-1c.dev.public.vespa.oath.cloud/ (cluster 'msmarco_container')\n",
-      "INFO    [10:37:31]  Installation succeeded!\n"
-     ]
-    }
-   ],
    "source": [
-    "from vespa.package import VespaCloud\n",
-    "\n",
-    "vespa_cloud = VespaCloud(\n",
-    "    \"vespa-team\", \n",
-    "    \"ms-marco\", \n",
-    "    \"/Users/tmartins/sample_application/tmartins.vespa-team.pem\", \n",
-    "    app_package\n",
-    ")\n",
-    "app = vespa_cloud.deploy('from-notebook', \"/Users/tmartins/sample_application\")"
+    "We then deploy the application to a particular instance (named `from-notebook` in this case) and specify a folder location necessary to store required files such as certificates to allow for secure data exchange between the client and the VespaCloud servers."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The app variable above will hold a `Vespa` instance that will be used to connect and interact with our text search application. We can see the deployment message returned by the Vespa engine:"
+    "**Note:** It takes around 15 min to call `cloud.deploy` for the first time, as Vespa Cloud will have the setup the environment. Subsequent calls will be much faster, usually taking less than 10 seconds."
    ]
   },
   {
@@ -212,16 +154,17 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "app.__class__"
+    "app = vespa_cloud.deploy(\n",
+    "    instance='from-notebook', \n",
+    "    disk_folder=\"/Users/username/sample_application\"\n",
+    ")"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "app.deployment_message"
+    "The `app` variable above will hold a `Vespa` instance that will be used to connect and interact with our text search application throughtout this tutorial."
    ]
   },
   {
@@ -240,9 +183,20 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 6,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(996, 3)"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "from pandas import read_csv\n",
     "\n",
@@ -252,9 +206,67 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 7,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>id</th>\n",
+       "      <th>title</th>\n",
+       "      <th>body</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>D2185715</td>\n",
+       "      <td>What Is an Appropriate Gift for a Bris</td>\n",
+       "      <td>Hub Pages   Religion and Philosophy   Judaism...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>D2819479</td>\n",
+       "      <td>lunge</td>\n",
+       "      <td>1lungenoun   ˈlənj  Popularity  Bottom 40  of...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "         id                                    title  \\\n",
+       "0  D2185715  What Is an Appropriate Gift for a Bris    \n",
+       "1  D2819479                                    lunge   \n",
+       "\n",
+       "                                                body  \n",
+       "0   Hub Pages   Religion and Philosophy   Judaism...  \n",
+       "1   1lungenoun   ˈlənj  Popularity  Bottom 40  of...  "
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "docs.head(2)"
    ]
@@ -263,34 +275,16 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To feed the data we need to specify the `schema` that we are sending data to. We name our schema `msmarco` in a previous section. Each data point needs to have a unique `data_id` associated with it, independent of having an id field or not. The `fields` should be a dict containing all the fields in the schema, which are `id`, `title` and `body` in our case. "
+    "To feed the data we need to specify the `schema` that we are sending data to. We named our schema `msmarco` in a previous section. Each data point needs to have a unique `data_id` associated with it, independent of having an id field or not. The `fields` should be a dict containing all the fields in the schema, which are `id`, `title` and `body` in our case. "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "app.feed_data_point(\n",
-    "        schema = \"msmarco\", \n",
-    "        data_id = \"test\", \n",
-    "        fields = {\n",
-    "            \"id\": \"test\", \n",
-    "            \"title\": \"this is a test title\", \n",
-    "            \"body\": \"this is test body\"\n",
-    "        }\n",
-    "    )"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [],
    "source": [
     "for idx, row in docs.iterrows():\n",
-    "    print(idx)\n",
     "    response = app.feed_data_point(\n",
     "        schema = \"msmarco\", \n",
     "        data_id = str(row[\"id\"]), \n",
@@ -306,31 +300,6 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Each call to the method `feed_data_point` sends a POST request to the appropriate Vespa endpoint and we can check the response of the requests if needed, such as the status code and the message returned."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "response.status_code"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "response.json()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
     "## Make a simple query"
    ]
   },
@@ -338,7 +307,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Once our application is fed we can start to use it by sending queries to it. The MS MARCO app expectes to receive questions as queries and the goal of the application is to return documents that are relevant to the questions made."
+    "Once our application is fed we can start sending queries to it. The MS MARCO app expects to receive questions as queries and the goal of the application is to return documents that are relevant to the questions made."
    ]
   },
   {
@@ -350,7 +319,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -367,15 +336,6 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "results.hits"
-   ]
-  },
-  {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -384,9 +344,20 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 10,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "2"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "len(results.hits)"
    ]
@@ -407,7 +378,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 11,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -429,7 +400,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "app = vespa_cloud.deploy('from-notebook', \"/Users/tmartins/sample_application\")"
+    "app = vespa_cloud.deploy('from-notebook', \"/Users/username/sample_application\")"
    ]
   },
   {
@@ -441,9 +412,20 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 15,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "2"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "results = app.query(\n",
     "    query=\"Where is my text?\", \n",
@@ -479,7 +461,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 16,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -499,9 +481,25 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 17,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[{'query_id': '1',\n",
+       "  'query': 'what county is aspen co',\n",
+       "  'relevant_docs': [{'id': 'D1098819'}]},\n",
+       " {'query_id': '2',\n",
+       "  'query': 'where is aeropostale located',\n",
+       "  'relevant_docs': [{'id': 'D2268823'}]}]"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "labelled_data[0:2]"
    ]
@@ -515,7 +513,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 18,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -527,7 +525,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 19,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -546,7 +544,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 20,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -564,7 +562,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 21,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -580,7 +578,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 22,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -603,9 +601,160 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 23,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>query_id</th>\n",
+       "      <th>match_ratio_retrieved_docs_default</th>\n",
+       "      <th>match_ratio_docs_available_default</th>\n",
+       "      <th>match_ratio_value_default</th>\n",
+       "      <th>recall_10_value_default</th>\n",
+       "      <th>reciprocal_rank_10_value_default</th>\n",
+       "      <th>match_ratio_retrieved_docs_bm25</th>\n",
+       "      <th>match_ratio_docs_available_bm25</th>\n",
+       "      <th>match_ratio_value_bm25</th>\n",
+       "      <th>recall_10_value_bm25</th>\n",
+       "      <th>reciprocal_rank_10_value_bm25</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1</td>\n",
+       "      <td>914</td>\n",
+       "      <td>997</td>\n",
+       "      <td>0.916750</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1.000</td>\n",
+       "      <td>914</td>\n",
+       "      <td>997</td>\n",
+       "      <td>0.916750</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1.000000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2</td>\n",
+       "      <td>896</td>\n",
+       "      <td>997</td>\n",
+       "      <td>0.898696</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>0.125</td>\n",
+       "      <td>896</td>\n",
+       "      <td>997</td>\n",
+       "      <td>0.898696</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1.000000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>3</td>\n",
+       "      <td>970</td>\n",
+       "      <td>997</td>\n",
+       "      <td>0.972919</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1.000</td>\n",
+       "      <td>970</td>\n",
+       "      <td>997</td>\n",
+       "      <td>0.972919</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1.000000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>4</td>\n",
+       "      <td>981</td>\n",
+       "      <td>997</td>\n",
+       "      <td>0.983952</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1.000</td>\n",
+       "      <td>981</td>\n",
+       "      <td>997</td>\n",
+       "      <td>0.983952</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1.000000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>5</td>\n",
+       "      <td>748</td>\n",
+       "      <td>997</td>\n",
+       "      <td>0.750251</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>0.500</td>\n",
+       "      <td>748</td>\n",
+       "      <td>997</td>\n",
+       "      <td>0.750251</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>0.333333</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "  query_id  match_ratio_retrieved_docs_default  \\\n",
+       "0        1                                 914   \n",
+       "1        2                                 896   \n",
+       "2        3                                 970   \n",
+       "3        4                                 981   \n",
+       "4        5                                 748   \n",
+       "\n",
+       "   match_ratio_docs_available_default  match_ratio_value_default  \\\n",
+       "0                                 997                   0.916750   \n",
+       "1                                 997                   0.898696   \n",
+       "2                                 997                   0.972919   \n",
+       "3                                 997                   0.983952   \n",
+       "4                                 997                   0.750251   \n",
+       "\n",
+       "   recall_10_value_default  reciprocal_rank_10_value_default  \\\n",
+       "0                      1.0                             1.000   \n",
+       "1                      1.0                             0.125   \n",
+       "2                      1.0                             1.000   \n",
+       "3                      1.0                             1.000   \n",
+       "4                      1.0                             0.500   \n",
+       "\n",
+       "   match_ratio_retrieved_docs_bm25  match_ratio_docs_available_bm25  \\\n",
+       "0                              914                              997   \n",
+       "1                              896                              997   \n",
+       "2                              970                              997   \n",
+       "3                              981                              997   \n",
+       "4                              748                              997   \n",
+       "\n",
+       "   match_ratio_value_bm25  recall_10_value_bm25  reciprocal_rank_10_value_bm25  \n",
+       "0                0.916750                   1.0                       1.000000  \n",
+       "1                0.898696                   1.0                       1.000000  \n",
+       "2                0.972919                   1.0                       1.000000  \n",
+       "3                0.983952                   1.0                       1.000000  \n",
+       "4                0.750251                   1.0                       0.333333  "
+      ]
+     },
+     "execution_count": 23,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "from pandas import merge\n",
     "\n",
@@ -627,9 +776,60 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 24,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>match_ratio_value_default</th>\n",
+       "      <th>match_ratio_value_bm25</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>mean</th>\n",
+       "      <td>0.866650</td>\n",
+       "      <td>0.866650</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>std</th>\n",
+       "      <td>0.181307</td>\n",
+       "      <td>0.181307</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "      match_ratio_value_default  match_ratio_value_bm25\n",
+       "mean                   0.866650                0.866650\n",
+       "std                    0.181307                0.181307"
+      ]
+     },
+     "execution_count": 24,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "eval_comparison[[\"match_ratio_value_default\", \"match_ratio_value_bm25\"]].describe().loc[[\"mean\", \"std\"]]"
    ]
@@ -643,9 +843,60 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 25,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>recall_10_value_default</th>\n",
+       "      <th>recall_10_value_bm25</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>mean</th>\n",
+       "      <td>0.840000</td>\n",
+       "      <td>0.960000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>std</th>\n",
+       "      <td>0.368453</td>\n",
+       "      <td>0.196946</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "      recall_10_value_default  recall_10_value_bm25\n",
+       "mean                 0.840000              0.960000\n",
+       "std                  0.368453              0.196946"
+      ]
+     },
+     "execution_count": 25,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "eval_comparison[[\"recall_10_value_default\", \"recall_10_value_bm25\"]].describe().loc[[\"mean\", \"std\"]]"
    ]
@@ -659,9 +910,60 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 26,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>reciprocal_rank_10_value_default</th>\n",
+       "      <th>reciprocal_rank_10_value_bm25</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>mean</th>\n",
+       "      <td>0.724750</td>\n",
+       "      <td>0.943333</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>std</th>\n",
+       "      <td>0.399118</td>\n",
+       "      <td>0.216103</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "      reciprocal_rank_10_value_default  reciprocal_rank_10_value_bm25\n",
+       "mean                          0.724750                       0.943333\n",
+       "std                           0.399118                       0.216103"
+      ]
+     },
+     "execution_count": 26,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "eval_comparison[[\"reciprocal_rank_10_value_default\", \"reciprocal_rank_10_value_bm25\"]].describe().loc[[\"mean\", \"std\"]]"
    ]
author	tmartins <thigm85@gmail.com>	2020-09-03 09:25:31 +0200
committer	tmartins <thigm85@gmail.com>	2020-09-03 09:25:31 +0200
commit	d793edfa5e28bab79b43c9888afb411ab8c46e77 (patch)
tree	9cacf3d43944c7614ab090facaa8522cff013a79 /python
parent	d5a061ad42d7b2755864f33d9caee28074740d8b (diff)