From 8dad79af8b18c08e270382ce1b18a3956fa59626 Mon Sep 17 00:00:00 2001
From: Ty Dunn <ty@tydunn.com>
Date: Thu, 7 Sep 2023 11:19:38 -0700
Subject: adding support for Hugging Face Inference Endpoints (#460)

* stream complete sketch

* correct structure but issues

* refactor: :art: clean up hf_inference_api.py

* fix: :bug: quick fix in hf_infrerence_api.py

* feat: :memo: update documentation code for hf_inference_api

* hf docs

* now working

---------

Co-authored-by: Nate Sesti <sestinj@gmail.com>
---
 docs/docs/customization.md | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

(limited to 'docs')

diff --git a/docs/docs/customization.md b/docs/docs/customization.md
index 5fc3eab5..fb7dc0c5 100644
--- a/docs/docs/customization.md
+++ b/docs/docs/customization.md
@@ -21,6 +21,7 @@ Open-Source Models (not local)
 
 - [TogetherLLM](#together) - Use any model from the [Together Models list](https://docs.together.ai/docs/models-inference) with your Together API key.
 - [ReplicateLLM](#replicate) - Use any open-source model from the [Replicate Streaming List](https://replicate.com/collections/streaming-language-models) with your Replicate API key.
+- [HuggingFaceInferenceAPI](#huggingface) - Use any open-source model from the [Hugging Face Inference API](https://huggingface.co/inference-api) with your Hugging Face token.
 
 ## Change the default LLM
 
@@ -206,6 +207,24 @@ config = ContinueConfig(
 
 If you don't specify the `model` parameter, it will default to `replicate/llama-2-70b-chat:58d078176e02c219e11eb4da5a02a7830a283b14cf8f94537af893ccff5ee781`.
 
+### Hugging Face
+
+Hugging Face Inference API is a great option for newly released language models. Sign up for an account and add billing [here](https://huggingface.co/settings/billing), access the Inference Endpoints [here](https://ui.endpoints.huggingface.co), click on “New endpoint”, and fill out the form (e.g. select a model like [WizardCoder-Python-34B-V1.0](https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0)), and then deploy your model by clicking “Create Endpoint”. Change `~/.continue/config.py` to look like this:
+
+```python
+from continuedev.src.continuedev.core.models import Models
+from continuedev.src.continuedev.libs.llm.hf_inference_api import HuggingFaceInferenceAPI
+
+config = ContinueConfig(
+    ...
+    models=Models(
+        default=HuggingFaceInferenceAPI(
+            endpoint_url: "<INFERENCE_API_ENDPOINT_URL>", 
+            hf_token: "<HUGGING_FACE_TOKEN>",
+    )
+)
+```
+
 ### Self-hosting an open-source model
 
 If you want to self-host on Colab, RunPod, HuggingFace, Haven, or another hosting provider you will need to wire up a new LLM class. It only needs to implement 3 primary methods: `stream_complete`, `complete`, and `stream_chat`, and you can see examples in `continuedev/src/continuedev/libs/llm`.
-- 
cgit v1.2.3-70-g09d2