Skip to content

Commit

Permalink
Update-multimodal-semantic-search-blog (#370)
Browse files Browse the repository at this point in the history
  • Loading branch information
sachitolani authored Feb 7, 2025
1 parent ac5b3f2 commit 092dfc5
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 46 deletions.
45 changes: 1 addition & 44 deletions blog/en/i-built-a-deep-research-with-open-source-so-can-you.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,10 +82,8 @@ Research starts by defining a question. We take the question to be the user’s

Here is an example reasoning trace as DeepSeek refines the question “How has the cast changed over time?”:

```
```bash
<think>
Alright, so I need to break down the question "How has the cast changed over time?" related to the evolution of The Simpsons. Let me think about what aspects are involved here.

First, I know that the cast has changed a lot, but I need to figure out the intermediate questions to approach this. The main question is about changes in the cast over time, so I should consider different areas that contribute to this change.
Expand All @@ -97,110 +95,69 @@ Also, some original voice actors have left, so I should include a sub-question a
The show has been popular for a long time, so recurring roles changing might be another aspect. Additionally, the role of the show in society might have influenced casting choices, so a sub-question about that.

Lastly, the overall impact on the cast's careers could be another angle. So, I should list these as sub-questions to cover all aspects.
</think>
```

### Search
Next, we conduct a “literature review” of Wikipedia articles. For now, we read a single article and leave navigating links to a future iteration. We discovered during prototyping that link exploration can become very expensive if each link requires a call to the reasoning model. We parse the article, and store its data in our vector database, Milvus, akin to taking notes.
Here is a code snippet showing how we store our Wikipedia page in Milvus using its LangChain integration:
```python

wiki_wiki = wikipediaapi.Wikipedia(user_agent='MilvusDeepResearchBot (<insert your email>)', language='en')

page_py = wiki_wiki.page(page_title)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)

docs = text_splitter.create_documents([page_py.text])
vectorstore = Milvus.from_documents( # or Zilliz.from_documents

documents=docs,

embedding=embeddings,

connection_args={

"uri": "./milvus_demo.db",

},

drop_old=True,

index_params={

"metric_type": "COSINE",

"index_type": "FLAT",

"params": {},

},

)

```

### Analyze
The agent returns to its questions and answers them based on the relevant information in the document. We will leave a multi-step analysis/reflection workflow for future work, as well as any critical thinking on the credibility and bias of our sources.
Here is a code snippet illustrating constructing a RAG with LangChain and answering our subquestions separately.
```python

# Define the RAG chain for response generation

rag_chain = (

{"context": retriever | format_docs, "question": RunnablePassthrough()}

| prompt

| llm

| StrOutputParser()

)
# Prompt the RAG for each question

answers = {}

total = len(leaves(breakdown))
pbar = tqdm(total=total)

for k, v in breakdown.items():

if v == []:

print(k)

answers[k] = rag_chain.invoke(k).split('</think>')[-1].strip()

pbar.update(1)

else:

for q in v:

print(q)

answers[q] = rag_chain.invoke(q).split('</think>')[-1].strip()

pbar.update(1)

```

### Synthesize
After the agent has performed its research, it creates a structured outline, or rather, a skeleton, of its findings to summarize in a report. It then completes each section, filling it in with a section title and the corresponding content. We leave a more sophisticated workflow with reflection, reordering, and rewriting for a future iteration. This part of the agent involves planning, tool usage, and memory.
Expand Down
3 changes: 1 addition & 2 deletions blog/en/multimodal-semantic-search-with-images-and-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,8 @@ canonicalUrl: https://milvus.io/blog/multimodal-semantic-search-with-images-and-



![](https://assets.zilliz.com/Multimodal_Semantic_Search_with_Images_and_Text_1_3da9b83015.png)

<iframe width="560" height="315" src="https://www.youtube.com/embed/bxE0_QYX_sU?si=PkOHFcZto-rda1Fv" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
<iframe width=“100%” height="315" src="https://www.youtube.com/embed/bxE0_QYX_sU?si=PkOHFcZto-rda1Fv" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

As humans, we interpret the world through our senses. We hear sounds, we see images, video, and text, often layered on top of each other. We understand the world through these multiple modalities and the relationship between them. For artificial intelligence to truly match or exceed human capabilities, it must develop this same ability to understand the world through multiple lenses simultaneously.

Expand Down

0 comments on commit 092dfc5

Please sign in to comment.