Update-multimodal-semantic-search-blog (#370)

milvus-io · Feb 7, 2025 · 092dfc5 · 092dfc5
1 parent ac5b3f2
commit 092dfc5
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 46 deletions.
diff --git a/blog/en/i-built-a-deep-research-with-open-source-so-can-you.md b/blog/en/i-built-a-deep-research-with-open-source-so-can-you.md
@@ -82,10 +82,8 @@ Research starts by defining a question. We take the question to be the user’s
 
 Here is an example reasoning trace as DeepSeek refines the question “How has the cast changed over time?”:
 
-```
-
+```bash
 <think>
-
 Alright, so I need to break down the question "How has the cast changed over time?" related to the evolution of The Simpsons. Let me think about what aspects are involved here.
 
 First, I know that the cast has changed a lot, but I need to figure out the intermediate questions to approach this. The main question is about changes in the cast over time, so I should consider different areas that contribute to this change.
@@ -97,110 +95,69 @@ Also, some original voice actors have left, so I should include a sub-question a
 The show has been popular for a long time, so recurring roles changing might be another aspect. Additionally, the role of the show in society might have influenced casting choices, so a sub-question about that.
 
 Lastly, the overall impact on the cast's careers could be another angle. So, I should list these as sub-questions to cover all aspects.
-
 </think>
-
 ```
 
-
 ### Search
 
 Next, we conduct a “literature review” of Wikipedia articles. For now, we read a single article and leave navigating links to a future iteration. We discovered during prototyping that link exploration can become very expensive if each link requires a call to the reasoning model. We parse the article, and store its data in our vector database, Milvus, akin to taking notes.
 
 Here is a code snippet showing how we store our Wikipedia page in Milvus using its LangChain integration:
 
 ```python
-
 wiki_wiki = wikipediaapi.Wikipedia(user_agent='MilvusDeepResearchBot (<insert your email>)', language='en')
-
 page_py = wiki_wiki.page(page_title)
 
 text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
-
 docs = text_splitter.create_documents([page_py.text])
 
 vectorstore = Milvus.from_documents(  # or Zilliz.from_documents
-
     documents=docs,
-
     embedding=embeddings,
-
     connection_args={
-
         "uri": "./milvus_demo.db",
-
     },
-
     drop_old=True, 
-
     index_params={
-
         "metric_type": "COSINE",
-
         "index_type": "FLAT",  
-
         "params": {},
-
     },
-
 )
-
 ```
 
-
 ### Analyze
 
 The agent returns to its questions and answers them based on the relevant information in the document. We will leave a multi-step analysis/reflection workflow for future work, as well as any critical thinking on the credibility and bias of our sources.
 
 Here is a code snippet illustrating constructing a RAG with LangChain and answering our subquestions separately.
 
 ```python
-
 # Define the RAG chain for response generation
-
 rag_chain = (
-
     {"context": retriever | format_docs, "question": RunnablePassthrough()}
-
     | prompt
-
     | llm
-
     | StrOutputParser()
-
 )
 
 # Prompt the RAG for each question
-
 answers = {}
-
 total = len(leaves(breakdown))
 
 pbar = tqdm(total=total)
-
 for k, v in breakdown.items():
-
     if v == []:
-
         print(k)
-
         answers[k] = rag_chain.invoke(k).split('</think>')[-1].strip()
-
         pbar.update(1)
-
     else:
-
         for q in v:
-
             print(q)
-
             answers[q] = rag_chain.invoke(q).split('</think>')[-1].strip()
-
             pbar.update(1)
-
 ```
 
-
 ### Synthesize
 
 After the agent has performed its research, it creates a structured outline, or rather, a skeleton, of its findings to summarize in a report. It then completes each section, filling it in with a section title and the corresponding content. We leave a more sophisticated workflow with reflection, reordering, and rewriting for a future iteration. This part of the agent involves planning, tool usage, and memory.

diff --git a/blog/en/multimodal-semantic-search-with-images-and-text.md b/blog/en/multimodal-semantic-search-with-images-and-text.md
@@ -14,9 +14,8 @@ canonicalUrl: https://milvus.io/blog/multimodal-semantic-search-with-images-and-
 
 
 
-![](https://assets.zilliz.com/Multimodal_Semantic_Search_with_Images_and_Text_1_3da9b83015.png)
 
-<iframe width="560" height="315" src="https://www.youtube.com/embed/bxE0_QYX_sU?si=PkOHFcZto-rda1Fv" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
+<iframe width=“100%” height="315" src="https://www.youtube.com/embed/bxE0_QYX_sU?si=PkOHFcZto-rda1Fv" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
 
 As humans, we interpret the world through our senses. We hear sounds, we see images, video, and text, often layered on top of each other. We understand the world through these multiple modalities and the relationship between them. For artificial intelligence to truly match or exceed human capabilities, it must develop this same ability to understand the world through multiple lenses simultaneously.