MillVus DB Introduction

Vector numbers can represent complex objects such as words, images, videos and audio generated by an ML model.
This high-dimensional vector data, containing multiple features, is essential to machine learning, natural language processing (NLP) and other AI tasks.
Some example uses of vector data include:

Text: Chatbots need to understand natural language. They do this by relying on vectors that represent words, paragraphs and entire documents.
Images: Image pixels can be described by numerical data and combined to make up a high-dimensional vector for that image.
Speech or audio: Like images, sound waves can also be broken into numerical data and represented as vectors, enabling AI applications such as voice recognition.

Milvus was created in 2019 with a singular goal: store, index, and manage massive embedding vectors generated by deep neural networks and other machine learning (ML) models.

As a database specifically designed to handle queries over input vectors, it is capable of indexing vectors on a trillion scale.
Unlike existing relational databases which mainly deal with structured data following a pre-defined pattern, Milvus is designed from the bottom-up to handle embedding vectors converted from unstructured data.

As the Internet grew and evolved, unstructured data became increasingly common, including emails, papers, IoT sensor data, Facebook photos, protein structures, and much more.
For computers to understand and process unstructured data, these are converted into vectors using embedding techniques.
Milvus stores and indexes these vectors. Milvus can analyze the correlation between two vectors by calculating their similarity distance.
If the two embedding vectors are very similar, the original data sources are also similar.

Vector databases are a popular way to power enterprise AI-based applications because they can deliver many benefits:

Vector databases use various indexing techniques to enable faster searching.
Vector indexing and distance-calculating algorithms such as nearest neighbour search can help optimize performance when searching for relevant results across large datasets with millions, if not billions, of data points.
One consideration is that vector databases provide approximate results.
Applications requiring greater accuracy might need to use a different kind of database at the cost of a slower processing speed.

Vector databases can store and manage massive amounts of unstructured data by scaling horizontally with additional nodes, maintaining performance as query demands and data volumes increase.

Because they enable faster data retrieval, vector databases speed the training of foundation models.

Vector databases typically provide built-in features to easily update and insert new unstructured data.

Vector databases are built to handle the added complexity of using images, videos or other multidimensional data.
Given the multiple use cases ranging from semantic search to conversational AI applications, vector databases can be customized to meet business and AI requirements.
Organizations can start with a general-purpose model such as IBM® Granite™ series models, Meta’s Llama-2 or Google’s Flan models, and then provide their own data in a vector database to enhance the output of the models and AI applications.

High performance when conducting vector searches on massive datasets.
A developer-first community that offers multi-language support and a toolchain.
Cloud scalability and high reliability even in the event of a disruption.
Hybrid search is achieved by pairing scalar filtering with vector similarity search.

Image similarity search: Images made searchable and instantaneously return the most similar images from a massive database.
Video similarity search: By converting key frames into vectors and then feeding the results into Milvus, billions of videos can be searched and recommended in near real-time.
Audio similarity search: Quickly query massive volumes of audio data such as speech, music, sound effects, and surface similar sounds.
Recommender system: Recommend information or products based on user behaviours and needs.
Question answering system: Interactive digital QA chatbot that automatically answers user questions.
DNA sequence classification: Accurately sort out the classification of a gene in milliseconds by comparing similar DNA sequences.
Text search engine: Help users find the information they are looking for by comparing keywords against a database of texts.

Praudyog