Comments for ArangoDB

Comment on C++ Memory Model: Migrating from x86 to ARM by Manuel Pöter

Manuel Pöter — Thu, 04 Jul 2024 12:29:32 +0000

In reply to thomas.

Yes, all modifications to a particular atomic object occur in some particular total order, called the modification order. So relaxed read-modify-write operations are still totally ordered with respect to the modified object. Unfortunately that part had to be omitted to keep the article short enough, but you can find more information in the referenced “Memory Models for C/C++ Programmers” white paper : https://arxiv.org/abs/1803.04432
It is important to note that the memory order only orders _surrounding_ operations. So in your example with the fetch_add, it is indeed guaranteed that no two threads see the same value.

Comment on C++ Memory Model: Migrating from x86 to ARM by thomas

thomas — Mon, 01 Jul 2024 18:55:55 +0000

Please explaint this statement more clearly: “However, of course, relaxed atomic operations are still fully atomic, so a relaxed fetch_add is still guaranteed to increase monotonically without any lost or duplicate updates.”

So this means that if multiple threads execute counter.fetch_add(1, std::memory_order_relaxed) they will still execute it in some order one after the other? They cannot execute it in parrarel and see the same value at the same time?

Comment on C++ Memory Model: Migrating from x86 to ARM by Christian Convey

Christian Convey — Tue, 27 Feb 2024 17:46:09 +0000

Hey Manuel, this is an AMAZINGLY helpful primer on the topic. Thanks so much for the effort it must have taken to produce such a well-polished article.

Comment on Fixing a Memory Leak in Go: Understanding time.After by Laura Cope

Laura Cope — Fri, 31 Mar 2023 13:23:50 +0000

In reply to Todd. yes, the below line: go func() { <-timeout }() // prevent leak could also work, but it creates a separate go-routine which eventually will be finished depends on the timeout variable. When the timeout variable was high then go-routine would exist during this timeout, which is not good for a performance. It is better to close a timer when we know that it is not longer required.

Comment on Fixing a Memory Leak in Go: Understanding time.After by Todd

Todd — Thu, 16 Feb 2023 20:47:05 +0000

couldn’t you also do

“`
timeout := time.After(time.Second)
select {
  case <-timeout:
// do something after 1 second.
  case <-ctx.Done():
go func() { <-timeout }() // prevent leak
// do something when context is finished.
  }
“`

Comment on Introducing the new ArangoDB Datasource for Apache Spark by Michele Rastelli

Michele Rastelli — Thu, 21 Jul 2022 13:13:33 +0000

In reply to quanns.

You can find a working PySpark demo at: https://github.com/arangodb/arangodb-spark-datasource/tree/main/demo#pythonpyspark-demo

Comment on Introducing the new ArangoDB Datasource for Apache Spark by quanns

quanns — Mon, 20 Jun 2022 06:36:25 +0000

Hi Rasetelli,
Does this driver support PySpark. I tried to use with pyspark and it doesn’t work. I can not find any documents for the integration between arango-spark lib and pyspark.

Comment on Word Embeddings in ArangoDB by Alex Geenen

Alex Geenen — Tue, 06 Jul 2021 13:44:38 +0000

In reply to Fabio Mencoboni. Hi Fabio,

If I understand correctly, this approach is using the DistillBERT model in python to calculate embeddings for documents which are then stored in ArangoDB.

Yes that's correct!

I have seen elsewhere the use of ArangoSearch, which I think did tokenization and embedding directly in the database. Do I understand the difference between these approaches correctly?

Yes, ArangoSearch allows you to perform tokenization and full-text search directly in the database. At this point, word embeddings aren't directly supported, which is what this tutorial lets you do. ArangoSearch does support vector space models such as BM-25 and TF-IDF for scoring search results. Please see here if you want to learn more about them.

The query uses the expression below to calculate the dot-product of the query embedding to document embedding. This implies a slower single-thread approach, though if ArangoDB is calculating this value for multiple documents concurrently under the hood it would still get the benefit of multi-core processors. Any thoughts/comments on performance?

Great question! The answer is that it depends. If you're querying a single server, it will use a sequential scan (so a single thread). If you're querying a collection on a cluster, and the collection is sharded across different servers, then there will be concurrency at a database server level, but within those server processes it will also be scanned sequentially.

Comment on Word Embeddings in ArangoDB by Fabio Mencoboni

Fabio Mencoboni — Fri, 02 Jul 2021 12:24:49 +0000

Very cool tutorial- thanks for sharing. I am really excited about using ArangoDB with Semantic queries, and this is a great overview. A couple questions:
* If I understand correctly, this approach is using the DistillBERT model in python to calculate embeddings for documents which are then stored in ArangoDB.
* I have seen elsewhere the use of ArangoSearch, which I think did tokenization and embedding directly in the database. Do I understand the difference between these approaches correctly?
* The query uses the expression below to calculate the dot-product of the query embedding to document embedding. This implies a slower single-thread approach, though if ArangoDB is calculating this value for multiple documents concurrently under the hood it would still get the benefit of multi-core processors. Any thoughts/comments on performance?
LET numerator = (SUM(
FOR i in RANGE(0,767)
RETURN TO_NUMBER(NTH(descr_emb, i)) * TO_NUMBER(NTH(v.word_emb, i))
))

Comment on Introducing Developer Deployments on ArangoDB ArangoGraph by Ewout Prangsma

Ewout Prangsma — Wed, 23 Jun 2021 09:10:41 +0000

In reply to Rishav Sharan.

hi Rishav,

Thanks for your comment.
We’ve chosen to offer a Free-to-try deployment that has enough resources to let you really try out all of the features of ArangoDB and Oasis.
That is free for 14 days.

The Developer deployments are aimed at individual developers.
If you need a fully free option, you can run the ArangoDB database in a docker container on your laptop or any small VPS.

Ewout