Comments on: AvocadoDB’s Design Objectives | ArangoDB Blog 2012

By: CoDEmanX

CoDEmanX — Wed, 04 Feb 2015 18:36:00 +0000

“[data shaping] processes run transparently behind the scenes for the developer’s eye.” – so is it behind the scenes, or transparent? Seems mutual exclusive.

After reading a blog post by Jan about shapes, it looks like there is a shape created for every unique document structure – is that right? What about nested objects in documents? Do they result in new shapes too, or can shapes also be nested? Is there a performance boost for sub-sub-(…)-attribute access if it’s a single shape, or just storage space savings?

By: paul_eg_carter

paul_eg_carter — Sat, 23 Feb 2013 21:53:00 +0000

I’m really liking what you’re aiming at and will try some stress tests, when I get some spare time.
I’ve got an application with 50 million or so document items needing some sub-set perspectives for e.g. documents in-progress. I would like to use null-suppressed indexes, maintained automatically by the database, but am wondering how long the in-memory index builder would take to retrieve the values. The extra start-up time might be too much for our inpatient users.
An alternative would be graph edges, I suppose. There would be a small performance cost, but nothing too much. That cross-referencing would need to be maintained by application logic, in an action perhaps.
I’m also wondering if the MVCC on its own may be significantly more complex to write data-entry-form code than if the application had some node lock facility. The lock data could easily live in memory; after a server crash, any row-based locks must be discarded any way; so I always wondered why some databases put them there and didn’t just hold them in memory. Also multi-master clustering gets a speed boost if the locks are in memory. I concede there’s quite a lot of complexity in lock negotiation in a cluster and guess that’s why I didn’t see it in the project.

By: martin Schönert

martin Schönert — Wed, 06 Feb 2013 09:51:00 +0000

ArangoDB deals with shapes in a fully automatic way.

So if you modify existing documents and add additional fields, ArangoDB will automatically detect that they have a different shape and use that new shape.

Note that a collection may contain documents of different shapes. That is indeed the normal case. The assumption is that the number of shapes for a collection is much smaller than the number of documents in that collection (though one can of course construct pathological cases where the = = 2 to the power of ;-).

And an index will index all documents in a collection (if they have the index key attribute) independent of their shape (i.e. independent of the other attributes they may or may not contain). It follows that you do not need to rebuild an index when you modify documents in such a way that they have a new shape.

You cannot directly access the shape of a document. And so you cannot directly retrieve all documents with shape . But remember that the shape reflects the existence and types of attributes. And you can access that. So you can retrieve all documents that have attribute . In that way you can indirectly retrieve documents of a certain shape.

Hope this answers you questions.

Regards, martin

By: atacamo

atacamo — Wed, 06 Feb 2013 08:54:00 +0000

sounds great… About the Schema-free schemas with “shapes“., how does it cope with changes in shapes ? If, from now on, I have an additional field on each saved object with previously shape A, do I need to rebuild some indices, is it considered as a different shape, can I retrieve all objects with shape A and shape A’ ?