Grokking LDAP

An opinionated view.

January 10, 2016

Introduction

[Originally written on December 23, 2002. This article contains minor updates.]

As I worked on one of two personal LDAP projects I wanted to do, it occurred to me that after reaching a certain level of understanding of LDAP (it was the fourth LDAP project for me), I should share some epiphanies I’ve had about it over the years. This article constitutes those little bursts into a higher reality I’ve had while getting my head around LDAP (five minutes to learn, a lifetime to master!)

NOTE: This post is not an LDAP tutorial! Familiarity with LDAP at a conceptual level at least is assumed. Some of this may seem fairly esoteric if you don’t understand LDAP at all. This essay is instead a series of notes based on clearing up misconceptions someone who has started to delve into LDAP may have. I struggled under the wrong concepts for quite a while before it started falling into place.

CAUTION: Some of what I say in here isn’t “official LDAP speak”. In other words, some of the concepts I am talking about are constructs I’ve made up in my own head that I’ve found to be useful when dealing with LDAP-based directories, and you won’t find them anywhere else (that I’ve seen).

In addition, I have tried to use certain phrases in very specific ways throughout the article.

Author’s note – an early reviewer of this piece objected to my renaming LDAP’s objectclass to “object type” and felt the intended audience could make the distinction. He is probably right.

A Hierarchy That’s Not a Hierarchy

Everyone approaches LDAP as a hierarchy. This is wrong. That statement may seem like heresy or blasphemy, given the ‘obvious’ hierarchical nature of an LDAP directory, but I stand by it. The name space of an LDAP directory is hierarchical, but that is all. And note this – the name space hierarchy is not typically used to navigate! Repeat this to yourself 4,096 times until you you get it! The name space hierarchy is not typically used to navigate!

Think about XML name spaces for a similar situation – they are URIs that uniquely name something without necessarily representing a valid URL to which you can navigate. In LDAP, the distingushedName (dn) (pseudo)attribute of an object, a node in the directory, is simply a name, even though it looks like it has more meaning or relationship to the object’s contents than it does.

Consider the following distinguished name, in LDIF syntax:

dn: cn=James Lehmer,dc=dullroar,dc=com

If this were the fully qualified distinguished name (FQDN) for an inetOrgPerson object, then commonName (cn) would be an attribute of the object (required, in fact, by the person class, one of the parent class types for inetOrgPerson), and the contents of the cn attribute could be the target of a search. The leftmost part of the FQDN (cn=James Lehmer in our case) is the only component of a FQDN that has to be an attribute in the object with that FQDN.

Note that the domainComponent (dc) attribute which seems to be implied is available from our FQDN is not only not required but it is not even defined as an attribute for the object type inetOrgPerson at all, nor its parents organizationalPerson or person, nor the ultimate parent type top! This is often the case – that some “component” of a FQDN, such as the dc “attributes” in this case (dc=dullroar,dc=com), are not actually attributes of the object being named by the FQDN. Remember, the “N” at the end of FQDN stands for “Name”, and it is only that. You cannot imply object content from it.

In some sense, it is better to think of the dn (pseudo)attribute as an opaque string, not having any inherent meaning (as Tim Berners-Lee reminds us we are supposed to think of URIs). Yes, the objects in the directory must be created in a name space hierarchy. But the hierarchy is a name space hierarchy only. You will access the objects via searches that often bypass any component of the FQDN name space entirely.

Remember, you can only search for entries based on their attributes. In the end, dn seems like just another attribute, although in most LDAP implementations you typically can’t search on it (you can use it as the starting point of a search, however), which is why I call it a (pseudo)attribute in places. So, to find the above named entry, we would have to use a LDAP search like the following, perhaps setting the starting point for the search at dc=dullroar,dc=com:

(cn=James Lehmer)

NOTE: Some objects do have dc attributes. For those objects, you could perform a search such as the following.

(&(dc=dullroar)(dc=com))

But this search would not work for the inetOrgPerson object, since it nor any of its parent object types include that attribute, and you don’t search “down” a hierarchy, you simply search for objects that themselves have specific attributes. What I mean by that is you don’t navigate by first finding the node with dc=com and then the node under it with dc=dullroar, etc. Instead, you are looking directly for the node with cn=James Lehmer.

Often, you must alter the schema and add a new object that simply has the attributes that are the parts of the FQDN, so that you can store objects that you can search for using the FQDN, by actually searching on the attributes you stored that comprise the FQDN instead. For example, if we created our own object type myInetOrgPerson, simply added the dc component as an optional component, then filled it in when we entered new entries which would be of both inetOrgPerson and myInetOrgPerson types, then we could do a search as follows:

(&(cn=James Lehmer)(dc=dullroar)(dc=com))

To summarize in one sentence: Do not count on the presence of an attribute in an object just because the FQDN seems to imply that attribute’s presence!

Everything You Know Is Wrong

Everything you know about relational, hierarchical and network database management systems is wrong. Wipe it from your head while dealing with LDAP.

A LDAP directory is a database, yes, but it doesn’t work like anything you’ve ever seen before. Navigation, per se, is almost completely missing, other than specifying the starting point in the name space for searches. A directory is relatively slow to update, but almost mind-numbingly fast to search and retrieve from (“search” is the key word in LDAP). Testing at ScienceXchange (author’s note – now defunct) using Netscape’s Directory Server showed random search and retrieve speeds of 8,000+ objects per second from a very middle-range single 800MHz/512MB server hosting a decent sized (600MB) directory over 100Mb switched Ethernet network to multiple multi-threaded clients1.

Think Venn Diagrams

Don’t think navigation or underlying organization. Think sets and set theory. Think Venn diagrams.

Once items have been added to a directory, barring update or deletion (which, remember, in the directory model is presumed to be much less common than search/retrieval), the primary mode of operation against the directory will be via searches and subsequent retrievals. For these searches, you are basically looking for some set of objects that contain some set of attributes that have (or do not have) the values you are looking for. An example would be helpful. You could have a search syntax as simple as the following.

(description=*)

If run from the base DN (the “root” of the directory name space) this could return a large number of objects of many different types, since lots of object syntaxes contain the description attribute, across object types covering organizations, people and devices. Of course, you would only see those objects that actually had this attribute filled in with a value.

NOTE: The LDAP search syntax defined in RFC 2254 is cryptic, with a sort of reverse reverse Polish notation (RRPN). I think it would be better served by a new language (as with everything, the answer is a new language!) that basically used the English syntax of set notation, such as “union”, “intersection”, “in”/“not in”, etc., rather than the highly symbolic one-off syntax in use by LDAP directory servers now. But I digress.

Some would say searching in a directory is very similar to SELECT in a relational database management system (RDBMS). Those persons would be wrong because in an RDBMS there is an implicit navigation – you (the programmer or user) SELECT against a table or set of tables with a join. You have “navigated” to the data container you want by naming table(s) in the operation, and you have defined what data type you want the content to be by selecting only certain columns from those tables.

In LDAP it’s all just in one big directory, and you search starting from a specific starting place in the name space (a starting dn, such as the base DN, or root), choosing to search against the current level only in the name space, the current level and next “down”, or the current level and all child levels “downward” in the name space. There are no “tables” or other organizational units. There are simply the name space used to set a starting point for a search and whatever attributes the objects themselves contain.

If you ran the following from the root of the directory name space and said to search the entire directory from there downward, you would get all objects of all types in the entire directory that had anything whatsoever in the description attribute.

(description=*)

This is completely different from the “rectangular” view you’d see from something like the following SQL.

SELECT description FROM my_table

Here you would be presented with presumably uniform description data attached to some specific coherent application “object” (whatever was being described in my_table).

LDAP Schemas Are Easy

LDAP schemas are easy. Once you get the hang of them. In defining LDAP schema elements, you need to understand the following points.

ASN.1 Notation

The schema definition syntax uses ASN.1 notation, and as such, you need to have a unique identifier (“object identifier” or “OID”) for everything you define. There is a private enterprise number (PEN) branch you can use to which you can append your organization’s enterprise number to define a unique “name space” for your schema additions. You can get an enterprise number surprisingly easily from IANA – if your company doesn’t have one, it is good to get if you think you are going to do any SNMP or LDAP development in the future, since both use ASN.1 syntax.

It is then up to you to manage all number assignments “under” your enterprise number. For example, given an enterprise number of 314159, you could divide attributes into category ‘1’, object types into ‘2’, and reserve ‘3’ and above for the future. Then under your first two categories, you would hand out number assignments as you added new attributes or object types to the schema.

The prefix for all private enterprise numbers is:

iso.org.dod.internet.private.enterprise (1.3.6.1.4.1)

So, given our example of an IANA assigned enterprise number of 314159, anything we created in the schema needing an object id would have a prefix of:

1.3.6.1.4.1.314159

Using the prefixing scheme we discussed above, a new attribute could be defined with an object identifier as follows:

1.3.6.1.4.1.314159.1.1

…and an object type called dossierPerson that included that attribute could be defined with an object identifier as follows:

1.3.6.1.4.1.314159.2.1

It is totally up to the organization defining the schema items to manage their own object numbering scheme! To avoid clashing within your own organization, figure some allocation schema and stick with it.

Adding New Attributes

In adding new object types, first you must define new attributes (if you need them). In defining attributes you define an attribute name (and aliases), an identifier (in ASN.1 syntax), a datatype (again, in ASN.1 syntax), whether the attribute is single or repeating, etc. Some examples follow:

attributetype ( 1.3.6.1.4.1.314159.1.1
    NAME 'birthday'
    DESC 'Birth date of person, expressed as a timestamp'
    EQUALITY generalizedTimeMatch
    SYNTAX 1.3.6.1.4.1.1466.115.121.1.24
    SINGLE-VALUE )
attributetype ( 1.3.6.1.4.1.314159.1.4
    NAME 'spouse'
    DESC 'Spouse of the person'
    EQUALITY caseIgnoreMatch
    SYNTAX 1.3.6.1.4.1.1466.115.121.1.15{255} )

Observe that the first attribute, birthday, is a single-value attribute – if a value exists for a given object with this attribute, there can be at most one (which makes sense, given the attribute). On the other hand, the spouse attribute will allow multiple entries (since in some countries polygamy is still allowed, and we want our directory application to be internationalized).

NOTE: It would not be a good idea to use this mechanism for holding ex-spouses, because attribute value order in an object is not guaranteed upon retrieval (so there would be no way to keep the spouse values in order from ex-wife v1.0 through ex-wife v2.0 to current production wife v3.0, for example). For that, it would be better to create new objects for each past or present spouse (containing perhaps marriage start and stop dates as attributes so you can order them), and then have an attribute in your object type that can contain one or more FQDN references to those entries.

Adding New Objects

In adding new object types, after defining any new attributes you may require, you then define the new object types themselves. Object types are basically a collection of attributes that an object is required to contain and attributes that an object may contain. Object types have a type of “inheritance”, but it is not an inheritance of behavior or topography (layout in the directory). It is instead simply a union of all the attributes in all the object types that a directory element comprises.

For example, the object type top has only one attribute, a required one, objectClass. All other object types “descend” from top in that all object instances must have at least one objectClass attribute value. In fact, every class instance by definition will have multiple objectClass attribute values, since top only contains objectClass, so an object instance that was just a top object wouldn’t be very interesting.

First, let’s look at top. Observe the ABSTRACT keyword, denoting an object is never meant to be created and stored as an object of just type top:

objectclass ( 2.5.6.0 NAME 'top' ABSTRACT
    MUST objectClass )

Now, let’s look at a “descendant” of top, which is person:

objectclass ( 2.5.6.6 NAME 'person' SUP top STRUCTURAL
    MUST ( sn $ cn )
    MAY ( userPassword $ telephoneNumber $ seeAlso $
        description ) )

If we had an object instance of type person, which is descended from top, we would have the following three required attributes that must exist for our object to even be stored in the directory.

  1. objectClass – from top.
  2. sn – from person, alias surname.
  3. cn – from person, alias commonName.

In addition, we know there would be at least two values for objectClass, containing the values top and person. Then we would also have the ability of optionally storing the following information in our object (defined in the person object type as optional attributes).

  • userPassword
  • telephoneNumber
  • seeAlso
  • description

So when thinking of an object instance, I try not to think of something as being “an object of type person”, but instead, as an entry containing data for types top and person. Obviously, it is easy enough to talk of an entry of type person2, but that often then obscures which object type a given attribute in an object is defined within. For example, we could say an object of type inetOrgPerson has attribute values for the objectClass attribute, but it would be more accurate to talk about the objectClass attribute coming from the top object type definition of the given object instance.

Why Add New Objects?

You typically define new objects to contain some attributes, pre-existing or that you are also newly defining, that aren’t contained in any other object class in the schema. For example, countryName (c) and co (two types of country attributes – the former holds the country display name, the latter the two character country code), are defined as part of the basic schema deployed by default with OpenLDAP. However, they are not defined as being a part of any object type in the same basic schema. So, if you want to store the country someone lives in, you’d have to create a new object class that had c or co or both as (probably) optional attributes. Another reason to add a new object type is to allow values that appear as part of the FQDN of an object to be searched against.

The tasks for schema definition you must do in any LDAP project should include the following:

  • Look at the schemas already available to you, especially those installed by default. On Linux with OpenLDAP, that includes core schemas and schemas to support CORBA, Java, Kerberos, basic tracking of people, and others. Active Directory has similar schemas plus more to support Windows, Exchange and so on.
  • Inventory which attributes you think you are interested in tracking. Most may already exist in the base schema set.
  • Note the object assignments for each – many attributes are used in multiple object types - and try and come up with a coherent minimum set of object types that contain the attributes you need. This set should make some sort of sense, not ending up with an amalgamation of “person” and “device” object types.
  • Catalog which attributes you may need that are not in an object type already. c and co are two good examples of “orphan” attributes with the default OpenLDAP install.
  • Describe which attributes are missing. Give them names, aliases if you wish, and decide which data type (and possibly maximum length) they will hold and the syntax in ASN.1 for it.

Following is an example of a new object type for holding information about people. It is not descended from person or any of the other person-based object types. Instead, it is meant to be used in conjunction with those other types. It is meant to hold attributes not assigned to any other person-based object types. The attributes used are both pre-defined in the base schema (c, co, generationQualifier, houseIdentifier) and custom-defined for this object type (birthday, endDate, child, spouse, startDate, weddingAnniversary).

# Seemingly useful information, especially if you want
# to use the directory as either a contacts or HR
# repository (try to do either without birthday or
# spouse, for example).
objectclass ( 1.3.6.1.4.1.314159.2.1
       NAME 'dossierPerson'
       DESC 'Helpful supplementary contact information'
       SUP top AUXILIARY
       MAY ( c $ co $ generationQualifier $
             houseIdentifier $ birthday $
             endDate $ child $ spouse $
             startDate $ weddingAnniversary ) )

Object Types Are Like Interfaces

Objects in the directory can morph in real-time. When you update objects in the directory you can add new object types to an existing object instance, and that instance can then contain all the required and optional attributes defined in all the object types that the entry contains in the objectClass attribute, including the newly added object type. In this sense, objectClass is simply another object attribute (in fact, part of the top object, as we have seen), and it is a repeating attribute that all objects have at least two values of (one of top and one of some other object type). The set of all the values in the objectClass attribute in a directory entry (object instance) comprises all the attributes the entry (object) can possibly contain. Many if not most of the optional attributes may not contain any value at any point in time, depending on the application.

It is better, however, especially given the generic tools that are out there, to define an object instance in advance as being of all the types you think it may ultimately contain. This allows browsers to see those types defined even if no attributes have been filled in for them yet. It forces the filling in of certain mandatory attributes (if any) for each type when an object instance is created and stored, increasing search options and speed, since often mandatory attributes are indexed.

Be Prepared to Code

You will probably have to write code (unless you’re working with an email client).

It is amazingly easy to write generic LDAP browsers (done that, been there). That’s why you see so many out there, including one with almost every LDAP toolkit and book. And yet, strangely enough, the one production application almost universally LDAP-enabled, which is email, strongly depends on the client implementation to pick up attributes correctly.

For example, Mozilla publishes an LDAP schema for their Thunderbird client, and they conform to others, because if you point their LDAP address book client at an inetOrgPerson, they will pick up many but not all of the attributes correctly. In fact, some seemingly “standard” attributes such as telephoneNumber may or may not be picked up correctly by a given email client, because they have coded against a specific object class or classes and are looking for specific attributes, and the “obvious” choice wasn’t the one chosen.

That said, outside of email there are almost always only two ways to use LDAP:

  1. Through an existing generic browser/editor – For example, the ADSI and LDAP API based tools that come with Microsoft’s Active Directory.
  2. Via a custom application you write – With the plethora of LDAP APIs out there including all the Java JNDI and .NET DirectoryServices APIs that abstract a lot of it for that language, it isn’t hard to write LDAP applications. I even use the SQL Server ADSI interface a lot, which allows you to issue SQL SELECT statements against Active Directory (with some limitations – it doesn’t support multi-value attributes, for example). In fact, if you are adding custom attributes and object types, you are almost assuredly going to be using a custom application if you want to deploy to end users, as opposed to technical personnel only.

A User is Just Another Entry

A “user” is just another directory entry. Some object types can contain (clear text or hashed) password and user id attributes. You can use these to control accessing the directory, both signing in and then controlling access after that. In addition, some object types representing persons, for example, can be made members of other object types representing groups. These can all be used to control access to directory entries in rich ways using access control lists (which are nothing more than a list of FQDNs and the permissions attached to them).

The thing to remember is a “user” is just another directory entry. In fact, in LDAP directories, everything is just another directory entry, including the metadata (just like system tables in an RDMBS are tables that hold information about other database entities, including tables, including themselves).

Conclusion

I hope that was worth the read! I attempted to alter your perception on some specific points about LDAP-based directories. To recap:

  1. Everyone approaches LDAP as a hierarchy. This is wrong.
  2. Everything you know about relational, hierarchical and network database management systems is wrong. Wipe it from your head while dealing with LDAP.
  3. Don’t think navigation or underlying organization. Think sets and set theory. Think Venn diagrams.
  4. LDAP schemas are easy. Once you get the hang of them.
  5. Object metadata in the directory can morph in real-time.
  6. You will probably have to write code (unless you’re working with an email client).
  7. A “user” is just another directory entry.

Each of these issues cost me some time and some pain before I fully “grokked” it and what it meant when working with LDAP. I hope by reading this, I save you some pain, if not now, then in the future when you land on that LDAP project, or start writing that cool new open source LDAP tool.


  1. In 2001.

  2. In some ways this is similar to “duck typing” in prototypical inheritance, where you care less about an object’s “type” and more about whether it has specific properties and methods you care about.