Tag Archives: semantic

The crowdsourcing of tagging: a form of sensemaking

Peter Wylie recently wrote, “As the Internet continues to accumulate more and more information, it becomes increasingly difficult to sort and prioritize that information in a way that provides optimal relevance for each individual user”. In this post Peter also describes Blekko, a search engine that requests users to rank the relevance of their search results by using a pre-set listing of tags. I have previously suggested in posts on tagging and the semantic web that crowdsourcing is an essential element of adding relevance to search findings.

Sensemaking is a user-centred theoretical-methodological approach developed by Brenda Dervin for understanding how and why users interact with information and information sources. It is based on the premise that life is marked by a series of discontinuities. As people move through life they face information (and other) gaps that stop their ability to make sense of the world and to take decisions and actions. Gaining the needed information allows them to progress. This triad: situations, gaps, and uses/helps, is fundamental to sensemaking [1, 2, 3, 4]. Simply put, when individuals realize that they are in a situation that requires information, they will move towards closing that gap by seeking help and/or information [2].

Tagging can be considered a sensemaking activity [5] as it involves assessing and describing content. Yew, Gibson and Teasley [6] write, “Social tagging facilitates the sense making efforts of the individual and the learning community through the collective act of associating keywords with documents/artifacts and by sharing those terms with the rest of the community” (p. 1010).
The sensemaking framework is therefore well-suited to understanding the tagging process in this context.

Sensemaking in the context of tags

Tagging is intended to add meaning to the content. However, one “side effect” of this is known as tagging ambiguity [7]. For example the tag “orange” can refer to the colour or the fruit of the same name. Ways to help focus a search using tags would be the use of multiple tag searches in which case the user can use the term “orange” and “fruit” to help contextualize their search even further.

It may seem foreign to my colleagues who work in technology to consider a theoretical framework when executing an initiative such as the crowdsourcing of tagging. However, it is this type of cross-pollination of ideas that leads to interdisciplinary collaborations. Who knows, maybe these concepts will be explored by graduate students who later initiate a start-up. Like Google.

References

[1] Dervin, B. 1977. Useful theory for librarianship: communication not information. Drexel Library Quarterly 13, 3, 16-32.

[2] Dervin, B. 1992. From the mind’s eye of the user: the sense-making qualitative-quantitative methodology. In D. Glazier & R. Powell (Eds.), Qualitative methods in information management (pp. 61-84). Englewood, CO: Libraries Unlimited.

[3] Dervin, B. 1998. Sense-making theory and practice: an overview of user interests in knowledge seeking and use. Journal of Knowledge Management, 2, 2, 36-45.

[4] Dervin, B. 1999. On studying information seeking methodologically: the implications of connecting metatheory to method. Information Processing and Management, 35, 727-750.

[5] Golder, S. A., & Huberman, B. A. (2006). Usage patterns of collaborative tagging systems. Journal of Information Science, 32, 2, 198-208.

[6] Yew, J., Gibson, F., & Teasley, S. (2006). Learning by tagging: group knowledge formation in a self-organizing learning community. In Proceedings of the 7th international conference on Learning sciences (pp. 1010-1011). Bloomington, Indiana: International Society of the Learning Sciences.

[7] Breslin, J., Passant, A., and Decker, S. 2009. The social semantic web. Springer.

Tables as a form of information visualization

Readers, you may find this blog posting of interest:

http://datamining.typepad.com/data_mining/2010/08/the-interpretation-of-tables-in-texts-2000.html

First, this guy (not to be rude, his name is Matthew Hurst) did his PhD on the depiction of data in tables. This is interesting in of itself. By tables I mean a plain old box with fields in rows and columns. It may seem “useless” or “stupid” to a lot of people but how many of us read data in this format today? Excel alone means probably millions. I, for one, am glad that people are working on ways to improve this.

Now comes the value added part. The author goes on to reference an article, “Exploiting a Web of Semantic Data for Interpreting Tables”, which can be found here:

http://journal.webscience.org/322/

The abstract states:

Much of the world’s knowledge is contained in structured documents like spreadsheets, database relations and tables in documents found on the Web and in print. The information in these tables might be much more valuable if it could be appropriately exported or encoded in RDF, making it easier to share, understand and integrate with other information. This is especially true if it could be linked into the growing linked data cloud. We describe techniques to automatically infer a (partial) semantic model for information in tables using both table headings, if available, and the values stored in table cells and to export the data the table represents as linked data. The techniques have been prototyped for a subset of linked data that covers the core of Wikipedia.

I’m looking forward to what this collaboration yields.

Why Web 3.0 may be a step backwards from Web 2.0

First off, for the purposes of this post I would like to declare that in using the phrase “web 3.0″ I’m referring to the semantic (defined as “meaning”) and “web 2.0″ as collaboration. My apologies in advance to my techie friends who would argue that web 2.0 means a lot more than that (and you know that I know that!).

I wanted to learn more about the semantic web so I read John Breslin, Alexandre Passant and Stefan’s Decker’s excellent book, “The Social Semantic Web“. The authors provide a nice introduction to semantics and the web, defining and describing various metadata formats (e.g. RDF) and ontologies (e.g. OWL).

I’m now concerned about the progress we’ve made moving the web (both technically and socially) towards a more collaborative environment.

There are many components of the semantic web that will still support collaboration (e.g. the semantic wiki, tagging and social networking). But my concern is about the automated, machine-driven ontologies and metadata schemas. The terms, definitions and applications associated with these are, again, decided by a closed group of individuals (qualified, I might add but closed nonetheless) who determined what will be labeled what. This will affect who will find what.

Unless the terms used in these ontologies are already known the information seeker using their favourite search engine the end results may be the same as someone trying to find a book indexed by the Dewey Decimal system without knowing the meaning of the numeric codes.

What if you have access to the number but not its meaning? Why are we denying access to the developments of these standards and not using a community driven folksonomy? Or what if you were denied access to the process? Where’s the collaboration in that?

Wikipedia and Flickr – a semantic marriage?

Wikipedia tends to primarily contain written content on practically every topic conceivable. Although it allows for pictures not every entry has (or needs) visual images. There are rules about what the content considered appropriate, including that it must be of merit (e.g. an article cannot be posted about a person unless they are a “notable“). Wikipedia has also specifically stated that it is not a repository of images. Finding written content on Wikipedia somewhat depends upon whether a page has been already created. Flickr, on the other hand, is all about images. They also have terms of service pertaining to copyright and objectionable content. However, anyone can pretty much post a picture of anything. Many of the pictures included at Flickr are created by amateur photographers, are of things or events that may only be of interest to a limited number of people or only themselves. There is little information about these images at Flickr, in some cases only a photo appears.

I searched for “rubber balls” on Wikipedia, guessing that it would not have its own page. The results indicated as such but provided links to other entries in which rubber balls were mentioned. I also ran a search at Flickr using the words “rubber balls”. The result contained reference to 8,781 images that were tagged with “rubber balls”.

My intention in this exercise is not to compare the two sites (after all, this would be like comparing apples and oranges) but to draw attention to Wikipedia’s need to improve the indexing or labelling of their content , preferrably using a community-based tagging sytem or folksonomy. Flickr would benefit from some descriptive information about the pictures that are posted, especially if they have historical or public information. For example, Wikipedia has an informative entry on anatomy but would likely benefit from the many images tagged as anatomy on Flickr

Once we move more towards the semantic (providing meaning) web which site will be in the better position in terms of preparing its content with metadata (data about data or tags that describe content)? Although Wikipedia has plans for a semantic version I think they have “dropped the rubber ball” by not building in a tagging feature now. Crowdsourcing is only going to take your site so far, especially when it is content that has already been created that needs to be tagged. We need to start building in tagging functions now, including other collaborative environments as well (e.g. message forums). What we really need is a message forum at both Wikipedia and Flickr. Well, one social scientist with an interest in the power and value of collaboration can only dream…