Welcome to Laura O'Grady, PhD

You can change this text in the options panel in the admin

Member Login
Lost your password?
Not a member yet? Sign Up!

The crowdsourcing of tagging: a form of sensemaking

June 4, 2011

Peter Wylie recently wrote, “As the Internet continues to accumulate more and more information, it becomes increasingly difficult to sort and prioritize that information in a way that provides optimal relevance for each individual user”. In this post Peter also describes Blekko, a search engine that requests users to rank the relevance of their search results by using a pre-set listing of tags. I have previously suggested in posts on tagging and the semantic web that crowdsourcing is an essential element of adding relevance to search findings.

Sensemaking is a user-centred theoretical-methodological approach developed by Brenda Dervin for understanding how and why users interact with information and information sources. It is based on the premise that life is marked by a series of discontinuities. As people move through life they face information (and other) gaps that stop their ability to make sense of the world and to take decisions and actions. Gaining the needed information allows them to progress. This triad: situations, gaps, and uses/helps, is fundamental to sensemaking [1, 2, 3, 4]. Simply put, when individuals realize that they are in a situation that requires information, they will move towards closing that gap by seeking help and/or information [2].

Tagging can be considered a sensemaking activity [5] as it involves assessing and describing content. Yew, Gibson and Teasley [6] write, “Social tagging facilitates the sense making efforts of the individual and the learning community through the collective act of associating keywords with documents/artifacts and by sharing those terms with the rest of the community” (p. 1010).
The sensemaking framework is therefore well-suited to understanding the tagging process in this context.

Sensemaking in the context of tags

Tagging is intended to add meaning to the content. However, one “side effect” of this is known as tagging ambiguity [7]. For example the tag “orange” can refer to the colour or the fruit of the same name. Ways to help focus a search using tags would be the use of multiple tag searches in which case the user can use the term “orange” and “fruit” to help contextualize their search even further.

It may seem foreign to my colleagues who work in technology to consider a theoretical framework when executing an initiative such as the crowdsourcing of tagging. However, it is this type of cross-pollination of ideas that leads to interdisciplinary collaborations. Who knows, maybe these concepts will be explored by graduate students who later initiate a start-up. Like Google.

References

[1] Dervin, B. 1977. Useful theory for librarianship: communication not information. Drexel Library Quarterly 13, 3, 16-32.

[2] Dervin, B. 1992. From the mind’s eye of the user: the sense-making qualitative-quantitative methodology. In D. Glazier & R. Powell (Eds.), Qualitative methods in information management (pp. 61-84). Englewood, CO: Libraries Unlimited.

[3] Dervin, B. 1998. Sense-making theory and practice: an overview of user interests in knowledge seeking and use. Journal of Knowledge Management, 2, 2, 36-45.

[4] Dervin, B. 1999. On studying information seeking methodologically: the implications of connecting metatheory to method. Information Processing and Management, 35, 727-750.

[5] Golder, S. A., & Huberman, B. A. (2006). Usage patterns of collaborative tagging systems. Journal of Information Science, 32, 2, 198-208.

[6] Yew, J., Gibson, F., & Teasley, S. (2006). Learning by tagging: group knowledge formation in a self-organizing learning community. In Proceedings of the 7th international conference on Learning sciences (pp. 1010-1011). Bloomington, Indiana: International Society of the Learning Sciences.

[7] Breslin, J., Passant, A., and Decker, S. 2009. The social semantic web. Springer.

Tags: , , , , ,

6 Responses to The crowdsourcing of tagging: a form of sensemaking

  1. Yaser Alyounes on July 14, 2011 at 11:20 pm

    First of all, Thank you for the great post, which along with the Social Media Matrix for Health Care I’m working on these days got me thinking of a potential way to automate adding hashtags from the user side while typing the tweet. Consider the following scenario:

    1-User types message
    2-Function to identify key words in a post (verbs, nouns, adjectives) as the user is typing.
    3-Function to identify nouns related to health care, classify nouns by category then display those categories (e.g. Person, Profession, Organization, Government, disease, technology, etc) as #hashtags.
    Health care terms can be identified by using SNOMED for example as a reference library.
    4- Those #hashtags come from a central database (e.g. http://tagdef.com/).
    5-The #hashtags are displayed 3 or 4 lines below the tweet text and highlighted, for better visual effect.
    6-To save the user time from selecting the #hashtag then clicking again to send the tweet, the #hashtag itself becomes the submit button.

    In that way, we allow the user to select the appropriate #hashtag and submit the post with one click. This automation will simplify use of #hashtags, while, and provide health care organization with better categorized tweets for analysis.

    Again, what got me thinking about this is that I’m trying to think of a statistical BI model to help healthcare organizations make sense of, track, and appropriately respond to social media engagement to influence decision making. I’ve developed the initial matrix and posted it on my blog, and adding more points every time I think of something. But the biggest challenge is classifying the posts based on subject.

    This is easy if we’re using a custom SM platform for patients (I.e. SharePoint2010) since it has a great built-in tag function, that allows users or the organization to maintain a list of tags and puts them as auto suggested text as the user is typing. Plus SharePoint has its own BI enging and dashboards but for other platorms like twitter, FB, Google+, etc, each will require a unique tool and sub-model for gathering and classifying data.

    • Laura O'Grady on July 15, 2011 at 2:37 pm

      Thank you for your comment Yaser. I read the two posts from your blog, “The Social Media Matrix. How to correlate your SM data to make decisions” and “How To Use Internal Analytics To Improve Your Hospital’s Performance”. I have a few questions and comments:

      1. What exactly is the research question/objective of this initiative? Are you using a theoretical framework in this study? I am thinking about how to ensure your findings will be generalizable.

      2. Who are the users (study participants) of this system? How are they known to you? How do you have access to this population? Will they be using SharePoint or have you just mentioned this as an example? Again, this related to generalization (and how hard it will be for you to complete this study/project).

      3. Some Twitter clients already have autocomplete features. Once you enter a hashtag a list of tags that populate once you enter “#”. Would it not be more prudent to have your list cross-pollinated with the larger Twitter community? Are you going to “force” your participants to use your client of choice (SharePoint)? What impact will this have on how the results can be applied in other contexts?

      There are already various semantic web initiatives and the move towards centralized taxonomies such as OWL or RDF and it may be prudent to “future proof” your efforts by considering this issue now. I think it is a good idea to consider SNOWMED (it is US government based). But if your system is to be truly interoperable you may need to also consider things like HL7 (which uses Arden syntax) or even ICD as these function at a global level.

      To execute steps #2 and #3 you may want to look at what has already been done in NLP if you have not already considered this. Steps #5 and #6 may require usability testing as most users will likely find these functions novel and may therefore have difficulty adopting them.

      I’m not sure if this is a thesis or a project. It is ambitious and you may want to consider scaling it back a bit.

      Also you may want to take a look at Breslin et al, “The Social Semantic Web”. You can also follow John on Twitter @johnbreslin.

  2. Yaser Alyounes on July 15, 2011 at 5:48 pm

    Thank you for taking the time to go through my blog. I regards to the inquiries/comments, first I would like to clarify that the model I’m working on is intended not for a one-time study, but rather as matrix for on going evaluation of data gathered from social media.

    1- I guess the primary objective of this initiative is to figure out a comprehensive “conceptual” statistical model, which health care organizations can use to evaluate their social media engagement and make decisions accordingly, through relying on indicators with true statistical significance. Again, this is intended to be for ongoing (monthly, quarterly, yearly) evaluation. So far I haven’t done any real research, but rather been brainstorming, and relying on my past work experience. The results were the two blog posts you mentioned and one before them titled: “Business Process Mapping, A Detailed Model” which focused more on the organization’s internal workflow and quality.

    2- The users of the system are the trickiest part here. I used sharepoint as an example if a health care organization choses to build its custom SM tool. But we all know that patients use their preferred SM portals (Facebook, Twitter, Imedix, etc). In order to truly capture and analyze all the information provided by, say for example, diabetic patients, we need to figure out a way of collecting all the posts about diabetes from all these different platforms, and either convert them into a homogenous format that we can then analyze, or analyze the data in each platform individually, then aggregating it to get the bigger picture. Either way, the model I’m working on is intended to be high level, with the assumption that data can be collected in the way we want it. Automating hashtags was a part of this, focusing on how to help patients and the community use the proper hashtags to help us in properly categorizing and analyzing all the health data on twitter.

    3- Again, SharePoint was used as an example; we have to consider a unique and applicable way for each major platform on how to filter and collect the data we want. We always run the risk of users using non-standard abbreviations or misspelling, and that’s a part of the margin of error that we have to accept.

    The very first post in my blog is a research paper I’ve done a few months back on the potential for Semantic Web in Health Care. I referred to SNOMED in step 3, but given that we are collecting data from patients as well as healthcare practitioners, the terminology used might differ. It even differs between physicians and ICD. One example of physicians refusing to use ICD (task assigned to the coders to do) was for the diagnosis “Retinoblastoma”. In ICD 10-AM (the standard we use in the middle east), it is Malignant Neoplasm of the Retina.
    I believe ICD is useful as the end result, as in, once we identify the data gathered, converting to ICD will add value for processing, but it is not reliable for collection.

    I’ve lead an HL7 integration project in the past, and it is useful only when we fully have control over the databases. But for my purpose here, which is, gathering data from Social Media, something like Message A01 or A03 are irrelevant. I have consulted with a friend of mine who works in AI and she recommended reading about NLP, specifically semantic clustering and semantic distances. The challenge here is that I’m not a programmer, so I’m trying to focus on the overall Matrix, and benefits of correlating different indicators to influence decision making.

    This undertaking is neither a thesis nor an official project. It’s just something I’m doing during my spare time for fun, or at least that’s what it is at the moment.

    I quickly read through the “Social Semantic Web” when I was doing my research a few months back, but never got the chance to read the whole thing in detail. Looking forward to it, and looking forward to your feedback.

  3. Laura O'Grady on July 17, 2011 at 9:57 pm

    Regarding statistics and the statement at your blog, “This would mean that the data collection procedure should focus on collecting information of statistical significance only” it should be clarified that one cannot collect data (information) that is known to be statistically significant in advance. The process entails collection first then analysis that may or may not yield significant results.

    In the opening paragraph in your comment above you have identified that the results are intended to be used by organizations and identify patients’ tweets as the data source. You will need to think through how analyzing a series of tweets, presumably by persons with an illness, perhaps caregivers and even health care professionals will aid in organizational decision making. Why types of decisions? Policy? Programs and Services?

    Tweets by laypersons are more likely to contain organic hashtags in the form of a folksonomy. Having a list populate while writing a tweet could conceivably “force” a patient to use another system (e.g., SNOWMED) that has been defined by others, most likely not patients. You may want to take a look at my blog post, “Why Web 3.0 may be a step backwards from Web 2.0″ that might help explain why I do not think this is a good approach.

    I think if you want to be sure that you collect all the tweets by a certain population (including those that use hashtags to identify terms and those that do not) you should query the Twitter firehose using their streaming API directly. Anything else would mean that your sample could be inadvertently tainted.

    I hope these comments and suggestions are of value. Your ideas are interesting and this work could be very useful in many ways.

  4. Yaser Alyounes on July 18, 2011 at 8:15 am

    Thank you for your valuable comments.

    I agree that from a inferential statistics viewpoint, statistical significance cannot be determined until we compare probability (p-value) to our level of significance (Alpha) in order to reject or accept the null hypothesis. While we’ve been monitoring day-to-day operations and business processes for a while using well-established measures of performance, the same does not apply to analyzing and monitoring Social Media yet.

    Given that trends keep changing much faster online than they do offline, what is statistically significant today, might not be tomorrow. For example, if we are monitoring the correlation between which platform patients use and a specific topic (e.g. Peptic ulcer), we might one month see a spike in use of one social media platform over the others, and the next month, the percentage of posts are close to equal. This spike might be an indication of a marketing campaign or other factors, and it can be a one time thing.

    As for how the numbers will help organizations make decisions, I attempted to do this when I started working on the first draft of the matrix, as it is the primary objective. If you click on the image of the matrix in my blog post (it will open an HTML page) and hover the mouse over the intersections (e.g. Platform/Timestamp), you will read the following in the alt-text:

    “Comparing count of users & posts per platform, per time can help us drill down to identify if there are any patterns between the time “hour/day/week/month/quarter/year” of the post and the platform used. If any automated announcements or even manual are planned, they can be better targeted to the community when posted at the right time”

    I have attempted to add such information for each correlation of two or more variables in order to clarify how the collected data is to be used to make decisions. What I provided are examples, and I will add more later.

    I read your post on Web 3.0 and while I agree that folksonomy should be a consideration, I personally find myself leaning more towards a formal method of standardizing ontologies when it comes to healthcare. Since the information we are gathering here is of great value and effect on people’s health, we need to ensure that the data is collected and classified as accurately as possible, in order to 1) minimize the data collection and analysis time, 2) minimize the margin of error in the gathered data, and 3) justify ROI for our initiatives.

    Once again, thank you for your much valued and appreciated comments. I will continue working on the model during my spare time and post updates on my blog whenever I add/modify anything in the matrix.

  5. Quora on August 4, 2011 at 5:55 pm

    How has the concept of hashtags changed the way people communicate?…

    Hashtags are a grassroots form of semantic data. It is about the community who uses the information labeling and taking control of how it is disseminated. In some ways it can be considered an “inside language” or code by which participation (in sow c…

Leave a Reply

Latest Tweets

Twitter