Friday, January 16, 2009

Dude, Where’s My Space?

Matt Cooper has a nice article on how to Identify your large attachments.


Those Zune guys make it look so easy: Zune 4 = up to 1,000 songs, while Zune 120 = up to 375 hours of video. But of course, they’re looking at average file sizes, and the “up to” acknowledges that not every song is exactly 4MB. Your business data isn’t MP3s and videos, but it’s pretty similar in that regard – it comes in different amounts. Maybe you stored just a telephone number for this contact, but a full family tree, criminal background check, and medical work-up for that one. CRM Online is similar to Zune, in that we offer “CRM 5GB” (Professional) and “CRM 20GB” (Professional Plus). Here the similarities end, I’m afraid: Zune comes in lots of colors, but CRM Online only comes in blue.

The Problem

So, you’re a Professional customer with 5GB of storage, and your notification & status page is starting to look like the screenshot below. You feel like you’re running out of space. Where’s that space going, and what can you do about it? Read on, because I’ve analyzed some reports from our datacenter, looked pretty deeply into the CRM internals, and just may be able to help you reduce your space consumption.

clip_image002

Obviously, you purchased CRM Online to store your business data (leads, contacts, accounts, orders, contracts, etc.). If you just knew how much space one lead took up, you’d know how many you could store, right? As some of you undoubtedly know, it’s not quite that simple.

<rant type=“geeky”>In computer science, and engineering in general, we know that almost anything is possible if you’re willing to make the right set of trade-offs. CRM is built on top of SQL server and a metadata-driven object model, each of which adds tons of capability, flexibility, extensibility, maintainability, and probably some other -abilities I’ve forgotten. And we want those features to give us reasonably fast response time, since a system with poor performance gets poor adoption by its potential user base. So what’s left to trade off against? Space and complexity, that’s what. For instance, to search a column in SQL quickly, we build an index. But that index itself consumes more space, and in interesting mathematical ways (read up on B-trees for some insight). Pile on the fact that our object model uses multiple physical tables, each with multiple indexes, to represent an entity and its relationships, and any predictive power you thought you had goes right out the window. You get an awesome object-relational database platform with powerful workflows and a strong security model, but you can’t tell a priori what the impact of adding 500 bytes of customer data will be on your storage size.</rant>

A Lucky Break (or, “A Nice Property of Dynamics CRM”)

Given the last paragraph, it seems hopeless that you could predict where to get major gains in space reduction, or even be able to control the growth of your business data, right? Fortunately, there’s a really nice property of CRM Online customer organizations: For most of them, the biggest use of space is one particular type of entity that’s pretty easy to measure. And it’s not core business data growing out of control. That entity is the E-mail Attachment. More on the solution later, but let’s talk for a moment about how I came to know about this happy circumstance.

How do I know? Simple, I asked our operations team to run some analysis scripts against a selection of the largest databases in the service. Those scripts gave me the space consumption of all different types of entities and system data, broken out by rows, indexes, partial pages, etc. A few manipulations in Excel later, I can say with confidence that e-mail attachments are the biggest space consumer in the largest databases on CRM Online.

A brief aside: I don’t want to scare anyone – at no point did anyone have access to any customer’s business data. Before our ops team will run scripts of this nature, we must prove to them that all we get are aggregate numbers (counts of rows, size of space consumed, etc.) and not any PII or other confidential business data. Operations team members go through special training and qualification before being allowed access to datacenter machines. PII is never allowed to leave the datacenter, and all our processes are designed around this constraint. If you want more information about how we protect your privacy at Microsoft, check out this document or feel free to e-mail me: I’m the Privacy Lead for Microsoft Dynamics CRM.

So now I’ll get into the details of the E-mail Attachment entity. Its internal name is ActivityMimeAttachment, and it specifically represents a MIME attachment to an E-mail activity. Where do these come from? Well, you can upload them when you create an E-mail record in CRM, say if you want to send a PDF to a lead. More likely, though, is that you’re using the CRM Outlook client and tracking your e-mails automatically. The Outlook client stores not only the text of the e-mail, but any attachments as well! As you know, attachments can be rather large these days: images, PDFs, engineering drawings, etc. In my screenshot below, you can see the original e-mail in Outlook on the left, with an attachment from our bug tracking system, and the corresponding E-mail activity / E-mail attachment in CRM on the right. Granted, it’s a tiny attachment, but it serves to illustrate my point. One stealthy way that e-mail attachments enter your CRM system is if people you correspond with have graphics in their signature. Every time you send or receive a CRM-tracked e-mail with such a person, a little more space gets eaten up by the graphic.

clip_image004

The Solution

How does this help us solve the problem? Well, if you can see it, you can search it – and if you can search it, you can Bulk Delete it! For various historical reasons, you can’t actually do an Advanced Find query for E-mail Attachments themselves, but it’s pretty simple to craft a query using the Regarding field on E-mails. Work through this example with me. (Some of this example will duplicate what Sharad wrote last April, but I wanted to give you an all-in-one resource for solving this particular problem. Sharad’s post was a more general introduction to the various data management facilities in CRM Online.)

First, we create an Advanced Find query that looks for all E-mails who have an E-mail Attachment whose File Size is greater than some number. In my example, I’ve chosen 3 million bytes (or roughly 3MB), but you might opt for a larger or smaller number based on your needs. The query and the results of that query in our “Dogfood” CRM server are below. You should save this view with a recognizable name.

clip_image006

clip_image008

Great, we’ve picked out the records, but how do we get rid of them? That’s a simple Bulk Delete job. Go to Settings, choose Data Management, and then Bulk Record Deletion. Choose New, and click Next. From the “Look for” dropdown, choose “E-mail”, and then you can select the Saved View you created above. But wait! Won’t that query delete ALL e-mails with big attachments? What if you have a recent e-mail that still has business value even though it’s taking up a lot of room?

Add more conditions to your Bulk Delete job. In my example, I’ve chosen to delete e-mails with big attachments that are over 3 months old. You should analyze your own business needs and determine whether age, creator, status of a related account, or some other criteria makes sense. Or, you may determine that your CRM system is not in the business of storing “big” attachments at all, and so may opt not to add another condition. Check with your business for its data retention policies, of course.

clip_image010

At any rate, click Next to see the next screen. Here you can name your job and set how often you want it to run. I’ve chosen every 30 days (since my job only deletes 3-month-old e-mails, there’s no reason to run more frequently). I run it at 9:00pm since that’s an off-peak hour both for my business and for CRM Online itself. One more Next button and a Submit button, and your Bulk Delete job will be set to run.

clip_image012

Deleting the e-mails will cascade the delete to the attachments, which will free up space. You should see your “Storage Used” indicator drop within a day after the job runs (as we have various other background jobs which examine the results of the bulk delete, defragment and compress the freed pages in SQL, etc., and some of those only run at night). One caveat: based on the way we currently calculate storage used, you may not see the drop for a few days, as slack space in the SQL database may be counted as “in use” for a time. I’ve got a feature in our next service update to change that calculation so that empty space is immediately recognized as empty. As I write this, we’re in the final stages of perf and security testing the new calculation, and I’m quite optimistic that we’ll get the green light.

Remember those top databases I talked about earlier? Assuming those orgs no longer wanted any of their big e-mail attachments, they would save on average 21% (average drop in size from 1184 MB to 939 MB). Of course YMMV since you may be using more or less space in e-mail attachments, or you may want to keep some of your attachments and not others.

Other fruitful areas you could consider:

  • Refactor your workflows. Each workflow action takes up some space as it’s expanded, processed, and completed by the workflow engine. If you’re a heavy user of workflow, take a look and see where you might be wasting space here – are you using 5 actions where 3 would suffice? Each action doesn’t take much room on its own, but if a workflow runs every time you create a contact, things can fill up fast. You could delete old System Jobs to get rid of these activities – that might be a blog entry all its own.
  • Notes take space, too. If your team uses a lot of Notes attached to entities, consider how much space those might be taking up. A page of text is roughly 1KB raw, and each annotation consumes not only its raw space but the same variety of links, indexes, etc., under the hood. Notes can be deleted by the bulk delete wizard.

Exercise left to the reader: what if you didn’t want to actively delete the big e-mails, but just create a Task for the owner of the big e-mail to take some action? I can imagine a workflow that would… you get the idea J I hope this article helps you better manage your storage in CRM Online, and that it leaves you with plenty of room to grow your business data.

Special Thanks

Special thanks to the following people for technical or business review of this article: Tommy Su, Pat Munns, Stephanie Dart, and Jim Glass.

Cheers,

Matt Cooper

Published Thursday, January 15, 2009 10:39 AM by crmblog

No comments: