SAP Table SOFFCONT1 Maintenance: Part #1 - Data Structure and Initial Analysis
*In this article and all future ones on this topic, I consider the following initial situation:
One day, every SAP Basis administrator may face a situation where the SOFFCONT1 table becomes too large. This can be related both to concerns about free space in the database and preparation for migration to SAP HANA, where every gigabyte matters.
Let's assume this moment has arrived, and you have an object that takes up too much space. There are two main approaches to solving this problem:
Each approach has its own advantages and disadvantages, described below:
Moving to a content server:
Analyzing and addressing database space utilization issues:
The optimal choice depends on the specific situation. If you need a quick solution, moving SOFFCONT1 to a content server might be an acceptable option. However, for long-term optimization and to prevent the issue from recurring, it is recommended to conduct a detailed analysis and address the database space utilization problem.
In this article, I want to share my experience and findings in seeking a sustainable solution, which can also be part of a hybrid approach. By addressing the root cause, you can then move the reduced and more static SOFFCONT1 table to a content server.
Data structure
Let's start by examining what and how SAP stores in SOFFCONT1.
SOFFCONT1 contains data related to SAPoffice and SAP Business Workplace. Specifically, this table is used to store the content of documents and objects such as attachments, messages, graphic files, and other types of files that are transmitted and stored in SAP.
SAP Note 904711 - "SAPoffice: Where are documents physically stored?" provides examples of typical scenarios for generating new entries in SOFFCONT1:
- Creating and sending documents using the send screen or the Business Workplace,
- Sending ABAP lists (for example, using System->List->Send),
- Sending emails or fax documents from an application,
- Sending documents using actions (Post Processing Framework PPF),
- Sending documents using message control,
- Sending documents using SAPscript or Smart Forms,
- Creating attachments in the GOS list
- Sending spool lists
*It is important to note that all the aforementioned cases do not include text documents, which will be stored directly in SOC3.
While SOFFCONT1 is the physical storage location for file content, the attribute information and logical connections of documents are distributed across several other tables. The most important of them are:
1) SOOD?
The SOOD table is used to store metadata of documents present in the system.
It is the primary source of file attribute information. Among the numerous columns, let's highlight the most interesting ones:
OBJTP - Code for document class.
OBJYR - Document creation year in specific format [[creation year] - 1975].
OBJNO - Document ID.
OWNNAM - Name of document owner.
CRDAT - Date of creation in format [YYYYMMDD].
EXTCT - Storage location code.
FILE_EXT - File extension for PC application.
OBJLEN - Size of document content in bytes.
The combination of OBJTP, OBJYR, and OBJNO uniquely identifies any file and used as a key in different SAPOffice tables.
Since we will be looking for the reasons for the growth of SOFFCONT1, we will be interested in entries that have the value "K" in the EXTCT column. This means that the object is stored in KPRO (knowledge provider), which by default is the database, i.e., its content has been loaded into SOFFCONT1. If the EXTCT field contains a space, then the file content will be stored directly in SOC3.
2) SOFM?
The SOFM table is used to store information about the logical connection of files and documents saved in the system.
Below are the columns which shall be noted for future use:
FOLTP - Code for folder type.
FOLYR - Folder creation year in specific format [[creation year] - 1975].
FOLNO - Folder ID.
DOCTP - Code for document class.
DOCYR - Document creation year in specific format [[creation year] - 1975].
DOCNO - Document ID.
For the study described in this article, the most important will be the connection between the document and its "folder," which is stored in SOFM:
The folder in this case can be either a logical grouping object, such as a composite data type, or a real directory in the SAPOffice structure.
3) SOC3?
The SOC3 table is key to determining the location of the content. If the object is located in the knowledge provider, SOC3 will have the corresponding logical file identifier recorded.
4) SOFFPHIO?
The SOFFPHIO table stores attributes of physical files. For documents that affect the size of the database, it is important to pay attention to entries where the storage class is recorded as "SOFFDB," which in transaction OAC0 is described as "Database storage for SAPOffice documents." Using data from SOFFPHIO, we can determine the physical file identifier and locate the file in the database, based on it's logical id.
5) SOFFCONT1?
The SOFFCONT1 table is the physical storage location of file content.
Description of table fields:
RELID - Description of information stored in data container field (CLUSTD).
PHIO_ID - File physical identifier.
SRTF2 - A counter for determining the sequence of entries in the data container. It is separate for each of RELIDs of the same file.
PH_CLASS - Physical document class.
CLUSTR - Length of field CLUSTD in bytes.
CLUSTD - Data container, storing information in encoded form.
Initial analysis
Having understood the data storage structure, we can make initial conclusions about the categories and types of SAPOffice content we have.
SOFFCONT1 side
To begin, let's gather general information about what is actually stored in the database, i.e., what is recorded in the SOFFCONT1 table.
[ Total number of all files stored in database ] ----------------------------------
select count(DISTINCT PHIO_ID) as TOTAL_NUM FROM SAPSR3.SOFFCONT1;
Output sample:
[ Total size of all files stored in database ] --------------------------------------
select ROUND(SUM(CLUSTR)/1024/1024/1024,0) as TOTAL_FILESIZE_GB from SAPSR3.SOFFCONT1;
Output sample:
领英推荐
[ List of files bigger than 50 MB ] -----------------------------------------------
select * FROM (select PHIO_ID, ROUND(SUM(CLUSTR)/1024/1024,0) as FILESIZE_MB from SAPSR3.SOFFCONT1 GROUP BY PHIO_ID)
where FILESIZE_MB>50
ORDER BY FILESIZE_MB DESC;
Output sample:
[ All stored docs grouped by size category ] ------------------------------------
SELECT
SIZE_CATEGORY as SIZE_CATEGORY_MB,
count(*) as NUM,
ROUND(SUM(FILESIZE_MB)/1024,0) as TOTAL_GB
FROM (
select
CASE
WHEN FILESIZE_MB >= 0 AND FILESIZE_MB <= 0.5 THEN '1. SIZE <= 0,5'
WHEN FILESIZE_MB > 0.5 AND FILESIZE_MB <= 1 THEN '2. 0,5 < SIZE <= 1'
WHEN FILESIZE_MB > 1 AND FILESIZE_MB <= 5 THEN '3. 1 < SIZE <= 5'
WHEN FILESIZE_MB > 5 AND FILESIZE_MB <= 10 THEN '4. 5 < SIZE <= 10'
WHEN FILESIZE_MB > 10 AND FILESIZE_MB <= 20 THEN '5. 10 < SIZE <= 20'
WHEN FILESIZE_MB > 20 AND FILESIZE_MB <= 30 THEN '6. 20 < SIZE <= 30'
WHEN FILESIZE_MB > 30 AND FILESIZE_MB <= 50 THEN '7. 30 < SIZE <= 50'
WHEN FILESIZE_MB > 50 AND FILESIZE_MB <= 100 THEN '8. 50 < SIZE <= 100'
WHEN FILESIZE_MB > 100 THEN '9. 100 < SIZE'
END AS SIZE_CATEGORY,
FILESIZE_MB
FROM (
select SUM(CLUSTR)/1024/1024 as FILESIZE_MB from SAPSR3.SOFFCONT1 GROUP BY PHIO_ID
)
)
GROUP BY
SIZE_CATEGORY
ORDER BY
SIZE_CATEGORY
;
Output sample:
SOOD/SOFM side
Now, let's gather general information about what SAP considers to be stored in DB, based on document header tables.
[ Total number of documents, stored in KRPO (DB), based on header ] ----------
select count(*) as TOTAL_NUM from SAPSR3.SOOD WHERE EXTCT='K';
Output sample:
[ Objects, stored in KRPO (DB), grouped by type ] ------------------------------
select OBJTP, count(*) as NUM from SAPSR3.SOOD WHERE EXTCT='K' group by OBJTP order by NUM DESC;
Output sample:
[ Objects, stored in KRPO (DB), grouped by file extension ] ---------------------
select UPPER(FILE_EXT) as FILE_EXT, count(*) as NUM from SAPSR3.SOOD WHERE EXTCT='K' GROUP BY UPPER(FILE_EXT) ORDER BY NUM DESC;
Output sample:
[ Objects, stored in KRPO (DB), grouped by owner ] ----------------------------
SELECT OWNNAM, count(*) as NUM FROM SAPSR3.SOOD WHERE EXTCT='K' GROUP BY OWNNAM ORDER BY NUM DESC;
Output sample:
[ Objects, stored in KRPO (DB), grouped by years ] ----------------------------
select SUBSTR(CRDAT,1,4) as CREATE_YEAR, count(*) as NUM FROM SAPSR3.SOOD where EXTCT = 'K' GROUP BY SUBSTR(CRDAT,1,4) ORDER BY CREATE_YEAR DESC;
Output sample:
Calculation of file sizes in DB
To conduct an analysis and identify the main space consumers, it is necessary to determine the method of calculating the file size. An obvious step seems to be using the OBJDBLEN or OBJLEN fields of the SOOD table:
However, the OBJDBLEN field is always empty (at least for NW753 with an Oracle database), and there is an important nuance with OBJLEN – this field contains the actual file size before being uploaded to SAP and does not reflect the space occupied in the database. Why does this happen?
Many online resources mistakenly claim that the reason is database-level compression. But in reality, the file structure is optimized by the SAP Application Server, as SOFFCONT1 belongs to the INDX table type, which is sometimes referred to as cluster data tables (not to be confused with clustered tables). Such objects should not be compressed (and not compressed by default) at the database level to avoid unnecessary computational resource expenditures.
For SAP on Oracle, references to this can be found in SAP OSS note 1431296 - "LOB conversion and table compression with BRSPACE 7.20":
We recommend not to compress the following tables in Oracle (if these tables have SecureFile LOBs, you should also not compress them):
...
- INDX-type tables (for example, BALDAT, SOC3)
...
?The majority of the tables mentioned above (pool, cluster, INDX-type, ABAP source, and ABAP loads) are already compressed in the SAP system. Therefore, a second compression in the Oracle system does not make sense.
To confirm the absence of compression in the database and, at the same time, observe a significant difference in file size before and after uploading to SAP, you can analyze the results of the following queries:
select COMPRESSION, DEDUPLICATION from dba_lobs where table_name='SOFFCONT1';
select ROUND(sum(dbms_lob.getlength (CLUSTD))/1024/1024/1024,0) as TOTAL_SIZE_GB from SAPSR3.SOFFCONT1;
select count(DISTINCT PHIO_ID) as TOTAL_NUM, ROUND(SUM(CLUSTR)/1024/1024/1024,0) as TOTAL_SIZE_GB from SAPSR3.SOFFCONT1;
select count(*) as TOTAL_NUM, ROUND(SUM(DECODE(LENGTH(TRIM(OBJLEN)),NULL,0,OBJLEN))/1024/1024/1024,0) as TOTAL_SIZE_GB from SAPSR3.SOOD WHERE EXTCT='K';
Formatted output organized in a table:
With an insignificant difference in the number of records, which could be caused by:
there is almost a twofold volume reduction in SOFFCONT1, confirming that the content is compressed by the SAP Application Server.
In some cases, the original file size might be sufficient to determine the pattern of files that occupy the most space, but this is more likely an exception when there are few documents and the category of consumers is statistically dominant. In most cases, the compression factor must be taken into account. For example, there are three different business processes that upload the following categories of files to SAP:
With an equal number of uploaded files, the difference in occupied space can be tenfold. Determining the exact volume of data for different categories of files is advisable, as it allows for accurate prediction of optimization results in the case of reorganization or migration work.
Determining the actual volume, utilized by specific document in the database is schematically shown in the diagram below:
The following occurs in the schematic:
From the perspective of speed and universality, the optimal method for data collection will be an SQL query. However, implementing it using database functions would be too complex because the SOC3 table field storing the logical identifier is clustered (like the content in SOFFCONT1) and requires decoding using FM SO_OBJECT_GET_CONTENT. Therefore, an ABAP program is a more optimal way to collect data.
To ensure the program execution results are available for repeated use, the output will be stored in the ZSOODFFCONT1_EXT table. This table's structure is copied from SOOD, with five additional fields added:
The idea is that we will create a table containing all the necessary data, and during the analysis phase, we will be able to make various queries directly from SQL console.
Here is a dummy ABAP program to collect data as described:
*&---------------------------------------------------------------------*
*& Report Z_FIND_DB_SIZE
*&---------------------------------------------------------------------*
*&
*&---------------------------------------------------------------------*
REPORT Z_FIND_DB_SIZE.
TYPES:
BEGIN OF ty_soffphio,
PHIO_ID TYPE SDOK_PHID,
STOR_CAT TYPE SDOK_STCAT,
END OF ty_soffphio.
DATA: lt_sood TYPE TABLE OF sood,
ls_sood TYPE sood,
lt_objcont TYPE TABLE OF SOLI,
ls_objcont TYPE SOLI,
lv_obj_id TYPE SOODK,
lv_offset TYPE i,
lv_loio_id TYPE string,
lv_phio_id TYPE string,
lv_phio_id_c TYPE C length 32,
lv_file_exists TYPE C LENGTH 1,
lv_message TYPE string,
lt_cont_info TYPE TABLE OF SCMS_ACINF,
ls_cont_info TYPE SCMS_ACINF,
lt_zsoodffcont1_ext TYPE TABLE OF ZSOODFFCONT1_EXT,
ls_zsoodffcont1_ext TYPE ZSOODFFCONT1_EXT,
lv_rows_inserted TYPE i, " Counter for rows inserted
lv_chunk_size TYPE i VALUE 1000, " Size of INSERT portion
lv_row TYPE i, " Counter for rows inserted
ls_soffphio TYPE ty_soffphio,
lv_stor_cat TYPE string,
lv_compsize TYPE INT4,
lv_real_size TYPE INT4
.
DELETE FROM ZSOODFFCONT1_EXT.
* Select data from the SPFLI table
SELECT * FROM sood
INTO TABLE lt_sood
WHERE
extct = 'K' " Stored externally (SOFFCONT1, CS)
* AND objyr = '47'
* AND objno = '000002762283'.
* AND CRDAT like '201401%'.
ORDER BY CRDAT CRTIM
.
IF lt_sood IS NOT INITIAL.
lv_row = 1.
lv_rows_inserted = 1.
LOOP AT lt_sood INTO ls_sood.
lv_loio_id = ''.
lv_phio_id = ''.
lv_stor_cat = ''.
lv_compsize = 0.
lv_real_size = 0.
CONCATENATE ls_sood-objtp ls_sood-objyr ls_sood-objno INTO lv_obj_id.
CALL FUNCTION 'SO_OBJECT_GET_CONTENT'
EXPORTING
OBJECT_ID = lv_obj_id
* ARCHIVE =
TABLES
* OBJHEAD =
OBJCONT = lt_objcont
* OBJPARA =
* OBJPARB =
* EXCEPTIONS
* ARCHIVE_ERROR = 1
* OBJECT_NOT_EXIST = 2
* OTHERS = 3
.
* IF SY-SUBRC <> 0.
* Implement suitable error handling here
* ENDIF.
READ TABLE lt_objcont INDEX 1 INTO ls_objcont.
FIND REGEX '[0-9A-F]{32,32}' IN ls_objcont
MATCH OFFSET lv_offset
.
IF SY-SUBRC = 0.
lv_loio_id = ls_objcont+lv_offset(32).
SELECT SINGLE
PHIO_ID STOR_CAT
INTO CORRESPONDING FIELDS OF
ls_soffphio
FROM
SOFFPHIO
WHERE
LOIO_ID = lv_loio_id
.
lv_phio_id = ls_soffphio-phio_id.
lv_stor_cat = ls_soffphio-stor_cat.
IF lv_phio_id IS NOT INITIAL.
IF lv_stor_cat = 'SOFFDB'.
CLEAR lt_cont_info.
MOVE lv_phio_id TO lv_phio_id_c.
CALL FUNCTION 'SCMS_R3DB_INFO'
EXPORTING
* MANDT = SY-MANDT
CREP_ID = 'SOFFDB'
DOC_ID = lv_phio_id_c
* PHIO_ID = lv_phio_id
* IMPORTING
* DOC_PROT =
* CREA_TIME =
* CHNG_TIME =
TABLES
CONTENT_INFO = lt_cont_info
EXCEPTIONS
ERROR_IMPORT = 1
ERROR_CONFIG = 2
OTHERS = 3
.
IF SY-SUBRC <> 0.
lv_file_exists = 'N'.
ENDIF.
IF lt_cont_info IS NOT INITIAL.
READ TABLE lt_cont_info INDEX 1 INTO ls_cont_info.
lv_file_exists = 'Y'.
lv_compsize = ls_cont_info-COMP_SIZE.
ELSE.
lv_file_exists = 'N'.
ENDIF.
SELECT
SUM( CLUSTR )
INTO lv_real_size
FROM
SOFFCONT1
WHERE
PHIO_ID = lv_phio_id
.
ELSE.
lv_file_exists = '-'.
ENDIF.
ELSE.
lv_phio_id = ''.
ENDIF.
ELSE.
lv_loio_id = ''.
lv_phio_id = ''.
lv_stor_cat = ''.
lv_file_exists = 'N'.
ENDIF.
MOVE-CORRESPONDING ls_sood TO ls_zsoodffcont1_ext.
ls_zsoodffcont1_ext-PHIO_ID = lv_phio_id.
ls_zsoodffcont1_ext-LOIO_ID = lv_loio_id.
ls_zsoodffcont1_ext-STOR_CAT = lv_stor_cat.
ls_zsoodffcont1_ext-COMP_SIZE = lv_compsize.
APPEND ls_zsoodffcont1_ext TO lt_zsoodffcont1_ext.
IF lv_row = lv_chunk_size.
INSERT ZSOODFFCONT1_EXT FROM TABLE lt_zsoodffcont1_ext.
lv_message = lv_rows_inserted.
CONDENSE lv_message NO-GAPS.
CONCATENATE 'Inserted in total [' lv_message '] rows.' INTO lv_message.
IF sy-batch = 'X'.
MESSAGE lv_message TYPE 'I'.
ELSE.
CALL FUNCTION 'SAPGUI_PROGRESS_INDICATOR'
EXPORTING
text = lv_message
EXCEPTIONS
OTHERS = 1.
ENDIF.
lv_row = 0.
CLEAR lt_zsoodffcont1_ext.
ENDIF.
lv_rows_inserted = lv_rows_inserted + 1.
lv_row = lv_row + 1.
ENDLOOP.
IF lv_row <> 1.
INSERT ZSOODFFCONT1_EXT FROM TABLE lt_zsoodffcont1_ext.
ENDIF.
ELSE.
WRITE: 'No data found.'.
ENDIF.
The logic of the program is identical to the process depicted in the diagram above. By running Z_FIND_DB_SIZE in the background, we wait for its execution and obtain extended data for analysis.
With the ZSOODFFCONT1_EXT table populated, which combines file attributes from SOOD with the actual file size from SOFFCONT1, we can gather different comprehensive statistics, such as average compression ratio of different file types:
select
FILE_EXT,
ROUND((INIT_SIZE_B-COMPR_SIZE_B)/INIT_SIZE_B*100,0) as REDUCTION_PERC
FROM (
select
UPPER(FILE_EXT) as FILE_EXT,
SUM(DECODE(LENGTH(TRIM(OBJLEN)),NULL,0,OBJLEN)) as INIT_SIZE_B,
SUM(FILE_SIZE_B) COMPR_SIZE_B
from SAPSR3.ZSOODFFCONT1_EXT
WHERE EXTCT='K'
GROUP BY UPPER(FILE_EXT)
)
ORDER BY INIT_SIZE_B DESC;
Output sample:
In addition to general data, we can return to the queries used in the initial analysis and expand them with a column showing the actual file sizes. For example, statistics of SOFFCONT1 utilization by user:
SELECT OWNNAM, TOTAL_SIZE_GB,
ROUND(TOTAL_SIZE_GB/SOFFCONT_SIZE_GB*100,0) as PERC_OF_SOFFCONT1_UTIL
FROM (SELECT OWNNAM, ROUND(SUM(FILE_SIZE_B)/1024/1024/1024,0) as TOTAL_SIZE_GB FROM SAPSR3.ZSOODFFCONT1_EXT WHERE EXTCT='K' GROUP BY OWNNAM ORDER BY TOTAL_SIZE_GB DESC
) table1,
(select ROUND(SUM(CLUSTR)/1024/1024/1024,0) as SOFFCONT_SIZE_GB from SAPSR3.SOFFCONT1) table2
WHERE ROWNUM<15;Output sample:
Conclusion:
By using the approach described in this article, you will likely be able to identify the source of file generation in the database, or at least narrow down the search to a specific area or time frame. However, the investigation may not end there for one of the following reasons:
To address such cases, I will describe methods for separating SAP functionalities and associating business modules for some of the most common scenarios in the following parts of this article.
SAP Cloud Architect at InnovatesApp
4 个月Hello, Mykola. Thank you for interesting article. Would it be possible to archive data using an elastic file system (EFS) instead using a content server? Our custom doesn't wanto to have a content server. Many thanks
SAP Technical Lead at Kyndryl
8 个月That was an interesting, well-organized and well-explained read. Thanks, Myko!
Co-Founder/SAP Technical Expert at DigitalMeta, s.r.o.
9 个月Hello, Mykola. Thank you for interesting article. Looking forward to see next part. Have one question: do I understand correctly that table SOC3?is still involved even for documents which are stored not in SOFFCONT1 but on external content server? BR, Artem.
Great stuff, waiting for #2